linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
       [not found] <20220125193530.3272386-1-bob.beckett@collabora.com>
@ 2022-01-25 19:35 ` Robert Beckett
  2022-01-26 13:49   ` [Intel-gfx] " Thomas Hellström (Intel)
  2022-01-25 19:35 ` [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards Robert Beckett
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-25 19:35 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Ramalingam C, Matthew Auld, Robert Beckett, intel-gfx, dri-devel,
	linux-kernel

From: Ramalingam C <ramalingam.c@intel.com>

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 10 +++++++---
 drivers/gpu/drm/i915/i915_pci.c          |  2 ++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 44c1f98144b4..1258b7779705 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
- * device local memory access. Also this flag implies that we require or
- * at least support the compact PT layout for the ppGTT when using the 64K
- * GTT pages.
+ * device local memory access.
  */
 #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
 
+/* Set this flag when platform doesn't allow both 64k pages and 4k pages in
+ * the same PT. this flag means we need to support compact PT layout for the
+ * ppGTT when using the 64K GTT pages.
+ */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 4081fd50ba9d..799b56569ef5 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = {
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = {
 	.media.rel = 55,
 	PLATFORM(INTEL_DG2),
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 3699b1c539ea..c8aaf646430c 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -130,6 +130,7 @@ enum intel_ppgtt_type {
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
 	func(has_64k_pages); \
+	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards
       [not found] <20220125193530.3272386-1-bob.beckett@collabora.com>
  2022-01-25 19:35 ` [PATCH v5 1/5] drm/i915: add needs_compact_pt flag Robert Beckett
@ 2022-01-25 19:35 ` Robert Beckett
  2022-01-26 15:45   ` [Intel-gfx] " Thomas Hellström (Intel)
  2022-01-25 19:35 ` [PATCH v5 3/5] drm/i915: support 64K GTT pages " Robert Beckett
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-25 19:35 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Matthew Auld, Ramalingam C, Robert Beckett, intel-gfx, dri-devel,
	linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
	* use needs_compact_pt flag to discriminate between
	  64K and 64K with compact-pt
	* add i915_vm_obj_min_alignment
	* use i915_vm_obj_min_alignment to round up vma reservation
	  if compact-pt instead of hard coding
v5:
	* fix i915_vm_obj_min_alignment for internal objects which
	  have no memory region

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 18 ++++
 drivers/gpu/drm/i915/i915_vma.c               |  9 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 117 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c8ff8bf0986d..f0bfce53258f 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..df23ebdfc994 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+	} else if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..ba9f040f8606 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,21 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
+static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
+					    struct drm_i915_gem_object  *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
+
+	return i915_vm_min_alignment(vm, type);
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index b959e904c4d3..3b31d2ba6864 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -745,6 +745,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
 	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
 
+	alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj));
+	/*
+	 * for compact-pt we round up the reservation to prevent
+	 * any smaller pages being used within the same PDE
+	 */
+	if (NEEDS_COMPACT_PT(vma->vm->i915))
+		size = round_up(size, alignment);
+
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
@@ -757,6 +765,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	}
 
 	color = 0;
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fba1c8be1649..b80788a2b7f9 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/5] drm/i915: support 64K GTT pages for discrete cards
       [not found] <20220125193530.3272386-1-bob.beckett@collabora.com>
  2022-01-25 19:35 ` [PATCH v5 1/5] drm/i915: add needs_compact_pt flag Robert Beckett
  2022-01-25 19:35 ` [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards Robert Beckett
@ 2022-01-25 19:35 ` Robert Beckett
  2022-01-31 11:19   ` [Intel-gfx] " Thomas Hellström
  2022-01-25 19:35 ` [PATCH v5 4/5] drm/i915: add gtt misalignment test Robert Beckett
  2022-01-25 19:35 ` [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support Robert Beckett
  4 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-25 19:35 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Matthew Auld, kernel test robot, Stuart Summers, Ramalingam C,
	Robert Beckett, intel-gfx, dri-devel, linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index f36191ebf964..a7d9bdb85d70 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index ba9f040f8606..e6ce0be6d484 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/5] drm/i915: add gtt misalignment test
       [not found] <20220125193530.3272386-1-bob.beckett@collabora.com>
                   ` (2 preceding siblings ...)
  2022-01-25 19:35 ` [PATCH v5 3/5] drm/i915: support 64K GTT pages " Robert Beckett
@ 2022-01-25 19:35 ` Robert Beckett
  2022-01-26 14:05   ` [Intel-gfx] " Thomas Hellström (Intel)
  2022-01-25 19:35 ` [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support Robert Beckett
  4 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-25 19:35 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Robert Beckett, kernel test robot, intel-gfx, dri-devel, linux-kernel

add test to check handling of misaligned offsets and sizes

v4:
	* remove spurious blank lines
	* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index b80788a2b7f9..f082b5ff3b5e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include <linux/list_sort.h>
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* dg2 should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id);
+		u64 size = min_alignment;
+		u64 addr = round_up(hole_start + (hole_size / 2), min_alignment);
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1136,6 +1252,11 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1208,6 +1329,11 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2180,12 +2306,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support
       [not found] <20220125193530.3272386-1-bob.beckett@collabora.com>
                   ` (3 preceding siblings ...)
  2022-01-25 19:35 ` [PATCH v5 4/5] drm/i915: add gtt misalignment test Robert Beckett
@ 2022-01-25 19:35 ` Robert Beckett
  2022-01-26 14:11   ` [Intel-gfx] " Thomas Hellström (Intel)
  4 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-25 19:35 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Matthew Auld, Ramalingam C, Robert Beckett, Jordan Justen,
	Simon Ser, Pekka Paalanen, Kenneth Graunke, mesa-dev, Tony Ye,
	Slawomir Milczarek, intel-gfx, dri-devel, linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v3: fix typos and less emphasis
v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..77e5e74c32c1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * DG2 64K min page size implications:
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * Special DG2 GTT address alignment requirement:
+	 *
+	 * The GTT alignment will also need to be at least 2M for such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE (which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects DG2, this
+	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
  2022-01-25 19:35 ` [PATCH v5 1/5] drm/i915: add needs_compact_pt flag Robert Beckett
@ 2022-01-26 13:49   ` Thomas Hellström (Intel)
  2022-01-26 17:11     ` Robert Beckett
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Hellström (Intel) @ 2022-01-26 13:49 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld


On 1/25/22 20:35, Robert Beckett wrote:
> From: Ramalingam C <ramalingam.c@intel.com>
>
> Add a new platform flag, needs_compact_pt, to mark the requirement of
> compact pt layout support for the ppGTT when using 64K GTT pages.
>
> With this flag has_64k_pages will only indicate requirement of 64K
> GTT page sizes or larger for device local memory access.
>
> Suggested-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          | 10 +++++++---
>   drivers/gpu/drm/i915/i915_pci.c          |  2 ++
>   drivers/gpu/drm/i915/intel_device_info.h |  1 +
>   3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 44c1f98144b4..1258b7779705 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>   
>   /*
>    * Set this flag, when platform requires 64K GTT page sizes or larger for
> - * device local memory access. Also this flag implies that we require or
> - * at least support the compact PT layout for the ppGTT when using the 64K
> - * GTT pages.

Why do we remove these comment lines?


> + * device local memory access.
>    */
>   #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
>   
> +/* Set this flag when platform doesn't allow both 64k pages and 4k pages in

First line of multi-line comments should be empty.


> + * the same PT. this flag means we need to support compact PT layout for the
> + * ppGTT when using the 64K GTT pages.
> + */
> +#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
> +
>   #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
>   
>   #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 4081fd50ba9d..799b56569ef5 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = {
>   	PLATFORM(INTEL_XEHPSDV),
>   	.display = { },
>   	.has_64k_pages = 1,
> +	.needs_compact_pt = 1,
>   	.platform_engine_mask =
>   		BIT(RCS0) | BIT(BCS0) |
>   		BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
> @@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = {
>   	.media.rel = 55,
>   	PLATFORM(INTEL_DG2),
>   	.has_64k_pages = 1,
> +	.needs_compact_pt = 1,
>   	.platform_engine_mask =
>   		BIT(RCS0) | BIT(BCS0) |
>   		BIT(VECS0) | BIT(VECS1) |
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index 3699b1c539ea..c8aaf646430c 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -130,6 +130,7 @@ enum intel_ppgtt_type {
>   	/* Keep has_* in alphabetical order */ \
>   	func(has_64bit_reloc); \
>   	func(has_64k_pages); \
> +	func(needs_compact_pt); \
>   	func(gpu_reset_clobbers_display); \
>   	func(has_reset_engine); \
>   	func(has_global_mocs); \

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 4/5] drm/i915: add gtt misalignment test
  2022-01-25 19:35 ` [PATCH v5 4/5] drm/i915: add gtt misalignment test Robert Beckett
@ 2022-01-26 14:05   ` Thomas Hellström (Intel)
  2022-01-26 17:22     ` Robert Beckett
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Hellström (Intel) @ 2022-01-26 14:05 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, dri-devel, linux-kernel


On 1/25/22 20:35, Robert Beckett wrote:
> add test to check handling of misaligned offsets and sizes
>
> v4:
> 	* remove spurious blank lines
> 	* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
> Reported-by: kernel test robot <lkp@intel.com>
>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++
>   1 file changed, 128 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index b80788a2b7f9..f082b5ff3b5e 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -22,10 +22,12 @@
>    *
>    */
>   
> +#include "gt/intel_gtt.h"
>   #include <linux/list_sort.h>
>   #include <linux/prime_numbers.h>
>   
>   #include "gem/i915_gem_context.h"
> +#include "gem/i915_gem_region.h"
>   #include "gem/selftests/mock_context.h"
>   #include "gt/intel_context.h"
>   #include "gt/intel_gpu_commands.h"
> @@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
>   	return err;
>   }
>   
> +static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
> +			   u64 addr, u64 size, unsigned long flags)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	int err = 0;
> +	u64 expected_vma_size, expected_node_size;
> +
> +	obj = i915_gem_object_create_region(mr, size, 0, 0);
> +	if (IS_ERR(obj))
> +		return PTR_ERR(obj);
> +
> +	vma = i915_vma_instance(obj, vm, NULL);
> +	if (IS_ERR(vma)) {
> +		err = PTR_ERR(vma);
> +		goto err_put;
> +	}
> +
> +	err = i915_vma_pin(vma, 0, 0, addr | flags);
> +	if (err)
> +		goto err_put;
> +	i915_vma_unpin(vma);
> +
> +	if (!drm_mm_node_allocated(&vma->node)) {
> +		err = -EINVAL;
> +		goto err_put;
> +	}
> +
> +	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
> +		err = -EINVAL;
> +		goto err_put;
> +	}
> +
> +	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
> +	expected_node_size = expected_vma_size;
> +
> +	if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
> +		/* dg2 should expand lmem node to 2MB */

Should this test be NEEDS_COMPACT_PT()?

Otherwise LGTM. Reviewed-by: Thomas Hellström 
<thomas.hellstrom@linux.intel.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support
  2022-01-25 19:35 ` [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support Robert Beckett
@ 2022-01-26 14:11   ` Thomas Hellström (Intel)
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström (Intel) @ 2022-01-26 14:11 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, Kenneth Graunke, dri-devel, Slawomir Milczarek,
	Pekka Paalanen, Matthew Auld, Simon Ser, mesa-dev, linux-kernel


On 1/25/22 20:35, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
>
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
>
> v3: fix typos and less emphasis
> v2: Fixed suggestions on formatting [Daniel]
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> Acked-by: Jordan Justen <jordan.l.justen@intel.com>
> Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-25 19:35 ` [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards Robert Beckett
@ 2022-01-26 15:45   ` Thomas Hellström (Intel)
  2022-01-26 17:21     ` Robert Beckett
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Hellström (Intel) @ 2022-01-26 15:45 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld


On 1/25/22 20:35, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
>
> For local-memory objects we need to align the GTT addresses
> to 64K, both for the ppgtt and ggtt.
>
> We need to support vm->min_alignment > 4K, depending
> on the vm itself and the type of object we are inserting.
> With this in mind update the GTT selftests to take this
> into account.
>
> For compact-pt we further align and pad lmem object GTT addresses
> to 2MB to ensure PDEs contain consistent page sizes as
> required by the HW.
>
> v3:
> 	* use needs_compact_pt flag to discriminate between
> 	  64K and 64K with compact-pt
> 	* add i915_vm_obj_min_alignment
> 	* use i915_vm_obj_min_alignment to round up vma reservation
> 	  if compact-pt instead of hard coding
> v5:
> 	* fix i915_vm_obj_min_alignment for internal objects which
> 	  have no memory region
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 18 ++++
>   drivers/gpu/drm/i915/i915_vma.c               |  9 ++
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>   5 files changed, 117 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index c8ff8bf0986d..f0bfce53258f 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -39,6 +39,7 @@ struct tiled_blits {
>   	struct blit_buffer scratch;
>   	struct i915_vma *batch;
>   	u64 hole;
> +	u64 align;
>   	u32 width;
>   	u32 height;
>   };
> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>   		goto err_free;
>   	}
>   
> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));

Don't we always end up with 2M here, regardless of the vm restrictions?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
  2022-01-26 13:49   ` [Intel-gfx] " Thomas Hellström (Intel)
@ 2022-01-26 17:11     ` Robert Beckett
  2022-01-27  9:37       ` Thomas Hellström (Intel)
  0 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-26 17:11 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld



On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
> 
> On 1/25/22 20:35, Robert Beckett wrote:
>> From: Ramalingam C <ramalingam.c@intel.com>
>>
>> Add a new platform flag, needs_compact_pt, to mark the requirement of
>> compact pt layout support for the ppGTT when using 64K GTT pages.
>>
>> With this flag has_64k_pages will only indicate requirement of 64K
>> GTT page sizes or larger for device local memory access.
>>
>> Suggested-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h          | 10 +++++++---
>>   drivers/gpu/drm/i915/i915_pci.c          |  2 ++
>>   drivers/gpu/drm/i915/intel_device_info.h |  1 +
>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 44c1f98144b4..1258b7779705 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private 
>> *i915,
>>   /*
>>    * Set this flag, when platform requires 64K GTT page sizes or 
>> larger for
>> - * device local memory access. Also this flag implies that we require or
>> - * at least support the compact PT layout for the ppGTT when using 
>> the 64K
>> - * GTT pages.
> 
> Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also 
requires compact pt.
This is to support other products that will have 64K but not have the 
PDE non-sharing restriction in future.

Those lines moved to the next change NEEDS_COMPACT_PT, which is now 
separate.

> 
> 
>> + * device local memory access.
>>    */
>>   #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
>> +/* Set this flag when platform doesn't allow both 64k pages and 4k 
>> pages in
> 
> First line of multi-line comments should be empty.
thanks. I'm surprised checkpatch didnt spot that.

> 
> 
>> + * the same PT. this flag means we need to support compact PT layout 
>> for the
>> + * ppGTT when using the 64K GTT pages.
>> + */
>> +#define NEEDS_COMPACT_PT(dev_priv) 
>> (INTEL_INFO(dev_priv)->needs_compact_pt)
>> +
>>   #define HAS_IPC(dev_priv)         
>> (INTEL_INFO(dev_priv)->display.has_ipc)
>>   #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
>> diff --git a/drivers/gpu/drm/i915/i915_pci.c 
>> b/drivers/gpu/drm/i915/i915_pci.c
>> index 4081fd50ba9d..799b56569ef5 100644
>> --- a/drivers/gpu/drm/i915/i915_pci.c
>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>> @@ -1028,6 +1028,7 @@ static const struct intel_device_info 
>> xehpsdv_info = {
>>       PLATFORM(INTEL_XEHPSDV),
>>       .display = { },
>>       .has_64k_pages = 1,
>> +    .needs_compact_pt = 1,
>>       .platform_engine_mask =
>>           BIT(RCS0) | BIT(BCS0) |
>>           BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
>> @@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = {
>>       .media.rel = 55,
>>       PLATFORM(INTEL_DG2),
>>       .has_64k_pages = 1,
>> +    .needs_compact_pt = 1,
>>       .platform_engine_mask =
>>           BIT(RCS0) | BIT(BCS0) |
>>           BIT(VECS0) | BIT(VECS1) |
>> diff --git a/drivers/gpu/drm/i915/intel_device_info.h 
>> b/drivers/gpu/drm/i915/intel_device_info.h
>> index 3699b1c539ea..c8aaf646430c 100644
>> --- a/drivers/gpu/drm/i915/intel_device_info.h
>> +++ b/drivers/gpu/drm/i915/intel_device_info.h
>> @@ -130,6 +130,7 @@ enum intel_ppgtt_type {
>>       /* Keep has_* in alphabetical order */ \
>>       func(has_64bit_reloc); \
>>       func(has_64k_pages); \
>> +    func(needs_compact_pt); \
>>       func(gpu_reset_clobbers_display); \
>>       func(has_reset_engine); \
>>       func(has_global_mocs); \

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-26 15:45   ` [Intel-gfx] " Thomas Hellström (Intel)
@ 2022-01-26 17:21     ` Robert Beckett
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Beckett @ 2022-01-26 17:21 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld



On 26/01/2022 15:45, Thomas Hellström (Intel) wrote:
> 
> On 1/25/22 20:35, Robert Beckett wrote:
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> For local-memory objects we need to align the GTT addresses
>> to 64K, both for the ppgtt and ggtt.
>>
>> We need to support vm->min_alignment > 4K, depending
>> on the vm itself and the type of object we are inserting.
>> With this in mind update the GTT selftests to take this
>> into account.
>>
>> For compact-pt we further align and pad lmem object GTT addresses
>> to 2MB to ensure PDEs contain consistent page sizes as
>> required by the HW.
>>
>> v3:
>>     * use needs_compact_pt flag to discriminate between
>>       64K and 64K with compact-pt
>>     * add i915_vm_obj_min_alignment
>>     * use i915_vm_obj_min_alignment to round up vma reservation
>>       if compact-pt instead of hard coding
>> v5:
>>     * fix i915_vm_obj_min_alignment for internal objects which
>>       have no memory region
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           | 18 ++++
>>   drivers/gpu/drm/i915/i915_vma.c               |  9 ++
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>   5 files changed, 117 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> index c8ff8bf0986d..f0bfce53258f 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>       struct blit_buffer scratch;
>>       struct i915_vma *batch;
>>       u64 hole;
>> +    u64 align;
>>       u32 width;
>>       u32 height;
>>   };
>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>> *engine, struct rnd_state *prng)
>>           goto err_free;
>>       }
>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from 
>> vm! */
>> +    t->align = max(t->align,
>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>> +    t->align = max(t->align,
>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
> 
> Don't we always end up with 2M here, regardless of the vm restrictions?
agreed. I will drop the 2M worst case.
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 4/5] drm/i915: add gtt misalignment test
  2022-01-26 14:05   ` [Intel-gfx] " Thomas Hellström (Intel)
@ 2022-01-26 17:22     ` Robert Beckett
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Beckett @ 2022-01-26 17:22 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, dri-devel, linux-kernel



On 26/01/2022 14:05, Thomas Hellström (Intel) wrote:
> 
> On 1/25/22 20:35, Robert Beckett wrote:
>> add test to check handling of misaligned offsets and sizes
>>
>> v4:
>>     * remove spurious blank lines
>>     * explicitly cast intel_region_id to intel_memory_type in 
>> misaligned_pin
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> ---
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++
>>   1 file changed, 128 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index b80788a2b7f9..f082b5ff3b5e 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -22,10 +22,12 @@
>>    *
>>    */
>> +#include "gt/intel_gtt.h"
>>   #include <linux/list_sort.h>
>>   #include <linux/prime_numbers.h>
>>   #include "gem/i915_gem_context.h"
>> +#include "gem/i915_gem_region.h"
>>   #include "gem/selftests/mock_context.h"
>>   #include "gt/intel_context.h"
>>   #include "gt/intel_gpu_commands.h"
>> @@ -1067,6 +1069,120 @@ static int shrink_boom(struct 
>> i915_address_space *vm,
>>       return err;
>>   }
>> +static int misaligned_case(struct i915_address_space *vm, struct 
>> intel_memory_region *mr,
>> +               u64 addr, u64 size, unsigned long flags)
>> +{
>> +    struct drm_i915_gem_object *obj;
>> +    struct i915_vma *vma;
>> +    int err = 0;
>> +    u64 expected_vma_size, expected_node_size;
>> +
>> +    obj = i915_gem_object_create_region(mr, size, 0, 0);
>> +    if (IS_ERR(obj))
>> +        return PTR_ERR(obj);
>> +
>> +    vma = i915_vma_instance(obj, vm, NULL);
>> +    if (IS_ERR(vma)) {
>> +        err = PTR_ERR(vma);
>> +        goto err_put;
>> +    }
>> +
>> +    err = i915_vma_pin(vma, 0, 0, addr | flags);
>> +    if (err)
>> +        goto err_put;
>> +    i915_vma_unpin(vma);
>> +
>> +    if (!drm_mm_node_allocated(&vma->node)) {
>> +        err = -EINVAL;
>> +        goto err_put;
>> +    }
>> +
>> +    if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
>> +        err = -EINVAL;
>> +        goto err_put;
>> +    }
>> +
>> +    expected_vma_size = round_up(size, 1 << 
>> (ffs(vma->resource->page_sizes_gtt) - 1));
>> +    expected_node_size = expected_vma_size;
>> +
>> +    if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
>> +        /* dg2 should expand lmem node to 2MB */
> 
> Should this test be NEEDS_COMPACT_PT()?
> 
> Otherwise LGTM. Reviewed-by: Thomas Hellström 
> <thomas.hellstrom@linux.intel.com>
Thanks. Good catch, forgot to retrofit the new macro here.
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
  2022-01-26 17:11     ` Robert Beckett
@ 2022-01-27  9:37       ` Thomas Hellström (Intel)
  2022-01-31 14:19         ` Robert Beckett
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Hellström (Intel) @ 2022-01-27  9:37 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld


On 1/26/22 18:11, Robert Beckett wrote:
>
>
> On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
>>
>> On 1/25/22 20:35, Robert Beckett wrote:
>>> From: Ramalingam C <ramalingam.c@intel.com>
>>>
>>> Add a new platform flag, needs_compact_pt, to mark the requirement of
>>> compact pt layout support for the ppGTT when using 64K GTT pages.
>>>
>>> With this flag has_64k_pages will only indicate requirement of 64K
>>> GTT page sizes or larger for device local memory access.
>>>
>>> Suggested-by: Matthew Auld <matthew.auld@intel.com>
>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h          | 10 +++++++---
>>>   drivers/gpu/drm/i915/i915_pci.c          |  2 ++
>>>   drivers/gpu/drm/i915/intel_device_info.h |  1 +
>>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index 44c1f98144b4..1258b7779705 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private 
>>> *i915,
>>>   /*
>>>    * Set this flag, when platform requires 64K GTT page sizes or 
>>> larger for
>>> - * device local memory access. Also this flag implies that we 
>>> require or
>>> - * at least support the compact PT layout for the ppGTT when using 
>>> the 64K
>>> - * GTT pages.
>>
>> Why do we remove these comment lines?
> Because HAS_64K_PAGES now means just 64K page, it no longer means also 
> requires compact pt.
> This is to support other products that will have 64K but not have the 
> PDE non-sharing restriction in future.
>
> Those lines moved to the next change NEEDS_COMPACT_PT, which is now 
> separate.

Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does 
"HAS_64K_PAGES" still mean compact is supported? That information is lost.

/Thomas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 3/5] drm/i915: support 64K GTT pages for discrete cards
  2022-01-25 19:35 ` [PATCH v5 3/5] drm/i915: support 64K GTT pages " Robert Beckett
@ 2022-01-31 11:19   ` Thomas Hellström
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2022-01-31 11:19 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld


On 1/25/22 20:35, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
>
> discrete cards optimise 64K GTT pages for local-memory, since everything
> should be allocated at 64K granularity. We say goodbye to sparse
> entries, and instead get a compact 256B page-table for 64K pages,
> which should be more cache friendly. 4K pages for local-memory
> are no longer supported by the HW.
>
> v4: don't return uninitialized err in igt_ppgtt_compact
> Reported-by: kernel test robot <lkp@intel.com>

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
>   4 files changed, 169 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index f36191ebf964..a7d9bdb85d70 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
>   	return err;
>   }
>   
> +static int igt_ppgtt_compact(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct drm_i915_gem_object *obj;
> +	int err;
> +
> +	/*
> +	 * Simple test to catch issues with compact 64K pages -- since the pt is
> +	 * compacted to 256B that gives us 32 entries per pt, however since the
> +	 * backing page for the pt is 4K, any extra entries we might incorrectly
> +	 * write out should be ignored by the HW. If ever hit such a case this
> +	 * test should catch it since some of our writes would land in scratch.
> +	 */
> +
> +	if (!HAS_64K_PAGES(i915)) {
> +		pr_info("device lacks compact 64K page support, skipping\n");
> +		return 0;
> +	}
> +
> +	if (!HAS_LMEM(i915)) {
> +		pr_info("device lacks LMEM support, skipping\n");
> +		return 0;
> +	}
> +
> +	/* We want the range to cover multiple page-table boundaries. */
> +	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
> +	if (IS_ERR(obj))
> +		return PTR_ERR(obj);
> +
> +	err = i915_gem_object_pin_pages_unlocked(obj);
> +	if (err)
> +		goto out_put;
> +
> +	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
> +		pr_info("LMEM compact unable to allocate huge-page(s)\n");
> +		goto out_unpin;
> +	}
> +
> +	/*
> +	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
> +	 * insertion.
> +	 */
> +	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
> +
> +	err = igt_write_huge(i915, obj);
> +	if (err)
> +		pr_err("LMEM compact write-huge failed\n");
> +
> +out_unpin:
> +	i915_gem_object_unpin_pages(obj);
> +out_put:
> +	i915_gem_object_put(obj);
> +
> +	if (err == -ENOMEM)
> +		err = 0;
> +
> +	return err;
> +}
> +
>   static int igt_tmpfs_fallback(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(igt_tmpfs_fallback),
>   		SUBTEST(igt_ppgtt_smoke_huge),
>   		SUBTEST(igt_ppgtt_sanity_check),
> +		SUBTEST(igt_ppgtt_compact),
>   	};
>   
>   	if (!HAS_PPGTT(i915)) {
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index c43e724afa9f..62471730266c 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   						   start, end, lvl);
>   		} else {
>   			unsigned int count;
> +			unsigned int pte = gen8_pd_index(start, 0);
> +			unsigned int num_ptes;
>   			u64 *vaddr;
>   
>   			count = gen8_pt_count(start, end);
> @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   			    atomic_read(&pt->used));
>   			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
>   
> +			num_ptes = count;
> +			if (pt->is_compact) {
> +				GEM_BUG_ON(num_ptes % 16);
> +				GEM_BUG_ON(pte % 16);
> +				num_ptes /= 16;
> +				pte /= 16;
> +			}
> +
>   			vaddr = px_vaddr(pt);
> -			memset64(vaddr + gen8_pd_index(start, 0),
> +			memset64(vaddr + pte,
>   				 vm->scratch[0]->encode,
> -				 count);
> +				 num_ptes);
>   
>   			atomic_sub(count, &pt->used);
>   			start += count;
> @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   	return idx;
>   }
>   
> +static void
> +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
> +			  struct i915_vma_resource *vma_res,
> +			  struct sgt_dma *iter,
> +			  enum i915_cache_level cache_level,
> +			  u32 flags)
> +{
> +	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
> +	unsigned int rem = sg_dma_len(iter->sg);
> +	u64 start = vma_res->start;
> +
> +	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
> +
> +	do {
> +		struct i915_page_directory * const pdp =
> +			gen8_pdp_for_page_address(vm, start);
> +		struct i915_page_directory * const pd =
> +			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
> +		struct i915_page_table *pt =
> +			i915_pt_entry(pd, __gen8_pte_index(start, 1));
> +		gen8_pte_t encode = pte_encode;
> +		unsigned int page_size;
> +		gen8_pte_t *vaddr;
> +		u16 index, max;
> +
> +		max = I915_PDES;
> +
> +		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
> +		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
> +		    rem >= I915_GTT_PAGE_SIZE_2M &&
> +		    !__gen8_pte_index(start, 0)) {
> +			index = __gen8_pte_index(start, 1);
> +			encode |= GEN8_PDE_PS_2M;
> +			page_size = I915_GTT_PAGE_SIZE_2M;
> +
> +			vaddr = px_vaddr(pd);
> +		} else {
> +			if (encode & GEN12_PPGTT_PTE_LM) {
> +				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
> +				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
> +				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
> +						       I915_GTT_PAGE_SIZE_64K));
> +
> +				index = __gen8_pte_index(start, 0) / 16;
> +				page_size = I915_GTT_PAGE_SIZE_64K;
> +
> +				max /= 16;
> +
> +				vaddr = px_vaddr(pd);
> +				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
> +
> +				pt->is_compact = true;
> +			} else {
> +				GEM_BUG_ON(pt->is_compact);
> +				index =  __gen8_pte_index(start, 0);
> +				page_size = I915_GTT_PAGE_SIZE;
> +			}
> +
> +			vaddr = px_vaddr(pt);
> +		}
> +
> +		do {
> +			GEM_BUG_ON(rem < page_size);
> +			vaddr[index++] = encode | iter->dma;
> +
> +			start += page_size;
> +			iter->dma += page_size;
> +			rem -= page_size;
> +			if (iter->dma >= iter->max) {
> +				iter->sg = __sg_next(iter->sg);
> +				if (!iter->sg)
> +					break;
> +
> +				rem = sg_dma_len(iter->sg);
> +				if (!rem)
> +					break;
> +
> +				iter->dma = sg_dma_address(iter->sg);
> +				iter->max = iter->dma + rem;
> +
> +				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
> +					break;
> +			}
> +		} while (rem >= page_size && index < max);
> +
> +		vma_res->page_sizes_gtt |= page_size;
> +	} while (iter->sg && sg_dma_len(iter->sg));
> +}
> +
>   static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
>   				   struct i915_vma_resource *vma_res,
>   				   struct sgt_dma *iter,
> @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   	struct sgt_dma iter = sgt_dma(vma_res);
>   
>   	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
> -		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> +		if (HAS_64K_PAGES(vm->i915))
> +			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
> +		else
> +			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>   	} else  {
>   		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index ba9f040f8606..e6ce0be6d484 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
>   
>   #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
>   
> +#define GEN12_PDE_64K BIT(6)
> +
>   /*
>    * Cacheability Control is a 4-bit value. The low three bits are stored in bits
>    * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> @@ -160,6 +162,7 @@ struct i915_page_table {
>   		atomic_t used;
>   		struct i915_page_table *stash;
>   	};
> +	bool is_compact;
>   };
>   
>   struct i915_page_directory {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 48e6e2f87700..043652dc6892 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> +	pt->is_compact = false;
>   	atomic_set(&pt->used, 0);
>   	return pt;
>   }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
  2022-01-27  9:37       ` Thomas Hellström (Intel)
@ 2022-01-31 14:19         ` Robert Beckett
  2022-01-31 14:58           ` Thomas Hellström
  0 siblings, 1 reply; 16+ messages in thread
From: Robert Beckett @ 2022-01-31 14:19 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld



On 27/01/2022 09:37, Thomas Hellström (Intel) wrote:
> 
> On 1/26/22 18:11, Robert Beckett wrote:
>>
>>
>> On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
>>>
>>> On 1/25/22 20:35, Robert Beckett wrote:
>>>> From: Ramalingam C <ramalingam.c@intel.com>
>>>>
>>>> Add a new platform flag, needs_compact_pt, to mark the requirement of
>>>> compact pt layout support for the ppGTT when using 64K GTT pages.
>>>>
>>>> With this flag has_64k_pages will only indicate requirement of 64K
>>>> GTT page sizes or larger for device local memory access.
>>>>
>>>> Suggested-by: Matthew Auld <matthew.auld@intel.com>
>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_drv.h          | 10 +++++++---
>>>>   drivers/gpu/drm/i915/i915_pci.c          |  2 ++
>>>>   drivers/gpu/drm/i915/intel_device_info.h |  1 +
>>>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>> index 44c1f98144b4..1258b7779705 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private 
>>>> *i915,
>>>>   /*
>>>>    * Set this flag, when platform requires 64K GTT page sizes or 
>>>> larger for
>>>> - * device local memory access. Also this flag implies that we 
>>>> require or
>>>> - * at least support the compact PT layout for the ppGTT when using 
>>>> the 64K
>>>> - * GTT pages.
>>>
>>> Why do we remove these comment lines?
>> Because HAS_64K_PAGES now means just 64K page, it no longer means also 
>> requires compact pt.
>> This is to support other products that will have 64K but not have the 
>> PDE non-sharing restriction in future.
>>
>> Those lines moved to the next change NEEDS_COMPACT_PT, which is now 
>> separate.
> 
> Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does 
> "HAS_64K_PAGES" still mean compact is supported? That information is lost.
Not any more.
I discussed the ambiguity of the original wording with mauld on irc.
We came to the conclusion that HAS_64K_PAGES should mean just that, that 
the hw has support for 64K pages, and says nothing about compact-pt at all.

NEEDS_COMPACT_PT means that the hw has compact-pt support and the driver 
is required to use it as it is a hw limitation.

There will be other devices that can support compact-pt but do not 
mandate its use. In this case, the current code would not use them, but 
there is potential for some future opportunistic use of that in the 
driver if desired (currently expected to include accelerated 
move/clear). If any opportunistic use is added to the driver, a new flag 
can be added along with the code that uses it to indicate compact-pt 
availability that is not mandatory (HAS_COMPACT_PT most likely), but as 
there is no code requiring it currently it should not be added yet, and 
the comments left as this patch does.

> 
> /Thomas
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
  2022-01-31 14:19         ` Robert Beckett
@ 2022-01-31 14:58           ` Thomas Hellström
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2022-01-31 14:58 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld


On 1/31/22 15:19, Robert Beckett wrote:
>
>
> On 27/01/2022 09:37, Thomas Hellström (Intel) wrote:
>>
>> On 1/26/22 18:11, Robert Beckett wrote:
>>>
>>>
>>> On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
>>>>
>>>> On 1/25/22 20:35, Robert Beckett wrote:
>>>>> From: Ramalingam C <ramalingam.c@intel.com>
>>>>>
>>>>> Add a new platform flag, needs_compact_pt, to mark the requirement of
>>>>> compact pt layout support for the ppGTT when using 64K GTT pages.
>>>>>
>>>>> With this flag has_64k_pages will only indicate requirement of 64K
>>>>> GTT page sizes or larger for device local memory access.
>>>>>
>>>>> Suggested-by: Matthew Auld <matthew.auld@intel.com>
>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_drv.h          | 10 +++++++---
>>>>>   drivers/gpu/drm/i915/i915_pci.c          |  2 ++
>>>>>   drivers/gpu/drm/i915/intel_device_info.h |  1 +
>>>>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index 44c1f98144b4..1258b7779705 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct 
>>>>> drm_i915_private *i915,
>>>>>   /*
>>>>>    * Set this flag, when platform requires 64K GTT page sizes or 
>>>>> larger for
>>>>> - * device local memory access. Also this flag implies that we 
>>>>> require or
>>>>> - * at least support the compact PT layout for the ppGTT when 
>>>>> using the 64K
>>>>> - * GTT pages.
>>>>
>>>> Why do we remove these comment lines?
>>> Because HAS_64K_PAGES now means just 64K page, it no longer means 
>>> also requires compact pt.
>>> This is to support other products that will have 64K but not have 
>>> the PDE non-sharing restriction in future.
>>>
>>> Those lines moved to the next change NEEDS_COMPACT_PT, which is now 
>>> separate.
>>
>> Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does 
>> "HAS_64K_PAGES" still mean compact is supported? That information is 
>> lost.
> Not any more.
> I discussed the ambiguity of the original wording with mauld on irc.
> We came to the conclusion that HAS_64K_PAGES should mean just that, 
> that the hw has support for 64K pages, and says nothing about 
> compact-pt at all.
>
> NEEDS_COMPACT_PT means that the hw has compact-pt support and the 
> driver is required to use it as it is a hw limitation.
>
> There will be other devices that can support compact-pt but do not 
> mandate its use. In this case, the current code would not use them, 
> but there is potential for some future opportunistic use of that in 
> the driver if desired (currently expected to include accelerated 
> move/clear). If any opportunistic use is added to the driver, a new 
> flag can be added along with the code that uses it to indicate 
> compact-pt availability that is not mandatory (HAS_COMPACT_PT most 
> likely), but as there is no code requiring it currently it should not 
> be added yet, and the comments left as this patch does.
>
Ok, Thanks for the clarification.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>



>>
>> /Thomas
>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-01-31 14:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220125193530.3272386-1-bob.beckett@collabora.com>
2022-01-25 19:35 ` [PATCH v5 1/5] drm/i915: add needs_compact_pt flag Robert Beckett
2022-01-26 13:49   ` [Intel-gfx] " Thomas Hellström (Intel)
2022-01-26 17:11     ` Robert Beckett
2022-01-27  9:37       ` Thomas Hellström (Intel)
2022-01-31 14:19         ` Robert Beckett
2022-01-31 14:58           ` Thomas Hellström
2022-01-25 19:35 ` [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards Robert Beckett
2022-01-26 15:45   ` [Intel-gfx] " Thomas Hellström (Intel)
2022-01-26 17:21     ` Robert Beckett
2022-01-25 19:35 ` [PATCH v5 3/5] drm/i915: support 64K GTT pages " Robert Beckett
2022-01-31 11:19   ` [Intel-gfx] " Thomas Hellström
2022-01-25 19:35 ` [PATCH v5 4/5] drm/i915: add gtt misalignment test Robert Beckett
2022-01-26 14:05   ` [Intel-gfx] " Thomas Hellström (Intel)
2022-01-26 17:22     ` Robert Beckett
2022-01-25 19:35 ` [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support Robert Beckett
2022-01-26 14:11   ` [Intel-gfx] " Thomas Hellström (Intel)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).