All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] discsrete card 64K page support
@ 2022-01-18 17:50 ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Intel GFX, dri-devel; +Cc: Robert Beckett, Matthew Auld, Stuart Summers

This series continues support for 64K pages for discrete cards.
It supersedes the 64K patches from https://patchwork.freedesktop.org/series/95686/#rev4
Changes since that series:

- set min alignment for DG2 to 2MB in i915_address_space_init
- replace coloring with simpler 2MB VA alignment for lmem buffers
	- enforce alignment to 2MB for lmem objects on DG2 in i915_vma_insert
	- expand vma reservation to round up to 2MB on DG2 in i915_vma_insert
- add alignment test

v2: rebase and fix for async vma that landed

Matthew Auld (3):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/uapi: document behaviour for DG2 64K support

Robert Beckett (1):
  drm/i915: add gtt misalignment test

 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 +++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  23 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 ++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  14 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  12 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 drivers/gpu/drm/i915/i915_vma.c               |  14 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 226 +++++++++++++++---
 include/uapi/drm/i915_drm.h                   |  44 +++-
 9 files changed, 453 insertions(+), 49 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-gfx] [PATCH v2 0/4] discsrete card 64K page support
@ 2022-01-18 17:50 ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Intel GFX, dri-devel; +Cc: Matthew Auld

This series continues support for 64K pages for discrete cards.
It supersedes the 64K patches from https://patchwork.freedesktop.org/series/95686/#rev4
Changes since that series:

- set min alignment for DG2 to 2MB in i915_address_space_init
- replace coloring with simpler 2MB VA alignment for lmem buffers
	- enforce alignment to 2MB for lmem objects on DG2 in i915_vma_insert
	- expand vma reservation to round up to 2MB on DG2 in i915_vma_insert
- add alignment test

v2: rebase and fix for async vma that landed

Matthew Auld (3):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/uapi: document behaviour for DG2 64K support

Robert Beckett (1):
  drm/i915: add gtt misalignment test

 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 +++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  23 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 ++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  14 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  12 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 drivers/gpu/drm/i915/i915_vma.c               |  14 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 226 +++++++++++++++---
 include/uapi/drm/i915_drm.h                   |  44 +++-
 9 files changed, 453 insertions(+), 49 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-01-18 17:50   ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Matthew Auld, Ramalingam C, Robert Beckett, intel-gfx, dri-devel,
	linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For DG2 we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
 drivers/gpu/drm/i915/i915_vma.c               | 14 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c08f766e6e15..7fee95a65414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..7c92b25c0f26 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915)) {
+		if (IS_DG2(vm->i915)) {
+			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		} else {
+			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		}
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..b8da2514d601 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 1f15c3298112..9ac92e7a3566 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	color = 0;
+
+	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
+		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
+		/*
+		 * DG2 can not have different sized pages in any given PDE (2MB range).
+		 * Keeping things simple, we force any lmem object to reserve
+		 * 2MB chunks, preventing any smaller pages being used alongside
+		 */
+		if (IS_DG2(vma->vm->i915)) {
+			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
+			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+		}
+	}
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 076d860ce01a..2f3f0c01786b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Robert Beckett, intel-gfx, linux-kernel, dri-devel, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For DG2 we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
 drivers/gpu/drm/i915/i915_vma.c               | 14 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c08f766e6e15..7fee95a65414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..7c92b25c0f26 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915)) {
+		if (IS_DG2(vm->i915)) {
+			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		} else {
+			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		}
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..b8da2514d601 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 1f15c3298112..9ac92e7a3566 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	color = 0;
+
+	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
+		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
+		/*
+		 * DG2 can not have different sized pages in any given PDE (2MB range).
+		 * Keeping things simple, we force any lmem object to reserve
+		 * 2MB chunks, preventing any smaller pages being used alongside
+		 */
+		if (IS_DG2(vma->vm->i915)) {
+			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
+			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+		}
+	}
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 076d860ce01a..2f3f0c01786b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For DG2 we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
 drivers/gpu/drm/i915/i915_vma.c               | 14 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c08f766e6e15..7fee95a65414 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..7c92b25c0f26 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915)) {
+		if (IS_DG2(vm->i915)) {
+			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		} else {
+			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		}
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..b8da2514d601 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 1f15c3298112..9ac92e7a3566 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	color = 0;
+
+	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
+		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
+		/*
+		 * DG2 can not have different sized pages in any given PDE (2MB range).
+		 * Keeping things simple, we force any lmem object to reserve
+		 * 2MB chunks, preventing any smaller pages being used alongside
+		 */
+		if (IS_DG2(vma->vm->i915)) {
+			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
+			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+		}
+	}
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 076d860ce01a..2f3f0c01786b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/4] drm/i915: support 64K GTT pages for discrete cards
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-01-18 17:50   ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Matthew Auld, Stuart Summers, Ramalingam C, intel-gfx, dri-devel,
	linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 26f997c376a2..7efa6a598b03 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return err;
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b8da2514d601..0ab1fee9c587 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/4] drm/i915: support 64K GTT pages for discrete cards
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Stuart Summers, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 26f997c376a2..7efa6a598b03 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return err;
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b8da2514d601..0ab1fee9c587 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-gfx] [PATCH v2 2/4] drm/i915: support 64K GTT pages for discrete cards
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 26f997c376a2..7efa6a598b03 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return err;
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b8da2514d601..0ab1fee9c587 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/4] drm/i915: add gtt misalignment test
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-01-18 17:50   ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Robert Beckett, intel-gfx, dri-devel, linux-kernel

add test to check handling of misaligned offsets and sizes

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 130 ++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 2f3f0c01786b..76696a5e547e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include <linux/list_sort.h>
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* dg2 should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, id);
+		u64 size = min_alignment;
+		u64 addr = round_up(hole_start + (hole_size / 2), min_alignment);
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1136,6 +1252,12 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1208,6 +1330,12 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2180,12 +2308,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/4] drm/i915: add gtt misalignment test
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Robert Beckett, intel-gfx, linux-kernel, dri-devel

add test to check handling of misaligned offsets and sizes

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 130 ++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 2f3f0c01786b..76696a5e547e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include <linux/list_sort.h>
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* dg2 should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, id);
+		u64 size = min_alignment;
+		u64 addr = round_up(hole_start + (hole_size / 2), min_alignment);
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1136,6 +1252,12 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1208,6 +1330,12 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2180,12 +2308,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-gfx] [PATCH v2 3/4] drm/i915: add gtt misalignment test
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: intel-gfx, linux-kernel, dri-devel

add test to check handling of misaligned offsets and sizes

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 130 ++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 2f3f0c01786b..76696a5e547e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include <linux/list_sort.h>
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* dg2 should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, id);
+		u64 size = min_alignment;
+		u64 addr = round_up(hole_start + (hole_size / 2), min_alignment);
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1136,6 +1252,12 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1208,6 +1330,12 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2180,12 +2308,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
  (?)
@ 2022-01-18 17:50   ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Matthew Auld, Ramalingam C, Robert Beckett, Simon Ser,
	Pekka Paalanen, Jordan Justen, Kenneth Graunke, mesa-dev,
	Tony Ye, Slawomir Milczarek, intel-gfx, dri-devel, linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..486b7b96291e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * **DG2 64K min page size implications:**
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * **Special DG2 GTT address alignment requirement:**
+	 *
+	 * The GTT alignment will also need be at least 2M for  such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE(which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
+	 * id deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Robert Beckett, Tony Ye, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Matthew Auld, Jordan Justen, mesa-dev,
	linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..486b7b96291e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * **DG2 64K min page size implications:**
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * **Special DG2 GTT address alignment requirement:**
+	 *
+	 * The GTT alignment will also need be at least 2M for  such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE(which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
+	 * id deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-gfx] [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-18 17:50   ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-18 17:50 UTC (permalink / raw)
  To: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter
  Cc: Simon Ser, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Pekka Paalanen, Matthew Auld, mesa-dev,
	linux-kernel

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..486b7b96291e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * **DG2 64K min page size implications:**
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * **Special DG2 GTT address alignment requirement:**
+	 *
+	 * The GTT alignment will also need be at least 2M for  such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE(which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
+	 * id deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for discsrete card 64K page support
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
                   ` (4 preceding siblings ...)
  (?)
@ 2022-01-18 18:02 ` Patchwork
  -1 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2022-01-18 18:02 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

== Series Details ==

Series: discsrete card 64K page support
URL   : https://patchwork.freedesktop.org/series/98996/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
3fc953fbf8ce drm/i915: enforce min GTT alignment for discrete cards
-:275: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#275: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:457:
+						if (offset < hole_start + aligned_size)

-:287: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#287: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:481:
+						if (offset + aligned_size > hole_end)

-:305: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#305: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:497:
+						if (offset < hole_start + aligned_size)

-:317: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#317: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:520:
+						if (offset + aligned_size > hole_end)

-:335: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#335: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:536:
+						if (offset < hole_start + aligned_size)

-:347: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#347: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:560:
+						if (offset + aligned_size > hole_end)

-:365: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#365: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:576:
+						if (offset < hole_start + aligned_size)

-:377: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#377: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:599:
+						if (offset + aligned_size > hole_end)

total: 0 errors, 8 warnings, 0 checks, 428 lines checked
a68ac5de3a3c drm/i915: support 64K GTT pages for discrete cards
f8049933a42f drm/i915: add gtt misalignment test
-:157: CHECK:LINE_SPACING: Please don't use multiple blank lines
#157: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:1260:
+
+

-:170: CHECK:LINE_SPACING: Please don't use multiple blank lines
#170: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:1338:
+
+

total: 0 errors, 0 warnings, 2 checks, 170 lines checked
b496821c4212 drm/i915/uapi: document behaviour for DG2 64K support



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for discsrete card 64K page support
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
                   ` (5 preceding siblings ...)
  (?)
@ 2022-01-18 18:03 ` Patchwork
  -1 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2022-01-18 18:03 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

== Series Details ==

Series: discsrete card 64K page support
URL   : https://patchwork.freedesktop.org/series/98996/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for discsrete card 64K page support
  2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
                   ` (6 preceding siblings ...)
  (?)
@ 2022-01-18 18:34 ` Patchwork
  -1 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2022-01-18 18:34 UTC (permalink / raw)
  To: Robert Beckett; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 13929 bytes --]

== Series Details ==

Series: discsrete card 64K page support
URL   : https://patchwork.freedesktop.org/series/98996/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11094 -> Patchwork_22016
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22016 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22016, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/index.html

Participating hosts (46 -> 43)
------------------------------

  Additional (2): fi-kbl-soraka fi-icl-u2 
  Missing    (5): shard-tglu fi-bsw-cyan shard-rkl shard-dg1 fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22016:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gtt:
    - fi-skl-guc:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-skl-guc/igt@i915_selftest@live@gtt.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-skl-guc/igt@i915_selftest@live@gtt.html
    - fi-kbl-8809g:       [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-kbl-8809g/igt@i915_selftest@live@gtt.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-8809g/igt@i915_selftest@live@gtt.html
    - fi-tgl-1115g4:      [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-tgl-1115g4/igt@i915_selftest@live@gtt.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-tgl-1115g4/igt@i915_selftest@live@gtt.html
    - fi-cfl-8700k:       [PASS][7] -> [DMESG-FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-cfl-8700k/igt@i915_selftest@live@gtt.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-cfl-8700k/igt@i915_selftest@live@gtt.html
    - fi-cfl-guc:         [PASS][9] -> [DMESG-FAIL][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-cfl-guc/igt@i915_selftest@live@gtt.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-cfl-guc/igt@i915_selftest@live@gtt.html
    - fi-skl-6700k2:      [PASS][11] -> [DMESG-FAIL][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-skl-6700k2/igt@i915_selftest@live@gtt.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-skl-6700k2/igt@i915_selftest@live@gtt.html
    - fi-rkl-11600:       [PASS][13] -> [DMESG-FAIL][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-rkl-11600/igt@i915_selftest@live@gtt.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-rkl-11600/igt@i915_selftest@live@gtt.html
    - fi-bsw-kefka:       [PASS][15] -> [DMESG-FAIL][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-bsw-kefka/igt@i915_selftest@live@gtt.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-bsw-kefka/igt@i915_selftest@live@gtt.html
    - fi-kbl-7567u:       [PASS][17] -> [DMESG-FAIL][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-kbl-7567u/igt@i915_selftest@live@gtt.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-7567u/igt@i915_selftest@live@gtt.html
    - fi-glk-j4005:       [PASS][19] -> [DMESG-FAIL][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-glk-j4005/igt@i915_selftest@live@gtt.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-glk-j4005/igt@i915_selftest@live@gtt.html
    - fi-bsw-nick:        [PASS][21] -> [DMESG-FAIL][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-bsw-nick/igt@i915_selftest@live@gtt.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-bsw-nick/igt@i915_selftest@live@gtt.html
    - fi-cfl-8109u:       [PASS][23] -> [DMESG-FAIL][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-cfl-8109u/igt@i915_selftest@live@gtt.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-cfl-8109u/igt@i915_selftest@live@gtt.html
    - fi-bxt-dsi:         [PASS][25] -> [DMESG-FAIL][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-bxt-dsi/igt@i915_selftest@live@gtt.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-bxt-dsi/igt@i915_selftest@live@gtt.html
    - fi-icl-u2:          NOTRUN -> [DMESG-FAIL][27]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@i915_selftest@live@gtt.html
    - fi-cml-u2:          [PASS][28] -> [DMESG-FAIL][29]
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-cml-u2/igt@i915_selftest@live@gtt.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-cml-u2/igt@i915_selftest@live@gtt.html
    - fi-glk-dsi:         [PASS][30] -> [DMESG-FAIL][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-glk-dsi/igt@i915_selftest@live@gtt.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-glk-dsi/igt@i915_selftest@live@gtt.html
    - fi-ivb-3770:        [PASS][32] -> [DMESG-FAIL][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-ivb-3770/igt@i915_selftest@live@gtt.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-ivb-3770/igt@i915_selftest@live@gtt.html
    - fi-rkl-guc:         [PASS][34] -> [DMESG-FAIL][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-rkl-guc/igt@i915_selftest@live@gtt.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-rkl-guc/igt@i915_selftest@live@gtt.html
    - fi-kbl-x1275:       [PASS][36] -> [DMESG-FAIL][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-kbl-x1275/igt@i915_selftest@live@gtt.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-x1275/igt@i915_selftest@live@gtt.html
    - fi-kbl-7500u:       [PASS][38] -> [DMESG-FAIL][39]
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-kbl-7500u/igt@i915_selftest@live@gtt.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-7500u/igt@i915_selftest@live@gtt.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@gtt:
    - {bat-jsl-2}:        [PASS][40] -> [DMESG-FAIL][41]
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/bat-jsl-2/igt@i915_selftest@live@gtt.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/bat-jsl-2/igt@i915_selftest@live@gtt.html
    - {fi-ehl-2}:         [PASS][42] -> [DMESG-FAIL][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-ehl-2/igt@i915_selftest@live@gtt.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-ehl-2/igt@i915_selftest@live@gtt.html
    - {bat-adlp-6}:       [PASS][44] -> [DMESG-FAIL][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/bat-adlp-6/igt@i915_selftest@live@gtt.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/bat-adlp-6/igt@i915_selftest@live@gtt.html

  
Known issues
------------

  Here are the changes found in Patchwork_22016 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_cs_nop@fork-gfx0:
    - fi-icl-u2:          NOTRUN -> [SKIP][46] ([fdo#109315]) +17 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@amdgpu/amd_cs_nop@fork-gfx0.html

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
    - fi-snb-2600:        NOTRUN -> [SKIP][47] ([fdo#109271]) +17 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-snb-2600/igt@amdgpu/amd_cs_nop@sync-fork-compute0.html

  * igt@gem_exec_fence@basic-busy@bcs0:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][48] ([fdo#109271]) +8 similar issues
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-soraka/igt@gem_exec_fence@basic-busy@bcs0.html

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][49] ([fdo#109271] / [i915#2190])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html
    - fi-icl-u2:          NOTRUN -> [SKIP][50] ([i915#2190])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - fi-icl-u2:          NOTRUN -> [SKIP][51] ([i915#4613]) +3 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@gem_lmem_swapping@parallel-random-engines.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][52] ([fdo#109271] / [i915#4613]) +3 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-soraka/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][53] ([i915#1886] / [i915#2291])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][54] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-soraka/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          NOTRUN -> [SKIP][55] ([fdo#111827]) +8 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-icl-u2:          NOTRUN -> [SKIP][56] ([fdo#109278]) +2 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-icl-u2:          NOTRUN -> [SKIP][57] ([fdo#109285])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][58] ([fdo#109271] / [i915#533])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-kbl-soraka/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@prime_vgem@basic-userptr:
    - fi-icl-u2:          NOTRUN -> [SKIP][59] ([i915#3301])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-icl-u2/igt@prime_vgem@basic-userptr.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_heartbeat:
    - {fi-tgl-dsi}:       [INCOMPLETE][60] -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-tgl-dsi/igt@i915_selftest@live@gt_heartbeat.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-tgl-dsi/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          [DMESG-FAIL][62] ([i915#4494]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
    - fi-snb-2600:        [INCOMPLETE][64] ([i915#3921]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4898]: https://gitlab.freedesktop.org/drm/intel/issues/4898
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * Linux: CI_DRM_11094 -> Patchwork_22016

  CI-20190529: 20190529
  CI_DRM_11094: 6ce31c986ee8beaa0f98fd4e200b7a421fd4adf9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6327: 0d559158c2d3b5723abbfc2cb4b04532e28663b2 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22016: b496821c4212562b294299f4557c932e3a6ec826 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

b496821c4212 drm/i915/uapi: document behaviour for DG2 64K support
f8049933a42f drm/i915: add gtt misalignment test
a68ac5de3a3c drm/i915: support 64K GTT pages for discrete cards
3fc953fbf8ce drm/i915: enforce min GTT alignment for discrete cards

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22016/index.html

[-- Attachment #2: Type: text/html, Size: 15681 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
  2022-01-18 17:50   ` Robert Beckett
  (?)
@ 2022-01-19 18:36     ` Jordan Justen
  -1 siblings, 0 replies; 50+ messages in thread
From: Jordan Justen @ 2022-01-19 18:36 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Matthew Auld, Ramalingam C, Robert Beckett, Simon Ser,
	Pekka Paalanen, Kenneth Graunke, mesa-dev, Tony Ye,
	Slawomir Milczarek, intel-gfx, dri-devel, linux-kernel

Robert Beckett <bob.beckett@collabora.com> writes:

> From: Matthew Auld <matthew.auld@intel.com>
>
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
>
> v2: Fixed suggestions on formatting [Daniel]
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---
>  include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5e678917da70..486b7b96291e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>  	/**
>  	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>  	 * the user with the GTT offset at which this object will be pinned.
> +	 *
>  	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>  	 * presumed_offset of the object.
> +	 *
>  	 * During execbuffer2 the kernel populates it with the value of the
>  	 * current GTT offset of the object, for future presumed_offset writes.
> +	 *
> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
> +	 * minimum page sizes, like DG2.
>  	 */
>  	__u64 offset;
>  
> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>  	 *
>  	 * The (page-aligned) allocated size for the object will be returned.
>  	 *
> -	 * Note that for some devices we have might have further minimum
> -	 * page-size restrictions(larger than 4K), like for device local-memory.
> -	 * However in general the final size here should always reflect any
> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> -	 * extension to place the object in device local-memory.
> +	 *
> +	 * **DG2 64K min page size implications:**

Long term, I'm not sure that the "**" (for emphasis) is needed here or
below. It's interesting at the moment, but will be just another thing
baked into the kernel/user code in a month from now. :)

> +	 *
> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
> +	 * page sizes for such memory. The kernel will already ensure that all
> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> +	 * sizes underneath.
> +	 *
> +	 * Note that the returned size here will always reflect any required
> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
> +	 * such as DG2.
> +	 *
> +	 * **Special DG2 GTT address alignment requirement:**
> +	 *
> +	 * The GTT alignment will also need be at least 2M for  such objects.
> +	 *
> +	 * Note that due to how the hardware implements 64K GTT page support, we
> +	 * have some further complications:
> +	 *
> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> +	 *   PDE is forbidden by the hardware.
> +	 *
> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> +	 *   objects.
> +	 *
> +	 * To keep things simple for userland, we mandate that any GTT mappings
> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
> +	 * address space and avoids userland having to copy any needlessly
> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
> +	 * id deemed to be a good compromise.

typos: GD2, id

Isn't much of this more relavent to the vma offset at exec time? Is
there actually any new restriction on the size field during buffer
creation?

I see Matthew references these notes from the offset comments, so if the
kernel devs prefer it here, then you can add my Acked-by on this patch.

-Jordan

>  	 */
>  	__u64 size;
>  	/**
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-19 18:36     ` Jordan Justen
  0 siblings, 0 replies; 50+ messages in thread
From: Jordan Justen @ 2022-01-19 18:36 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Robert Beckett, Tony Ye, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Matthew Auld, mesa-dev, linux-kernel

Robert Beckett <bob.beckett@collabora.com> writes:

> From: Matthew Auld <matthew.auld@intel.com>
>
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
>
> v2: Fixed suggestions on formatting [Daniel]
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---
>  include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5e678917da70..486b7b96291e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>  	/**
>  	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>  	 * the user with the GTT offset at which this object will be pinned.
> +	 *
>  	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>  	 * presumed_offset of the object.
> +	 *
>  	 * During execbuffer2 the kernel populates it with the value of the
>  	 * current GTT offset of the object, for future presumed_offset writes.
> +	 *
> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
> +	 * minimum page sizes, like DG2.
>  	 */
>  	__u64 offset;
>  
> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>  	 *
>  	 * The (page-aligned) allocated size for the object will be returned.
>  	 *
> -	 * Note that for some devices we have might have further minimum
> -	 * page-size restrictions(larger than 4K), like for device local-memory.
> -	 * However in general the final size here should always reflect any
> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> -	 * extension to place the object in device local-memory.
> +	 *
> +	 * **DG2 64K min page size implications:**

Long term, I'm not sure that the "**" (for emphasis) is needed here or
below. It's interesting at the moment, but will be just another thing
baked into the kernel/user code in a month from now. :)

> +	 *
> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
> +	 * page sizes for such memory. The kernel will already ensure that all
> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> +	 * sizes underneath.
> +	 *
> +	 * Note that the returned size here will always reflect any required
> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
> +	 * such as DG2.
> +	 *
> +	 * **Special DG2 GTT address alignment requirement:**
> +	 *
> +	 * The GTT alignment will also need be at least 2M for  such objects.
> +	 *
> +	 * Note that due to how the hardware implements 64K GTT page support, we
> +	 * have some further complications:
> +	 *
> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> +	 *   PDE is forbidden by the hardware.
> +	 *
> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> +	 *   objects.
> +	 *
> +	 * To keep things simple for userland, we mandate that any GTT mappings
> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
> +	 * address space and avoids userland having to copy any needlessly
> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
> +	 * id deemed to be a good compromise.

typos: GD2, id

Isn't much of this more relavent to the vma offset at exec time? Is
there actually any new restriction on the size field during buffer
creation?

I see Matthew references these notes from the offset comments, so if the
kernel devs prefer it here, then you can add my Acked-by on this patch.

-Jordan

>  	 */
>  	__u64 size;
>  	/**
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-19 18:36     ` Jordan Justen
  0 siblings, 0 replies; 50+ messages in thread
From: Jordan Justen @ 2022-01-19 18:36 UTC (permalink / raw)
  To: Robert Beckett, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, Kenneth Graunke, dri-devel, Slawomir Milczarek,
	Pekka Paalanen, Matthew Auld, Simon Ser, mesa-dev, linux-kernel

Robert Beckett <bob.beckett@collabora.com> writes:

> From: Matthew Auld <matthew.auld@intel.com>
>
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
>
> v2: Fixed suggestions on formatting [Daniel]
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---
>  include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5e678917da70..486b7b96291e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>  	/**
>  	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>  	 * the user with the GTT offset at which this object will be pinned.
> +	 *
>  	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>  	 * presumed_offset of the object.
> +	 *
>  	 * During execbuffer2 the kernel populates it with the value of the
>  	 * current GTT offset of the object, for future presumed_offset writes.
> +	 *
> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
> +	 * minimum page sizes, like DG2.
>  	 */
>  	__u64 offset;
>  
> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>  	 *
>  	 * The (page-aligned) allocated size for the object will be returned.
>  	 *
> -	 * Note that for some devices we have might have further minimum
> -	 * page-size restrictions(larger than 4K), like for device local-memory.
> -	 * However in general the final size here should always reflect any
> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> -	 * extension to place the object in device local-memory.
> +	 *
> +	 * **DG2 64K min page size implications:**

Long term, I'm not sure that the "**" (for emphasis) is needed here or
below. It's interesting at the moment, but will be just another thing
baked into the kernel/user code in a month from now. :)

> +	 *
> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
> +	 * page sizes for such memory. The kernel will already ensure that all
> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> +	 * sizes underneath.
> +	 *
> +	 * Note that the returned size here will always reflect any required
> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
> +	 * such as DG2.
> +	 *
> +	 * **Special DG2 GTT address alignment requirement:**
> +	 *
> +	 * The GTT alignment will also need be at least 2M for  such objects.
> +	 *
> +	 * Note that due to how the hardware implements 64K GTT page support, we
> +	 * have some further complications:
> +	 *
> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> +	 *   PDE is forbidden by the hardware.
> +	 *
> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> +	 *   objects.
> +	 *
> +	 * To keep things simple for userland, we mandate that any GTT mappings
> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
> +	 * address space and avoids userland having to copy any needlessly
> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
> +	 * id deemed to be a good compromise.

typos: GD2, id

Isn't much of this more relavent to the vma offset at exec time? Is
there actually any new restriction on the size field during buffer
creation?

I see Matthew references these notes from the offset comments, so if the
kernel devs prefer it here, then you can add my Acked-by on this patch.

-Jordan

>  	 */
>  	__u64 size;
>  	/**
> -- 
> 2.25.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
  2022-01-19 18:36     ` Jordan Justen
  (?)
@ 2022-01-19 19:49       ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-19 19:49 UTC (permalink / raw)
  To: Jordan Justen, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Matthew Auld, Ramalingam C, Simon Ser, Pekka Paalanen,
	Kenneth Graunke, mesa-dev, Tony Ye, Slawomir Milczarek,
	intel-gfx, dri-devel, linux-kernel



On 19/01/2022 18:36, Jordan Justen wrote:
> Robert Beckett <bob.beckett@collabora.com> writes:
> 
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> On discrete platforms like DG2, we need to support a minimum page size
>> of 64K when dealing with device local-memory. This is quite tricky for
>> various reasons, so try to document the new implicit uapi for this.
>>
>> v2: Fixed suggestions on formatting [Daniel]
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> cc: Simon Ser <contact@emersion.fr>
>> cc: Pekka Paalanen <ppaalanen@gmail.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: mesa-dev@lists.freedesktop.org
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
>> ---
>>   include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>>   1 file changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 5e678917da70..486b7b96291e 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>>   	/**
>>   	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>>   	 * the user with the GTT offset at which this object will be pinned.
>> +	 *
>>   	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>>   	 * presumed_offset of the object.
>> +	 *
>>   	 * During execbuffer2 the kernel populates it with the value of the
>>   	 * current GTT offset of the object, for future presumed_offset writes.
>> +	 *
>> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
>> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
>> +	 * minimum page sizes, like DG2.
>>   	 */
>>   	__u64 offset;
>>   
>> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>>   	 *
>>   	 * The (page-aligned) allocated size for the object will be returned.
>>   	 *
>> -	 * Note that for some devices we have might have further minimum
>> -	 * page-size restrictions(larger than 4K), like for device local-memory.
>> -	 * However in general the final size here should always reflect any
>> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> -	 * extension to place the object in device local-memory.
>> +	 *
>> +	 * **DG2 64K min page size implications:**
> 
> Long term, I'm not sure that the "**" (for emphasis) is needed here or
> below. It's interesting at the moment, but will be just another thing
> baked into the kernel/user code in a month from now. :)

fair point, I'll make it less shouty

> 
>> +	 *
>> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
>> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
>> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
>> +	 * page sizes for such memory. The kernel will already ensure that all
>> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
>> +	 * sizes underneath.
>> +	 *
>> +	 * Note that the returned size here will always reflect any required
>> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
>> +	 * such as DG2.
>> +	 *
>> +	 * **Special DG2 GTT address alignment requirement:**
>> +	 *
>> +	 * The GTT alignment will also need be at least 2M for  such objects.
>> +	 *
>> +	 * Note that due to how the hardware implements 64K GTT page support, we
>> +	 * have some further complications:
>> +	 *
>> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
>> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
>> +	 *   PDE is forbidden by the hardware.
>> +	 *
>> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
>> +	 *   objects.
>> +	 *
>> +	 * To keep things simple for userland, we mandate that any GTT mappings
>> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
>> +	 * address space and avoids userland having to copy any needlessly
>> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
>> +	 * id deemed to be a good compromise.
> 
> typos: GD2, id

thanks

> 
> Isn't much of this more relavent to the vma offset at exec time? Is
> there actually any new restriction on the size field during buffer
> creation?

No new restriction on size, just placement, which mesa is already doing.
The request for ack was just to get an ack from mesa folks that they are 
happy with the mandatory 2MB alignment for DG2 vma.

> 
> I see Matthew references these notes from the offset comments, so if the
> kernel devs prefer it here, then you can add my Acked-by on this patch.

thanks!

> 
> -Jordan
> 
>>   	 */
>>   	__u64 size;
>>   	/**
>> -- 
>> 2.25.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-19 19:49       ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-19 19:49 UTC (permalink / raw)
  To: Jordan Justen, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: Tony Ye, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Matthew Auld, mesa-dev, linux-kernel



On 19/01/2022 18:36, Jordan Justen wrote:
> Robert Beckett <bob.beckett@collabora.com> writes:
> 
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> On discrete platforms like DG2, we need to support a minimum page size
>> of 64K when dealing with device local-memory. This is quite tricky for
>> various reasons, so try to document the new implicit uapi for this.
>>
>> v2: Fixed suggestions on formatting [Daniel]
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> cc: Simon Ser <contact@emersion.fr>
>> cc: Pekka Paalanen <ppaalanen@gmail.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: mesa-dev@lists.freedesktop.org
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
>> ---
>>   include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>>   1 file changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 5e678917da70..486b7b96291e 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>>   	/**
>>   	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>>   	 * the user with the GTT offset at which this object will be pinned.
>> +	 *
>>   	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>>   	 * presumed_offset of the object.
>> +	 *
>>   	 * During execbuffer2 the kernel populates it with the value of the
>>   	 * current GTT offset of the object, for future presumed_offset writes.
>> +	 *
>> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
>> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
>> +	 * minimum page sizes, like DG2.
>>   	 */
>>   	__u64 offset;
>>   
>> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>>   	 *
>>   	 * The (page-aligned) allocated size for the object will be returned.
>>   	 *
>> -	 * Note that for some devices we have might have further minimum
>> -	 * page-size restrictions(larger than 4K), like for device local-memory.
>> -	 * However in general the final size here should always reflect any
>> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> -	 * extension to place the object in device local-memory.
>> +	 *
>> +	 * **DG2 64K min page size implications:**
> 
> Long term, I'm not sure that the "**" (for emphasis) is needed here or
> below. It's interesting at the moment, but will be just another thing
> baked into the kernel/user code in a month from now. :)

fair point, I'll make it less shouty

> 
>> +	 *
>> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
>> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
>> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
>> +	 * page sizes for such memory. The kernel will already ensure that all
>> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
>> +	 * sizes underneath.
>> +	 *
>> +	 * Note that the returned size here will always reflect any required
>> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
>> +	 * such as DG2.
>> +	 *
>> +	 * **Special DG2 GTT address alignment requirement:**
>> +	 *
>> +	 * The GTT alignment will also need be at least 2M for  such objects.
>> +	 *
>> +	 * Note that due to how the hardware implements 64K GTT page support, we
>> +	 * have some further complications:
>> +	 *
>> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
>> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
>> +	 *   PDE is forbidden by the hardware.
>> +	 *
>> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
>> +	 *   objects.
>> +	 *
>> +	 * To keep things simple for userland, we mandate that any GTT mappings
>> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
>> +	 * address space and avoids userland having to copy any needlessly
>> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
>> +	 * id deemed to be a good compromise.
> 
> typos: GD2, id

thanks

> 
> Isn't much of this more relavent to the vma offset at exec time? Is
> there actually any new restriction on the size field during buffer
> creation?

No new restriction on size, just placement, which mesa is already doing.
The request for ack was just to get an ack from mesa folks that they are 
happy with the mandatory 2MB alignment for DG2 vma.

> 
> I see Matthew references these notes from the offset comments, so if the
> kernel devs prefer it here, then you can add my Acked-by on this patch.

thanks!

> 
> -Jordan
> 
>>   	 */
>>   	__u64 size;
>>   	/**
>> -- 
>> 2.25.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-19 19:49       ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-19 19:49 UTC (permalink / raw)
  To: Jordan Justen, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, David Airlie, Daniel Vetter
  Cc: intel-gfx, Kenneth Graunke, dri-devel, Slawomir Milczarek,
	Pekka Paalanen, Matthew Auld, Simon Ser, mesa-dev, linux-kernel



On 19/01/2022 18:36, Jordan Justen wrote:
> Robert Beckett <bob.beckett@collabora.com> writes:
> 
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> On discrete platforms like DG2, we need to support a minimum page size
>> of 64K when dealing with device local-memory. This is quite tricky for
>> various reasons, so try to document the new implicit uapi for this.
>>
>> v2: Fixed suggestions on formatting [Daniel]
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> cc: Simon Ser <contact@emersion.fr>
>> cc: Pekka Paalanen <ppaalanen@gmail.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Kenneth Graunke <kenneth@whitecape.org>
>> Cc: mesa-dev@lists.freedesktop.org
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
>> ---
>>   include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>>   1 file changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 5e678917da70..486b7b96291e 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>>   	/**
>>   	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>>   	 * the user with the GTT offset at which this object will be pinned.
>> +	 *
>>   	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>>   	 * presumed_offset of the object.
>> +	 *
>>   	 * During execbuffer2 the kernel populates it with the value of the
>>   	 * current GTT offset of the object, for future presumed_offset writes.
>> +	 *
>> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
>> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
>> +	 * minimum page sizes, like DG2.
>>   	 */
>>   	__u64 offset;
>>   
>> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>>   	 *
>>   	 * The (page-aligned) allocated size for the object will be returned.
>>   	 *
>> -	 * Note that for some devices we have might have further minimum
>> -	 * page-size restrictions(larger than 4K), like for device local-memory.
>> -	 * However in general the final size here should always reflect any
>> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
>> -	 * extension to place the object in device local-memory.
>> +	 *
>> +	 * **DG2 64K min page size implications:**
> 
> Long term, I'm not sure that the "**" (for emphasis) is needed here or
> below. It's interesting at the moment, but will be just another thing
> baked into the kernel/user code in a month from now. :)

fair point, I'll make it less shouty

> 
>> +	 *
>> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
>> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
>> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
>> +	 * page sizes for such memory. The kernel will already ensure that all
>> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
>> +	 * sizes underneath.
>> +	 *
>> +	 * Note that the returned size here will always reflect any required
>> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
>> +	 * such as DG2.
>> +	 *
>> +	 * **Special DG2 GTT address alignment requirement:**
>> +	 *
>> +	 * The GTT alignment will also need be at least 2M for  such objects.
>> +	 *
>> +	 * Note that due to how the hardware implements 64K GTT page support, we
>> +	 * have some further complications:
>> +	 *
>> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
>> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
>> +	 *   PDE is forbidden by the hardware.
>> +	 *
>> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
>> +	 *   objects.
>> +	 *
>> +	 * To keep things simple for userland, we mandate that any GTT mappings
>> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
>> +	 * address space and avoids userland having to copy any needlessly
>> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
>> +	 * id deemed to be a good compromise.
> 
> typos: GD2, id

thanks

> 
> Isn't much of this more relavent to the vma offset at exec time? Is
> there actually any new restriction on the size field during buffer
> creation?

No new restriction on size, just placement, which mesa is already doing.
The request for ack was just to get an ack from mesa folks that they are 
happy with the mandatory 2MB alignment for DG2 vma.

> 
> I see Matthew references these notes from the offset comments, so if the
> kernel devs prefer it here, then you can add my Acked-by on this patch.

thanks!

> 
> -Jordan
> 
>>   	 */
>>   	__u64 size;
>>   	/**
>> -- 
>> 2.25.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-18 17:50   ` Robert Beckett
  (?)
@ 2022-01-20 11:46     ` Ramalingam C
  -1 siblings, 0 replies; 50+ messages in thread
From: Ramalingam C @ 2022-01-20 11:46 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, Matthew Auld, intel-gfx, dri-devel,
	linux-kernel

On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> For local-memory objects we need to align the GTT addresses
> to 64K, both for the ppgtt and ggtt.
> 
> We need to support vm->min_alignment > 4K, depending
> on the vm itself and the type of object we are inserting.
> With this in mind update the GTT selftests to take this
> into account.
> 
> For DG2 we further align and pad lmem object GTT addresses
> to 2MB to ensure PDEs contain consistent page sizes as
> required by the HW.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>  drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>  drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>  5 files changed, 115 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index c08f766e6e15..7fee95a65414 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -39,6 +39,7 @@ struct tiled_blits {
>  	struct blit_buffer scratch;
>  	struct i915_vma *batch;
>  	u64 hole;
> +	u64 align;
>  	u32 width;
>  	u32 height;
>  };
> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>  		goto err_free;
>  	}
>  
> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
> +
> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>  	hole_size *= 2; /* room to maneuver */
> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> +	hole_size += 2 * t->align; /* padding on either side */
>  
>  	mutex_lock(&t->ce->vm->mutex);
>  	memset(&hole, 0, sizeof(hole));
>  	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
> +					  hole_size, t->align,
> +					  I915_COLOR_UNEVICTABLE,
>  					  0, U64_MAX,
>  					  DRM_MM_INSERT_BEST);
>  	if (!err)
> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>  		goto err_put;
>  	}
>  
> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> +	t->hole = hole.start + t->align;
>  	pr_info("Using hole at %llx\n", t->hole);
>  
>  	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>  static int tiled_blits_prepare(struct tiled_blits *t,
>  			       struct rnd_state *prng)
>  {
> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>  	u32 *map;
>  	int err;
>  	int i;
> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>  
>  static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>  {
> -	u64 offset =
> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>  	int err;
>  
>  	/* We want to check position invariant tiling across GTT eviction */
> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>  
>  	/* Reposition so that we overlap the old addresses, and slightly off */
>  	err = tiled_blit(t,
> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> +			 &t->buffers[2], t->hole + t->align,
>  			 &t->buffers[1], t->hole + 3 * offset / 2);
>  	if (err)
>  		return err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 46be4197b93f..7c92b25c0f26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>  
>  	GEM_BUG_ON(!vm->total);
>  	drm_mm_init(&vm->mm, 0, vm->total);
> +
> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> +		 ARRAY_SIZE(vm->min_alignment));
> +
> +	if (HAS_64K_PAGES(vm->i915)) {
> +		if (IS_DG2(vm->i915)) {
I think we need this 2M alignment for all platform with HAS_64K_PAGES.
Not only for DG2.
> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
> +		} else {
> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +		}
> +	}
> +
>  	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>  
>  	INIT_LIST_HEAD(&vm->bound_list);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8073438b67c8..b8da2514d601 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -29,6 +29,8 @@
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  #include "i915_vma_types.h"
> +#include "i915_params.h"
> +#include "intel_memory_region.h"
>  
>  #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>  
> @@ -223,6 +225,7 @@ struct i915_address_space {
>  	struct device *dma;
>  	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>  	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>  
>  	unsigned int bind_async_flags;
>  
> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>  	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>  }
>  
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	return vm->min_alignment[type];
> +}
> +
>  static inline bool
>  i915_vm_has_cache_coloring(struct i915_address_space *vm)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f15c3298112..9ac92e7a3566 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>  	}
>  
>  	color = 0;
> +
> +	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
> +		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
> +		/*
> +		 * DG2 can not have different sized pages in any given PDE (2MB range).
> +		 * Keeping things simple, we force any lmem object to reserve
> +		 * 2MB chunks, preventing any smaller pages being used alongside
> +		 */
> +		if (IS_DG2(vma->vm->i915)) {
Similarly here we dont need special case for DG2.

Ram
> +			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
> +			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
> +		}
> +	}
> +
>  	if (i915_vm_has_cache_coloring(vma->vm))
>  		color = vma->obj->cache_level;
>  
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 076d860ce01a..2f3f0c01786b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			 u64 hole_start, u64 hole_end,
>  			 unsigned long end_time)
>  {
> +	const unsigned int min_alignment =
> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>  	I915_RND_STATE(seed_prng);
>  	struct i915_vma_resource *mock_vma_res;
>  	unsigned int size;
> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		I915_RND_SUBSTATE(prng, seed_prng);
>  		struct drm_i915_gem_object *obj;
>  		unsigned int *order, count, n;
> -		u64 hole_size;
> +		u64 hole_size, aligned_size;
>  
> -		hole_size = (hole_end - hole_start) >> size;
> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
> +		hole_size = (hole_end - hole_start) >> aligned_size;
>  		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>  			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>  		count = hole_size >> 1;
> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		}
>  		GEM_BUG_ON(!order);
>  
> -		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
> -		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
> +		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
> +		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
>  
>  		/* Ignore allocation failures (i.e. don't report them as
>  		 * a test failure) as we are purposefully allocating very
> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		}
>  
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  			intel_wakeref_t wakeref;
>  
> -			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> +			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>  
>  			if (igt_timeout(end_time,
>  					"%s timed out before %d/%d\n",
> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			}
>  
>  			mock_vma_res->bi.pages = obj->mm.pages;
> -			mock_vma_res->node_size = BIT_ULL(size);
> +			mock_vma_res->node_size = BIT_ULL(aligned_size);
>  			mock_vma_res->start = addr;
>  
>  			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  
>  		i915_random_reorder(order, count, &prng);
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  			intel_wakeref_t wakeref;
>  
>  			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>  {
>  	const u64 hole_size = hole_end - hole_start;
>  	struct drm_i915_gem_object *obj;
> +	const unsigned int min_alignment =
> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>  	const unsigned long max_pages =
> -		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
> +		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
>  	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>  	unsigned long npages, prime, flags;
>  	struct i915_vma *vma;
> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
>  
>  				offset = p->offset;
>  				list_for_each_entry(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					err = i915_vma_pin(vma, 0, 0, offset | flags);
> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					i915_vma_unpin(vma);
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					if (!drm_mm_node_allocated(&vma->node) ||
> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					}
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry_reverse(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					err = i915_vma_pin(vma, 0, 0, offset | flags);
> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					i915_vma_unpin(vma);
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry_reverse(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					if (!drm_mm_node_allocated(&vma->node) ||
> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>  					}
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  			}
> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>  	const u64 hole_size = hole_end - hole_start;
>  	const unsigned long max_pages =
>  		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
> +	unsigned long min_alignment;
>  	unsigned long flags;
>  	u64 size;
>  
> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	for_each_prime_number_from(size, 1, max_pages) {
>  		struct drm_i915_gem_object *obj;
>  		struct i915_vma *vma;
> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>  
>  		for (addr = hole_start;
>  		     addr + obj->base.size < hole_end;
> -		     addr += obj->base.size) {
> +		     addr += round_up(obj->base.size, min_alignment)) {
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
>  				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>  {
>  	struct drm_i915_gem_object *obj;
>  	struct i915_vma *vma;
> +	unsigned int min_alignment;
>  	unsigned long flags;
>  	unsigned int pot;
>  	int err = 0;
> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
>  	if (IS_ERR(obj))
>  		return PTR_ERR(obj);
> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>  
>  	/* Insert a pair of pages across every pot boundary within the hole */
>  	for (pot = fls64(hole_end - 1) - 1;
> -	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
> +	     pot > ilog2(2 * min_alignment);
>  	     pot--) {
>  		u64 step = BIT_ULL(pot);
>  		u64 addr;
>  
> -		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> -		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> +		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
> +		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
>  		     addr += step) {
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>  		      unsigned long end_time)
>  {
>  	I915_RND_STATE(prng);
> +	unsigned int min_alignment;
>  	unsigned int size;
>  	unsigned long flags;
>  
> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	/* Keep creating larger objects until one cannot fit into the hole */
>  	for (size = 12; (hole_end - hole_start) >> size; size++) {
>  		struct drm_i915_gem_object *obj;
>  		unsigned int *order, count, n;
>  		struct i915_vma *vma;
> -		u64 hole_size;
> +		u64 hole_size, aligned_size;
>  		int err = -ENODEV;
>  
> -		hole_size = (hole_end - hole_start) >> size;
> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
> +		hole_size = (hole_end - hole_start) >> aligned_size;
>  		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>  			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>  		count = hole_size >> 1;
> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>  		GEM_BUG_ON(vma->size != BIT_ULL(size));
>  
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
> @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
>  {
>  	struct drm_i915_gem_object *obj;
>  	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
> +	unsigned int min_alignment;
>  	unsigned int order = 12;
>  	LIST_HEAD(objects);
>  	int err = 0;
>  	u64 addr;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	/* Keep creating larger objects until one cannot fit into the hole */
>  	for (addr = hole_start; addr < hole_end; ) {
>  		struct i915_vma *vma;
> @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>  		}
>  
>  		i915_vma_unpin(vma);
> -		addr += size;
> +		addr += round_up(size, min_alignment);
>  
>  		/*
>  		 * Since we are injecting allocation faults at random intervals,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 11:46     ` Ramalingam C
  0 siblings, 0 replies; 50+ messages in thread
From: Ramalingam C @ 2022-01-20 11:46 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Tvrtko Ursulin, dri-devel, David Airlie, intel-gfx, linux-kernel,
	Matthew Auld, Rodrigo Vivi

On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> For local-memory objects we need to align the GTT addresses
> to 64K, both for the ppgtt and ggtt.
> 
> We need to support vm->min_alignment > 4K, depending
> on the vm itself and the type of object we are inserting.
> With this in mind update the GTT selftests to take this
> into account.
> 
> For DG2 we further align and pad lmem object GTT addresses
> to 2MB to ensure PDEs contain consistent page sizes as
> required by the HW.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>  drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>  drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>  5 files changed, 115 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index c08f766e6e15..7fee95a65414 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -39,6 +39,7 @@ struct tiled_blits {
>  	struct blit_buffer scratch;
>  	struct i915_vma *batch;
>  	u64 hole;
> +	u64 align;
>  	u32 width;
>  	u32 height;
>  };
> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>  		goto err_free;
>  	}
>  
> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
> +
> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>  	hole_size *= 2; /* room to maneuver */
> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> +	hole_size += 2 * t->align; /* padding on either side */
>  
>  	mutex_lock(&t->ce->vm->mutex);
>  	memset(&hole, 0, sizeof(hole));
>  	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
> +					  hole_size, t->align,
> +					  I915_COLOR_UNEVICTABLE,
>  					  0, U64_MAX,
>  					  DRM_MM_INSERT_BEST);
>  	if (!err)
> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>  		goto err_put;
>  	}
>  
> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> +	t->hole = hole.start + t->align;
>  	pr_info("Using hole at %llx\n", t->hole);
>  
>  	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>  static int tiled_blits_prepare(struct tiled_blits *t,
>  			       struct rnd_state *prng)
>  {
> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>  	u32 *map;
>  	int err;
>  	int i;
> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>  
>  static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>  {
> -	u64 offset =
> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>  	int err;
>  
>  	/* We want to check position invariant tiling across GTT eviction */
> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>  
>  	/* Reposition so that we overlap the old addresses, and slightly off */
>  	err = tiled_blit(t,
> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> +			 &t->buffers[2], t->hole + t->align,
>  			 &t->buffers[1], t->hole + 3 * offset / 2);
>  	if (err)
>  		return err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 46be4197b93f..7c92b25c0f26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>  
>  	GEM_BUG_ON(!vm->total);
>  	drm_mm_init(&vm->mm, 0, vm->total);
> +
> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> +		 ARRAY_SIZE(vm->min_alignment));
> +
> +	if (HAS_64K_PAGES(vm->i915)) {
> +		if (IS_DG2(vm->i915)) {
I think we need this 2M alignment for all platform with HAS_64K_PAGES.
Not only for DG2.
> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
> +		} else {
> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +		}
> +	}
> +
>  	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>  
>  	INIT_LIST_HEAD(&vm->bound_list);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8073438b67c8..b8da2514d601 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -29,6 +29,8 @@
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  #include "i915_vma_types.h"
> +#include "i915_params.h"
> +#include "intel_memory_region.h"
>  
>  #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>  
> @@ -223,6 +225,7 @@ struct i915_address_space {
>  	struct device *dma;
>  	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>  	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>  
>  	unsigned int bind_async_flags;
>  
> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>  	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>  }
>  
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	return vm->min_alignment[type];
> +}
> +
>  static inline bool
>  i915_vm_has_cache_coloring(struct i915_address_space *vm)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f15c3298112..9ac92e7a3566 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>  	}
>  
>  	color = 0;
> +
> +	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
> +		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
> +		/*
> +		 * DG2 can not have different sized pages in any given PDE (2MB range).
> +		 * Keeping things simple, we force any lmem object to reserve
> +		 * 2MB chunks, preventing any smaller pages being used alongside
> +		 */
> +		if (IS_DG2(vma->vm->i915)) {
Similarly here we dont need special case for DG2.

Ram
> +			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
> +			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
> +		}
> +	}
> +
>  	if (i915_vm_has_cache_coloring(vma->vm))
>  		color = vma->obj->cache_level;
>  
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 076d860ce01a..2f3f0c01786b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			 u64 hole_start, u64 hole_end,
>  			 unsigned long end_time)
>  {
> +	const unsigned int min_alignment =
> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>  	I915_RND_STATE(seed_prng);
>  	struct i915_vma_resource *mock_vma_res;
>  	unsigned int size;
> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		I915_RND_SUBSTATE(prng, seed_prng);
>  		struct drm_i915_gem_object *obj;
>  		unsigned int *order, count, n;
> -		u64 hole_size;
> +		u64 hole_size, aligned_size;
>  
> -		hole_size = (hole_end - hole_start) >> size;
> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
> +		hole_size = (hole_end - hole_start) >> aligned_size;
>  		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>  			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>  		count = hole_size >> 1;
> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		}
>  		GEM_BUG_ON(!order);
>  
> -		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
> -		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
> +		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
> +		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
>  
>  		/* Ignore allocation failures (i.e. don't report them as
>  		 * a test failure) as we are purposefully allocating very
> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		}
>  
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  			intel_wakeref_t wakeref;
>  
> -			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> +			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>  
>  			if (igt_timeout(end_time,
>  					"%s timed out before %d/%d\n",
> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			}
>  
>  			mock_vma_res->bi.pages = obj->mm.pages;
> -			mock_vma_res->node_size = BIT_ULL(size);
> +			mock_vma_res->node_size = BIT_ULL(aligned_size);
>  			mock_vma_res->start = addr;
>  
>  			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  
>  		i915_random_reorder(order, count, &prng);
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  			intel_wakeref_t wakeref;
>  
>  			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>  {
>  	const u64 hole_size = hole_end - hole_start;
>  	struct drm_i915_gem_object *obj;
> +	const unsigned int min_alignment =
> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>  	const unsigned long max_pages =
> -		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
> +		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
>  	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>  	unsigned long npages, prime, flags;
>  	struct i915_vma *vma;
> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
>  
>  				offset = p->offset;
>  				list_for_each_entry(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					err = i915_vma_pin(vma, 0, 0, offset | flags);
> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					i915_vma_unpin(vma);
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					if (!drm_mm_node_allocated(&vma->node) ||
> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					}
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry_reverse(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					err = i915_vma_pin(vma, 0, 0, offset | flags);
> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					i915_vma_unpin(vma);
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry_reverse(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					if (!drm_mm_node_allocated(&vma->node) ||
> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>  					}
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  			}
> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>  	const u64 hole_size = hole_end - hole_start;
>  	const unsigned long max_pages =
>  		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
> +	unsigned long min_alignment;
>  	unsigned long flags;
>  	u64 size;
>  
> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	for_each_prime_number_from(size, 1, max_pages) {
>  		struct drm_i915_gem_object *obj;
>  		struct i915_vma *vma;
> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>  
>  		for (addr = hole_start;
>  		     addr + obj->base.size < hole_end;
> -		     addr += obj->base.size) {
> +		     addr += round_up(obj->base.size, min_alignment)) {
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
>  				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>  {
>  	struct drm_i915_gem_object *obj;
>  	struct i915_vma *vma;
> +	unsigned int min_alignment;
>  	unsigned long flags;
>  	unsigned int pot;
>  	int err = 0;
> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
>  	if (IS_ERR(obj))
>  		return PTR_ERR(obj);
> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>  
>  	/* Insert a pair of pages across every pot boundary within the hole */
>  	for (pot = fls64(hole_end - 1) - 1;
> -	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
> +	     pot > ilog2(2 * min_alignment);
>  	     pot--) {
>  		u64 step = BIT_ULL(pot);
>  		u64 addr;
>  
> -		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> -		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> +		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
> +		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
>  		     addr += step) {
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>  		      unsigned long end_time)
>  {
>  	I915_RND_STATE(prng);
> +	unsigned int min_alignment;
>  	unsigned int size;
>  	unsigned long flags;
>  
> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	/* Keep creating larger objects until one cannot fit into the hole */
>  	for (size = 12; (hole_end - hole_start) >> size; size++) {
>  		struct drm_i915_gem_object *obj;
>  		unsigned int *order, count, n;
>  		struct i915_vma *vma;
> -		u64 hole_size;
> +		u64 hole_size, aligned_size;
>  		int err = -ENODEV;
>  
> -		hole_size = (hole_end - hole_start) >> size;
> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
> +		hole_size = (hole_end - hole_start) >> aligned_size;
>  		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>  			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>  		count = hole_size >> 1;
> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>  		GEM_BUG_ON(vma->size != BIT_ULL(size));
>  
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
> @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
>  {
>  	struct drm_i915_gem_object *obj;
>  	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
> +	unsigned int min_alignment;
>  	unsigned int order = 12;
>  	LIST_HEAD(objects);
>  	int err = 0;
>  	u64 addr;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	/* Keep creating larger objects until one cannot fit into the hole */
>  	for (addr = hole_start; addr < hole_end; ) {
>  		struct i915_vma *vma;
> @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>  		}
>  
>  		i915_vma_unpin(vma);
> -		addr += size;
> +		addr += round_up(size, min_alignment);
>  
>  		/*
>  		 * Since we are injecting allocation faults at random intervals,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 11:46     ` Ramalingam C
  0 siblings, 0 replies; 50+ messages in thread
From: Ramalingam C @ 2022-01-20 11:46 UTC (permalink / raw)
  To: Robert Beckett
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Matthew Auld

On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> For local-memory objects we need to align the GTT addresses
> to 64K, both for the ppgtt and ggtt.
> 
> We need to support vm->min_alignment > 4K, depending
> on the vm itself and the type of object we are inserting.
> With this in mind update the GTT selftests to take this
> into account.
> 
> For DG2 we further align and pad lmem object GTT addresses
> to 2MB to ensure PDEs contain consistent page sizes as
> required by the HW.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>  drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>  drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>  5 files changed, 115 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index c08f766e6e15..7fee95a65414 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -39,6 +39,7 @@ struct tiled_blits {
>  	struct blit_buffer scratch;
>  	struct i915_vma *batch;
>  	u64 hole;
> +	u64 align;
>  	u32 width;
>  	u32 height;
>  };
> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>  		goto err_free;
>  	}
>  
> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
> +
> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>  	hole_size *= 2; /* room to maneuver */
> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> +	hole_size += 2 * t->align; /* padding on either side */
>  
>  	mutex_lock(&t->ce->vm->mutex);
>  	memset(&hole, 0, sizeof(hole));
>  	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
> +					  hole_size, t->align,
> +					  I915_COLOR_UNEVICTABLE,
>  					  0, U64_MAX,
>  					  DRM_MM_INSERT_BEST);
>  	if (!err)
> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>  		goto err_put;
>  	}
>  
> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> +	t->hole = hole.start + t->align;
>  	pr_info("Using hole at %llx\n", t->hole);
>  
>  	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>  static int tiled_blits_prepare(struct tiled_blits *t,
>  			       struct rnd_state *prng)
>  {
> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>  	u32 *map;
>  	int err;
>  	int i;
> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>  
>  static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>  {
> -	u64 offset =
> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>  	int err;
>  
>  	/* We want to check position invariant tiling across GTT eviction */
> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>  
>  	/* Reposition so that we overlap the old addresses, and slightly off */
>  	err = tiled_blit(t,
> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> +			 &t->buffers[2], t->hole + t->align,
>  			 &t->buffers[1], t->hole + 3 * offset / 2);
>  	if (err)
>  		return err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 46be4197b93f..7c92b25c0f26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>  
>  	GEM_BUG_ON(!vm->total);
>  	drm_mm_init(&vm->mm, 0, vm->total);
> +
> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> +		 ARRAY_SIZE(vm->min_alignment));
> +
> +	if (HAS_64K_PAGES(vm->i915)) {
> +		if (IS_DG2(vm->i915)) {
I think we need this 2M alignment for all platform with HAS_64K_PAGES.
Not only for DG2.
> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
> +		} else {
> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +		}
> +	}
> +
>  	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>  
>  	INIT_LIST_HEAD(&vm->bound_list);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8073438b67c8..b8da2514d601 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -29,6 +29,8 @@
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  #include "i915_vma_types.h"
> +#include "i915_params.h"
> +#include "intel_memory_region.h"
>  
>  #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>  
> @@ -223,6 +225,7 @@ struct i915_address_space {
>  	struct device *dma;
>  	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>  	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>  
>  	unsigned int bind_async_flags;
>  
> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>  	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>  }
>  
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	return vm->min_alignment[type];
> +}
> +
>  static inline bool
>  i915_vm_has_cache_coloring(struct i915_address_space *vm)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 1f15c3298112..9ac92e7a3566 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>  	}
>  
>  	color = 0;
> +
> +	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
> +		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
> +		/*
> +		 * DG2 can not have different sized pages in any given PDE (2MB range).
> +		 * Keeping things simple, we force any lmem object to reserve
> +		 * 2MB chunks, preventing any smaller pages being used alongside
> +		 */
> +		if (IS_DG2(vma->vm->i915)) {
Similarly here we dont need special case for DG2.

Ram
> +			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
> +			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
> +		}
> +	}
> +
>  	if (i915_vm_has_cache_coloring(vma->vm))
>  		color = vma->obj->cache_level;
>  
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 076d860ce01a..2f3f0c01786b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			 u64 hole_start, u64 hole_end,
>  			 unsigned long end_time)
>  {
> +	const unsigned int min_alignment =
> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>  	I915_RND_STATE(seed_prng);
>  	struct i915_vma_resource *mock_vma_res;
>  	unsigned int size;
> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		I915_RND_SUBSTATE(prng, seed_prng);
>  		struct drm_i915_gem_object *obj;
>  		unsigned int *order, count, n;
> -		u64 hole_size;
> +		u64 hole_size, aligned_size;
>  
> -		hole_size = (hole_end - hole_start) >> size;
> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
> +		hole_size = (hole_end - hole_start) >> aligned_size;
>  		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>  			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>  		count = hole_size >> 1;
> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		}
>  		GEM_BUG_ON(!order);
>  
> -		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
> -		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
> +		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
> +		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
>  
>  		/* Ignore allocation failures (i.e. don't report them as
>  		 * a test failure) as we are purposefully allocating very
> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  		}
>  
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  			intel_wakeref_t wakeref;
>  
> -			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> +			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>  
>  			if (igt_timeout(end_time,
>  					"%s timed out before %d/%d\n",
> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  			}
>  
>  			mock_vma_res->bi.pages = obj->mm.pages;
> -			mock_vma_res->node_size = BIT_ULL(size);
> +			mock_vma_res->node_size = BIT_ULL(aligned_size);
>  			mock_vma_res->start = addr;
>  
>  			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>  
>  		i915_random_reorder(order, count, &prng);
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  			intel_wakeref_t wakeref;
>  
>  			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>  {
>  	const u64 hole_size = hole_end - hole_start;
>  	struct drm_i915_gem_object *obj;
> +	const unsigned int min_alignment =
> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>  	const unsigned long max_pages =
> -		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
> +		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
>  	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>  	unsigned long npages, prime, flags;
>  	struct i915_vma *vma;
> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
>  
>  				offset = p->offset;
>  				list_for_each_entry(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					err = i915_vma_pin(vma, 0, 0, offset | flags);
> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					i915_vma_unpin(vma);
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					if (!drm_mm_node_allocated(&vma->node) ||
> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					}
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry_reverse(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					err = i915_vma_pin(vma, 0, 0, offset | flags);
> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
>  					i915_vma_unpin(vma);
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  
>  				offset = p->offset;
>  				list_for_each_entry_reverse(obj, &objects, st_link) {
> +					u64 aligned_size = round_up(obj->base.size,
> +								    min_alignment);
> +
>  					vma = i915_vma_instance(obj, vm, NULL);
>  					if (IS_ERR(vma))
>  						continue;
>  
>  					if (p->step < 0) {
> -						if (offset < hole_start + obj->base.size)
> +						if (offset < hole_start + aligned_size)
>  							break;
> -						offset -= obj->base.size;
> +						offset -= aligned_size;
>  					}
>  
>  					if (!drm_mm_node_allocated(&vma->node) ||
> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>  					}
>  
>  					if (p->step > 0) {
> -						if (offset + obj->base.size > hole_end)
> +						if (offset + aligned_size > hole_end)
>  							break;
> -						offset += obj->base.size;
> +						offset += aligned_size;
>  					}
>  				}
>  			}
> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>  	const u64 hole_size = hole_end - hole_start;
>  	const unsigned long max_pages =
>  		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
> +	unsigned long min_alignment;
>  	unsigned long flags;
>  	u64 size;
>  
> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	for_each_prime_number_from(size, 1, max_pages) {
>  		struct drm_i915_gem_object *obj;
>  		struct i915_vma *vma;
> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>  
>  		for (addr = hole_start;
>  		     addr + obj->base.size < hole_end;
> -		     addr += obj->base.size) {
> +		     addr += round_up(obj->base.size, min_alignment)) {
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
>  				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>  {
>  	struct drm_i915_gem_object *obj;
>  	struct i915_vma *vma;
> +	unsigned int min_alignment;
>  	unsigned long flags;
>  	unsigned int pot;
>  	int err = 0;
> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
>  	if (IS_ERR(obj))
>  		return PTR_ERR(obj);
> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>  
>  	/* Insert a pair of pages across every pot boundary within the hole */
>  	for (pot = fls64(hole_end - 1) - 1;
> -	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
> +	     pot > ilog2(2 * min_alignment);
>  	     pot--) {
>  		u64 step = BIT_ULL(pot);
>  		u64 addr;
>  
> -		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> -		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> +		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
> +		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
>  		     addr += step) {
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>  		      unsigned long end_time)
>  {
>  	I915_RND_STATE(prng);
> +	unsigned int min_alignment;
>  	unsigned int size;
>  	unsigned long flags;
>  
> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
>  	if (i915_is_ggtt(vm))
>  		flags |= PIN_GLOBAL;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	/* Keep creating larger objects until one cannot fit into the hole */
>  	for (size = 12; (hole_end - hole_start) >> size; size++) {
>  		struct drm_i915_gem_object *obj;
>  		unsigned int *order, count, n;
>  		struct i915_vma *vma;
> -		u64 hole_size;
> +		u64 hole_size, aligned_size;
>  		int err = -ENODEV;
>  
> -		hole_size = (hole_end - hole_start) >> size;
> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
> +		hole_size = (hole_end - hole_start) >> aligned_size;
>  		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>  			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>  		count = hole_size >> 1;
> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>  		GEM_BUG_ON(vma->size != BIT_ULL(size));
>  
>  		for (n = 0; n < count; n++) {
> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>  
>  			err = i915_vma_pin(vma, 0, 0, addr | flags);
>  			if (err) {
> @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
>  {
>  	struct drm_i915_gem_object *obj;
>  	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
> +	unsigned int min_alignment;
>  	unsigned int order = 12;
>  	LIST_HEAD(objects);
>  	int err = 0;
>  	u64 addr;
>  
> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> +
>  	/* Keep creating larger objects until one cannot fit into the hole */
>  	for (addr = hole_start; addr < hole_end; ) {
>  		struct i915_vma *vma;
> @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>  		}
>  
>  		i915_vma_unpin(vma);
> -		addr += size;
> +		addr += round_up(size, min_alignment);
>  
>  		/*
>  		 * Since we are injecting allocation faults at random intervals,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
  2022-01-18 17:50   ` Robert Beckett
  (?)
@ 2022-01-20 11:53     ` Ramalingam C
  -1 siblings, 0 replies; 50+ messages in thread
From: Ramalingam C @ 2022-01-20 11:53 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, Matthew Auld, Simon Ser,
	Pekka Paalanen, Jordan Justen, Kenneth Graunke, mesa-dev,
	Tony Ye, Slawomir Milczarek, intel-gfx, dri-devel, linux-kernel

On 2022-01-18 at 17:50:37 +0000, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
> 
> v2: Fixed suggestions on formatting [Daniel]
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---
>  include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5e678917da70..486b7b96291e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>  	/**
>  	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>  	 * the user with the GTT offset at which this object will be pinned.
> +	 *
>  	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>  	 * presumed_offset of the object.
> +	 *
>  	 * During execbuffer2 the kernel populates it with the value of the
>  	 * current GTT offset of the object, for future presumed_offset writes.
> +	 *
> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
> +	 * minimum page sizes, like DG2.
>  	 */
>  	__u64 offset;
>  
> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>  	 *
>  	 * The (page-aligned) allocated size for the object will be returned.
>  	 *
> -	 * Note that for some devices we have might have further minimum
> -	 * page-size restrictions(larger than 4K), like for device local-memory.
> -	 * However in general the final size here should always reflect any
> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> -	 * extension to place the object in device local-memory.
> +	 *
> +	 * **DG2 64K min page size implications:**
> +	 *
> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
> +	 * page sizes for such memory. The kernel will already ensure that all
> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> +	 * sizes underneath.
> +	 *
> +	 * Note that the returned size here will always reflect any required
> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
> +	 * such as DG2.
> +	 *
> +	 * **Special DG2 GTT address alignment requirement:**
> +	 *
> +	 * The GTT alignment will also need be at least 2M for  such objects.
> +	 *
> +	 * Note that due to how the hardware implements 64K GTT page support, we
> +	 * have some further complications:
> +	 *
> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> +	 *   PDE is forbidden by the hardware.
> +	 *
> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> +	 *   objects.
> +	 *
> +	 * To keep things simple for userland, we mandate that any GTT mappings
> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
> +	 * address space and avoids userland having to copy any needlessly
> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
> +	 * id deemed to be a good compromise.
"only affects DG2, this is" 

Except these typos, patch looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
>  	 */
>  	__u64 size;
>  	/**
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-20 11:53     ` Ramalingam C
  0 siblings, 0 replies; 50+ messages in thread
From: Ramalingam C @ 2022-01-20 11:53 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Tony Ye, Tvrtko Ursulin, dri-devel, Jordan Justen, David Airlie,
	intel-gfx, Kenneth Graunke, Slawomir Milczarek, Matthew Auld,
	Rodrigo Vivi, mesa-dev, linux-kernel

On 2022-01-18 at 17:50:37 +0000, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
> 
> v2: Fixed suggestions on formatting [Daniel]
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---
>  include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5e678917da70..486b7b96291e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>  	/**
>  	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>  	 * the user with the GTT offset at which this object will be pinned.
> +	 *
>  	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>  	 * presumed_offset of the object.
> +	 *
>  	 * During execbuffer2 the kernel populates it with the value of the
>  	 * current GTT offset of the object, for future presumed_offset writes.
> +	 *
> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
> +	 * minimum page sizes, like DG2.
>  	 */
>  	__u64 offset;
>  
> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>  	 *
>  	 * The (page-aligned) allocated size for the object will be returned.
>  	 *
> -	 * Note that for some devices we have might have further minimum
> -	 * page-size restrictions(larger than 4K), like for device local-memory.
> -	 * However in general the final size here should always reflect any
> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> -	 * extension to place the object in device local-memory.
> +	 *
> +	 * **DG2 64K min page size implications:**
> +	 *
> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
> +	 * page sizes for such memory. The kernel will already ensure that all
> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> +	 * sizes underneath.
> +	 *
> +	 * Note that the returned size here will always reflect any required
> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
> +	 * such as DG2.
> +	 *
> +	 * **Special DG2 GTT address alignment requirement:**
> +	 *
> +	 * The GTT alignment will also need be at least 2M for  such objects.
> +	 *
> +	 * Note that due to how the hardware implements 64K GTT page support, we
> +	 * have some further complications:
> +	 *
> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> +	 *   PDE is forbidden by the hardware.
> +	 *
> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> +	 *   objects.
> +	 *
> +	 * To keep things simple for userland, we mandate that any GTT mappings
> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
> +	 * address space and avoids userland having to copy any needlessly
> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
> +	 * id deemed to be a good compromise.
"only affects DG2, this is" 

Except these typos, patch looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
>  	 */
>  	__u64 size;
>  	/**
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-01-20 11:53     ` Ramalingam C
  0 siblings, 0 replies; 50+ messages in thread
From: Ramalingam C @ 2022-01-20 11:53 UTC (permalink / raw)
  To: Robert Beckett
  Cc: dri-devel, David Airlie, Simon Ser, intel-gfx, Kenneth Graunke,
	Slawomir Milczarek, Pekka Paalanen, Matthew Auld, mesa-dev,
	linux-kernel

On 2022-01-18 at 17:50:37 +0000, Robert Beckett wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> On discrete platforms like DG2, we need to support a minimum page size
> of 64K when dealing with device local-memory. This is quite tricky for
> various reasons, so try to document the new implicit uapi for this.
> 
> v2: Fixed suggestions on formatting [Daniel]
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> cc: Simon Ser <contact@emersion.fr>
> cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> ---
>  include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5e678917da70..486b7b96291e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
>  	/**
>  	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
>  	 * the user with the GTT offset at which this object will be pinned.
> +	 *
>  	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
>  	 * presumed_offset of the object.
> +	 *
>  	 * During execbuffer2 the kernel populates it with the value of the
>  	 * current GTT offset of the object, for future presumed_offset writes.
> +	 *
> +	 * See struct drm_i915_gem_create_ext for the rules when dealing with
> +	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
> +	 * minimum page sizes, like DG2.
>  	 */
>  	__u64 offset;
>  
> @@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
>  	 *
>  	 * The (page-aligned) allocated size for the object will be returned.
>  	 *
> -	 * Note that for some devices we have might have further minimum
> -	 * page-size restrictions(larger than 4K), like for device local-memory.
> -	 * However in general the final size here should always reflect any
> -	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
> -	 * extension to place the object in device local-memory.
> +	 *
> +	 * **DG2 64K min page size implications:**
> +	 *
> +	 * On discrete platforms, starting from DG2, we have to contend with GTT
> +	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> +	 * objects.  Specifically the hardware only supports 64K or larger GTT
> +	 * page sizes for such memory. The kernel will already ensure that all
> +	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> +	 * sizes underneath.
> +	 *
> +	 * Note that the returned size here will always reflect any required
> +	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
> +	 * such as DG2.
> +	 *
> +	 * **Special DG2 GTT address alignment requirement:**
> +	 *
> +	 * The GTT alignment will also need be at least 2M for  such objects.
> +	 *
> +	 * Note that due to how the hardware implements 64K GTT page support, we
> +	 * have some further complications:
> +	 *
> +	 *   1) The entire PDE(which covers a 2MB virtual address range), must
> +	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> +	 *   PDE is forbidden by the hardware.
> +	 *
> +	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> +	 *   objects.
> +	 *
> +	 * To keep things simple for userland, we mandate that any GTT mappings
> +	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
> +	 * address space and avoids userland having to copy any needlessly
> +	 * complicated PDE sharing scheme (coloring) and only affects GD2, this
> +	 * id deemed to be a good compromise.
"only affects DG2, this is" 

Except these typos, patch looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
>  	 */
>  	__u64 size;
>  	/**
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 11:46     ` Ramalingam C
  (?)
@ 2022-01-20 13:15       ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 13:15 UTC (permalink / raw)
  To: Ramalingam C
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, Matthew Auld, intel-gfx, dri-devel,
	linux-kernel



On 20/01/2022 11:46, Ramalingam C wrote:
> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> For local-memory objects we need to align the GTT addresses
>> to 64K, both for the ppgtt and ggtt.
>>
>> We need to support vm->min_alignment > 4K, depending
>> on the vm itself and the type of object we are inserting.
>> With this in mind update the GTT selftests to take this
>> into account.
>>
>> For DG2 we further align and pad lmem object GTT addresses
>> to 2MB to ensure PDEs contain consistent page sizes as
>> required by the HW.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> index c08f766e6e15..7fee95a65414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>   	struct blit_buffer scratch;
>>   	struct i915_vma *batch;
>>   	u64 hole;
>> +	u64 align;
>>   	u32 width;
>>   	u32 height;
>>   };
>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>>   		goto err_free;
>>   	}
>>   
>> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
>> +	t->align = max(t->align,
>> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>> +	t->align = max(t->align,
>> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>> +
>> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>   	hole_size *= 2; /* room to maneuver */
>> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>> +	hole_size += 2 * t->align; /* padding on either side */
>>   
>>   	mutex_lock(&t->ce->vm->mutex);
>>   	memset(&hole, 0, sizeof(hole));
>>   	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
>> +					  hole_size, t->align,
>> +					  I915_COLOR_UNEVICTABLE,
>>   					  0, U64_MAX,
>>   					  DRM_MM_INSERT_BEST);
>>   	if (!err)
>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>>   		goto err_put;
>>   	}
>>   
>> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>> +	t->hole = hole.start + t->align;
>>   	pr_info("Using hole at %llx\n", t->hole);
>>   
>>   	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>   			       struct rnd_state *prng)
>>   {
>> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>>   	u32 *map;
>>   	int err;
>>   	int i;
>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>>   
>>   static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>>   {
>> -	u64 offset =
>> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
>> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>   	int err;
>>   
>>   	/* We want to check position invariant tiling across GTT eviction */
>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>>   
>>   	/* Reposition so that we overlap the old addresses, and slightly off */
>>   	err = tiled_blit(t,
>> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>> +			 &t->buffers[2], t->hole + t->align,
>>   			 &t->buffers[1], t->hole + 3 * offset / 2);
>>   	if (err)
>>   		return err;
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 46be4197b93f..7c92b25c0f26 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>   
>>   	GEM_BUG_ON(!vm->total);
>>   	drm_mm_init(&vm->mm, 0, vm->total);
>> +
>> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>> +		 ARRAY_SIZE(vm->min_alignment));
>> +
>> +	if (HAS_64K_PAGES(vm->i915)) {
>> +		if (IS_DG2(vm->i915)) {
> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
> Not only for DG2.

really? can we get confirmation of this?
this contradicts the documentation in patch 4, which you reviewed, so I 
am confused now

>> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
>> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
>> +		} else {
>> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
>> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
>> +		}
>> +	}
>> +
>>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>   
>>   	INIT_LIST_HEAD(&vm->bound_list);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 8073438b67c8..b8da2514d601 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -29,6 +29,8 @@
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   #include "i915_vma_types.h"
>> +#include "i915_params.h"
>> +#include "intel_memory_region.h"
>>   
>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>>   
>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>   	struct device *dma;
>>   	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>>   	u64 reserved;		/* size addr space reserved */
>> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>   
>>   	unsigned int bind_async_flags;
>>   
>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>>   	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>   }
>>   
>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>> +					enum intel_memory_type type)
>> +{
>> +	return vm->min_alignment[type];
>> +}
>> +
>>   static inline bool
>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 1f15c3298112..9ac92e7a3566 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>>   	}
>>   
>>   	color = 0;
>> +
>> +	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
>> +		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>> +		/*
>> +		 * DG2 can not have different sized pages in any given PDE (2MB range).
>> +		 * Keeping things simple, we force any lmem object to reserve
>> +		 * 2MB chunks, preventing any smaller pages being used alongside
>> +		 */
>> +		if (IS_DG2(vma->vm->i915)) {
> Similarly here we dont need special case for DG2.
> 
> Ram
>> +			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>> +			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>> +		}
>> +	}
>> +
>>   	if (i915_vm_has_cache_coloring(vma->vm))
>>   		color = vma->obj->cache_level;
>>   
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index 076d860ce01a..2f3f0c01786b 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			 u64 hole_start, u64 hole_end,
>>   			 unsigned long end_time)
>>   {
>> +	const unsigned int min_alignment =
>> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>   	I915_RND_STATE(seed_prng);
>>   	struct i915_vma_resource *mock_vma_res;
>>   	unsigned int size;
>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		I915_RND_SUBSTATE(prng, seed_prng);
>>   		struct drm_i915_gem_object *obj;
>>   		unsigned int *order, count, n;
>> -		u64 hole_size;
>> +		u64 hole_size, aligned_size;
>>   
>> -		hole_size = (hole_end - hole_start) >> size;
>> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
>> +		hole_size = (hole_end - hole_start) >> aligned_size;
>>   		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>   			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>   		count = hole_size >> 1;
>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		}
>>   		GEM_BUG_ON(!order);
>>   
>> -		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>> -		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>> +		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>> +		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
>>   
>>   		/* Ignore allocation failures (i.e. don't report them as
>>   		 * a test failure) as we are purposefully allocating very
>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		}
>>   
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   			intel_wakeref_t wakeref;
>>   
>> -			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>> +			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>   
>>   			if (igt_timeout(end_time,
>>   					"%s timed out before %d/%d\n",
>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			}
>>   
>>   			mock_vma_res->bi.pages = obj->mm.pages;
>> -			mock_vma_res->node_size = BIT_ULL(size);
>> +			mock_vma_res->node_size = BIT_ULL(aligned_size);
>>   			mock_vma_res->start = addr;
>>   
>>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   
>>   		i915_random_reorder(order, count, &prng);
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   			intel_wakeref_t wakeref;
>>   
>>   			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>>   {
>>   	const u64 hole_size = hole_end - hole_start;
>>   	struct drm_i915_gem_object *obj;
>> +	const unsigned int min_alignment =
>> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>   	const unsigned long max_pages =
>> -		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>> +		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
>>   	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>   	unsigned long npages, prime, flags;
>>   	struct i915_vma *vma;
>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					err = i915_vma_pin(vma, 0, 0, offset | flags);
>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					i915_vma_unpin(vma);
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					if (!drm_mm_node_allocated(&vma->node) ||
>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					}
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry_reverse(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					err = i915_vma_pin(vma, 0, 0, offset | flags);
>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					i915_vma_unpin(vma);
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry_reverse(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					if (!drm_mm_node_allocated(&vma->node) ||
>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>   					}
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   			}
>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>   	const u64 hole_size = hole_end - hole_start;
>>   	const unsigned long max_pages =
>>   		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>> +	unsigned long min_alignment;
>>   	unsigned long flags;
>>   	u64 size;
>>   
>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	for_each_prime_number_from(size, 1, max_pages) {
>>   		struct drm_i915_gem_object *obj;
>>   		struct i915_vma *vma;
>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>   
>>   		for (addr = hole_start;
>>   		     addr + obj->base.size < hole_end;
>> -		     addr += obj->base.size) {
>> +		     addr += round_up(obj->base.size, min_alignment)) {
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>>   				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   	struct i915_vma *vma;
>> +	unsigned int min_alignment;
>>   	unsigned long flags;
>>   	unsigned int pot;
>>   	int err = 0;
>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
>>   	if (IS_ERR(obj))
>>   		return PTR_ERR(obj);
>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>>   
>>   	/* Insert a pair of pages across every pot boundary within the hole */
>>   	for (pot = fls64(hole_end - 1) - 1;
>> -	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>> +	     pot > ilog2(2 * min_alignment);
>>   	     pot--) {
>>   		u64 step = BIT_ULL(pot);
>>   		u64 addr;
>>   
>> -		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
>> -		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
>> +		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
>> +		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
>>   		     addr += step) {
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>   		      unsigned long end_time)
>>   {
>>   	I915_RND_STATE(prng);
>> +	unsigned int min_alignment;
>>   	unsigned int size;
>>   	unsigned long flags;
>>   
>> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	/* Keep creating larger objects until one cannot fit into the hole */
>>   	for (size = 12; (hole_end - hole_start) >> size; size++) {
>>   		struct drm_i915_gem_object *obj;
>>   		unsigned int *order, count, n;
>>   		struct i915_vma *vma;
>> -		u64 hole_size;
>> +		u64 hole_size, aligned_size;
>>   		int err = -ENODEV;
>>   
>> -		hole_size = (hole_end - hole_start) >> size;
>> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
>> +		hole_size = (hole_end - hole_start) >> aligned_size;
>>   		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>   			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>   		count = hole_size >> 1;
>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>   		GEM_BUG_ON(vma->size != BIT_ULL(size));
>>   
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>> +	unsigned int min_alignment;
>>   	unsigned int order = 12;
>>   	LIST_HEAD(objects);
>>   	int err = 0;
>>   	u64 addr;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	/* Keep creating larger objects until one cannot fit into the hole */
>>   	for (addr = hole_start; addr < hole_end; ) {
>>   		struct i915_vma *vma;
>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>>   		}
>>   
>>   		i915_vma_unpin(vma);
>> -		addr += size;
>> +		addr += round_up(size, min_alignment);
>>   
>>   		/*
>>   		 * Since we are injecting allocation faults at random intervals,
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 13:15       ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 13:15 UTC (permalink / raw)
  To: Ramalingam C
  Cc: Tvrtko Ursulin, dri-devel, David Airlie, intel-gfx, linux-kernel,
	Matthew Auld, Rodrigo Vivi



On 20/01/2022 11:46, Ramalingam C wrote:
> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> For local-memory objects we need to align the GTT addresses
>> to 64K, both for the ppgtt and ggtt.
>>
>> We need to support vm->min_alignment > 4K, depending
>> on the vm itself and the type of object we are inserting.
>> With this in mind update the GTT selftests to take this
>> into account.
>>
>> For DG2 we further align and pad lmem object GTT addresses
>> to 2MB to ensure PDEs contain consistent page sizes as
>> required by the HW.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> index c08f766e6e15..7fee95a65414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>   	struct blit_buffer scratch;
>>   	struct i915_vma *batch;
>>   	u64 hole;
>> +	u64 align;
>>   	u32 width;
>>   	u32 height;
>>   };
>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>>   		goto err_free;
>>   	}
>>   
>> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
>> +	t->align = max(t->align,
>> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>> +	t->align = max(t->align,
>> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>> +
>> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>   	hole_size *= 2; /* room to maneuver */
>> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>> +	hole_size += 2 * t->align; /* padding on either side */
>>   
>>   	mutex_lock(&t->ce->vm->mutex);
>>   	memset(&hole, 0, sizeof(hole));
>>   	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
>> +					  hole_size, t->align,
>> +					  I915_COLOR_UNEVICTABLE,
>>   					  0, U64_MAX,
>>   					  DRM_MM_INSERT_BEST);
>>   	if (!err)
>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>>   		goto err_put;
>>   	}
>>   
>> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>> +	t->hole = hole.start + t->align;
>>   	pr_info("Using hole at %llx\n", t->hole);
>>   
>>   	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>   			       struct rnd_state *prng)
>>   {
>> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>>   	u32 *map;
>>   	int err;
>>   	int i;
>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>>   
>>   static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>>   {
>> -	u64 offset =
>> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
>> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>   	int err;
>>   
>>   	/* We want to check position invariant tiling across GTT eviction */
>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>>   
>>   	/* Reposition so that we overlap the old addresses, and slightly off */
>>   	err = tiled_blit(t,
>> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>> +			 &t->buffers[2], t->hole + t->align,
>>   			 &t->buffers[1], t->hole + 3 * offset / 2);
>>   	if (err)
>>   		return err;
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 46be4197b93f..7c92b25c0f26 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>   
>>   	GEM_BUG_ON(!vm->total);
>>   	drm_mm_init(&vm->mm, 0, vm->total);
>> +
>> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>> +		 ARRAY_SIZE(vm->min_alignment));
>> +
>> +	if (HAS_64K_PAGES(vm->i915)) {
>> +		if (IS_DG2(vm->i915)) {
> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
> Not only for DG2.

really? can we get confirmation of this?
this contradicts the documentation in patch 4, which you reviewed, so I 
am confused now

>> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
>> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
>> +		} else {
>> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
>> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
>> +		}
>> +	}
>> +
>>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>   
>>   	INIT_LIST_HEAD(&vm->bound_list);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 8073438b67c8..b8da2514d601 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -29,6 +29,8 @@
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   #include "i915_vma_types.h"
>> +#include "i915_params.h"
>> +#include "intel_memory_region.h"
>>   
>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>>   
>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>   	struct device *dma;
>>   	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>>   	u64 reserved;		/* size addr space reserved */
>> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>   
>>   	unsigned int bind_async_flags;
>>   
>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>>   	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>   }
>>   
>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>> +					enum intel_memory_type type)
>> +{
>> +	return vm->min_alignment[type];
>> +}
>> +
>>   static inline bool
>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 1f15c3298112..9ac92e7a3566 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>>   	}
>>   
>>   	color = 0;
>> +
>> +	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
>> +		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>> +		/*
>> +		 * DG2 can not have different sized pages in any given PDE (2MB range).
>> +		 * Keeping things simple, we force any lmem object to reserve
>> +		 * 2MB chunks, preventing any smaller pages being used alongside
>> +		 */
>> +		if (IS_DG2(vma->vm->i915)) {
> Similarly here we dont need special case for DG2.
> 
> Ram
>> +			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>> +			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>> +		}
>> +	}
>> +
>>   	if (i915_vm_has_cache_coloring(vma->vm))
>>   		color = vma->obj->cache_level;
>>   
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index 076d860ce01a..2f3f0c01786b 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			 u64 hole_start, u64 hole_end,
>>   			 unsigned long end_time)
>>   {
>> +	const unsigned int min_alignment =
>> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>   	I915_RND_STATE(seed_prng);
>>   	struct i915_vma_resource *mock_vma_res;
>>   	unsigned int size;
>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		I915_RND_SUBSTATE(prng, seed_prng);
>>   		struct drm_i915_gem_object *obj;
>>   		unsigned int *order, count, n;
>> -		u64 hole_size;
>> +		u64 hole_size, aligned_size;
>>   
>> -		hole_size = (hole_end - hole_start) >> size;
>> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
>> +		hole_size = (hole_end - hole_start) >> aligned_size;
>>   		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>   			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>   		count = hole_size >> 1;
>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		}
>>   		GEM_BUG_ON(!order);
>>   
>> -		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>> -		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>> +		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>> +		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
>>   
>>   		/* Ignore allocation failures (i.e. don't report them as
>>   		 * a test failure) as we are purposefully allocating very
>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		}
>>   
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   			intel_wakeref_t wakeref;
>>   
>> -			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>> +			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>   
>>   			if (igt_timeout(end_time,
>>   					"%s timed out before %d/%d\n",
>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			}
>>   
>>   			mock_vma_res->bi.pages = obj->mm.pages;
>> -			mock_vma_res->node_size = BIT_ULL(size);
>> +			mock_vma_res->node_size = BIT_ULL(aligned_size);
>>   			mock_vma_res->start = addr;
>>   
>>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   
>>   		i915_random_reorder(order, count, &prng);
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   			intel_wakeref_t wakeref;
>>   
>>   			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>>   {
>>   	const u64 hole_size = hole_end - hole_start;
>>   	struct drm_i915_gem_object *obj;
>> +	const unsigned int min_alignment =
>> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>   	const unsigned long max_pages =
>> -		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>> +		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
>>   	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>   	unsigned long npages, prime, flags;
>>   	struct i915_vma *vma;
>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					err = i915_vma_pin(vma, 0, 0, offset | flags);
>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					i915_vma_unpin(vma);
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					if (!drm_mm_node_allocated(&vma->node) ||
>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					}
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry_reverse(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					err = i915_vma_pin(vma, 0, 0, offset | flags);
>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					i915_vma_unpin(vma);
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry_reverse(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					if (!drm_mm_node_allocated(&vma->node) ||
>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>   					}
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   			}
>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>   	const u64 hole_size = hole_end - hole_start;
>>   	const unsigned long max_pages =
>>   		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>> +	unsigned long min_alignment;
>>   	unsigned long flags;
>>   	u64 size;
>>   
>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	for_each_prime_number_from(size, 1, max_pages) {
>>   		struct drm_i915_gem_object *obj;
>>   		struct i915_vma *vma;
>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>   
>>   		for (addr = hole_start;
>>   		     addr + obj->base.size < hole_end;
>> -		     addr += obj->base.size) {
>> +		     addr += round_up(obj->base.size, min_alignment)) {
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>>   				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   	struct i915_vma *vma;
>> +	unsigned int min_alignment;
>>   	unsigned long flags;
>>   	unsigned int pot;
>>   	int err = 0;
>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
>>   	if (IS_ERR(obj))
>>   		return PTR_ERR(obj);
>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>>   
>>   	/* Insert a pair of pages across every pot boundary within the hole */
>>   	for (pot = fls64(hole_end - 1) - 1;
>> -	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>> +	     pot > ilog2(2 * min_alignment);
>>   	     pot--) {
>>   		u64 step = BIT_ULL(pot);
>>   		u64 addr;
>>   
>> -		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
>> -		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
>> +		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
>> +		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
>>   		     addr += step) {
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>   		      unsigned long end_time)
>>   {
>>   	I915_RND_STATE(prng);
>> +	unsigned int min_alignment;
>>   	unsigned int size;
>>   	unsigned long flags;
>>   
>> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	/* Keep creating larger objects until one cannot fit into the hole */
>>   	for (size = 12; (hole_end - hole_start) >> size; size++) {
>>   		struct drm_i915_gem_object *obj;
>>   		unsigned int *order, count, n;
>>   		struct i915_vma *vma;
>> -		u64 hole_size;
>> +		u64 hole_size, aligned_size;
>>   		int err = -ENODEV;
>>   
>> -		hole_size = (hole_end - hole_start) >> size;
>> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
>> +		hole_size = (hole_end - hole_start) >> aligned_size;
>>   		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>   			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>   		count = hole_size >> 1;
>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>   		GEM_BUG_ON(vma->size != BIT_ULL(size));
>>   
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>> +	unsigned int min_alignment;
>>   	unsigned int order = 12;
>>   	LIST_HEAD(objects);
>>   	int err = 0;
>>   	u64 addr;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	/* Keep creating larger objects until one cannot fit into the hole */
>>   	for (addr = hole_start; addr < hole_end; ) {
>>   		struct i915_vma *vma;
>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>>   		}
>>   
>>   		i915_vma_unpin(vma);
>> -		addr += size;
>> +		addr += round_up(size, min_alignment);
>>   
>>   		/*
>>   		 * Since we are injecting allocation faults at random intervals,
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 13:15       ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 13:15 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Matthew Auld



On 20/01/2022 11:46, Ramalingam C wrote:
> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>> From: Matthew Auld <matthew.auld@intel.com>
>>
>> For local-memory objects we need to align the GTT addresses
>> to 64K, both for the ppgtt and ggtt.
>>
>> We need to support vm->min_alignment > 4K, depending
>> on the vm itself and the type of object we are inserting.
>> With this in mind update the GTT selftests to take this
>> into account.
>>
>> For DG2 we further align and pad lmem object GTT addresses
>> to 2MB to ensure PDEs contain consistent page sizes as
>> required by the HW.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> index c08f766e6e15..7fee95a65414 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>   	struct blit_buffer scratch;
>>   	struct i915_vma *batch;
>>   	u64 hole;
>> +	u64 align;
>>   	u32 width;
>>   	u32 height;
>>   };
>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>>   		goto err_free;
>>   	}
>>   
>> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
>> +	t->align = max(t->align,
>> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>> +	t->align = max(t->align,
>> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>> +
>> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>   	hole_size *= 2; /* room to maneuver */
>> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>> +	hole_size += 2 * t->align; /* padding on either side */
>>   
>>   	mutex_lock(&t->ce->vm->mutex);
>>   	memset(&hole, 0, sizeof(hole));
>>   	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
>> +					  hole_size, t->align,
>> +					  I915_COLOR_UNEVICTABLE,
>>   					  0, U64_MAX,
>>   					  DRM_MM_INSERT_BEST);
>>   	if (!err)
>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>>   		goto err_put;
>>   	}
>>   
>> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>> +	t->hole = hole.start + t->align;
>>   	pr_info("Using hole at %llx\n", t->hole);
>>   
>>   	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>   			       struct rnd_state *prng)
>>   {
>> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>>   	u32 *map;
>>   	int err;
>>   	int i;
>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>>   
>>   static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>>   {
>> -	u64 offset =
>> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
>> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>   	int err;
>>   
>>   	/* We want to check position invariant tiling across GTT eviction */
>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>>   
>>   	/* Reposition so that we overlap the old addresses, and slightly off */
>>   	err = tiled_blit(t,
>> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>> +			 &t->buffers[2], t->hole + t->align,
>>   			 &t->buffers[1], t->hole + 3 * offset / 2);
>>   	if (err)
>>   		return err;
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 46be4197b93f..7c92b25c0f26 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>>   
>>   	GEM_BUG_ON(!vm->total);
>>   	drm_mm_init(&vm->mm, 0, vm->total);
>> +
>> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>> +		 ARRAY_SIZE(vm->min_alignment));
>> +
>> +	if (HAS_64K_PAGES(vm->i915)) {
>> +		if (IS_DG2(vm->i915)) {
> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
> Not only for DG2.

really? can we get confirmation of this?
this contradicts the documentation in patch 4, which you reviewed, so I 
am confused now

>> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
>> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
>> +		} else {
>> +			vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
>> +			vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
>> +		}
>> +	}
>> +
>>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>   
>>   	INIT_LIST_HEAD(&vm->bound_list);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index 8073438b67c8..b8da2514d601 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -29,6 +29,8 @@
>>   #include "i915_selftest.h"
>>   #include "i915_vma_resource.h"
>>   #include "i915_vma_types.h"
>> +#include "i915_params.h"
>> +#include "intel_memory_region.h"
>>   
>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>>   
>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>   	struct device *dma;
>>   	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>>   	u64 reserved;		/* size addr space reserved */
>> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>   
>>   	unsigned int bind_async_flags;
>>   
>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>>   	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>   }
>>   
>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>> +					enum intel_memory_type type)
>> +{
>> +	return vm->min_alignment[type];
>> +}
>> +
>>   static inline bool
>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>> index 1f15c3298112..9ac92e7a3566 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>>   	}
>>   
>>   	color = 0;
>> +
>> +	if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) {
>> +		alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>> +		/*
>> +		 * DG2 can not have different sized pages in any given PDE (2MB range).
>> +		 * Keeping things simple, we force any lmem object to reserve
>> +		 * 2MB chunks, preventing any smaller pages being used alongside
>> +		 */
>> +		if (IS_DG2(vma->vm->i915)) {
> Similarly here we dont need special case for DG2.
> 
> Ram
>> +			alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>> +			size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>> +		}
>> +	}
>> +
>>   	if (i915_vm_has_cache_coloring(vma->vm))
>>   		color = vma->obj->cache_level;
>>   
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index 076d860ce01a..2f3f0c01786b 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			 u64 hole_start, u64 hole_end,
>>   			 unsigned long end_time)
>>   {
>> +	const unsigned int min_alignment =
>> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>   	I915_RND_STATE(seed_prng);
>>   	struct i915_vma_resource *mock_vma_res;
>>   	unsigned int size;
>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		I915_RND_SUBSTATE(prng, seed_prng);
>>   		struct drm_i915_gem_object *obj;
>>   		unsigned int *order, count, n;
>> -		u64 hole_size;
>> +		u64 hole_size, aligned_size;
>>   
>> -		hole_size = (hole_end - hole_start) >> size;
>> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
>> +		hole_size = (hole_end - hole_start) >> aligned_size;
>>   		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>   			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>   		count = hole_size >> 1;
>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		}
>>   		GEM_BUG_ON(!order);
>>   
>> -		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>> -		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>> +		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>> +		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
>>   
>>   		/* Ignore allocation failures (i.e. don't report them as
>>   		 * a test failure) as we are purposefully allocating very
>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   		}
>>   
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   			intel_wakeref_t wakeref;
>>   
>> -			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>> +			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>   
>>   			if (igt_timeout(end_time,
>>   					"%s timed out before %d/%d\n",
>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   			}
>>   
>>   			mock_vma_res->bi.pages = obj->mm.pages;
>> -			mock_vma_res->node_size = BIT_ULL(size);
>> +			mock_vma_res->node_size = BIT_ULL(aligned_size);
>>   			mock_vma_res->start = addr;
>>   
>>   			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
>>   
>>   		i915_random_reorder(order, count, &prng);
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   			intel_wakeref_t wakeref;
>>   
>>   			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>>   {
>>   	const u64 hole_size = hole_end - hole_start;
>>   	struct drm_i915_gem_object *obj;
>> +	const unsigned int min_alignment =
>> +		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>   	const unsigned long max_pages =
>> -		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>> +		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
>>   	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>   	unsigned long npages, prime, flags;
>>   	struct i915_vma *vma;
>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					err = i915_vma_pin(vma, 0, 0, offset | flags);
>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					i915_vma_unpin(vma);
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					if (!drm_mm_node_allocated(&vma->node) ||
>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					}
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry_reverse(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					err = i915_vma_pin(vma, 0, 0, offset | flags);
>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
>>   					i915_vma_unpin(vma);
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   
>>   				offset = p->offset;
>>   				list_for_each_entry_reverse(obj, &objects, st_link) {
>> +					u64 aligned_size = round_up(obj->base.size,
>> +								    min_alignment);
>> +
>>   					vma = i915_vma_instance(obj, vm, NULL);
>>   					if (IS_ERR(vma))
>>   						continue;
>>   
>>   					if (p->step < 0) {
>> -						if (offset < hole_start + obj->base.size)
>> +						if (offset < hole_start + aligned_size)
>>   							break;
>> -						offset -= obj->base.size;
>> +						offset -= aligned_size;
>>   					}
>>   
>>   					if (!drm_mm_node_allocated(&vma->node) ||
>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>   					}
>>   
>>   					if (p->step > 0) {
>> -						if (offset + obj->base.size > hole_end)
>> +						if (offset + aligned_size > hole_end)
>>   							break;
>> -						offset += obj->base.size;
>> +						offset += aligned_size;
>>   					}
>>   				}
>>   			}
>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>   	const u64 hole_size = hole_end - hole_start;
>>   	const unsigned long max_pages =
>>   		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>> +	unsigned long min_alignment;
>>   	unsigned long flags;
>>   	u64 size;
>>   
>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	for_each_prime_number_from(size, 1, max_pages) {
>>   		struct drm_i915_gem_object *obj;
>>   		struct i915_vma *vma;
>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>   
>>   		for (addr = hole_start;
>>   		     addr + obj->base.size < hole_end;
>> -		     addr += obj->base.size) {
>> +		     addr += round_up(obj->base.size, min_alignment)) {
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>>   				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   	struct i915_vma *vma;
>> +	unsigned int min_alignment;
>>   	unsigned long flags;
>>   	unsigned int pot;
>>   	int err = 0;
>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
>>   	if (IS_ERR(obj))
>>   		return PTR_ERR(obj);
>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>>   
>>   	/* Insert a pair of pages across every pot boundary within the hole */
>>   	for (pot = fls64(hole_end - 1) - 1;
>> -	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>> +	     pot > ilog2(2 * min_alignment);
>>   	     pot--) {
>>   		u64 step = BIT_ULL(pot);
>>   		u64 addr;
>>   
>> -		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
>> -		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
>> +		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
>> +		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
>>   		     addr += step) {
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>   		      unsigned long end_time)
>>   {
>>   	I915_RND_STATE(prng);
>> +	unsigned int min_alignment;
>>   	unsigned int size;
>>   	unsigned long flags;
>>   
>> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
>>   	if (i915_is_ggtt(vm))
>>   		flags |= PIN_GLOBAL;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	/* Keep creating larger objects until one cannot fit into the hole */
>>   	for (size = 12; (hole_end - hole_start) >> size; size++) {
>>   		struct drm_i915_gem_object *obj;
>>   		unsigned int *order, count, n;
>>   		struct i915_vma *vma;
>> -		u64 hole_size;
>> +		u64 hole_size, aligned_size;
>>   		int err = -ENODEV;
>>   
>> -		hole_size = (hole_end - hole_start) >> size;
>> +		aligned_size = max_t(u32, ilog2(min_alignment), size);
>> +		hole_size = (hole_end - hole_start) >> aligned_size;
>>   		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>   			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>   		count = hole_size >> 1;
>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>   		GEM_BUG_ON(vma->size != BIT_ULL(size));
>>   
>>   		for (n = 0; n < count; n++) {
>> -			u64 addr = hole_start + order[n] * BIT_ULL(size);
>> +			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>   
>>   			err = i915_vma_pin(vma, 0, 0, addr | flags);
>>   			if (err) {
>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>> +	unsigned int min_alignment;
>>   	unsigned int order = 12;
>>   	LIST_HEAD(objects);
>>   	int err = 0;
>>   	u64 addr;
>>   
>> +	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>> +
>>   	/* Keep creating larger objects until one cannot fit into the hole */
>>   	for (addr = hole_start; addr < hole_end; ) {
>>   		struct i915_vma *vma;
>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
>>   		}
>>   
>>   		i915_vma_unpin(vma);
>> -		addr += size;
>> +		addr += round_up(size, min_alignment);
>>   
>>   		/*
>>   		 * Since we are injecting allocation faults at random intervals,
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 13:15       ` Robert Beckett
  (?)
@ 2022-01-20 14:59         ` Matthew Auld
  -1 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 14:59 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel, linux-kernel

On 20/01/2022 13:15, Robert Beckett wrote:
> 
> 
> On 20/01/2022 11:46, Ramalingam C wrote:
>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>> From: Matthew Auld <matthew.auld@intel.com>
>>>
>>> For local-memory objects we need to align the GTT addresses
>>> to 64K, both for the ppgtt and ggtt.
>>>
>>> We need to support vm->min_alignment > 4K, depending
>>> on the vm itself and the type of object we are inserting.
>>> With this in mind update the GTT selftests to take this
>>> into account.
>>>
>>> For DG2 we further align and pad lmem object GTT addresses
>>> to 2MB to ensure PDEs contain consistent page sizes as
>>> required by the HW.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> index c08f766e6e15..7fee95a65414 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>       struct blit_buffer scratch;
>>>       struct i915_vma *batch;
>>>       u64 hole;
>>> +    u64 align;
>>>       u32 width;
>>>       u32 height;
>>>   };
>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>> *engine, struct rnd_state *prng)
>>>           goto err_free;
>>>       }
>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from 
>>> vm! */
>>> +    t->align = max(t->align,
>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>> +    t->align = max(t->align,
>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>>> +
>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>       hole_size *= 2; /* room to maneuver */
>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>       mutex_lock(&t->ce->vm->mutex);
>>>       memset(&hole, 0, sizeof(hole));
>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>> +                      hole_size, t->align,
>>> +                      I915_COLOR_UNEVICTABLE,
>>>                         0, U64_MAX,
>>>                         DRM_MM_INSERT_BEST);
>>>       if (!err)
>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>> *engine, struct rnd_state *prng)
>>>           goto err_put;
>>>       }
>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>> +    t->hole = hole.start + t->align;
>>>       pr_info("Using hole at %llx\n", t->hole);
>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>> tiled_blits *t)
>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>                      struct rnd_state *prng)
>>>   {
>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>       u32 *map;
>>>       int err;
>>>       int i;
>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits 
>>> *t,
>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>> rnd_state *prng)
>>>   {
>>> -    u64 offset =
>>> -        round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>       int err;
>>>       /* We want to check position invariant tiling across GTT 
>>> eviction */
>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits 
>>> *t, struct rnd_state *prng)
>>>       /* Reposition so that we overlap the old addresses, and 
>>> slightly off */
>>>       err = tiled_blit(t,
>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>> +             &t->buffers[2], t->hole + t->align,
>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>       if (err)
>>>           return err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 46be4197b93f..7c92b25c0f26 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>> i915_address_space *vm, int subclass)
>>>       GEM_BUG_ON(!vm->total);
>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>> +
>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>> +         ARRAY_SIZE(vm->min_alignment));
>>> +
>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>> +        if (IS_DG2(vm->i915)) {
>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>> Not only for DG2.
> 
> really? can we get confirmation of this?
> this contradicts the documentation in patch 4, which you reviewed, so I 
> am confused now

Starting from DG2, some platforms will have this new 64K GTT page size 
restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
to cover exactly that, AFAIK.

> 
>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_2M;
>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_2M;
>>> +        } else {
>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_64K;
>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_64K;
>>> +        }
>>> +    }
>>> +
>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>       INIT_LIST_HEAD(&vm->bound_list);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index 8073438b67c8..b8da2514d601 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -29,6 +29,8 @@
>>>   #include "i915_selftest.h"
>>>   #include "i915_vma_resource.h"
>>>   #include "i915_vma_types.h"
>>> +#include "i915_params.h"
>>> +#include "intel_memory_region.h"
>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>> __GFP_NOWARN)
>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>       struct device *dma;
>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>       u64 reserved;        /* size addr space reserved */
>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>       unsigned int bind_async_flags;
>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>> i915_address_space *vm)
>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>   }
>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>>> +                    enum intel_memory_type type)
>>> +{
>>> +    return vm->min_alignment[type];
>>> +}
>>> +
>>>   static inline bool
>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>   {
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 1f15c3298112..9ac92e7a3566 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
>>> u64 alignment, u64 flags)
>>>       }
>>>       color = 0;
>>> +
>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>> i915_gem_object_is_lmem(vma->obj)) {
>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>> +        /*
>>> +         * DG2 can not have different sized pages in any given PDE 
>>> (2MB range).
>>> +         * Keeping things simple, we force any lmem object to reserve
>>> +         * 2MB chunks, preventing any smaller pages being used 
>>> alongside
>>> +         */
>>> +        if (IS_DG2(vma->vm->i915)) {
>> Similarly here we dont need special case for DG2.
>>
>> Ram
>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>> +        }
>>> +    }
>>> +
>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>           color = vma->obj->cache_level;
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> index 076d860ce01a..2f3f0c01786b 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>                u64 hole_start, u64 hole_end,
>>>                unsigned long end_time)
>>>   {
>>> +    const unsigned int min_alignment =
>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>       I915_RND_STATE(seed_prng);
>>>       struct i915_vma_resource *mock_vma_res;
>>>       unsigned int size;
>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>           struct drm_i915_gem_object *obj;
>>>           unsigned int *order, count, n;
>>> -        u64 hole_size;
>>> +        u64 hole_size, aligned_size;
>>> -        hole_size = (hole_end - hole_start) >> size;
>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>           count = hole_size >> 1;
>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           GEM_BUG_ON(!order);
>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>> hole_end);
>>>           /* Ignore allocation failures (i.e. don't report them as
>>>            * a test failure) as we are purposefully allocating very
>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               intel_wakeref_t wakeref;
>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>               if (igt_timeout(end_time,
>>>                       "%s timed out before %d/%d\n",
>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>               }
>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>               mock_vma_res->start = addr;
>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           i915_random_reorder(order, count, &prng);
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               intel_wakeref_t wakeref;
>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>>>   {
>>>       const u64 hole_size = hole_end - hole_start;
>>>       struct drm_i915_gem_object *obj;
>>> +    const unsigned int min_alignment =
>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>       const unsigned long max_pages =
>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>> ilog2(min_alignment));
>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>       unsigned long npages, prime, flags;
>>>       struct i915_vma *vma;
>>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                   offset = p->offset;
>>>                   list_for_each_entry(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       i915_vma_unpin(vma);
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       }
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       i915_vma_unpin(vma);
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>>                       }
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>               }
>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>       const u64 hole_size = hole_end - hole_start;
>>>       const unsigned long max_pages =
>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>> +    unsigned long min_alignment;
>>>       unsigned long flags;
>>>       u64 size;
>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>           struct drm_i915_gem_object *obj;
>>>           struct i915_vma *vma;
>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>           for (addr = hole_start;
>>>                addr + obj->base.size < hole_end;
>>> -             addr += obj->base.size) {
>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>> %llx] with err=%d\n",
>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>>       struct i915_vma *vma;
>>> +    unsigned int min_alignment;
>>>       unsigned long flags;
>>>       unsigned int pot;
>>>       int err = 0;
>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>> I915_GTT_PAGE_SIZE);
>>>       if (IS_ERR(obj))
>>>           return PTR_ERR(obj);
>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>>>       /* Insert a pair of pages across every pot boundary within the 
>>> hole */
>>>       for (pot = fls64(hole_end - 1) - 1;
>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>> +         pot > ilog2(2 * min_alignment);
>>>            pot--) {
>>>           u64 step = BIT_ULL(pot);
>>>           u64 addr;
>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) 
>>> - I915_GTT_PAGE_SIZE;
>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>> step) - I915_GTT_PAGE_SIZE;
>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>> min_alignment;
>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>> step) - min_alignment;
>>>                addr += step) {
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>                 unsigned long end_time)
>>>   {
>>>       I915_RND_STATE(prng);
>>> +    unsigned int min_alignment;
>>>       unsigned int size;
>>>       unsigned long flags;
>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space 
>>> *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       /* Keep creating larger objects until one cannot fit into the 
>>> hole */
>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>           struct drm_i915_gem_object *obj;
>>>           unsigned int *order, count, n;
>>>           struct i915_vma *vma;
>>> -        u64 hole_size;
>>> +        u64 hole_size, aligned_size;
>>>           int err = -ENODEV;
>>> -        hole_size = (hole_end - hole_start) >> size;
>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>           count = hole_size >> 1;
>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>> +    unsigned int min_alignment;
>>>       unsigned int order = 12;
>>>       LIST_HEAD(objects);
>>>       int err = 0;
>>>       u64 addr;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       /* Keep creating larger objects until one cannot fit into the 
>>> hole */
>>>       for (addr = hole_start; addr < hole_end; ) {
>>>           struct i915_vma *vma;
>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           i915_vma_unpin(vma);
>>> -        addr += size;
>>> +        addr += round_up(size, min_alignment);
>>>           /*
>>>            * Since we are injecting allocation faults at random 
>>> intervals,
>>> -- 
>>> 2.25.1
>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 14:59         ` Matthew Auld
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 14:59 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Rodrigo Vivi

On 20/01/2022 13:15, Robert Beckett wrote:
> 
> 
> On 20/01/2022 11:46, Ramalingam C wrote:
>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>> From: Matthew Auld <matthew.auld@intel.com>
>>>
>>> For local-memory objects we need to align the GTT addresses
>>> to 64K, both for the ppgtt and ggtt.
>>>
>>> We need to support vm->min_alignment > 4K, depending
>>> on the vm itself and the type of object we are inserting.
>>> With this in mind update the GTT selftests to take this
>>> into account.
>>>
>>> For DG2 we further align and pad lmem object GTT addresses
>>> to 2MB to ensure PDEs contain consistent page sizes as
>>> required by the HW.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> index c08f766e6e15..7fee95a65414 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>       struct blit_buffer scratch;
>>>       struct i915_vma *batch;
>>>       u64 hole;
>>> +    u64 align;
>>>       u32 width;
>>>       u32 height;
>>>   };
>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>> *engine, struct rnd_state *prng)
>>>           goto err_free;
>>>       }
>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from 
>>> vm! */
>>> +    t->align = max(t->align,
>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>> +    t->align = max(t->align,
>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>>> +
>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>       hole_size *= 2; /* room to maneuver */
>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>       mutex_lock(&t->ce->vm->mutex);
>>>       memset(&hole, 0, sizeof(hole));
>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>> +                      hole_size, t->align,
>>> +                      I915_COLOR_UNEVICTABLE,
>>>                         0, U64_MAX,
>>>                         DRM_MM_INSERT_BEST);
>>>       if (!err)
>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>> *engine, struct rnd_state *prng)
>>>           goto err_put;
>>>       }
>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>> +    t->hole = hole.start + t->align;
>>>       pr_info("Using hole at %llx\n", t->hole);
>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>> tiled_blits *t)
>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>                      struct rnd_state *prng)
>>>   {
>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>       u32 *map;
>>>       int err;
>>>       int i;
>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits 
>>> *t,
>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>> rnd_state *prng)
>>>   {
>>> -    u64 offset =
>>> -        round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>       int err;
>>>       /* We want to check position invariant tiling across GTT 
>>> eviction */
>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits 
>>> *t, struct rnd_state *prng)
>>>       /* Reposition so that we overlap the old addresses, and 
>>> slightly off */
>>>       err = tiled_blit(t,
>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>> +             &t->buffers[2], t->hole + t->align,
>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>       if (err)
>>>           return err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 46be4197b93f..7c92b25c0f26 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>> i915_address_space *vm, int subclass)
>>>       GEM_BUG_ON(!vm->total);
>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>> +
>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>> +         ARRAY_SIZE(vm->min_alignment));
>>> +
>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>> +        if (IS_DG2(vm->i915)) {
>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>> Not only for DG2.
> 
> really? can we get confirmation of this?
> this contradicts the documentation in patch 4, which you reviewed, so I 
> am confused now

Starting from DG2, some platforms will have this new 64K GTT page size 
restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
to cover exactly that, AFAIK.

> 
>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_2M;
>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_2M;
>>> +        } else {
>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_64K;
>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_64K;
>>> +        }
>>> +    }
>>> +
>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>       INIT_LIST_HEAD(&vm->bound_list);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index 8073438b67c8..b8da2514d601 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -29,6 +29,8 @@
>>>   #include "i915_selftest.h"
>>>   #include "i915_vma_resource.h"
>>>   #include "i915_vma_types.h"
>>> +#include "i915_params.h"
>>> +#include "intel_memory_region.h"
>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>> __GFP_NOWARN)
>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>       struct device *dma;
>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>       u64 reserved;        /* size addr space reserved */
>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>       unsigned int bind_async_flags;
>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>> i915_address_space *vm)
>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>   }
>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>>> +                    enum intel_memory_type type)
>>> +{
>>> +    return vm->min_alignment[type];
>>> +}
>>> +
>>>   static inline bool
>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>   {
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 1f15c3298112..9ac92e7a3566 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
>>> u64 alignment, u64 flags)
>>>       }
>>>       color = 0;
>>> +
>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>> i915_gem_object_is_lmem(vma->obj)) {
>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>> +        /*
>>> +         * DG2 can not have different sized pages in any given PDE 
>>> (2MB range).
>>> +         * Keeping things simple, we force any lmem object to reserve
>>> +         * 2MB chunks, preventing any smaller pages being used 
>>> alongside
>>> +         */
>>> +        if (IS_DG2(vma->vm->i915)) {
>> Similarly here we dont need special case for DG2.
>>
>> Ram
>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>> +        }
>>> +    }
>>> +
>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>           color = vma->obj->cache_level;
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> index 076d860ce01a..2f3f0c01786b 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>                u64 hole_start, u64 hole_end,
>>>                unsigned long end_time)
>>>   {
>>> +    const unsigned int min_alignment =
>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>       I915_RND_STATE(seed_prng);
>>>       struct i915_vma_resource *mock_vma_res;
>>>       unsigned int size;
>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>           struct drm_i915_gem_object *obj;
>>>           unsigned int *order, count, n;
>>> -        u64 hole_size;
>>> +        u64 hole_size, aligned_size;
>>> -        hole_size = (hole_end - hole_start) >> size;
>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>           count = hole_size >> 1;
>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           GEM_BUG_ON(!order);
>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>> hole_end);
>>>           /* Ignore allocation failures (i.e. don't report them as
>>>            * a test failure) as we are purposefully allocating very
>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               intel_wakeref_t wakeref;
>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>               if (igt_timeout(end_time,
>>>                       "%s timed out before %d/%d\n",
>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>               }
>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>               mock_vma_res->start = addr;
>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           i915_random_reorder(order, count, &prng);
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               intel_wakeref_t wakeref;
>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>>>   {
>>>       const u64 hole_size = hole_end - hole_start;
>>>       struct drm_i915_gem_object *obj;
>>> +    const unsigned int min_alignment =
>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>       const unsigned long max_pages =
>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>> ilog2(min_alignment));
>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>       unsigned long npages, prime, flags;
>>>       struct i915_vma *vma;
>>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                   offset = p->offset;
>>>                   list_for_each_entry(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       i915_vma_unpin(vma);
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       }
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       i915_vma_unpin(vma);
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>>                       }
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>               }
>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>       const u64 hole_size = hole_end - hole_start;
>>>       const unsigned long max_pages =
>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>> +    unsigned long min_alignment;
>>>       unsigned long flags;
>>>       u64 size;
>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>           struct drm_i915_gem_object *obj;
>>>           struct i915_vma *vma;
>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>           for (addr = hole_start;
>>>                addr + obj->base.size < hole_end;
>>> -             addr += obj->base.size) {
>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>> %llx] with err=%d\n",
>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>>       struct i915_vma *vma;
>>> +    unsigned int min_alignment;
>>>       unsigned long flags;
>>>       unsigned int pot;
>>>       int err = 0;
>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>> I915_GTT_PAGE_SIZE);
>>>       if (IS_ERR(obj))
>>>           return PTR_ERR(obj);
>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>>>       /* Insert a pair of pages across every pot boundary within the 
>>> hole */
>>>       for (pot = fls64(hole_end - 1) - 1;
>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>> +         pot > ilog2(2 * min_alignment);
>>>            pot--) {
>>>           u64 step = BIT_ULL(pot);
>>>           u64 addr;
>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) 
>>> - I915_GTT_PAGE_SIZE;
>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>> step) - I915_GTT_PAGE_SIZE;
>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>> min_alignment;
>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>> step) - min_alignment;
>>>                addr += step) {
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>                 unsigned long end_time)
>>>   {
>>>       I915_RND_STATE(prng);
>>> +    unsigned int min_alignment;
>>>       unsigned int size;
>>>       unsigned long flags;
>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space 
>>> *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       /* Keep creating larger objects until one cannot fit into the 
>>> hole */
>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>           struct drm_i915_gem_object *obj;
>>>           unsigned int *order, count, n;
>>>           struct i915_vma *vma;
>>> -        u64 hole_size;
>>> +        u64 hole_size, aligned_size;
>>>           int err = -ENODEV;
>>> -        hole_size = (hole_end - hole_start) >> size;
>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>           count = hole_size >> 1;
>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>> +    unsigned int min_alignment;
>>>       unsigned int order = 12;
>>>       LIST_HEAD(objects);
>>>       int err = 0;
>>>       u64 addr;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       /* Keep creating larger objects until one cannot fit into the 
>>> hole */
>>>       for (addr = hole_start; addr < hole_end; ) {
>>>           struct i915_vma *vma;
>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           i915_vma_unpin(vma);
>>> -        addr += size;
>>> +        addr += round_up(size, min_alignment);
>>>           /*
>>>            * Since we are injecting allocation faults at random 
>>> intervals,
>>> -- 
>>> 2.25.1
>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 14:59         ` Matthew Auld
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 14:59 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel

On 20/01/2022 13:15, Robert Beckett wrote:
> 
> 
> On 20/01/2022 11:46, Ramalingam C wrote:
>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>> From: Matthew Auld <matthew.auld@intel.com>
>>>
>>> For local-memory objects we need to align the GTT addresses
>>> to 64K, both for the ppgtt and ggtt.
>>>
>>> We need to support vm->min_alignment > 4K, depending
>>> on the vm itself and the type of object we are inserting.
>>> With this in mind update the GTT selftests to take this
>>> into account.
>>>
>>> For DG2 we further align and pad lmem object GTT addresses
>>> to 2MB to ensure PDEs contain consistent page sizes as
>>> required by the HW.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> index c08f766e6e15..7fee95a65414 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>       struct blit_buffer scratch;
>>>       struct i915_vma *batch;
>>>       u64 hole;
>>> +    u64 align;
>>>       u32 width;
>>>       u32 height;
>>>   };
>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>> *engine, struct rnd_state *prng)
>>>           goto err_free;
>>>       }
>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from 
>>> vm! */
>>> +    t->align = max(t->align,
>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>> +    t->align = max(t->align,
>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>>> +
>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>       hole_size *= 2; /* room to maneuver */
>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>       mutex_lock(&t->ce->vm->mutex);
>>>       memset(&hole, 0, sizeof(hole));
>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>> +                      hole_size, t->align,
>>> +                      I915_COLOR_UNEVICTABLE,
>>>                         0, U64_MAX,
>>>                         DRM_MM_INSERT_BEST);
>>>       if (!err)
>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>> *engine, struct rnd_state *prng)
>>>           goto err_put;
>>>       }
>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>> +    t->hole = hole.start + t->align;
>>>       pr_info("Using hole at %llx\n", t->hole);
>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>> tiled_blits *t)
>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>                      struct rnd_state *prng)
>>>   {
>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>       u32 *map;
>>>       int err;
>>>       int i;
>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits 
>>> *t,
>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>> rnd_state *prng)
>>>   {
>>> -    u64 offset =
>>> -        round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>       int err;
>>>       /* We want to check position invariant tiling across GTT 
>>> eviction */
>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits 
>>> *t, struct rnd_state *prng)
>>>       /* Reposition so that we overlap the old addresses, and 
>>> slightly off */
>>>       err = tiled_blit(t,
>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>> +             &t->buffers[2], t->hole + t->align,
>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>       if (err)
>>>           return err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> index 46be4197b93f..7c92b25c0f26 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>> i915_address_space *vm, int subclass)
>>>       GEM_BUG_ON(!vm->total);
>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>> +
>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>> +         ARRAY_SIZE(vm->min_alignment));
>>> +
>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>> +        if (IS_DG2(vm->i915)) {
>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>> Not only for DG2.
> 
> really? can we get confirmation of this?
> this contradicts the documentation in patch 4, which you reviewed, so I 
> am confused now

Starting from DG2, some platforms will have this new 64K GTT page size 
restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
to cover exactly that, AFAIK.

> 
>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_2M;
>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_2M;
>>> +        } else {
>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_64K;
>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>> I915_GTT_PAGE_SIZE_64K;
>>> +        }
>>> +    }
>>> +
>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>       INIT_LIST_HEAD(&vm->bound_list);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index 8073438b67c8..b8da2514d601 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -29,6 +29,8 @@
>>>   #include "i915_selftest.h"
>>>   #include "i915_vma_resource.h"
>>>   #include "i915_vma_types.h"
>>> +#include "i915_params.h"
>>> +#include "intel_memory_region.h"
>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>> __GFP_NOWARN)
>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>       struct device *dma;
>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>       u64 reserved;        /* size addr space reserved */
>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>       unsigned int bind_async_flags;
>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>> i915_address_space *vm)
>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>   }
>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>>> +                    enum intel_memory_type type)
>>> +{
>>> +    return vm->min_alignment[type];
>>> +}
>>> +
>>>   static inline bool
>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>   {
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>> b/drivers/gpu/drm/i915/i915_vma.c
>>> index 1f15c3298112..9ac92e7a3566 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
>>> u64 alignment, u64 flags)
>>>       }
>>>       color = 0;
>>> +
>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>> i915_gem_object_is_lmem(vma->obj)) {
>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>> +        /*
>>> +         * DG2 can not have different sized pages in any given PDE 
>>> (2MB range).
>>> +         * Keeping things simple, we force any lmem object to reserve
>>> +         * 2MB chunks, preventing any smaller pages being used 
>>> alongside
>>> +         */
>>> +        if (IS_DG2(vma->vm->i915)) {
>> Similarly here we dont need special case for DG2.
>>
>> Ram
>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>> +        }
>>> +    }
>>> +
>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>           color = vma->obj->cache_level;
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> index 076d860ce01a..2f3f0c01786b 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>                u64 hole_start, u64 hole_end,
>>>                unsigned long end_time)
>>>   {
>>> +    const unsigned int min_alignment =
>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>       I915_RND_STATE(seed_prng);
>>>       struct i915_vma_resource *mock_vma_res;
>>>       unsigned int size;
>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>           struct drm_i915_gem_object *obj;
>>>           unsigned int *order, count, n;
>>> -        u64 hole_size;
>>> +        u64 hole_size, aligned_size;
>>> -        hole_size = (hole_end - hole_start) >> size;
>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>           count = hole_size >> 1;
>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           GEM_BUG_ON(!order);
>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>> hole_end);
>>>           /* Ignore allocation failures (i.e. don't report them as
>>>            * a test failure) as we are purposefully allocating very
>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               intel_wakeref_t wakeref;
>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>               if (igt_timeout(end_time,
>>>                       "%s timed out before %d/%d\n",
>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>               }
>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>               mock_vma_res->start = addr;
>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>> i915_address_space *vm,
>>>           i915_random_reorder(order, count, &prng);
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               intel_wakeref_t wakeref;
>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
>>>   {
>>>       const u64 hole_size = hole_end - hole_start;
>>>       struct drm_i915_gem_object *obj;
>>> +    const unsigned int min_alignment =
>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>       const unsigned long max_pages =
>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>> ilog2(min_alignment));
>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>       unsigned long npages, prime, flags;
>>>       struct i915_vma *vma;
>>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                   offset = p->offset;
>>>                   list_for_each_entry(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       i915_vma_unpin(vma);
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       }
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space 
>>> *vm,
>>>                       i915_vma_unpin(vma);
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>                   offset = p->offset;
>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>> +                    u64 aligned_size = round_up(obj->base.size,
>>> +                                    min_alignment);
>>> +
>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>                       if (IS_ERR(vma))
>>>                           continue;
>>>                       if (p->step < 0) {
>>> -                        if (offset < hole_start + obj->base.size)
>>> +                        if (offset < hole_start + aligned_size)
>>>                               break;
>>> -                        offset -= obj->base.size;
>>> +                        offset -= aligned_size;
>>>                       }
>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>>                       }
>>>                       if (p->step > 0) {
>>> -                        if (offset + obj->base.size > hole_end)
>>> +                        if (offset + aligned_size > hole_end)
>>>                               break;
>>> -                        offset += obj->base.size;
>>> +                        offset += aligned_size;
>>>                       }
>>>                   }
>>>               }
>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>       const u64 hole_size = hole_end - hole_start;
>>>       const unsigned long max_pages =
>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>> +    unsigned long min_alignment;
>>>       unsigned long flags;
>>>       u64 size;
>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>           struct drm_i915_gem_object *obj;
>>>           struct i915_vma *vma;
>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>           for (addr = hole_start;
>>>                addr + obj->base.size < hole_end;
>>> -             addr += obj->base.size) {
>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>> %llx] with err=%d\n",
>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>>       struct i915_vma *vma;
>>> +    unsigned int min_alignment;
>>>       unsigned long flags;
>>>       unsigned int pot;
>>>       int err = 0;
>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>> I915_GTT_PAGE_SIZE);
>>>       if (IS_ERR(obj))
>>>           return PTR_ERR(obj);
>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
>>>       /* Insert a pair of pages across every pot boundary within the 
>>> hole */
>>>       for (pot = fls64(hole_end - 1) - 1;
>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>> +         pot > ilog2(2 * min_alignment);
>>>            pot--) {
>>>           u64 step = BIT_ULL(pot);
>>>           u64 addr;
>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) 
>>> - I915_GTT_PAGE_SIZE;
>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>> step) - I915_GTT_PAGE_SIZE;
>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>> min_alignment;
>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>> step) - min_alignment;
>>>                addr += step) {
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>                 unsigned long end_time)
>>>   {
>>>       I915_RND_STATE(prng);
>>> +    unsigned int min_alignment;
>>>       unsigned int size;
>>>       unsigned long flags;
>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space 
>>> *vm,
>>>       if (i915_is_ggtt(vm))
>>>           flags |= PIN_GLOBAL;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       /* Keep creating larger objects until one cannot fit into the 
>>> hole */
>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>           struct drm_i915_gem_object *obj;
>>>           unsigned int *order, count, n;
>>>           struct i915_vma *vma;
>>> -        u64 hole_size;
>>> +        u64 hole_size, aligned_size;
>>>           int err = -ENODEV;
>>> -        hole_size = (hole_end - hole_start) >> size;
>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>           count = hole_size >> 1;
>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>           for (n = 0; n < count; n++) {
>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>               if (err) {
>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>   {
>>>       struct drm_i915_gem_object *obj;
>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>> +    unsigned int min_alignment;
>>>       unsigned int order = 12;
>>>       LIST_HEAD(objects);
>>>       int err = 0;
>>>       u64 addr;
>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>> +
>>>       /* Keep creating larger objects until one cannot fit into the 
>>> hole */
>>>       for (addr = hole_start; addr < hole_end; ) {
>>>           struct i915_vma *vma;
>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>> i915_address_space *vm,
>>>           }
>>>           i915_vma_unpin(vma);
>>> -        addr += size;
>>> +        addr += round_up(size, min_alignment);
>>>           /*
>>>            * Since we are injecting allocation faults at random 
>>> intervals,
>>> -- 
>>> 2.25.1
>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 14:59         ` Matthew Auld
  (?)
@ 2022-01-20 15:44           ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 15:44 UTC (permalink / raw)
  To: Matthew Auld, Ramalingam C
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel, linux-kernel



On 20/01/2022 14:59, Matthew Auld wrote:
> On 20/01/2022 13:15, Robert Beckett wrote:
>>
>>
>> On 20/01/2022 11:46, Ramalingam C wrote:
>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>
>>>> For local-memory objects we need to align the GTT addresses
>>>> to 64K, both for the ppgtt and ggtt.
>>>>
>>>> We need to support vm->min_alignment > 4K, depending
>>>> on the vm itself and the type of object we are inserting.
>>>> With this in mind update the GTT selftests to take this
>>>> into account.
>>>>
>>>> For DG2 we further align and pad lmem object GTT addresses
>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>> required by the HW.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> ---
>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>> ++++++++++++-------
>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git 
>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> index c08f766e6e15..7fee95a65414 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>       struct blit_buffer scratch;
>>>>       struct i915_vma *batch;
>>>>       u64 hole;
>>>> +    u64 align;
>>>>       u32 width;
>>>>       u32 height;
>>>>   };
>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>> *engine, struct rnd_state *prng)
>>>>           goto err_free;
>>>>       }
>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>> from vm! */
>>>> +    t->align = max(t->align,
>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>>> +    t->align = max(t->align,
>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>>>> +
>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>       hole_size *= 2; /* room to maneuver */
>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>       memset(&hole, 0, sizeof(hole));
>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>> +                      hole_size, t->align,
>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>                         0, U64_MAX,
>>>>                         DRM_MM_INSERT_BEST);
>>>>       if (!err)
>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>> *engine, struct rnd_state *prng)
>>>>           goto err_put;
>>>>       }
>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>> +    t->hole = hole.start + t->align;
>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>> tiled_blits *t)
>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>                      struct rnd_state *prng)
>>>>   {
>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>       u32 *map;
>>>>       int err;
>>>>       int i;
>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>> tiled_blits *t,
>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>> rnd_state *prng)
>>>>   {
>>>> -    u64 offset =
>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>> I915_GTT_MIN_ALIGNMENT);
>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>       int err;
>>>>       /* We want to check position invariant tiling across GTT 
>>>> eviction */
>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits 
>>>> *t, struct rnd_state *prng)
>>>>       /* Reposition so that we overlap the old addresses, and 
>>>> slightly off */
>>>>       err = tiled_blit(t,
>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>> +             &t->buffers[2], t->hole + t->align,
>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>       if (err)
>>>>           return err;
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>> i915_address_space *vm, int subclass)
>>>>       GEM_BUG_ON(!vm->total);
>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>> +
>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>> +
>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>> +        if (IS_DG2(vm->i915)) {
>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>> Not only for DG2.
>>
>> really? can we get confirmation of this?
>> this contradicts the documentation in patch 4, which you reviewed, so 
>> I am confused now
> 
> Starting from DG2, some platforms will have this new 64K GTT page size 
> restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
> to cover exactly that, AFAIK.

As I understood it, 64K pages only are a requirement going forward for 
discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
pages was specific to DG2.

e.g.  xehpsdv is also defined as having 64k pages. And others in future 
are likely to, but without the PDE sharing restrictions.

If this is not the case, and all 64K page devices will also necessitate 
not sharing PDEs, then we can just use the HAS_64K_PAGES and use 2MB 
everywhere, but so far this sounds unconfirmed.

> 
>>
>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_2M;
>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_2M;
>>>> +        } else {
>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_64K;
>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_64K;
>>>> +        }
>>>> +    }
>>>> +
>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> index 8073438b67c8..b8da2514d601 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> @@ -29,6 +29,8 @@
>>>>   #include "i915_selftest.h"
>>>>   #include "i915_vma_resource.h"
>>>>   #include "i915_vma_types.h"
>>>> +#include "i915_params.h"
>>>> +#include "intel_memory_region.h"
>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>> __GFP_NOWARN)
>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>       struct device *dma;
>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>       u64 reserved;        /* size addr space reserved */
>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>       unsigned int bind_async_flags;
>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>> i915_address_space *vm)
>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>   }
>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>>>> +                    enum intel_memory_type type)
>>>> +{
>>>> +    return vm->min_alignment[type];
>>>> +}
>>>> +
>>>>   static inline bool
>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>   {
>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
>>>> u64 alignment, u64 flags)
>>>>       }
>>>>       color = 0;
>>>> +
>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>> +        /*
>>>> +         * DG2 can not have different sized pages in any given PDE 
>>>> (2MB range).
>>>> +         * Keeping things simple, we force any lmem object to reserve
>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>> alongside
>>>> +         */
>>>> +        if (IS_DG2(vma->vm->i915)) {
>>> Similarly here we dont need special case for DG2.
>>>
>>> Ram
>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>> +        }
>>>> +    }
>>>> +
>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>           color = vma->obj->cache_level;
>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>                u64 hole_start, u64 hole_end,
>>>>                unsigned long end_time)
>>>>   {
>>>> +    const unsigned int min_alignment =
>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>       I915_RND_STATE(seed_prng);
>>>>       struct i915_vma_resource *mock_vma_res;
>>>>       unsigned int size;
>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>           struct drm_i915_gem_object *obj;
>>>>           unsigned int *order, count, n;
>>>> -        u64 hole_size;
>>>> +        u64 hole_size, aligned_size;
>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>           count = hole_size >> 1;
>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           GEM_BUG_ON(!order);
>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>> hole_end);
>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>            * a test failure) as we are purposefully allocating very
>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               intel_wakeref_t wakeref;
>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>               if (igt_timeout(end_time,
>>>>                       "%s timed out before %d/%d\n",
>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>               }
>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>               mock_vma_res->start = addr;
>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           i915_random_reorder(order, count, &prng);
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               intel_wakeref_t wakeref;
>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>   {
>>>>       const u64 hole_size = hole_end - hole_start;
>>>>       struct drm_i915_gem_object *obj;
>>>> +    const unsigned int min_alignment =
>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>       const unsigned long max_pages =
>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>> ilog2(min_alignment));
>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>       unsigned long npages, prime, flags;
>>>>       struct i915_vma *vma;
>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       i915_vma_unpin(vma);
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       }
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       i915_vma_unpin(vma);
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>>>                       }
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>               }
>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>>       const u64 hole_size = hole_end - hole_start;
>>>>       const unsigned long max_pages =
>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>> +    unsigned long min_alignment;
>>>>       unsigned long flags;
>>>>       u64 size;
>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>           struct drm_i915_gem_object *obj;
>>>>           struct i915_vma *vma;
>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>>           for (addr = hole_start;
>>>>                addr + obj->base.size < hole_end;
>>>> -             addr += obj->base.size) {
>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>>> %llx] with err=%d\n",
>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>>   {
>>>>       struct drm_i915_gem_object *obj;
>>>>       struct i915_vma *vma;
>>>> +    unsigned int min_alignment;
>>>>       unsigned long flags;
>>>>       unsigned int pot;
>>>>       int err = 0;
>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>> I915_GTT_PAGE_SIZE);
>>>>       if (IS_ERR(obj))
>>>>           return PTR_ERR(obj);
>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space 
>>>> *vm,
>>>>       /* Insert a pair of pages across every pot boundary within the 
>>>> hole */
>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>> +         pot > ilog2(2 * min_alignment);
>>>>            pot--) {
>>>>           u64 step = BIT_ULL(pot);
>>>>           u64 addr;
>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) 
>>>> - I915_GTT_PAGE_SIZE;
>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>> step) - I915_GTT_PAGE_SIZE;
>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>> min_alignment;
>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>> step) - min_alignment;
>>>>                addr += step) {
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space 
>>>> *vm,
>>>>                 unsigned long end_time)
>>>>   {
>>>>       I915_RND_STATE(prng);
>>>> +    unsigned int min_alignment;
>>>>       unsigned int size;
>>>>       unsigned long flags;
>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>> i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>> hole */
>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>           struct drm_i915_gem_object *obj;
>>>>           unsigned int *order, count, n;
>>>>           struct i915_vma *vma;
>>>> -        u64 hole_size;
>>>> +        u64 hole_size, aligned_size;
>>>>           int err = -ENODEV;
>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>           count = hole_size >> 1;
>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space 
>>>> *vm,
>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>> i915_address_space *vm,
>>>>   {
>>>>       struct drm_i915_gem_object *obj;
>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>> +    unsigned int min_alignment;
>>>>       unsigned int order = 12;
>>>>       LIST_HEAD(objects);
>>>>       int err = 0;
>>>>       u64 addr;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>> hole */
>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>           struct i915_vma *vma;
>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           i915_vma_unpin(vma);
>>>> -        addr += size;
>>>> +        addr += round_up(size, min_alignment);
>>>>           /*
>>>>            * Since we are injecting allocation faults at random 
>>>> intervals,
>>>> -- 
>>>> 2.25.1
>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 15:44           ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 15:44 UTC (permalink / raw)
  To: Matthew Auld, Ramalingam C
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Rodrigo Vivi



On 20/01/2022 14:59, Matthew Auld wrote:
> On 20/01/2022 13:15, Robert Beckett wrote:
>>
>>
>> On 20/01/2022 11:46, Ramalingam C wrote:
>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>
>>>> For local-memory objects we need to align the GTT addresses
>>>> to 64K, both for the ppgtt and ggtt.
>>>>
>>>> We need to support vm->min_alignment > 4K, depending
>>>> on the vm itself and the type of object we are inserting.
>>>> With this in mind update the GTT selftests to take this
>>>> into account.
>>>>
>>>> For DG2 we further align and pad lmem object GTT addresses
>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>> required by the HW.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> ---
>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>> ++++++++++++-------
>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git 
>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> index c08f766e6e15..7fee95a65414 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>       struct blit_buffer scratch;
>>>>       struct i915_vma *batch;
>>>>       u64 hole;
>>>> +    u64 align;
>>>>       u32 width;
>>>>       u32 height;
>>>>   };
>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>> *engine, struct rnd_state *prng)
>>>>           goto err_free;
>>>>       }
>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>> from vm! */
>>>> +    t->align = max(t->align,
>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>>> +    t->align = max(t->align,
>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>>>> +
>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>       hole_size *= 2; /* room to maneuver */
>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>       memset(&hole, 0, sizeof(hole));
>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>> +                      hole_size, t->align,
>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>                         0, U64_MAX,
>>>>                         DRM_MM_INSERT_BEST);
>>>>       if (!err)
>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>> *engine, struct rnd_state *prng)
>>>>           goto err_put;
>>>>       }
>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>> +    t->hole = hole.start + t->align;
>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>> tiled_blits *t)
>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>                      struct rnd_state *prng)
>>>>   {
>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>       u32 *map;
>>>>       int err;
>>>>       int i;
>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>> tiled_blits *t,
>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>> rnd_state *prng)
>>>>   {
>>>> -    u64 offset =
>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>> I915_GTT_MIN_ALIGNMENT);
>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>       int err;
>>>>       /* We want to check position invariant tiling across GTT 
>>>> eviction */
>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits 
>>>> *t, struct rnd_state *prng)
>>>>       /* Reposition so that we overlap the old addresses, and 
>>>> slightly off */
>>>>       err = tiled_blit(t,
>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>> +             &t->buffers[2], t->hole + t->align,
>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>       if (err)
>>>>           return err;
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>> i915_address_space *vm, int subclass)
>>>>       GEM_BUG_ON(!vm->total);
>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>> +
>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>> +
>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>> +        if (IS_DG2(vm->i915)) {
>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>> Not only for DG2.
>>
>> really? can we get confirmation of this?
>> this contradicts the documentation in patch 4, which you reviewed, so 
>> I am confused now
> 
> Starting from DG2, some platforms will have this new 64K GTT page size 
> restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
> to cover exactly that, AFAIK.

As I understood it, 64K pages only are a requirement going forward for 
discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
pages was specific to DG2.

e.g.  xehpsdv is also defined as having 64k pages. And others in future 
are likely to, but without the PDE sharing restrictions.

If this is not the case, and all 64K page devices will also necessitate 
not sharing PDEs, then we can just use the HAS_64K_PAGES and use 2MB 
everywhere, but so far this sounds unconfirmed.

> 
>>
>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_2M;
>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_2M;
>>>> +        } else {
>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_64K;
>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_64K;
>>>> +        }
>>>> +    }
>>>> +
>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> index 8073438b67c8..b8da2514d601 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> @@ -29,6 +29,8 @@
>>>>   #include "i915_selftest.h"
>>>>   #include "i915_vma_resource.h"
>>>>   #include "i915_vma_types.h"
>>>> +#include "i915_params.h"
>>>> +#include "intel_memory_region.h"
>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>> __GFP_NOWARN)
>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>       struct device *dma;
>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>       u64 reserved;        /* size addr space reserved */
>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>       unsigned int bind_async_flags;
>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>> i915_address_space *vm)
>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>   }
>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>>>> +                    enum intel_memory_type type)
>>>> +{
>>>> +    return vm->min_alignment[type];
>>>> +}
>>>> +
>>>>   static inline bool
>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>   {
>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
>>>> u64 alignment, u64 flags)
>>>>       }
>>>>       color = 0;
>>>> +
>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>> +        /*
>>>> +         * DG2 can not have different sized pages in any given PDE 
>>>> (2MB range).
>>>> +         * Keeping things simple, we force any lmem object to reserve
>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>> alongside
>>>> +         */
>>>> +        if (IS_DG2(vma->vm->i915)) {
>>> Similarly here we dont need special case for DG2.
>>>
>>> Ram
>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>> +        }
>>>> +    }
>>>> +
>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>           color = vma->obj->cache_level;
>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>                u64 hole_start, u64 hole_end,
>>>>                unsigned long end_time)
>>>>   {
>>>> +    const unsigned int min_alignment =
>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>       I915_RND_STATE(seed_prng);
>>>>       struct i915_vma_resource *mock_vma_res;
>>>>       unsigned int size;
>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>           struct drm_i915_gem_object *obj;
>>>>           unsigned int *order, count, n;
>>>> -        u64 hole_size;
>>>> +        u64 hole_size, aligned_size;
>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>           count = hole_size >> 1;
>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           GEM_BUG_ON(!order);
>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>> hole_end);
>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>            * a test failure) as we are purposefully allocating very
>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               intel_wakeref_t wakeref;
>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>               if (igt_timeout(end_time,
>>>>                       "%s timed out before %d/%d\n",
>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>               }
>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>               mock_vma_res->start = addr;
>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           i915_random_reorder(order, count, &prng);
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               intel_wakeref_t wakeref;
>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>   {
>>>>       const u64 hole_size = hole_end - hole_start;
>>>>       struct drm_i915_gem_object *obj;
>>>> +    const unsigned int min_alignment =
>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>       const unsigned long max_pages =
>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>> ilog2(min_alignment));
>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>       unsigned long npages, prime, flags;
>>>>       struct i915_vma *vma;
>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       i915_vma_unpin(vma);
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       }
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       i915_vma_unpin(vma);
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>>>                       }
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>               }
>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>>       const u64 hole_size = hole_end - hole_start;
>>>>       const unsigned long max_pages =
>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>> +    unsigned long min_alignment;
>>>>       unsigned long flags;
>>>>       u64 size;
>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>           struct drm_i915_gem_object *obj;
>>>>           struct i915_vma *vma;
>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>>           for (addr = hole_start;
>>>>                addr + obj->base.size < hole_end;
>>>> -             addr += obj->base.size) {
>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>>> %llx] with err=%d\n",
>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>>   {
>>>>       struct drm_i915_gem_object *obj;
>>>>       struct i915_vma *vma;
>>>> +    unsigned int min_alignment;
>>>>       unsigned long flags;
>>>>       unsigned int pot;
>>>>       int err = 0;
>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>> I915_GTT_PAGE_SIZE);
>>>>       if (IS_ERR(obj))
>>>>           return PTR_ERR(obj);
>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space 
>>>> *vm,
>>>>       /* Insert a pair of pages across every pot boundary within the 
>>>> hole */
>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>> +         pot > ilog2(2 * min_alignment);
>>>>            pot--) {
>>>>           u64 step = BIT_ULL(pot);
>>>>           u64 addr;
>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) 
>>>> - I915_GTT_PAGE_SIZE;
>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>> step) - I915_GTT_PAGE_SIZE;
>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>> min_alignment;
>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>> step) - min_alignment;
>>>>                addr += step) {
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space 
>>>> *vm,
>>>>                 unsigned long end_time)
>>>>   {
>>>>       I915_RND_STATE(prng);
>>>> +    unsigned int min_alignment;
>>>>       unsigned int size;
>>>>       unsigned long flags;
>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>> i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>> hole */
>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>           struct drm_i915_gem_object *obj;
>>>>           unsigned int *order, count, n;
>>>>           struct i915_vma *vma;
>>>> -        u64 hole_size;
>>>> +        u64 hole_size, aligned_size;
>>>>           int err = -ENODEV;
>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>           count = hole_size >> 1;
>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space 
>>>> *vm,
>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>> i915_address_space *vm,
>>>>   {
>>>>       struct drm_i915_gem_object *obj;
>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>> +    unsigned int min_alignment;
>>>>       unsigned int order = 12;
>>>>       LIST_HEAD(objects);
>>>>       int err = 0;
>>>>       u64 addr;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>> hole */
>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>           struct i915_vma *vma;
>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           i915_vma_unpin(vma);
>>>> -        addr += size;
>>>> +        addr += round_up(size, min_alignment);
>>>>           /*
>>>>            * Since we are injecting allocation faults at random 
>>>> intervals,
>>>> -- 
>>>> 2.25.1
>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 15:44           ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 15:44 UTC (permalink / raw)
  To: Matthew Auld, Ramalingam C
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel



On 20/01/2022 14:59, Matthew Auld wrote:
> On 20/01/2022 13:15, Robert Beckett wrote:
>>
>>
>> On 20/01/2022 11:46, Ramalingam C wrote:
>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>
>>>> For local-memory objects we need to align the GTT addresses
>>>> to 64K, both for the ppgtt and ggtt.
>>>>
>>>> We need to support vm->min_alignment > 4K, depending
>>>> on the vm itself and the type of object we are inserting.
>>>> With this in mind update the GTT selftests to take this
>>>> into account.
>>>>
>>>> For DG2 we further align and pad lmem object GTT addresses
>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>> required by the HW.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> ---
>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>> ++++++++++++-------
>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git 
>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> index c08f766e6e15..7fee95a65414 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>       struct blit_buffer scratch;
>>>>       struct i915_vma *batch;
>>>>       u64 hole;
>>>> +    u64 align;
>>>>       u32 width;
>>>>       u32 height;
>>>>   };
>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>> *engine, struct rnd_state *prng)
>>>>           goto err_free;
>>>>       }
>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>> from vm! */
>>>> +    t->align = max(t->align,
>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>>> +    t->align = max(t->align,
>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
>>>> +
>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>       hole_size *= 2; /* room to maneuver */
>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>       memset(&hole, 0, sizeof(hole));
>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>> +                      hole_size, t->align,
>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>                         0, U64_MAX,
>>>>                         DRM_MM_INSERT_BEST);
>>>>       if (!err)
>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>> *engine, struct rnd_state *prng)
>>>>           goto err_put;
>>>>       }
>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>> +    t->hole = hole.start + t->align;
>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>> tiled_blits *t)
>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>                      struct rnd_state *prng)
>>>>   {
>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>       u32 *map;
>>>>       int err;
>>>>       int i;
>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>> tiled_blits *t,
>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>> rnd_state *prng)
>>>>   {
>>>> -    u64 offset =
>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>> I915_GTT_MIN_ALIGNMENT);
>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>       int err;
>>>>       /* We want to check position invariant tiling across GTT 
>>>> eviction */
>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits 
>>>> *t, struct rnd_state *prng)
>>>>       /* Reposition so that we overlap the old addresses, and 
>>>> slightly off */
>>>>       err = tiled_blit(t,
>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>> +             &t->buffers[2], t->hole + t->align,
>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>       if (err)
>>>>           return err;
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>> i915_address_space *vm, int subclass)
>>>>       GEM_BUG_ON(!vm->total);
>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>> +
>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>> +
>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>> +        if (IS_DG2(vm->i915)) {
>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>> Not only for DG2.
>>
>> really? can we get confirmation of this?
>> this contradicts the documentation in patch 4, which you reviewed, so 
>> I am confused now
> 
> Starting from DG2, some platforms will have this new 64K GTT page size 
> restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
> to cover exactly that, AFAIK.

As I understood it, 64K pages only are a requirement going forward for 
discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
pages was specific to DG2.

e.g.  xehpsdv is also defined as having 64k pages. And others in future 
are likely to, but without the PDE sharing restrictions.

If this is not the case, and all 64K page devices will also necessitate 
not sharing PDEs, then we can just use the HAS_64K_PAGES and use 2MB 
everywhere, but so far this sounds unconfirmed.

> 
>>
>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_2M;
>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_2M;
>>>> +        } else {
>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_64K;
>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>> I915_GTT_PAGE_SIZE_64K;
>>>> +        }
>>>> +    }
>>>> +
>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> index 8073438b67c8..b8da2514d601 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>> @@ -29,6 +29,8 @@
>>>>   #include "i915_selftest.h"
>>>>   #include "i915_vma_resource.h"
>>>>   #include "i915_vma_types.h"
>>>> +#include "i915_params.h"
>>>> +#include "intel_memory_region.h"
>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>> __GFP_NOWARN)
>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>       struct device *dma;
>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>       u64 reserved;        /* size addr space reserved */
>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>       unsigned int bind_async_flags;
>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>> i915_address_space *vm)
>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>   }
>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
>>>> +                    enum intel_memory_type type)
>>>> +{
>>>> +    return vm->min_alignment[type];
>>>> +}
>>>> +
>>>>   static inline bool
>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>   {
>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 size, 
>>>> u64 alignment, u64 flags)
>>>>       }
>>>>       color = 0;
>>>> +
>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>> +        /*
>>>> +         * DG2 can not have different sized pages in any given PDE 
>>>> (2MB range).
>>>> +         * Keeping things simple, we force any lmem object to reserve
>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>> alongside
>>>> +         */
>>>> +        if (IS_DG2(vma->vm->i915)) {
>>> Similarly here we dont need special case for DG2.
>>>
>>> Ram
>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>> +        }
>>>> +    }
>>>> +
>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>           color = vma->obj->cache_level;
>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>                u64 hole_start, u64 hole_end,
>>>>                unsigned long end_time)
>>>>   {
>>>> +    const unsigned int min_alignment =
>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>       I915_RND_STATE(seed_prng);
>>>>       struct i915_vma_resource *mock_vma_res;
>>>>       unsigned int size;
>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>           struct drm_i915_gem_object *obj;
>>>>           unsigned int *order, count, n;
>>>> -        u64 hole_size;
>>>> +        u64 hole_size, aligned_size;
>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>           count = hole_size >> 1;
>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           GEM_BUG_ON(!order);
>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>> hole_end);
>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>            * a test failure) as we are purposefully allocating very
>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               intel_wakeref_t wakeref;
>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>               if (igt_timeout(end_time,
>>>>                       "%s timed out before %d/%d\n",
>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>               }
>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>               mock_vma_res->start = addr;
>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>> i915_address_space *vm,
>>>>           i915_random_reorder(order, count, &prng);
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               intel_wakeref_t wakeref;
>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>   {
>>>>       const u64 hole_size = hole_end - hole_start;
>>>>       struct drm_i915_gem_object *obj;
>>>> +    const unsigned int min_alignment =
>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>       const unsigned long max_pages =
>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>> ilog2(min_alignment));
>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>       unsigned long npages, prime, flags;
>>>>       struct i915_vma *vma;
>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       i915_vma_unpin(vma);
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       }
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space 
>>>> *vm,
>>>>                       i915_vma_unpin(vma);
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>                   offset = p->offset;
>>>>                   list_for_each_entry_reverse(obj, &objects, st_link) {
>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>> +                                    min_alignment);
>>>> +
>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>                       if (IS_ERR(vma))
>>>>                           continue;
>>>>                       if (p->step < 0) {
>>>> -                        if (offset < hole_start + obj->base.size)
>>>> +                        if (offset < hole_start + aligned_size)
>>>>                               break;
>>>> -                        offset -= obj->base.size;
>>>> +                        offset -= aligned_size;
>>>>                       }
>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
>>>>                       }
>>>>                       if (p->step > 0) {
>>>> -                        if (offset + obj->base.size > hole_end)
>>>> +                        if (offset + aligned_size > hole_end)
>>>>                               break;
>>>> -                        offset += obj->base.size;
>>>> +                        offset += aligned_size;
>>>>                       }
>>>>                   }
>>>>               }
>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>>       const u64 hole_size = hole_end - hole_start;
>>>>       const unsigned long max_pages =
>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>> +    unsigned long min_alignment;
>>>>       unsigned long flags;
>>>>       u64 size;
>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>           struct drm_i915_gem_object *obj;
>>>>           struct i915_vma *vma;
>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
>>>>           for (addr = hole_start;
>>>>                addr + obj->base.size < hole_end;
>>>> -             addr += obj->base.size) {
>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>>> %llx] with err=%d\n",
>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>>   {
>>>>       struct drm_i915_gem_object *obj;
>>>>       struct i915_vma *vma;
>>>> +    unsigned int min_alignment;
>>>>       unsigned long flags;
>>>>       unsigned int pot;
>>>>       int err = 0;
>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>> I915_GTT_PAGE_SIZE);
>>>>       if (IS_ERR(obj))
>>>>           return PTR_ERR(obj);
>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space 
>>>> *vm,
>>>>       /* Insert a pair of pages across every pot boundary within the 
>>>> hole */
>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>> +         pot > ilog2(2 * min_alignment);
>>>>            pot--) {
>>>>           u64 step = BIT_ULL(pot);
>>>>           u64 addr;
>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) 
>>>> - I915_GTT_PAGE_SIZE;
>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>> step) - I915_GTT_PAGE_SIZE;
>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>> min_alignment;
>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>> step) - min_alignment;
>>>>                addr += step) {
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space 
>>>> *vm,
>>>>                 unsigned long end_time)
>>>>   {
>>>>       I915_RND_STATE(prng);
>>>> +    unsigned int min_alignment;
>>>>       unsigned int size;
>>>>       unsigned long flags;
>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>> i915_address_space *vm,
>>>>       if (i915_is_ggtt(vm))
>>>>           flags |= PIN_GLOBAL;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>> hole */
>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>           struct drm_i915_gem_object *obj;
>>>>           unsigned int *order, count, n;
>>>>           struct i915_vma *vma;
>>>> -        u64 hole_size;
>>>> +        u64 hole_size, aligned_size;
>>>>           int err = -ENODEV;
>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>           count = hole_size >> 1;
>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space 
>>>> *vm,
>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>           for (n = 0; n < count; n++) {
>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>               if (err) {
>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>> i915_address_space *vm,
>>>>   {
>>>>       struct drm_i915_gem_object *obj;
>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>> +    unsigned int min_alignment;
>>>>       unsigned int order = 12;
>>>>       LIST_HEAD(objects);
>>>>       int err = 0;
>>>>       u64 addr;
>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>> +
>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>> hole */
>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>           struct i915_vma *vma;
>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>> i915_address_space *vm,
>>>>           }
>>>>           i915_vma_unpin(vma);
>>>> -        addr += size;
>>>> +        addr += round_up(size, min_alignment);
>>>>           /*
>>>>            * Since we are injecting allocation faults at random 
>>>> intervals,
>>>> -- 
>>>> 2.25.1
>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 15:44           ` Robert Beckett
  (?)
@ 2022-01-20 15:58             ` Matthew Auld
  -1 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 15:58 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel, linux-kernel

On 20/01/2022 15:44, Robert Beckett wrote:
> 
> 
> On 20/01/2022 14:59, Matthew Auld wrote:
>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>
>>>
>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>
>>>>> For local-memory objects we need to align the GTT addresses
>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>
>>>>> We need to support vm->min_alignment > 4K, depending
>>>>> on the vm itself and the type of object we are inserting.
>>>>> With this in mind update the GTT selftests to take this
>>>>> into account.
>>>>>
>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>> required by the HW.
>>>>>
>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>> ---
>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>> ++++++++++++-------
>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>
>>>>> diff --git 
>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>       struct blit_buffer scratch;
>>>>>       struct i915_vma *batch;
>>>>>       u64 hole;
>>>>> +    u64 align;
>>>>>       u32 width;
>>>>>       u32 height;
>>>>>   };
>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>> *engine, struct rnd_state *prng)
>>>>>           goto err_free;
>>>>>       }
>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>> from vm! */
>>>>> +    t->align = max(t->align,
>>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>>>> +    t->align = max(t->align,
>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>> INTEL_MEMORY_SYSTEM));
>>>>> +
>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>       hole_size *= 2; /* room to maneuver */
>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>> +                      hole_size, t->align,
>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>                         0, U64_MAX,
>>>>>                         DRM_MM_INSERT_BEST);
>>>>>       if (!err)
>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>> *engine, struct rnd_state *prng)
>>>>>           goto err_put;
>>>>>       }
>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>> +    t->hole = hole.start + t->align;
>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>> tiled_blits *t)
>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>                      struct rnd_state *prng)
>>>>>   {
>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>       u32 *map;
>>>>>       int err;
>>>>>       int i;
>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>> tiled_blits *t,
>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>> rnd_state *prng)
>>>>>   {
>>>>> -    u64 offset =
>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>       int err;
>>>>>       /* We want to check position invariant tiling across GTT 
>>>>> eviction */
>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>> slightly off */
>>>>>       err = tiled_blit(t,
>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>       if (err)
>>>>>           return err;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>> i915_address_space *vm, int subclass)
>>>>>       GEM_BUG_ON(!vm->total);
>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>> +
>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>> +
>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>> +        if (IS_DG2(vm->i915)) {
>>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>>> Not only for DG2.
>>>
>>> really? can we get confirmation of this?
>>> this contradicts the documentation in patch 4, which you reviewed, so 
>>> I am confused now
>>
>> Starting from DG2, some platforms will have this new 64K GTT page size 
>> restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
>> to cover exactly that, AFAIK.
> 
> As I understood it, 64K pages only are a requirement going forward for 
> discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
> pages was specific to DG2.
> 
> e.g.  xehpsdv is also defined as having 64k pages. And others in future 
> are likely to, but without the PDE sharing restrictions.

Yeah, pretty much. But there is one other platform lurking.

 From chatting with Ram, it might also make sense to disentangle 
HAS_64K_PAGES(), since it currently means both that we need min 64K page 
granularity, and that there is this compact-pt layout thing which 
doesn't allow mixing 64K and 4K in the same page-table.

> 
> If this is not the case, and all 64K page devices will also necessitate 
> not sharing PDEs, then we can just use the HAS_64K_PAGES and use 2MB 
> everywhere, but so far this sounds unconfirmed.
> 
>>
>>>
>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>> +        } else {
>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> @@ -29,6 +29,8 @@
>>>>>   #include "i915_selftest.h"
>>>>>   #include "i915_vma_resource.h"
>>>>>   #include "i915_vma_types.h"
>>>>> +#include "i915_params.h"
>>>>> +#include "intel_memory_region.h"
>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>> __GFP_NOWARN)
>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>       struct device *dma;
>>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>>       u64 reserved;        /* size addr space reserved */
>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>       unsigned int bind_async_flags;
>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>> i915_address_space *vm)
>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>   }
>>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space 
>>>>> *vm,
>>>>> +                    enum intel_memory_type type)
>>>>> +{
>>>>> +    return vm->min_alignment[type];
>>>>> +}
>>>>> +
>>>>>   static inline bool
>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>   {
>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>> size, u64 alignment, u64 flags)
>>>>>       }
>>>>>       color = 0;
>>>>> +
>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>> +        /*
>>>>> +         * DG2 can not have different sized pages in any given PDE 
>>>>> (2MB range).
>>>>> +         * Keeping things simple, we force any lmem object to reserve
>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>> alongside
>>>>> +         */
>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>> Similarly here we dont need special case for DG2.
>>>>
>>>> Ram
>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>           color = vma->obj->cache_level;
>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                u64 hole_start, u64 hole_end,
>>>>>                unsigned long end_time)
>>>>>   {
>>>>> +    const unsigned int min_alignment =
>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>       I915_RND_STATE(seed_prng);
>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>       unsigned int size;
>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           unsigned int *order, count, n;
>>>>> -        u64 hole_size;
>>>>> +        u64 hole_size, aligned_size;
>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>           count = hole_size >> 1;
>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           GEM_BUG_ON(!order);
>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>> hole_end);
>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>            * a test failure) as we are purposefully allocating very
>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               intel_wakeref_t wakeref;
>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>               if (igt_timeout(end_time,
>>>>>                       "%s timed out before %d/%d\n",
>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>               }
>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>               mock_vma_res->start = addr;
>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           i915_random_reorder(order, count, &prng);
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               intel_wakeref_t wakeref;
>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space 
>>>>> *vm,
>>>>>   {
>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>       struct drm_i915_gem_object *obj;
>>>>> +    const unsigned int min_alignment =
>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>       const unsigned long max_pages =
>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>> ilog2(min_alignment));
>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>       unsigned long npages, prime, flags;
>>>>>       struct i915_vma *vma;
>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       i915_vma_unpin(vma);
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       }
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>> st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       i915_vma_unpin(vma);
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>> st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space 
>>>>> *vm,
>>>>>                       }
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>               }
>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>       const unsigned long max_pages =
>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>> +    unsigned long min_alignment;
>>>>>       unsigned long flags;
>>>>>       u64 size;
>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           struct i915_vma *vma;
>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>           for (addr = hole_start;
>>>>>                addr + obj->base.size < hole_end;
>>>>> -             addr += obj->base.size) {
>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>>>> %llx] with err=%d\n",
>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>>>   {
>>>>>       struct drm_i915_gem_object *obj;
>>>>>       struct i915_vma *vma;
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned long flags;
>>>>>       unsigned int pot;
>>>>>       int err = 0;
>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>> I915_GTT_PAGE_SIZE);
>>>>>       if (IS_ERR(obj))
>>>>>           return PTR_ERR(obj);
>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>> the hole */
>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>            pot--) {
>>>>>           u64 step = BIT_ULL(pot);
>>>>>           u64 addr;
>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>> min_alignment;
>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>> step) - min_alignment;
>>>>>                addr += step) {
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>                 unsigned long end_time)
>>>>>   {
>>>>>       I915_RND_STATE(prng);
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned int size;
>>>>>       unsigned long flags;
>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>> i915_address_space *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>>> hole */
>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           unsigned int *order, count, n;
>>>>>           struct i915_vma *vma;
>>>>> -        u64 hole_size;
>>>>> +        u64 hole_size, aligned_size;
>>>>>           int err = -ENODEV;
>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>           count = hole_size >> 1;
>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>> i915_address_space *vm,
>>>>>   {
>>>>>       struct drm_i915_gem_object *obj;
>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned int order = 12;
>>>>>       LIST_HEAD(objects);
>>>>>       int err = 0;
>>>>>       u64 addr;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>>> hole */
>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>           struct i915_vma *vma;
>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           i915_vma_unpin(vma);
>>>>> -        addr += size;
>>>>> +        addr += round_up(size, min_alignment);
>>>>>           /*
>>>>>            * Since we are injecting allocation faults at random 
>>>>> intervals,
>>>>> -- 
>>>>> 2.25.1
>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 15:58             ` Matthew Auld
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 15:58 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Rodrigo Vivi

On 20/01/2022 15:44, Robert Beckett wrote:
> 
> 
> On 20/01/2022 14:59, Matthew Auld wrote:
>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>
>>>
>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>
>>>>> For local-memory objects we need to align the GTT addresses
>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>
>>>>> We need to support vm->min_alignment > 4K, depending
>>>>> on the vm itself and the type of object we are inserting.
>>>>> With this in mind update the GTT selftests to take this
>>>>> into account.
>>>>>
>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>> required by the HW.
>>>>>
>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>> ---
>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>> ++++++++++++-------
>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>
>>>>> diff --git 
>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>       struct blit_buffer scratch;
>>>>>       struct i915_vma *batch;
>>>>>       u64 hole;
>>>>> +    u64 align;
>>>>>       u32 width;
>>>>>       u32 height;
>>>>>   };
>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>> *engine, struct rnd_state *prng)
>>>>>           goto err_free;
>>>>>       }
>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>> from vm! */
>>>>> +    t->align = max(t->align,
>>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>>>> +    t->align = max(t->align,
>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>> INTEL_MEMORY_SYSTEM));
>>>>> +
>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>       hole_size *= 2; /* room to maneuver */
>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>> +                      hole_size, t->align,
>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>                         0, U64_MAX,
>>>>>                         DRM_MM_INSERT_BEST);
>>>>>       if (!err)
>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>> *engine, struct rnd_state *prng)
>>>>>           goto err_put;
>>>>>       }
>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>> +    t->hole = hole.start + t->align;
>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>> tiled_blits *t)
>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>                      struct rnd_state *prng)
>>>>>   {
>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>       u32 *map;
>>>>>       int err;
>>>>>       int i;
>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>> tiled_blits *t,
>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>> rnd_state *prng)
>>>>>   {
>>>>> -    u64 offset =
>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>       int err;
>>>>>       /* We want to check position invariant tiling across GTT 
>>>>> eviction */
>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>> slightly off */
>>>>>       err = tiled_blit(t,
>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>       if (err)
>>>>>           return err;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>> i915_address_space *vm, int subclass)
>>>>>       GEM_BUG_ON(!vm->total);
>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>> +
>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>> +
>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>> +        if (IS_DG2(vm->i915)) {
>>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>>> Not only for DG2.
>>>
>>> really? can we get confirmation of this?
>>> this contradicts the documentation in patch 4, which you reviewed, so 
>>> I am confused now
>>
>> Starting from DG2, some platforms will have this new 64K GTT page size 
>> restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
>> to cover exactly that, AFAIK.
> 
> As I understood it, 64K pages only are a requirement going forward for 
> discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
> pages was specific to DG2.
> 
> e.g.  xehpsdv is also defined as having 64k pages. And others in future 
> are likely to, but without the PDE sharing restrictions.

Yeah, pretty much. But there is one other platform lurking.

 From chatting with Ram, it might also make sense to disentangle 
HAS_64K_PAGES(), since it currently means both that we need min 64K page 
granularity, and that there is this compact-pt layout thing which 
doesn't allow mixing 64K and 4K in the same page-table.

> 
> If this is not the case, and all 64K page devices will also necessitate 
> not sharing PDEs, then we can just use the HAS_64K_PAGES and use 2MB 
> everywhere, but so far this sounds unconfirmed.
> 
>>
>>>
>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>> +        } else {
>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> @@ -29,6 +29,8 @@
>>>>>   #include "i915_selftest.h"
>>>>>   #include "i915_vma_resource.h"
>>>>>   #include "i915_vma_types.h"
>>>>> +#include "i915_params.h"
>>>>> +#include "intel_memory_region.h"
>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>> __GFP_NOWARN)
>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>       struct device *dma;
>>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>>       u64 reserved;        /* size addr space reserved */
>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>       unsigned int bind_async_flags;
>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>> i915_address_space *vm)
>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>   }
>>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space 
>>>>> *vm,
>>>>> +                    enum intel_memory_type type)
>>>>> +{
>>>>> +    return vm->min_alignment[type];
>>>>> +}
>>>>> +
>>>>>   static inline bool
>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>   {
>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>> size, u64 alignment, u64 flags)
>>>>>       }
>>>>>       color = 0;
>>>>> +
>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>> +        /*
>>>>> +         * DG2 can not have different sized pages in any given PDE 
>>>>> (2MB range).
>>>>> +         * Keeping things simple, we force any lmem object to reserve
>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>> alongside
>>>>> +         */
>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>> Similarly here we dont need special case for DG2.
>>>>
>>>> Ram
>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>           color = vma->obj->cache_level;
>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                u64 hole_start, u64 hole_end,
>>>>>                unsigned long end_time)
>>>>>   {
>>>>> +    const unsigned int min_alignment =
>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>       I915_RND_STATE(seed_prng);
>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>       unsigned int size;
>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           unsigned int *order, count, n;
>>>>> -        u64 hole_size;
>>>>> +        u64 hole_size, aligned_size;
>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>           count = hole_size >> 1;
>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           GEM_BUG_ON(!order);
>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>> hole_end);
>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>            * a test failure) as we are purposefully allocating very
>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               intel_wakeref_t wakeref;
>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>               if (igt_timeout(end_time,
>>>>>                       "%s timed out before %d/%d\n",
>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>               }
>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>               mock_vma_res->start = addr;
>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           i915_random_reorder(order, count, &prng);
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               intel_wakeref_t wakeref;
>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space 
>>>>> *vm,
>>>>>   {
>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>       struct drm_i915_gem_object *obj;
>>>>> +    const unsigned int min_alignment =
>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>       const unsigned long max_pages =
>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>> ilog2(min_alignment));
>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>       unsigned long npages, prime, flags;
>>>>>       struct i915_vma *vma;
>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       i915_vma_unpin(vma);
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       }
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>> st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       i915_vma_unpin(vma);
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>> st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space 
>>>>> *vm,
>>>>>                       }
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>               }
>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>       const unsigned long max_pages =
>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>> +    unsigned long min_alignment;
>>>>>       unsigned long flags;
>>>>>       u64 size;
>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           struct i915_vma *vma;
>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>           for (addr = hole_start;
>>>>>                addr + obj->base.size < hole_end;
>>>>> -             addr += obj->base.size) {
>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>>>> %llx] with err=%d\n",
>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>>>   {
>>>>>       struct drm_i915_gem_object *obj;
>>>>>       struct i915_vma *vma;
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned long flags;
>>>>>       unsigned int pot;
>>>>>       int err = 0;
>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>> I915_GTT_PAGE_SIZE);
>>>>>       if (IS_ERR(obj))
>>>>>           return PTR_ERR(obj);
>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>> the hole */
>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>            pot--) {
>>>>>           u64 step = BIT_ULL(pot);
>>>>>           u64 addr;
>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>> min_alignment;
>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>> step) - min_alignment;
>>>>>                addr += step) {
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>                 unsigned long end_time)
>>>>>   {
>>>>>       I915_RND_STATE(prng);
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned int size;
>>>>>       unsigned long flags;
>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>> i915_address_space *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>>> hole */
>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           unsigned int *order, count, n;
>>>>>           struct i915_vma *vma;
>>>>> -        u64 hole_size;
>>>>> +        u64 hole_size, aligned_size;
>>>>>           int err = -ENODEV;
>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>           count = hole_size >> 1;
>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>> i915_address_space *vm,
>>>>>   {
>>>>>       struct drm_i915_gem_object *obj;
>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned int order = 12;
>>>>>       LIST_HEAD(objects);
>>>>>       int err = 0;
>>>>>       u64 addr;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>>> hole */
>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>           struct i915_vma *vma;
>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           i915_vma_unpin(vma);
>>>>> -        addr += size;
>>>>> +        addr += round_up(size, min_alignment);
>>>>>           /*
>>>>>            * Since we are injecting allocation faults at random 
>>>>> intervals,
>>>>> -- 
>>>>> 2.25.1
>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 15:58             ` Matthew Auld
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 15:58 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel

On 20/01/2022 15:44, Robert Beckett wrote:
> 
> 
> On 20/01/2022 14:59, Matthew Auld wrote:
>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>
>>>
>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>
>>>>> For local-memory objects we need to align the GTT addresses
>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>
>>>>> We need to support vm->min_alignment > 4K, depending
>>>>> on the vm itself and the type of object we are inserting.
>>>>> With this in mind update the GTT selftests to take this
>>>>> into account.
>>>>>
>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>> required by the HW.
>>>>>
>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>> ---
>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>> ++++++++++++-------
>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>
>>>>> diff --git 
>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>       struct blit_buffer scratch;
>>>>>       struct i915_vma *batch;
>>>>>       u64 hole;
>>>>> +    u64 align;
>>>>>       u32 width;
>>>>>       u32 height;
>>>>>   };
>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>> *engine, struct rnd_state *prng)
>>>>>           goto err_free;
>>>>>       }
>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>> from vm! */
>>>>> +    t->align = max(t->align,
>>>>> +               i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
>>>>> +    t->align = max(t->align,
>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>> INTEL_MEMORY_SYSTEM));
>>>>> +
>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>       hole_size *= 2; /* room to maneuver */
>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>> +                      hole_size, t->align,
>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>                         0, U64_MAX,
>>>>>                         DRM_MM_INSERT_BEST);
>>>>>       if (!err)
>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>> *engine, struct rnd_state *prng)
>>>>>           goto err_put;
>>>>>       }
>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>> +    t->hole = hole.start + t->align;
>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>> tiled_blits *t)
>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>                      struct rnd_state *prng)
>>>>>   {
>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>       u32 *map;
>>>>>       int err;
>>>>>       int i;
>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>> tiled_blits *t,
>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>> rnd_state *prng)
>>>>>   {
>>>>> -    u64 offset =
>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>       int err;
>>>>>       /* We want to check position invariant tiling across GTT 
>>>>> eviction */
>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>> slightly off */
>>>>>       err = tiled_blit(t,
>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>       if (err)
>>>>>           return err;
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>> i915_address_space *vm, int subclass)
>>>>>       GEM_BUG_ON(!vm->total);
>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>> +
>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>> +
>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>> +        if (IS_DG2(vm->i915)) {
>>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>>> Not only for DG2.
>>>
>>> really? can we get confirmation of this?
>>> this contradicts the documentation in patch 4, which you reviewed, so 
>>> I am confused now
>>
>> Starting from DG2, some platforms will have this new 64K GTT page size 
>> restriction when dealing with LMEM. The HAS_64K_PAGES() macro is meant 
>> to cover exactly that, AFAIK.
> 
> As I understood it, 64K pages only are a requirement going forward for 
> discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
> pages was specific to DG2.
> 
> e.g.  xehpsdv is also defined as having 64k pages. And others in future 
> are likely to, but without the PDE sharing restrictions.

Yeah, pretty much. But there is one other platform lurking.

 From chatting with Ram, it might also make sense to disentangle 
HAS_64K_PAGES(), since it currently means both that we need min 64K page 
granularity, and that there is this compact-pt layout thing which 
doesn't allow mixing 64K and 4K in the same page-table.

> 
> If this is not the case, and all 64K page devices will also necessitate 
> not sharing PDEs, then we can just use the HAS_64K_PAGES and use 2MB 
> everywhere, but so far this sounds unconfirmed.
> 
>>
>>>
>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>> +        } else {
>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>> @@ -29,6 +29,8 @@
>>>>>   #include "i915_selftest.h"
>>>>>   #include "i915_vma_resource.h"
>>>>>   #include "i915_vma_types.h"
>>>>> +#include "i915_params.h"
>>>>> +#include "intel_memory_region.h"
>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>> __GFP_NOWARN)
>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>       struct device *dma;
>>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>>       u64 reserved;        /* size addr space reserved */
>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>       unsigned int bind_async_flags;
>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>> i915_address_space *vm)
>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>   }
>>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space 
>>>>> *vm,
>>>>> +                    enum intel_memory_type type)
>>>>> +{
>>>>> +    return vm->min_alignment[type];
>>>>> +}
>>>>> +
>>>>>   static inline bool
>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>   {
>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>> size, u64 alignment, u64 flags)
>>>>>       }
>>>>>       color = 0;
>>>>> +
>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>> +        /*
>>>>> +         * DG2 can not have different sized pages in any given PDE 
>>>>> (2MB range).
>>>>> +         * Keeping things simple, we force any lmem object to reserve
>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>> alongside
>>>>> +         */
>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>> Similarly here we dont need special case for DG2.
>>>>
>>>> Ram
>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>           color = vma->obj->cache_level;
>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                u64 hole_start, u64 hole_end,
>>>>>                unsigned long end_time)
>>>>>   {
>>>>> +    const unsigned int min_alignment =
>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>       I915_RND_STATE(seed_prng);
>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>       unsigned int size;
>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           unsigned int *order, count, n;
>>>>> -        u64 hole_size;
>>>>> +        u64 hole_size, aligned_size;
>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>           count = hole_size >> 1;
>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           GEM_BUG_ON(!order);
>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>> hole_end);
>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>            * a test failure) as we are purposefully allocating very
>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               intel_wakeref_t wakeref;
>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>               if (igt_timeout(end_time,
>>>>>                       "%s timed out before %d/%d\n",
>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>               }
>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>               mock_vma_res->start = addr;
>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           i915_random_reorder(order, count, &prng);
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               intel_wakeref_t wakeref;
>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space 
>>>>> *vm,
>>>>>   {
>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>       struct drm_i915_gem_object *obj;
>>>>> +    const unsigned int min_alignment =
>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>       const unsigned long max_pages =
>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>> ilog2(min_alignment));
>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>       unsigned long npages, prime, flags;
>>>>>       struct i915_vma *vma;
>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       i915_vma_unpin(vma);
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       }
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>> st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>> i915_address_space *vm,
>>>>>                       i915_vma_unpin(vma);
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>                   offset = p->offset;
>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>> st_link) {
>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>> +                                    min_alignment);
>>>>> +
>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>                       if (IS_ERR(vma))
>>>>>                           continue;
>>>>>                       if (p->step < 0) {
>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>                               break;
>>>>> -                        offset -= obj->base.size;
>>>>> +                        offset -= aligned_size;
>>>>>                       }
>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space 
>>>>> *vm,
>>>>>                       }
>>>>>                       if (p->step > 0) {
>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>                               break;
>>>>> -                        offset += obj->base.size;
>>>>> +                        offset += aligned_size;
>>>>>                       }
>>>>>                   }
>>>>>               }
>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>       const unsigned long max_pages =
>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>> +    unsigned long min_alignment;
>>>>>       unsigned long flags;
>>>>>       u64 size;
>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           struct i915_vma *vma;
>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>           for (addr = hole_start;
>>>>>                addr + obj->base.size < hole_end;
>>>>> -             addr += obj->base.size) {
>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>>                   pr_err("%s bind failed at %llx + %llx [hole %llx- 
>>>>> %llx] with err=%d\n",
>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
>>>>>   {
>>>>>       struct drm_i915_gem_object *obj;
>>>>>       struct i915_vma *vma;
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned long flags;
>>>>>       unsigned int pot;
>>>>>       int err = 0;
>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>> I915_GTT_PAGE_SIZE);
>>>>>       if (IS_ERR(obj))
>>>>>           return PTR_ERR(obj);
>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space 
>>>>> *vm,
>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>> the hole */
>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>            pot--) {
>>>>>           u64 step = BIT_ULL(pot);
>>>>>           u64 addr;
>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>> min_alignment;
>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>> step) - min_alignment;
>>>>>                addr += step) {
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>                 unsigned long end_time)
>>>>>   {
>>>>>       I915_RND_STATE(prng);
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned int size;
>>>>>       unsigned long flags;
>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>> i915_address_space *vm,
>>>>>       if (i915_is_ggtt(vm))
>>>>>           flags |= PIN_GLOBAL;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>>> hole */
>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>           struct drm_i915_gem_object *obj;
>>>>>           unsigned int *order, count, n;
>>>>>           struct i915_vma *vma;
>>>>> -        u64 hole_size;
>>>>> +        u64 hole_size, aligned_size;
>>>>>           int err = -ENODEV;
>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>           count = hole_size >> 1;
>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space 
>>>>> *vm,
>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>           for (n = 0; n < count; n++) {
>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>> +            u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>               if (err) {
>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>> i915_address_space *vm,
>>>>>   {
>>>>>       struct drm_i915_gem_object *obj;
>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>> +    unsigned int min_alignment;
>>>>>       unsigned int order = 12;
>>>>>       LIST_HEAD(objects);
>>>>>       int err = 0;
>>>>>       u64 addr;
>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>> +
>>>>>       /* Keep creating larger objects until one cannot fit into the 
>>>>> hole */
>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>           struct i915_vma *vma;
>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>> i915_address_space *vm,
>>>>>           }
>>>>>           i915_vma_unpin(vma);
>>>>> -        addr += size;
>>>>> +        addr += round_up(size, min_alignment);
>>>>>           /*
>>>>>            * Since we are injecting allocation faults at random 
>>>>> intervals,
>>>>> -- 
>>>>> 2.25.1
>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 15:58             ` Matthew Auld
  (?)
@ 2022-01-20 16:09               ` Robert Beckett
  -1 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 16:09 UTC (permalink / raw)
  To: Matthew Auld, Ramalingam C
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel, linux-kernel



On 20/01/2022 15:58, Matthew Auld wrote:
> On 20/01/2022 15:44, Robert Beckett wrote:
>>
>>
>> On 20/01/2022 14:59, Matthew Auld wrote:
>>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>>
>>>>
>>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>>
>>>>>> For local-memory objects we need to align the GTT addresses
>>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>>
>>>>>> We need to support vm->min_alignment > 4K, depending
>>>>>> on the vm itself and the type of object we are inserting.
>>>>>> With this in mind update the GTT selftests to take this
>>>>>> into account.
>>>>>>
>>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>>> required by the HW.
>>>>>>
>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>> ---
>>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>>> ++++++++++++-------
>>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>>
>>>>>> diff --git 
>>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>>       struct blit_buffer scratch;
>>>>>>       struct i915_vma *batch;
>>>>>>       u64 hole;
>>>>>> +    u64 align;
>>>>>>       u32 width;
>>>>>>       u32 height;
>>>>>>   };
>>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>> *engine, struct rnd_state *prng)
>>>>>>           goto err_free;
>>>>>>       }
>>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>>> from vm! */
>>>>>> +    t->align = max(t->align,
>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>> INTEL_MEMORY_LOCAL));
>>>>>> +    t->align = max(t->align,
>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>> INTEL_MEMORY_SYSTEM));
>>>>>> +
>>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>>       hole_size *= 2; /* room to maneuver */
>>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>>> +                      hole_size, t->align,
>>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>>                         0, U64_MAX,
>>>>>>                         DRM_MM_INSERT_BEST);
>>>>>>       if (!err)
>>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>> *engine, struct rnd_state *prng)
>>>>>>           goto err_put;
>>>>>>       }
>>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>>> +    t->hole = hole.start + t->align;
>>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>>> tiled_blits *t)
>>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>>                      struct rnd_state *prng)
>>>>>>   {
>>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>>       u32 *map;
>>>>>>       int err;
>>>>>>       int i;
>>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>>> tiled_blits *t,
>>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>>> rnd_state *prng)
>>>>>>   {
>>>>>> -    u64 offset =
>>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>>       int err;
>>>>>>       /* We want to check position invariant tiling across GTT 
>>>>>> eviction */
>>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>>> slightly off */
>>>>>>       err = tiled_blit(t,
>>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>>       if (err)
>>>>>>           return err;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>>> i915_address_space *vm, int subclass)
>>>>>>       GEM_BUG_ON(!vm->total);
>>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>>> +
>>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>>> +
>>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>>> +        if (IS_DG2(vm->i915)) {
>>>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>>>> Not only for DG2.
>>>>
>>>> really? can we get confirmation of this?
>>>> this contradicts the documentation in patch 4, which you reviewed, 
>>>> so I am confused now
>>>
>>> Starting from DG2, some platforms will have this new 64K GTT page 
>>> size restriction when dealing with LMEM. The HAS_64K_PAGES() macro is 
>>> meant to cover exactly that, AFAIK.
>>
>> As I understood it, 64K pages only are a requirement going forward for 
>> discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
>> pages was specific to DG2.
>>
>> e.g.  xehpsdv is also defined as having 64k pages. And others in 
>> future are likely to, but without the PDE sharing restrictions.
> 
> Yeah, pretty much. But there is one other platform lurking.
> 
>  From chatting with Ram, it might also make sense to disentangle 
> HAS_64K_PAGES(), since it currently means both that we need min 64K page 
> granularity, and that there is this compact-pt layout thing which 
> doesn't allow mixing 64K and 4K in the same page-table.

okay, so it sounds to me like the IS_DG2 check here is appropriate. 
Other 64K page systems will not have the 2MB alignment requirement.

If any future platform does require compact-pt layout, when adding that 
plaform, we can then add a HAS_COMPACT_PT macro or something, which 
would be set for DG2 and the future platform.

For now, this code seems correct to me as it currently only affects DG2.
> 
>>
>> If this is not the case, and all 64K page devices will also 
>> necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES 
>> and use 2MB everywhere, but so far this sounds unconfirmed.
>>
>>>
>>>>
>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>> +        } else {
>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> @@ -29,6 +29,8 @@
>>>>>>   #include "i915_selftest.h"
>>>>>>   #include "i915_vma_resource.h"
>>>>>>   #include "i915_vma_types.h"
>>>>>> +#include "i915_params.h"
>>>>>> +#include "intel_memory_region.h"
>>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>>> __GFP_NOWARN)
>>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>>       struct device *dma;
>>>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>>>       u64 reserved;        /* size addr space reserved */
>>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>>       unsigned int bind_async_flags;
>>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>>> i915_address_space *vm)
>>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>>   }
>>>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space 
>>>>>> *vm,
>>>>>> +                    enum intel_memory_type type)
>>>>>> +{
>>>>>> +    return vm->min_alignment[type];
>>>>>> +}
>>>>>> +
>>>>>>   static inline bool
>>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>>   {
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>>> size, u64 alignment, u64 flags)
>>>>>>       }
>>>>>>       color = 0;
>>>>>> +
>>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>>> +        /*
>>>>>> +         * DG2 can not have different sized pages in any given 
>>>>>> PDE (2MB range).
>>>>>> +         * Keeping things simple, we force any lmem object to 
>>>>>> reserve
>>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>>> alongside
>>>>>> +         */
>>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>>> Similarly here we dont need special case for DG2.
>>>>>
>>>>> Ram
>>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>>           color = vma->obj->cache_level;
>>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                u64 hole_start, u64 hole_end,
>>>>>>                unsigned long end_time)
>>>>>>   {
>>>>>> +    const unsigned int min_alignment =
>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>       I915_RND_STATE(seed_prng);
>>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>>       unsigned int size;
>>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           unsigned int *order, count, n;
>>>>>> -        u64 hole_size;
>>>>>> +        u64 hole_size, aligned_size;
>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>           count = hole_size >> 1;
>>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           GEM_BUG_ON(!order);
>>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>>> hole_end);
>>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>>            * a test failure) as we are purposefully allocating very
>>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               intel_wakeref_t wakeref;
>>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>>               if (igt_timeout(end_time,
>>>>>>                       "%s timed out before %d/%d\n",
>>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>               }
>>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>>               mock_vma_res->start = addr;
>>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           i915_random_reorder(order, count, &prng);
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               intel_wakeref_t wakeref;
>>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>   {
>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>> +    const unsigned int min_alignment =
>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>       const unsigned long max_pages =
>>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>>> ilog2(min_alignment));
>>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>>       unsigned long npages, prime, flags;
>>>>>>       struct i915_vma *vma;
>>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       i915_vma_unpin(vma);
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       }
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>> st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       i915_vma_unpin(vma);
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>> st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>                       }
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>               }
>>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>       const unsigned long max_pages =
>>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>>> +    unsigned long min_alignment;
>>>>>>       unsigned long flags;
>>>>>>       u64 size;
>>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           struct i915_vma *vma;
>>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>           for (addr = hole_start;
>>>>>>                addr + obj->base.size < hole_end;
>>>>>> -             addr += obj->base.size) {
>>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>>                   pr_err("%s bind failed at %llx + %llx [hole 
>>>>>> %llx- %llx] with err=%d\n",
>>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>   {
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>       struct i915_vma *vma;
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned long flags;
>>>>>>       unsigned int pot;
>>>>>>       int err = 0;
>>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>>> I915_GTT_PAGE_SIZE);
>>>>>>       if (IS_ERR(obj))
>>>>>>           return PTR_ERR(obj);
>>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>>> the hole */
>>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>>            pot--) {
>>>>>>           u64 step = BIT_ULL(pot);
>>>>>>           u64 addr;
>>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>>> min_alignment;
>>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>>> step) - min_alignment;
>>>>>>                addr += step) {
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                 unsigned long end_time)
>>>>>>   {
>>>>>>       I915_RND_STATE(prng);
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned int size;
>>>>>>       unsigned long flags;
>>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>> the hole */
>>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           unsigned int *order, count, n;
>>>>>>           struct i915_vma *vma;
>>>>>> -        u64 hole_size;
>>>>>> +        u64 hole_size, aligned_size;
>>>>>>           int err = -ENODEV;
>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>           count = hole_size >> 1;
>>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>   {
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned int order = 12;
>>>>>>       LIST_HEAD(objects);
>>>>>>       int err = 0;
>>>>>>       u64 addr;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>> the hole */
>>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>>           struct i915_vma *vma;
>>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           i915_vma_unpin(vma);
>>>>>> -        addr += size;
>>>>>> +        addr += round_up(size, min_alignment);
>>>>>>           /*
>>>>>>            * Since we are injecting allocation faults at random 
>>>>>> intervals,
>>>>>> -- 
>>>>>> 2.25.1
>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 16:09               ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 16:09 UTC (permalink / raw)
  To: Matthew Auld, Ramalingam C
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Rodrigo Vivi



On 20/01/2022 15:58, Matthew Auld wrote:
> On 20/01/2022 15:44, Robert Beckett wrote:
>>
>>
>> On 20/01/2022 14:59, Matthew Auld wrote:
>>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>>
>>>>
>>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>>
>>>>>> For local-memory objects we need to align the GTT addresses
>>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>>
>>>>>> We need to support vm->min_alignment > 4K, depending
>>>>>> on the vm itself and the type of object we are inserting.
>>>>>> With this in mind update the GTT selftests to take this
>>>>>> into account.
>>>>>>
>>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>>> required by the HW.
>>>>>>
>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>> ---
>>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>>> ++++++++++++-------
>>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>>
>>>>>> diff --git 
>>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>>       struct blit_buffer scratch;
>>>>>>       struct i915_vma *batch;
>>>>>>       u64 hole;
>>>>>> +    u64 align;
>>>>>>       u32 width;
>>>>>>       u32 height;
>>>>>>   };
>>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>> *engine, struct rnd_state *prng)
>>>>>>           goto err_free;
>>>>>>       }
>>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>>> from vm! */
>>>>>> +    t->align = max(t->align,
>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>> INTEL_MEMORY_LOCAL));
>>>>>> +    t->align = max(t->align,
>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>> INTEL_MEMORY_SYSTEM));
>>>>>> +
>>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>>       hole_size *= 2; /* room to maneuver */
>>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>>> +                      hole_size, t->align,
>>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>>                         0, U64_MAX,
>>>>>>                         DRM_MM_INSERT_BEST);
>>>>>>       if (!err)
>>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>> *engine, struct rnd_state *prng)
>>>>>>           goto err_put;
>>>>>>       }
>>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>>> +    t->hole = hole.start + t->align;
>>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>>> tiled_blits *t)
>>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>>                      struct rnd_state *prng)
>>>>>>   {
>>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>>       u32 *map;
>>>>>>       int err;
>>>>>>       int i;
>>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>>> tiled_blits *t,
>>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>>> rnd_state *prng)
>>>>>>   {
>>>>>> -    u64 offset =
>>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>>       int err;
>>>>>>       /* We want to check position invariant tiling across GTT 
>>>>>> eviction */
>>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>>> slightly off */
>>>>>>       err = tiled_blit(t,
>>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>>       if (err)
>>>>>>           return err;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>>> i915_address_space *vm, int subclass)
>>>>>>       GEM_BUG_ON(!vm->total);
>>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>>> +
>>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>>> +
>>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>>> +        if (IS_DG2(vm->i915)) {
>>>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>>>> Not only for DG2.
>>>>
>>>> really? can we get confirmation of this?
>>>> this contradicts the documentation in patch 4, which you reviewed, 
>>>> so I am confused now
>>>
>>> Starting from DG2, some platforms will have this new 64K GTT page 
>>> size restriction when dealing with LMEM. The HAS_64K_PAGES() macro is 
>>> meant to cover exactly that, AFAIK.
>>
>> As I understood it, 64K pages only are a requirement going forward for 
>> discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
>> pages was specific to DG2.
>>
>> e.g.  xehpsdv is also defined as having 64k pages. And others in 
>> future are likely to, but without the PDE sharing restrictions.
> 
> Yeah, pretty much. But there is one other platform lurking.
> 
>  From chatting with Ram, it might also make sense to disentangle 
> HAS_64K_PAGES(), since it currently means both that we need min 64K page 
> granularity, and that there is this compact-pt layout thing which 
> doesn't allow mixing 64K and 4K in the same page-table.

okay, so it sounds to me like the IS_DG2 check here is appropriate. 
Other 64K page systems will not have the 2MB alignment requirement.

If any future platform does require compact-pt layout, when adding that 
plaform, we can then add a HAS_COMPACT_PT macro or something, which 
would be set for DG2 and the future platform.

For now, this code seems correct to me as it currently only affects DG2.
> 
>>
>> If this is not the case, and all 64K page devices will also 
>> necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES 
>> and use 2MB everywhere, but so far this sounds unconfirmed.
>>
>>>
>>>>
>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>> +        } else {
>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> @@ -29,6 +29,8 @@
>>>>>>   #include "i915_selftest.h"
>>>>>>   #include "i915_vma_resource.h"
>>>>>>   #include "i915_vma_types.h"
>>>>>> +#include "i915_params.h"
>>>>>> +#include "intel_memory_region.h"
>>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>>> __GFP_NOWARN)
>>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>>       struct device *dma;
>>>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>>>       u64 reserved;        /* size addr space reserved */
>>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>>       unsigned int bind_async_flags;
>>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>>> i915_address_space *vm)
>>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>>   }
>>>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space 
>>>>>> *vm,
>>>>>> +                    enum intel_memory_type type)
>>>>>> +{
>>>>>> +    return vm->min_alignment[type];
>>>>>> +}
>>>>>> +
>>>>>>   static inline bool
>>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>>   {
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>>> size, u64 alignment, u64 flags)
>>>>>>       }
>>>>>>       color = 0;
>>>>>> +
>>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>>> +        /*
>>>>>> +         * DG2 can not have different sized pages in any given 
>>>>>> PDE (2MB range).
>>>>>> +         * Keeping things simple, we force any lmem object to 
>>>>>> reserve
>>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>>> alongside
>>>>>> +         */
>>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>>> Similarly here we dont need special case for DG2.
>>>>>
>>>>> Ram
>>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>>           color = vma->obj->cache_level;
>>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                u64 hole_start, u64 hole_end,
>>>>>>                unsigned long end_time)
>>>>>>   {
>>>>>> +    const unsigned int min_alignment =
>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>       I915_RND_STATE(seed_prng);
>>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>>       unsigned int size;
>>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           unsigned int *order, count, n;
>>>>>> -        u64 hole_size;
>>>>>> +        u64 hole_size, aligned_size;
>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>           count = hole_size >> 1;
>>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           GEM_BUG_ON(!order);
>>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>>> hole_end);
>>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>>            * a test failure) as we are purposefully allocating very
>>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               intel_wakeref_t wakeref;
>>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>>               if (igt_timeout(end_time,
>>>>>>                       "%s timed out before %d/%d\n",
>>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>               }
>>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>>               mock_vma_res->start = addr;
>>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           i915_random_reorder(order, count, &prng);
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               intel_wakeref_t wakeref;
>>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>   {
>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>> +    const unsigned int min_alignment =
>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>       const unsigned long max_pages =
>>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>>> ilog2(min_alignment));
>>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>>       unsigned long npages, prime, flags;
>>>>>>       struct i915_vma *vma;
>>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       i915_vma_unpin(vma);
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       }
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>> st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       i915_vma_unpin(vma);
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>> st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>                       }
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>               }
>>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>       const unsigned long max_pages =
>>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>>> +    unsigned long min_alignment;
>>>>>>       unsigned long flags;
>>>>>>       u64 size;
>>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           struct i915_vma *vma;
>>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>           for (addr = hole_start;
>>>>>>                addr + obj->base.size < hole_end;
>>>>>> -             addr += obj->base.size) {
>>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>>                   pr_err("%s bind failed at %llx + %llx [hole 
>>>>>> %llx- %llx] with err=%d\n",
>>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>   {
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>       struct i915_vma *vma;
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned long flags;
>>>>>>       unsigned int pot;
>>>>>>       int err = 0;
>>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>>> I915_GTT_PAGE_SIZE);
>>>>>>       if (IS_ERR(obj))
>>>>>>           return PTR_ERR(obj);
>>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>>> the hole */
>>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>>            pot--) {
>>>>>>           u64 step = BIT_ULL(pot);
>>>>>>           u64 addr;
>>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>>> min_alignment;
>>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>>> step) - min_alignment;
>>>>>>                addr += step) {
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                 unsigned long end_time)
>>>>>>   {
>>>>>>       I915_RND_STATE(prng);
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned int size;
>>>>>>       unsigned long flags;
>>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>> the hole */
>>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           unsigned int *order, count, n;
>>>>>>           struct i915_vma *vma;
>>>>>> -        u64 hole_size;
>>>>>> +        u64 hole_size, aligned_size;
>>>>>>           int err = -ENODEV;
>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>           count = hole_size >> 1;
>>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>   {
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned int order = 12;
>>>>>>       LIST_HEAD(objects);
>>>>>>       int err = 0;
>>>>>>       u64 addr;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>> the hole */
>>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>>           struct i915_vma *vma;
>>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           i915_vma_unpin(vma);
>>>>>> -        addr += size;
>>>>>> +        addr += round_up(size, min_alignment);
>>>>>>           /*
>>>>>>            * Since we are injecting allocation faults at random 
>>>>>> intervals,
>>>>>> -- 
>>>>>> 2.25.1
>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 16:09               ` Robert Beckett
  0 siblings, 0 replies; 50+ messages in thread
From: Robert Beckett @ 2022-01-20 16:09 UTC (permalink / raw)
  To: Matthew Auld, Ramalingam C
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel



On 20/01/2022 15:58, Matthew Auld wrote:
> On 20/01/2022 15:44, Robert Beckett wrote:
>>
>>
>> On 20/01/2022 14:59, Matthew Auld wrote:
>>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>>
>>>>
>>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>>
>>>>>> For local-memory objects we need to align the GTT addresses
>>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>>
>>>>>> We need to support vm->min_alignment > 4K, depending
>>>>>> on the vm itself and the type of object we are inserting.
>>>>>> With this in mind update the GTT selftests to take this
>>>>>> into account.
>>>>>>
>>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>>> required by the HW.
>>>>>>
>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>> ---
>>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>>> ++++++++++++-------
>>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>>
>>>>>> diff --git 
>>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>>       struct blit_buffer scratch;
>>>>>>       struct i915_vma *batch;
>>>>>>       u64 hole;
>>>>>> +    u64 align;
>>>>>>       u32 width;
>>>>>>       u32 height;
>>>>>>   };
>>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>> *engine, struct rnd_state *prng)
>>>>>>           goto err_free;
>>>>>>       }
>>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>>> from vm! */
>>>>>> +    t->align = max(t->align,
>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>> INTEL_MEMORY_LOCAL));
>>>>>> +    t->align = max(t->align,
>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>> INTEL_MEMORY_SYSTEM));
>>>>>> +
>>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>>       hole_size *= 2; /* room to maneuver */
>>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>>> +                      hole_size, t->align,
>>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>>                         0, U64_MAX,
>>>>>>                         DRM_MM_INSERT_BEST);
>>>>>>       if (!err)
>>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>> *engine, struct rnd_state *prng)
>>>>>>           goto err_put;
>>>>>>       }
>>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>>> +    t->hole = hole.start + t->align;
>>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>>> tiled_blits *t)
>>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>>                      struct rnd_state *prng)
>>>>>>   {
>>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>>       u32 *map;
>>>>>>       int err;
>>>>>>       int i;
>>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>>> tiled_blits *t,
>>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>>> rnd_state *prng)
>>>>>>   {
>>>>>> -    u64 offset =
>>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>>       int err;
>>>>>>       /* We want to check position invariant tiling across GTT 
>>>>>> eviction */
>>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>>> slightly off */
>>>>>>       err = tiled_blit(t,
>>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>>       if (err)
>>>>>>           return err;
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>>> i915_address_space *vm, int subclass)
>>>>>>       GEM_BUG_ON(!vm->total);
>>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>>> +
>>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>>> +
>>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>>> +        if (IS_DG2(vm->i915)) {
>>>>> I think we need this 2M alignment for all platform with HAS_64K_PAGES.
>>>>> Not only for DG2.
>>>>
>>>> really? can we get confirmation of this?
>>>> this contradicts the documentation in patch 4, which you reviewed, 
>>>> so I am confused now
>>>
>>> Starting from DG2, some platforms will have this new 64K GTT page 
>>> size restriction when dealing with LMEM. The HAS_64K_PAGES() macro is 
>>> meant to cover exactly that, AFAIK.
>>
>> As I understood it, 64K pages only are a requirement going forward for 
>> discrete cards, but the restriction of nt sharing pdes with 4k and 64k 
>> pages was specific to DG2.
>>
>> e.g.  xehpsdv is also defined as having 64k pages. And others in 
>> future are likely to, but without the PDE sharing restrictions.
> 
> Yeah, pretty much. But there is one other platform lurking.
> 
>  From chatting with Ram, it might also make sense to disentangle 
> HAS_64K_PAGES(), since it currently means both that we need min 64K page 
> granularity, and that there is this compact-pt layout thing which 
> doesn't allow mixing 64K and 4K in the same page-table.

okay, so it sounds to me like the IS_DG2 check here is appropriate. 
Other 64K page systems will not have the 2MB alignment requirement.

If any future platform does require compact-pt layout, when adding that 
plaform, we can then add a HAS_COMPACT_PT macro or something, which 
would be set for DG2 and the future platform.

For now, this code seems correct to me as it currently only affects DG2.
> 
>>
>> If this is not the case, and all 64K page devices will also 
>> necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES 
>> and use 2MB everywhere, but so far this sounds unconfirmed.
>>
>>>
>>>>
>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>> +        } else {
>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>> @@ -29,6 +29,8 @@
>>>>>>   #include "i915_selftest.h"
>>>>>>   #include "i915_vma_resource.h"
>>>>>>   #include "i915_vma_types.h"
>>>>>> +#include "i915_params.h"
>>>>>> +#include "intel_memory_region.h"
>>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>>> __GFP_NOWARN)
>>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>>       struct device *dma;
>>>>>>       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
>>>>>>       u64 reserved;        /* size addr space reserved */
>>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>>       unsigned int bind_async_flags;
>>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>>> i915_address_space *vm)
>>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>>   }
>>>>>> +static inline u64 i915_vm_min_alignment(struct i915_address_space 
>>>>>> *vm,
>>>>>> +                    enum intel_memory_type type)
>>>>>> +{
>>>>>> +    return vm->min_alignment[type];
>>>>>> +}
>>>>>> +
>>>>>>   static inline bool
>>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>>   {
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>>> size, u64 alignment, u64 flags)
>>>>>>       }
>>>>>>       color = 0;
>>>>>> +
>>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>>> +        /*
>>>>>> +         * DG2 can not have different sized pages in any given 
>>>>>> PDE (2MB range).
>>>>>> +         * Keeping things simple, we force any lmem object to 
>>>>>> reserve
>>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>>> alongside
>>>>>> +         */
>>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>>> Similarly here we dont need special case for DG2.
>>>>>
>>>>> Ram
>>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>>           color = vma->obj->cache_level;
>>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                u64 hole_start, u64 hole_end,
>>>>>>                unsigned long end_time)
>>>>>>   {
>>>>>> +    const unsigned int min_alignment =
>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>       I915_RND_STATE(seed_prng);
>>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>>       unsigned int size;
>>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           unsigned int *order, count, n;
>>>>>> -        u64 hole_size;
>>>>>> +        u64 hole_size, aligned_size;
>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>           count = hole_size >> 1;
>>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           GEM_BUG_ON(!order);
>>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>>> hole_end);
>>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>>            * a test failure) as we are purposefully allocating very
>>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               intel_wakeref_t wakeref;
>>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>>               if (igt_timeout(end_time,
>>>>>>                       "%s timed out before %d/%d\n",
>>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>               }
>>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>>               mock_vma_res->start = addr;
>>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           i915_random_reorder(order, count, &prng);
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               intel_wakeref_t wakeref;
>>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>   {
>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>> +    const unsigned int min_alignment =
>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>       const unsigned long max_pages =
>>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>>> ilog2(min_alignment));
>>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>>       unsigned long npages, prime, flags;
>>>>>>       struct i915_vma *vma;
>>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       i915_vma_unpin(vma);
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       }
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>> st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                       i915_vma_unpin(vma);
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>                   offset = p->offset;
>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>> st_link) {
>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>> +                                    min_alignment);
>>>>>> +
>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>                       if (IS_ERR(vma))
>>>>>>                           continue;
>>>>>>                       if (p->step < 0) {
>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>                               break;
>>>>>> -                        offset -= obj->base.size;
>>>>>> +                        offset -= aligned_size;
>>>>>>                       }
>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>                       }
>>>>>>                       if (p->step > 0) {
>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>                               break;
>>>>>> -                        offset += obj->base.size;
>>>>>> +                        offset += aligned_size;
>>>>>>                       }
>>>>>>                   }
>>>>>>               }
>>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>       const unsigned long max_pages =
>>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>>> +    unsigned long min_alignment;
>>>>>>       unsigned long flags;
>>>>>>       u64 size;
>>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           struct i915_vma *vma;
>>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>           for (addr = hole_start;
>>>>>>                addr + obj->base.size < hole_end;
>>>>>> -             addr += obj->base.size) {
>>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>>                   pr_err("%s bind failed at %llx + %llx [hole 
>>>>>> %llx- %llx] with err=%d\n",
>>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>   {
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>       struct i915_vma *vma;
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned long flags;
>>>>>>       unsigned int pot;
>>>>>>       int err = 0;
>>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space 
>>>>>> *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>>> I915_GTT_PAGE_SIZE);
>>>>>>       if (IS_ERR(obj))
>>>>>>           return PTR_ERR(obj);
>>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>>> the hole */
>>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>>            pot--) {
>>>>>>           u64 step = BIT_ULL(pot);
>>>>>>           u64 addr;
>>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>>> min_alignment;
>>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>>> step) - min_alignment;
>>>>>>                addr += step) {
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>                 unsigned long end_time)
>>>>>>   {
>>>>>>       I915_RND_STATE(prng);
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned int size;
>>>>>>       unsigned long flags;
>>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>       if (i915_is_ggtt(vm))
>>>>>>           flags |= PIN_GLOBAL;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>> the hole */
>>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>           unsigned int *order, count, n;
>>>>>>           struct i915_vma *vma;
>>>>>> -        u64 hole_size;
>>>>>> +        u64 hole_size, aligned_size;
>>>>>>           int err = -ENODEV;
>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>           count = hole_size >> 1;
>>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>>           for (n = 0; n < count; n++) {
>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>> BIT_ULL(aligned_size);
>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>               if (err) {
>>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>   {
>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>>> +    unsigned int min_alignment;
>>>>>>       unsigned int order = 12;
>>>>>>       LIST_HEAD(objects);
>>>>>>       int err = 0;
>>>>>>       u64 addr;
>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>> +
>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>> the hole */
>>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>>           struct i915_vma *vma;
>>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>>> i915_address_space *vm,
>>>>>>           }
>>>>>>           i915_vma_unpin(vma);
>>>>>> -        addr += size;
>>>>>> +        addr += round_up(size, min_alignment);
>>>>>>           /*
>>>>>>            * Since we are injecting allocation faults at random 
>>>>>> intervals,
>>>>>> -- 
>>>>>> 2.25.1
>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 16:09               ` Robert Beckett
  (?)
@ 2022-01-20 16:25                 ` Matthew Auld
  -1 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 16:25 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel, linux-kernel

On 20/01/2022 16:09, Robert Beckett wrote:
> 
> 
> On 20/01/2022 15:58, Matthew Auld wrote:
>> On 20/01/2022 15:44, Robert Beckett wrote:
>>>
>>>
>>> On 20/01/2022 14:59, Matthew Auld wrote:
>>>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>>>
>>>>>
>>>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>>>
>>>>>>> For local-memory objects we need to align the GTT addresses
>>>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>>>
>>>>>>> We need to support vm->min_alignment > 4K, depending
>>>>>>> on the vm itself and the type of object we are inserting.
>>>>>>> With this in mind update the GTT selftests to take this
>>>>>>> into account.
>>>>>>>
>>>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>>>> required by the HW.
>>>>>>>
>>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> ---
>>>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>>>> ++++++++++++-------
>>>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>>>
>>>>>>> diff --git 
>>>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>>>       struct blit_buffer scratch;
>>>>>>>       struct i915_vma *batch;
>>>>>>>       u64 hole;
>>>>>>> +    u64 align;
>>>>>>>       u32 width;
>>>>>>>       u32 height;
>>>>>>>   };
>>>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>>> *engine, struct rnd_state *prng)
>>>>>>>           goto err_free;
>>>>>>>       }
>>>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>>>> from vm! */
>>>>>>> +    t->align = max(t->align,
>>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>>> INTEL_MEMORY_LOCAL));
>>>>>>> +    t->align = max(t->align,
>>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>>> INTEL_MEMORY_SYSTEM));
>>>>>>> +
>>>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>>>       hole_size *= 2; /* room to maneuver */
>>>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>>>> +                      hole_size, t->align,
>>>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>>>                         0, U64_MAX,
>>>>>>>                         DRM_MM_INSERT_BEST);
>>>>>>>       if (!err)
>>>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>>> *engine, struct rnd_state *prng)
>>>>>>>           goto err_put;
>>>>>>>       }
>>>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>>>> +    t->hole = hole.start + t->align;
>>>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>>>> tiled_blits *t)
>>>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>>>                      struct rnd_state *prng)
>>>>>>>   {
>>>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>>>       u32 *map;
>>>>>>>       int err;
>>>>>>>       int i;
>>>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>>>> tiled_blits *t,
>>>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>>>> rnd_state *prng)
>>>>>>>   {
>>>>>>> -    u64 offset =
>>>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>>>       int err;
>>>>>>>       /* We want to check position invariant tiling across GTT 
>>>>>>> eviction */
>>>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>>>> slightly off */
>>>>>>>       err = tiled_blit(t,
>>>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>>>       if (err)
>>>>>>>           return err;
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>>>> i915_address_space *vm, int subclass)
>>>>>>>       GEM_BUG_ON(!vm->total);
>>>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>>>> +
>>>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>>>> +
>>>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>>>> +        if (IS_DG2(vm->i915)) {
>>>>>> I think we need this 2M alignment for all platform with 
>>>>>> HAS_64K_PAGES.
>>>>>> Not only for DG2.
>>>>>
>>>>> really? can we get confirmation of this?
>>>>> this contradicts the documentation in patch 4, which you reviewed, 
>>>>> so I am confused now
>>>>
>>>> Starting from DG2, some platforms will have this new 64K GTT page 
>>>> size restriction when dealing with LMEM. The HAS_64K_PAGES() macro 
>>>> is meant to cover exactly that, AFAIK.
>>>
>>> As I understood it, 64K pages only are a requirement going forward 
>>> for discrete cards, but the restriction of nt sharing pdes with 4k 
>>> and 64k pages was specific to DG2.
>>>
>>> e.g.  xehpsdv is also defined as having 64k pages. And others in 
>>> future are likely to, but without the PDE sharing restrictions.
>>
>> Yeah, pretty much. But there is one other platform lurking.
>>
>>  From chatting with Ram, it might also make sense to disentangle 
>> HAS_64K_PAGES(), since it currently means both that we need min 64K 
>> page granularity, and that there is this compact-pt layout thing which 
>> doesn't allow mixing 64K and 4K in the same page-table.
> 
> okay, so it sounds to me like the IS_DG2 check here is appropriate. 
> Other 64K page systems will not have the 2MB alignment requirement.

There is both dg2 and xehpsdv as per i915_pci.c. IIRC xehpsdv came 
first, and then dg2 inherited this feature. For example the accelerated 
DG2 moves series[1] is meant to work on both platforms.

[1] https://patchwork.freedesktop.org/series/97544/

> 
> If any future platform does require compact-pt layout, when adding that 
> plaform, we can then add a HAS_COMPACT_PT macro or something, which 
> would be set for DG2 and the future platform.
> 
> For now, this code seems correct to me as it currently only affects DG2.
>>
>>>
>>> If this is not the case, and all 64K page devices will also 
>>> necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES 
>>> and use 2MB everywhere, but so far this sounds unconfirmed.
>>>
>>>>
>>>>>
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>>> +        } else {
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> @@ -29,6 +29,8 @@
>>>>>>>   #include "i915_selftest.h"
>>>>>>>   #include "i915_vma_resource.h"
>>>>>>>   #include "i915_vma_types.h"
>>>>>>> +#include "i915_params.h"
>>>>>>> +#include "intel_memory_region.h"
>>>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>>>> __GFP_NOWARN)
>>>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>>>       struct device *dma;
>>>>>>>       u64 total;        /* size addr space maps (ex. 2GB for 
>>>>>>> ggtt) */
>>>>>>>       u64 reserved;        /* size addr space reserved */
>>>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>>>       unsigned int bind_async_flags;
>>>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>>>> i915_address_space *vm)
>>>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>>>   }
>>>>>>> +static inline u64 i915_vm_min_alignment(struct 
>>>>>>> i915_address_space *vm,
>>>>>>> +                    enum intel_memory_type type)
>>>>>>> +{
>>>>>>> +    return vm->min_alignment[type];
>>>>>>> +}
>>>>>>> +
>>>>>>>   static inline bool
>>>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>>>   {
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>>>> size, u64 alignment, u64 flags)
>>>>>>>       }
>>>>>>>       color = 0;
>>>>>>> +
>>>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>>>> +        /*
>>>>>>> +         * DG2 can not have different sized pages in any given 
>>>>>>> PDE (2MB range).
>>>>>>> +         * Keeping things simple, we force any lmem object to 
>>>>>>> reserve
>>>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>>>> alongside
>>>>>>> +         */
>>>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>>>> Similarly here we dont need special case for DG2.
>>>>>>
>>>>>> Ram
>>>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>>>           color = vma->obj->cache_level;
>>>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                u64 hole_start, u64 hole_end,
>>>>>>>                unsigned long end_time)
>>>>>>>   {
>>>>>>> +    const unsigned int min_alignment =
>>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>>       I915_RND_STATE(seed_prng);
>>>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>>>       unsigned int size;
>>>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           unsigned int *order, count, n;
>>>>>>> -        u64 hole_size;
>>>>>>> +        u64 hole_size, aligned_size;
>>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>>           count = hole_size >> 1;
>>>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           GEM_BUG_ON(!order);
>>>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>>>> hole_end);
>>>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>>>            * a test failure) as we are purposefully allocating very
>>>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               intel_wakeref_t wakeref;
>>>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>>>               if (igt_timeout(end_time,
>>>>>>>                       "%s timed out before %d/%d\n",
>>>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>               }
>>>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>>>               mock_vma_res->start = addr;
>>>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           i915_random_reorder(order, count, &prng);
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               intel_wakeref_t wakeref;
>>>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>   {
>>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>> +    const unsigned int min_alignment =
>>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>>       const unsigned long max_pages =
>>>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>>>> ilog2(min_alignment));
>>>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>>>       unsigned long npages, prime, flags;
>>>>>>>       struct i915_vma *vma;
>>>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       i915_vma_unpin(vma);
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       }
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>>> st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       i915_vma_unpin(vma);
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>>> st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       }
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>               }
>>>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>>       const unsigned long max_pages =
>>>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>>>> +    unsigned long min_alignment;
>>>>>>>       unsigned long flags;
>>>>>>>       u64 size;
>>>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           struct i915_vma *vma;
>>>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           for (addr = hole_start;
>>>>>>>                addr + obj->base.size < hole_end;
>>>>>>> -             addr += obj->base.size) {
>>>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>>                   pr_err("%s bind failed at %llx + %llx [hole 
>>>>>>> %llx- %llx] with err=%d\n",
>>>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space 
>>>>>>> *vm,
>>>>>>>   {
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>>       struct i915_vma *vma;
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned long flags;
>>>>>>>       unsigned int pot;
>>>>>>>       int err = 0;
>>>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space 
>>>>>>> *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>>>> I915_GTT_PAGE_SIZE);
>>>>>>>       if (IS_ERR(obj))
>>>>>>>           return PTR_ERR(obj);
>>>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>>>> the hole */
>>>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>>>            pot--) {
>>>>>>>           u64 step = BIT_ULL(pot);
>>>>>>>           u64 addr;
>>>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>>>> min_alignment;
>>>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>>>> step) - min_alignment;
>>>>>>>                addr += step) {
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                 unsigned long end_time)
>>>>>>>   {
>>>>>>>       I915_RND_STATE(prng);
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned int size;
>>>>>>>       unsigned long flags;
>>>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>>> the hole */
>>>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           unsigned int *order, count, n;
>>>>>>>           struct i915_vma *vma;
>>>>>>> -        u64 hole_size;
>>>>>>> +        u64 hole_size, aligned_size;
>>>>>>>           int err = -ENODEV;
>>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>>           count = hole_size >> 1;
>>>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>   {
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned int order = 12;
>>>>>>>       LIST_HEAD(objects);
>>>>>>>       int err = 0;
>>>>>>>       u64 addr;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>>> the hole */
>>>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>>>           struct i915_vma *vma;
>>>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           i915_vma_unpin(vma);
>>>>>>> -        addr += size;
>>>>>>> +        addr += round_up(size, min_alignment);
>>>>>>>           /*
>>>>>>>            * Since we are injecting allocation faults at random 
>>>>>>> intervals,
>>>>>>> -- 
>>>>>>> 2.25.1
>>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 16:25                 ` Matthew Auld
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 16:25 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Rodrigo Vivi

On 20/01/2022 16:09, Robert Beckett wrote:
> 
> 
> On 20/01/2022 15:58, Matthew Auld wrote:
>> On 20/01/2022 15:44, Robert Beckett wrote:
>>>
>>>
>>> On 20/01/2022 14:59, Matthew Auld wrote:
>>>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>>>
>>>>>
>>>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>>>
>>>>>>> For local-memory objects we need to align the GTT addresses
>>>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>>>
>>>>>>> We need to support vm->min_alignment > 4K, depending
>>>>>>> on the vm itself and the type of object we are inserting.
>>>>>>> With this in mind update the GTT selftests to take this
>>>>>>> into account.
>>>>>>>
>>>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>>>> required by the HW.
>>>>>>>
>>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> ---
>>>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>>>> ++++++++++++-------
>>>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>>>
>>>>>>> diff --git 
>>>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>>>       struct blit_buffer scratch;
>>>>>>>       struct i915_vma *batch;
>>>>>>>       u64 hole;
>>>>>>> +    u64 align;
>>>>>>>       u32 width;
>>>>>>>       u32 height;
>>>>>>>   };
>>>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>>> *engine, struct rnd_state *prng)
>>>>>>>           goto err_free;
>>>>>>>       }
>>>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>>>> from vm! */
>>>>>>> +    t->align = max(t->align,
>>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>>> INTEL_MEMORY_LOCAL));
>>>>>>> +    t->align = max(t->align,
>>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>>> INTEL_MEMORY_SYSTEM));
>>>>>>> +
>>>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>>>       hole_size *= 2; /* room to maneuver */
>>>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>>>> +                      hole_size, t->align,
>>>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>>>                         0, U64_MAX,
>>>>>>>                         DRM_MM_INSERT_BEST);
>>>>>>>       if (!err)
>>>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>>> *engine, struct rnd_state *prng)
>>>>>>>           goto err_put;
>>>>>>>       }
>>>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>>>> +    t->hole = hole.start + t->align;
>>>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>>>> tiled_blits *t)
>>>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>>>                      struct rnd_state *prng)
>>>>>>>   {
>>>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>>>       u32 *map;
>>>>>>>       int err;
>>>>>>>       int i;
>>>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>>>> tiled_blits *t,
>>>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>>>> rnd_state *prng)
>>>>>>>   {
>>>>>>> -    u64 offset =
>>>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>>>       int err;
>>>>>>>       /* We want to check position invariant tiling across GTT 
>>>>>>> eviction */
>>>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>>>> slightly off */
>>>>>>>       err = tiled_blit(t,
>>>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>>>       if (err)
>>>>>>>           return err;
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>>>> i915_address_space *vm, int subclass)
>>>>>>>       GEM_BUG_ON(!vm->total);
>>>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>>>> +
>>>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>>>> +
>>>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>>>> +        if (IS_DG2(vm->i915)) {
>>>>>> I think we need this 2M alignment for all platform with 
>>>>>> HAS_64K_PAGES.
>>>>>> Not only for DG2.
>>>>>
>>>>> really? can we get confirmation of this?
>>>>> this contradicts the documentation in patch 4, which you reviewed, 
>>>>> so I am confused now
>>>>
>>>> Starting from DG2, some platforms will have this new 64K GTT page 
>>>> size restriction when dealing with LMEM. The HAS_64K_PAGES() macro 
>>>> is meant to cover exactly that, AFAIK.
>>>
>>> As I understood it, 64K pages only are a requirement going forward 
>>> for discrete cards, but the restriction of nt sharing pdes with 4k 
>>> and 64k pages was specific to DG2.
>>>
>>> e.g.  xehpsdv is also defined as having 64k pages. And others in 
>>> future are likely to, but without the PDE sharing restrictions.
>>
>> Yeah, pretty much. But there is one other platform lurking.
>>
>>  From chatting with Ram, it might also make sense to disentangle 
>> HAS_64K_PAGES(), since it currently means both that we need min 64K 
>> page granularity, and that there is this compact-pt layout thing which 
>> doesn't allow mixing 64K and 4K in the same page-table.
> 
> okay, so it sounds to me like the IS_DG2 check here is appropriate. 
> Other 64K page systems will not have the 2MB alignment requirement.

There is both dg2 and xehpsdv as per i915_pci.c. IIRC xehpsdv came 
first, and then dg2 inherited this feature. For example the accelerated 
DG2 moves series[1] is meant to work on both platforms.

[1] https://patchwork.freedesktop.org/series/97544/

> 
> If any future platform does require compact-pt layout, when adding that 
> plaform, we can then add a HAS_COMPACT_PT macro or something, which 
> would be set for DG2 and the future platform.
> 
> For now, this code seems correct to me as it currently only affects DG2.
>>
>>>
>>> If this is not the case, and all 64K page devices will also 
>>> necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES 
>>> and use 2MB everywhere, but so far this sounds unconfirmed.
>>>
>>>>
>>>>>
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>>> +        } else {
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> @@ -29,6 +29,8 @@
>>>>>>>   #include "i915_selftest.h"
>>>>>>>   #include "i915_vma_resource.h"
>>>>>>>   #include "i915_vma_types.h"
>>>>>>> +#include "i915_params.h"
>>>>>>> +#include "intel_memory_region.h"
>>>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>>>> __GFP_NOWARN)
>>>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>>>       struct device *dma;
>>>>>>>       u64 total;        /* size addr space maps (ex. 2GB for 
>>>>>>> ggtt) */
>>>>>>>       u64 reserved;        /* size addr space reserved */
>>>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>>>       unsigned int bind_async_flags;
>>>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>>>> i915_address_space *vm)
>>>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>>>   }
>>>>>>> +static inline u64 i915_vm_min_alignment(struct 
>>>>>>> i915_address_space *vm,
>>>>>>> +                    enum intel_memory_type type)
>>>>>>> +{
>>>>>>> +    return vm->min_alignment[type];
>>>>>>> +}
>>>>>>> +
>>>>>>>   static inline bool
>>>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>>>   {
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>>>> size, u64 alignment, u64 flags)
>>>>>>>       }
>>>>>>>       color = 0;
>>>>>>> +
>>>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>>>> +        /*
>>>>>>> +         * DG2 can not have different sized pages in any given 
>>>>>>> PDE (2MB range).
>>>>>>> +         * Keeping things simple, we force any lmem object to 
>>>>>>> reserve
>>>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>>>> alongside
>>>>>>> +         */
>>>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>>>> Similarly here we dont need special case for DG2.
>>>>>>
>>>>>> Ram
>>>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>>>           color = vma->obj->cache_level;
>>>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                u64 hole_start, u64 hole_end,
>>>>>>>                unsigned long end_time)
>>>>>>>   {
>>>>>>> +    const unsigned int min_alignment =
>>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>>       I915_RND_STATE(seed_prng);
>>>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>>>       unsigned int size;
>>>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           unsigned int *order, count, n;
>>>>>>> -        u64 hole_size;
>>>>>>> +        u64 hole_size, aligned_size;
>>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>>           count = hole_size >> 1;
>>>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           GEM_BUG_ON(!order);
>>>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>>>> hole_end);
>>>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>>>            * a test failure) as we are purposefully allocating very
>>>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               intel_wakeref_t wakeref;
>>>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>>>               if (igt_timeout(end_time,
>>>>>>>                       "%s timed out before %d/%d\n",
>>>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>               }
>>>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>>>               mock_vma_res->start = addr;
>>>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           i915_random_reorder(order, count, &prng);
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               intel_wakeref_t wakeref;
>>>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>   {
>>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>> +    const unsigned int min_alignment =
>>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>>       const unsigned long max_pages =
>>>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>>>> ilog2(min_alignment));
>>>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>>>       unsigned long npages, prime, flags;
>>>>>>>       struct i915_vma *vma;
>>>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       i915_vma_unpin(vma);
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       }
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>>> st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       i915_vma_unpin(vma);
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>>> st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       }
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>               }
>>>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>>       const unsigned long max_pages =
>>>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>>>> +    unsigned long min_alignment;
>>>>>>>       unsigned long flags;
>>>>>>>       u64 size;
>>>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           struct i915_vma *vma;
>>>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           for (addr = hole_start;
>>>>>>>                addr + obj->base.size < hole_end;
>>>>>>> -             addr += obj->base.size) {
>>>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>>                   pr_err("%s bind failed at %llx + %llx [hole 
>>>>>>> %llx- %llx] with err=%d\n",
>>>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space 
>>>>>>> *vm,
>>>>>>>   {
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>>       struct i915_vma *vma;
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned long flags;
>>>>>>>       unsigned int pot;
>>>>>>>       int err = 0;
>>>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space 
>>>>>>> *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>>>> I915_GTT_PAGE_SIZE);
>>>>>>>       if (IS_ERR(obj))
>>>>>>>           return PTR_ERR(obj);
>>>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>>>> the hole */
>>>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>>>            pot--) {
>>>>>>>           u64 step = BIT_ULL(pot);
>>>>>>>           u64 addr;
>>>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>>>> min_alignment;
>>>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>>>> step) - min_alignment;
>>>>>>>                addr += step) {
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                 unsigned long end_time)
>>>>>>>   {
>>>>>>>       I915_RND_STATE(prng);
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned int size;
>>>>>>>       unsigned long flags;
>>>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>>> the hole */
>>>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           unsigned int *order, count, n;
>>>>>>>           struct i915_vma *vma;
>>>>>>> -        u64 hole_size;
>>>>>>> +        u64 hole_size, aligned_size;
>>>>>>>           int err = -ENODEV;
>>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>>           count = hole_size >> 1;
>>>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>   {
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned int order = 12;
>>>>>>>       LIST_HEAD(objects);
>>>>>>>       int err = 0;
>>>>>>>       u64 addr;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>>> the hole */
>>>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>>>           struct i915_vma *vma;
>>>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           i915_vma_unpin(vma);
>>>>>>> -        addr += size;
>>>>>>> +        addr += round_up(size, min_alignment);
>>>>>>>           /*
>>>>>>>            * Since we are injecting allocation faults at random 
>>>>>>> intervals,
>>>>>>> -- 
>>>>>>> 2.25.1
>>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 16:25                 ` Matthew Auld
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Auld @ 2022-01-20 16:25 UTC (permalink / raw)
  To: Robert Beckett, Ramalingam C
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel

On 20/01/2022 16:09, Robert Beckett wrote:
> 
> 
> On 20/01/2022 15:58, Matthew Auld wrote:
>> On 20/01/2022 15:44, Robert Beckett wrote:
>>>
>>>
>>> On 20/01/2022 14:59, Matthew Auld wrote:
>>>> On 20/01/2022 13:15, Robert Beckett wrote:
>>>>>
>>>>>
>>>>> On 20/01/2022 11:46, Ramalingam C wrote:
>>>>>> On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
>>>>>>> From: Matthew Auld <matthew.auld@intel.com>
>>>>>>>
>>>>>>> For local-memory objects we need to align the GTT addresses
>>>>>>> to 64K, both for the ppgtt and ggtt.
>>>>>>>
>>>>>>> We need to support vm->min_alignment > 4K, depending
>>>>>>> on the vm itself and the type of object we are inserting.
>>>>>>> With this in mind update the GTT selftests to take this
>>>>>>> into account.
>>>>>>>
>>>>>>> For DG2 we further align and pad lmem object GTT addresses
>>>>>>> to 2MB to ensure PDEs contain consistent page sizes as
>>>>>>> required by the HW.
>>>>>>>
>>>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> ---
>>>>>>>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
>>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
>>>>>>>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
>>>>>>>   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
>>>>>>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 
>>>>>>> ++++++++++++-------
>>>>>>>   5 files changed, 115 insertions(+), 41 deletions(-)
>>>>>>>
>>>>>>> diff --git 
>>>>>>> a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
>>>>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> index c08f766e6e15..7fee95a65414 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
>>>>>>> @@ -39,6 +39,7 @@ struct tiled_blits {
>>>>>>>       struct blit_buffer scratch;
>>>>>>>       struct i915_vma *batch;
>>>>>>>       u64 hole;
>>>>>>> +    u64 align;
>>>>>>>       u32 width;
>>>>>>>       u32 height;
>>>>>>>   };
>>>>>>> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>>> *engine, struct rnd_state *prng)
>>>>>>>           goto err_free;
>>>>>>>       }
>>>>>>> -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
>>>>>>> +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive 
>>>>>>> from vm! */
>>>>>>> +    t->align = max(t->align,
>>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>>> INTEL_MEMORY_LOCAL));
>>>>>>> +    t->align = max(t->align,
>>>>>>> +               i915_vm_min_alignment(t->ce->vm, 
>>>>>>> INTEL_MEMORY_SYSTEM));
>>>>>>> +
>>>>>>> +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>>>>>>>       hole_size *= 2; /* room to maneuver */
>>>>>>> -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
>>>>>>> +    hole_size += 2 * t->align; /* padding on either side */
>>>>>>>       mutex_lock(&t->ce->vm->mutex);
>>>>>>>       memset(&hole, 0, sizeof(hole));
>>>>>>>       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
>>>>>>> -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
>>>>>>> +                      hole_size, t->align,
>>>>>>> +                      I915_COLOR_UNEVICTABLE,
>>>>>>>                         0, U64_MAX,
>>>>>>>                         DRM_MM_INSERT_BEST);
>>>>>>>       if (!err)
>>>>>>> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs 
>>>>>>> *engine, struct rnd_state *prng)
>>>>>>>           goto err_put;
>>>>>>>       }
>>>>>>> -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
>>>>>>> +    t->hole = hole.start + t->align;
>>>>>>>       pr_info("Using hole at %llx\n", t->hole);
>>>>>>>       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
>>>>>>> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct 
>>>>>>> tiled_blits *t)
>>>>>>>   static int tiled_blits_prepare(struct tiled_blits *t,
>>>>>>>                      struct rnd_state *prng)
>>>>>>>   {
>>>>>>> -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
>>>>>>> +    u64 offset = round_up(t->width * t->height * 4, t->align);
>>>>>>>       u32 *map;
>>>>>>>       int err;
>>>>>>>       int i;
>>>>>>> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct 
>>>>>>> tiled_blits *t,
>>>>>>>   static int tiled_blits_bounce(struct tiled_blits *t, struct 
>>>>>>> rnd_state *prng)
>>>>>>>   {
>>>>>>> -    u64 offset =
>>>>>>> -        round_up(t->width * t->height * 4, 2 * 
>>>>>>> I915_GTT_MIN_ALIGNMENT);
>>>>>>> +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>>>>>>>       int err;
>>>>>>>       /* We want to check position invariant tiling across GTT 
>>>>>>> eviction */
>>>>>>> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct 
>>>>>>> tiled_blits *t, struct rnd_state *prng)
>>>>>>>       /* Reposition so that we overlap the old addresses, and 
>>>>>>> slightly off */
>>>>>>>       err = tiled_blit(t,
>>>>>>> -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
>>>>>>> +             &t->buffers[2], t->hole + t->align,
>>>>>>>                &t->buffers[1], t->hole + 3 * offset / 2);
>>>>>>>       if (err)
>>>>>>>           return err;
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
>>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> index 46be4197b93f..7c92b25c0f26 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>>>>>>> @@ -223,6 +223,20 @@ void i915_address_space_init(struct 
>>>>>>> i915_address_space *vm, int subclass)
>>>>>>>       GEM_BUG_ON(!vm->total);
>>>>>>>       drm_mm_init(&vm->mm, 0, vm->total);
>>>>>>> +
>>>>>>> +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
>>>>>>> +         ARRAY_SIZE(vm->min_alignment));
>>>>>>> +
>>>>>>> +    if (HAS_64K_PAGES(vm->i915)) {
>>>>>>> +        if (IS_DG2(vm->i915)) {
>>>>>> I think we need this 2M alignment for all platform with 
>>>>>> HAS_64K_PAGES.
>>>>>> Not only for DG2.
>>>>>
>>>>> really? can we get confirmation of this?
>>>>> this contradicts the documentation in patch 4, which you reviewed, 
>>>>> so I am confused now
>>>>
>>>> Starting from DG2, some platforms will have this new 64K GTT page 
>>>> size restriction when dealing with LMEM. The HAS_64K_PAGES() macro 
>>>> is meant to cover exactly that, AFAIK.
>>>
>>> As I understood it, 64K pages only are a requirement going forward 
>>> for discrete cards, but the restriction of nt sharing pdes with 4k 
>>> and 64k pages was specific to DG2.
>>>
>>> e.g.  xehpsdv is also defined as having 64k pages. And others in 
>>> future are likely to, but without the PDE sharing restrictions.
>>
>> Yeah, pretty much. But there is one other platform lurking.
>>
>>  From chatting with Ram, it might also make sense to disentangle 
>> HAS_64K_PAGES(), since it currently means both that we need min 64K 
>> page granularity, and that there is this compact-pt layout thing which 
>> doesn't allow mixing 64K and 4K in the same page-table.
> 
> okay, so it sounds to me like the IS_DG2 check here is appropriate. 
> Other 64K page systems will not have the 2MB alignment requirement.

There is both dg2 and xehpsdv as per i915_pci.c. IIRC xehpsdv came 
first, and then dg2 inherited this feature. For example the accelerated 
DG2 moves series[1] is meant to work on both platforms.

[1] https://patchwork.freedesktop.org/series/97544/

> 
> If any future platform does require compact-pt layout, when adding that 
> plaform, we can then add a HAS_COMPACT_PT macro or something, which 
> would be set for DG2 and the future platform.
> 
> For now, this code seems correct to me as it currently only affects DG2.
>>
>>>
>>> If this is not the case, and all 64K page devices will also 
>>> necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES 
>>> and use 2MB everywhere, but so far this sounds unconfirmed.
>>>
>>>>
>>>>>
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_2M;
>>>>>>> +        } else {
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>>> +            vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
>>>>>>> I915_GTT_PAGE_SIZE_64K;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>>>>>>>       INIT_LIST_HEAD(&vm->bound_list);
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>>>>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> index 8073438b67c8..b8da2514d601 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>>>>>> @@ -29,6 +29,8 @@
>>>>>>>   #include "i915_selftest.h"
>>>>>>>   #include "i915_vma_resource.h"
>>>>>>>   #include "i915_vma_types.h"
>>>>>>> +#include "i915_params.h"
>>>>>>> +#include "intel_memory_region.h"
>>>>>>>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
>>>>>>> __GFP_NOWARN)
>>>>>>> @@ -223,6 +225,7 @@ struct i915_address_space {
>>>>>>>       struct device *dma;
>>>>>>>       u64 total;        /* size addr space maps (ex. 2GB for 
>>>>>>> ggtt) */
>>>>>>>       u64 reserved;        /* size addr space reserved */
>>>>>>> +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>>>>>>>       unsigned int bind_async_flags;
>>>>>>> @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct 
>>>>>>> i915_address_space *vm)
>>>>>>>       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>>>>>>>   }
>>>>>>> +static inline u64 i915_vm_min_alignment(struct 
>>>>>>> i915_address_space *vm,
>>>>>>> +                    enum intel_memory_type type)
>>>>>>> +{
>>>>>>> +    return vm->min_alignment[type];
>>>>>>> +}
>>>>>>> +
>>>>>>>   static inline bool
>>>>>>>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>>>>>>>   {
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>>>>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> index 1f15c3298112..9ac92e7a3566 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>>>>> @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma *vma, u64 
>>>>>>> size, u64 alignment, u64 flags)
>>>>>>>       }
>>>>>>>       color = 0;
>>>>>>> +
>>>>>>> +    if (HAS_64K_PAGES(vma->vm->i915) && 
>>>>>>> i915_gem_object_is_lmem(vma->obj)) {
>>>>>>> +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
>>>>>>> +        /*
>>>>>>> +         * DG2 can not have different sized pages in any given 
>>>>>>> PDE (2MB range).
>>>>>>> +         * Keeping things simple, we force any lmem object to 
>>>>>>> reserve
>>>>>>> +         * 2MB chunks, preventing any smaller pages being used 
>>>>>>> alongside
>>>>>>> +         */
>>>>>>> +        if (IS_DG2(vma->vm->i915)) {
>>>>>> Similarly here we dont need special case for DG2.
>>>>>>
>>>>>> Ram
>>>>>>> +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
>>>>>>> +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>       if (i915_vm_has_cache_coloring(vma->vm))
>>>>>>>           color = vma->obj->cache_level;
>>>>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
>>>>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> index 076d860ce01a..2f3f0c01786b 100644
>>>>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>>>>>>> @@ -238,6 +238,8 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                u64 hole_start, u64 hole_end,
>>>>>>>                unsigned long end_time)
>>>>>>>   {
>>>>>>> +    const unsigned int min_alignment =
>>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>>       I915_RND_STATE(seed_prng);
>>>>>>>       struct i915_vma_resource *mock_vma_res;
>>>>>>>       unsigned int size;
>>>>>>> @@ -251,9 +253,10 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           I915_RND_SUBSTATE(prng, seed_prng);
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           unsigned int *order, count, n;
>>>>>>> -        u64 hole_size;
>>>>>>> +        u64 hole_size, aligned_size;
>>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>>           count = hole_size >> 1;
>>>>>>> @@ -274,8 +277,8 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           GEM_BUG_ON(!order);
>>>>>>> -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
>>>>>>> -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
>>>>>>> +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
>>>>>>> +        GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > 
>>>>>>> hole_end);
>>>>>>>           /* Ignore allocation failures (i.e. don't report them as
>>>>>>>            * a test failure) as we are purposefully allocating very
>>>>>>> @@ -298,10 +301,10 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               intel_wakeref_t wakeref;
>>>>>>> -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>>> +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
>>>>>>>               if (igt_timeout(end_time,
>>>>>>>                       "%s timed out before %d/%d\n",
>>>>>>> @@ -344,7 +347,7 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>               }
>>>>>>>               mock_vma_res->bi.pages = obj->mm.pages;
>>>>>>> -            mock_vma_res->node_size = BIT_ULL(size);
>>>>>>> +            mock_vma_res->node_size = BIT_ULL(aligned_size);
>>>>>>>               mock_vma_res->start = addr;
>>>>>>>               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
>>>>>>> @@ -355,7 +358,7 @@ static int lowlevel_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           i915_random_reorder(order, count, &prng);
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               intel_wakeref_t wakeref;
>>>>>>>               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
>>>>>>> @@ -399,8 +402,10 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>   {
>>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>> +    const unsigned int min_alignment =
>>>>>>> +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>>       const unsigned long max_pages =
>>>>>>> -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
>>>>>>> +        min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> 
>>>>>>> ilog2(min_alignment));
>>>>>>>       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
>>>>>>>       unsigned long npages, prime, flags;
>>>>>>>       struct i915_vma *vma;
>>>>>>> @@ -441,14 +446,17 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>>> @@ -470,22 +478,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       i915_vma_unpin(vma);
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry(obj, &objects, st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>>> @@ -506,22 +517,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       }
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>>> st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       err = i915_vma_pin(vma, 0, 0, offset | flags);
>>>>>>> @@ -543,22 +557,25 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       i915_vma_unpin(vma);
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>                   offset = p->offset;
>>>>>>>                   list_for_each_entry_reverse(obj, &objects, 
>>>>>>> st_link) {
>>>>>>> +                    u64 aligned_size = round_up(obj->base.size,
>>>>>>> +                                    min_alignment);
>>>>>>> +
>>>>>>>                       vma = i915_vma_instance(obj, vm, NULL);
>>>>>>>                       if (IS_ERR(vma))
>>>>>>>                           continue;
>>>>>>>                       if (p->step < 0) {
>>>>>>> -                        if (offset < hole_start + obj->base.size)
>>>>>>> +                        if (offset < hole_start + aligned_size)
>>>>>>>                               break;
>>>>>>> -                        offset -= obj->base.size;
>>>>>>> +                        offset -= aligned_size;
>>>>>>>                       }
>>>>>>>                       if (!drm_mm_node_allocated(&vma->node) ||
>>>>>>> @@ -579,9 +596,9 @@ static int fill_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                       }
>>>>>>>                       if (p->step > 0) {
>>>>>>> -                        if (offset + obj->base.size > hole_end)
>>>>>>> +                        if (offset + aligned_size > hole_end)
>>>>>>>                               break;
>>>>>>> -                        offset += obj->base.size;
>>>>>>> +                        offset += aligned_size;
>>>>>>>                       }
>>>>>>>                   }
>>>>>>>               }
>>>>>>> @@ -611,6 +628,7 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       const u64 hole_size = hole_end - hole_start;
>>>>>>>       const unsigned long max_pages =
>>>>>>>           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
>>>>>>> +    unsigned long min_alignment;
>>>>>>>       unsigned long flags;
>>>>>>>       u64 size;
>>>>>>> @@ -620,6 +638,8 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       for_each_prime_number_from(size, 1, max_pages) {
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           struct i915_vma *vma;
>>>>>>> @@ -638,7 +658,7 @@ static int walk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           for (addr = hole_start;
>>>>>>>                addr + obj->base.size < hole_end;
>>>>>>> -             addr += obj->base.size) {
>>>>>>> +             addr += round_up(obj->base.size, min_alignment)) {
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>>                   pr_err("%s bind failed at %llx + %llx [hole 
>>>>>>> %llx- %llx] with err=%d\n",
>>>>>>> @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space 
>>>>>>> *vm,
>>>>>>>   {
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>>       struct i915_vma *vma;
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned long flags;
>>>>>>>       unsigned int pot;
>>>>>>>       int err = 0;
>>>>>>> @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space 
>>>>>>> *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       obj = i915_gem_object_create_internal(vm->i915, 2 * 
>>>>>>> I915_GTT_PAGE_SIZE);
>>>>>>>       if (IS_ERR(obj))
>>>>>>>           return PTR_ERR(obj);
>>>>>>> @@ -710,13 +733,13 @@ static int pot_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       /* Insert a pair of pages across every pot boundary within 
>>>>>>> the hole */
>>>>>>>       for (pot = fls64(hole_end - 1) - 1;
>>>>>>> -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
>>>>>>> +         pot > ilog2(2 * min_alignment);
>>>>>>>            pot--) {
>>>>>>>           u64 step = BIT_ULL(pot);
>>>>>>>           u64 addr;
>>>>>>> -        for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, 
>>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>>> -             addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, 
>>>>>>> step) - I915_GTT_PAGE_SIZE;
>>>>>>> +        for (addr = round_up(hole_start + min_alignment, step) - 
>>>>>>> min_alignment;
>>>>>>> +             addr <= round_down(hole_end - (2 * min_alignment), 
>>>>>>> step) - min_alignment;
>>>>>>>                addr += step) {
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>> @@ -761,6 +784,7 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>                 unsigned long end_time)
>>>>>>>   {
>>>>>>>       I915_RND_STATE(prng);
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned int size;
>>>>>>>       unsigned long flags;
>>>>>>> @@ -768,15 +792,18 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>       if (i915_is_ggtt(vm))
>>>>>>>           flags |= PIN_GLOBAL;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>>> the hole */
>>>>>>>       for (size = 12; (hole_end - hole_start) >> size; size++) {
>>>>>>>           struct drm_i915_gem_object *obj;
>>>>>>>           unsigned int *order, count, n;
>>>>>>>           struct i915_vma *vma;
>>>>>>> -        u64 hole_size;
>>>>>>> +        u64 hole_size, aligned_size;
>>>>>>>           int err = -ENODEV;
>>>>>>> -        hole_size = (hole_end - hole_start) >> size;
>>>>>>> +        aligned_size = max_t(u32, ilog2(min_alignment), size);
>>>>>>> +        hole_size = (hole_end - hole_start) >> aligned_size;
>>>>>>>           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
>>>>>>>               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
>>>>>>>           count = hole_size >> 1;
>>>>>>> @@ -816,7 +843,7 @@ static int drunk_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           GEM_BUG_ON(vma->size != BIT_ULL(size));
>>>>>>>           for (n = 0; n < count; n++) {
>>>>>>> -            u64 addr = hole_start + order[n] * BIT_ULL(size);
>>>>>>> +            u64 addr = hole_start + order[n] * 
>>>>>>> BIT_ULL(aligned_size);
>>>>>>>               err = i915_vma_pin(vma, 0, 0, addr | flags);
>>>>>>>               if (err) {
>>>>>>> @@ -868,11 +895,14 @@ static int __shrink_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>   {
>>>>>>>       struct drm_i915_gem_object *obj;
>>>>>>>       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
>>>>>>> +    unsigned int min_alignment;
>>>>>>>       unsigned int order = 12;
>>>>>>>       LIST_HEAD(objects);
>>>>>>>       int err = 0;
>>>>>>>       u64 addr;
>>>>>>> +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
>>>>>>> +
>>>>>>>       /* Keep creating larger objects until one cannot fit into 
>>>>>>> the hole */
>>>>>>>       for (addr = hole_start; addr < hole_end; ) {
>>>>>>>           struct i915_vma *vma;
>>>>>>> @@ -913,7 +943,7 @@ static int __shrink_hole(struct 
>>>>>>> i915_address_space *vm,
>>>>>>>           }
>>>>>>>           i915_vma_unpin(vma);
>>>>>>> -        addr += size;
>>>>>>> +        addr += round_up(size, min_alignment);
>>>>>>>           /*
>>>>>>>            * Since we are injecting allocation faults at random 
>>>>>>> intervals,
>>>>>>> -- 
>>>>>>> 2.25.1
>>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
  2022-01-20 16:09               ` Robert Beckett
  (?)
@ 2022-01-20 16:29                 ` C, Ramalingam
  -1 siblings, 0 replies; 50+ messages in thread
From: C, Ramalingam @ 2022-01-20 16:29 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Auld, Matthew, Jani Nikula, Joonas Lahtinen, Vivi, Rodrigo,
	Tvrtko Ursulin, David Airlie, Daniel Vetter, intel-gfx,
	dri-devel, linux-kernel

On 2022-01-20 at 16:09:01 +0000, Robert Beckett wrote:
> 
> 
> On 20/01/2022 15:58, Matthew Auld wrote:
> > On 20/01/2022 15:44, Robert Beckett wrote:
> > > 
> > > 
> > > On 20/01/2022 14:59, Matthew Auld wrote:
> > > > On 20/01/2022 13:15, Robert Beckett wrote:
> > > > > 
> > > > > 
> > > > > On 20/01/2022 11:46, Ramalingam C wrote:
> > > > > > On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
> > > > > > > From: Matthew Auld <matthew.auld@intel.com>
> > > > > > > 
> > > > > > > For local-memory objects we need to align the GTT addresses
> > > > > > > to 64K, both for the ppgtt and ggtt.
> > > > > > > 
> > > > > > > We need to support vm->min_alignment > 4K, depending
> > > > > > > on the vm itself and the type of object we are inserting.
> > > > > > > With this in mind update the GTT selftests to take this
> > > > > > > into account.
> > > > > > > 
> > > > > > > For DG2 we further align and pad lmem object GTT addresses
> > > > > > > to 2MB to ensure PDEs contain consistent page sizes as
> > > > > > > required by the HW.
> > > > > > > 
> > > > > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > > > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > > > > > > Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> > > > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > > ---
> > > > > > >   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
> > > > > > >   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
> > > > > > >   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
> > > > > > >   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
> > > > > > >   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96
> > > > > > > ++++++++++++-------
> > > > > > >   5 files changed, 115 insertions(+), 41 deletions(-)
> > > > > > > 
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > index c08f766e6e15..7fee95a65414 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > @@ -39,6 +39,7 @@ struct tiled_blits {
> > > > > > >       struct blit_buffer scratch;
> > > > > > >       struct i915_vma *batch;
> > > > > > >       u64 hole;
> > > > > > > +    u64 align;
> > > > > > >       u32 width;
> > > > > > >       u32 height;
> > > > > > >   };
> > > > > > > @@ -410,14 +411,21 @@ tiled_blits_create(struct
> > > > > > > intel_engine_cs *engine, struct rnd_state *prng)
> > > > > > >           goto err_free;
> > > > > > >       }
> > > > > > > -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> > > > > > > +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst
> > > > > > > case, derive from vm! */
> > > > > > > +    t->align = max(t->align,
> > > > > > > +               i915_vm_min_alignment(t->ce->vm,
> > > > > > > INTEL_MEMORY_LOCAL));
> > > > > > > +    t->align = max(t->align,
> > > > > > > +               i915_vm_min_alignment(t->ce->vm,
> > > > > > > INTEL_MEMORY_SYSTEM));
> > > > > > > +
> > > > > > > +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
> > > > > > >       hole_size *= 2; /* room to maneuver */
> > > > > > > -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> > > > > > > +    hole_size += 2 * t->align; /* padding on either side */
> > > > > > >       mutex_lock(&t->ce->vm->mutex);
> > > > > > >       memset(&hole, 0, sizeof(hole));
> > > > > > >       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> > > > > > > -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
> > > > > > > +                      hole_size, t->align,
> > > > > > > +                      I915_COLOR_UNEVICTABLE,
> > > > > > >                         0, U64_MAX,
> > > > > > >                         DRM_MM_INSERT_BEST);
> > > > > > >       if (!err)
> > > > > > > @@ -428,7 +436,7 @@ tiled_blits_create(struct
> > > > > > > intel_engine_cs *engine, struct rnd_state *prng)
> > > > > > >           goto err_put;
> > > > > > >       }
> > > > > > > -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> > > > > > > +    t->hole = hole.start + t->align;
> > > > > > >       pr_info("Using hole at %llx\n", t->hole);
> > > > > > >       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> > > > > > > @@ -455,7 +463,7 @@ static void
> > > > > > > tiled_blits_destroy(struct tiled_blits *t)
> > > > > > >   static int tiled_blits_prepare(struct tiled_blits *t,
> > > > > > >                      struct rnd_state *prng)
> > > > > > >   {
> > > > > > > -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> > > > > > > +    u64 offset = round_up(t->width * t->height * 4, t->align);
> > > > > > >       u32 *map;
> > > > > > >       int err;
> > > > > > >       int i;
> > > > > > > @@ -486,8 +494,7 @@ static int
> > > > > > > tiled_blits_prepare(struct tiled_blits *t,
> > > > > > >   static int tiled_blits_bounce(struct tiled_blits
> > > > > > > *t, struct rnd_state *prng)
> > > > > > >   {
> > > > > > > -    u64 offset =
> > > > > > > -        round_up(t->width * t->height * 4, 2 *
> > > > > > > I915_GTT_MIN_ALIGNMENT);
> > > > > > > +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
> > > > > > >       int err;
> > > > > > >       /* We want to check position invariant tiling
> > > > > > > across GTT eviction */
> > > > > > > @@ -500,7 +507,7 @@ static int
> > > > > > > tiled_blits_bounce(struct tiled_blits *t, struct
> > > > > > > rnd_state *prng)
> > > > > > >       /* Reposition so that we overlap the old
> > > > > > > addresses, and slightly off */
> > > > > > >       err = tiled_blit(t,
> > > > > > > -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> > > > > > > +             &t->buffers[2], t->hole + t->align,
> > > > > > >                &t->buffers[1], t->hole + 3 * offset / 2);
> > > > > > >       if (err)
> > > > > > >           return err;
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > index 46be4197b93f..7c92b25c0f26 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > @@ -223,6 +223,20 @@ void
> > > > > > > i915_address_space_init(struct i915_address_space
> > > > > > > *vm, int subclass)
> > > > > > >       GEM_BUG_ON(!vm->total);
> > > > > > >       drm_mm_init(&vm->mm, 0, vm->total);
> > > > > > > +
> > > > > > > +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> > > > > > > +         ARRAY_SIZE(vm->min_alignment));
> > > > > > > +
> > > > > > > +    if (HAS_64K_PAGES(vm->i915)) {
> > > > > > > +        if (IS_DG2(vm->i915)) {
> > > > > > I think we need this 2M alignment for all platform with HAS_64K_PAGES.
> > > > > > Not only for DG2.
> > > > > 
> > > > > really? can we get confirmation of this?
> > > > > this contradicts the documentation in patch 4, which you
> > > > > reviewed, so I am confused now
> > > > 
> > > > Starting from DG2, some platforms will have this new 64K GTT
> > > > page size restriction when dealing with LMEM. The
> > > > HAS_64K_PAGES() macro is meant to cover exactly that, AFAIK.
> > > 
> > > As I understood it, 64K pages only are a requirement going forward
> > > for discrete cards, but the restriction of nt sharing pdes with 4k
> > > and 64k pages was specific to DG2.
> > > 
> > > e.g.  xehpsdv is also defined as having 64k pages. And others in
> > > future are likely to, but without the PDE sharing restrictions.
> > 
> > Yeah, pretty much. But there is one other platform lurking.
> > 
> >  From chatting with Ram, it might also make sense to disentangle
> > HAS_64K_PAGES(), since it currently means both that we need min 64K page
> > granularity, and that there is this compact-pt layout thing which
> > doesn't allow mixing 64K and 4K in the same page-table.
> 
> okay, so it sounds to me like the IS_DG2 check here is appropriate. Other
> 64K page systems will not have the 2MB alignment requirement.
> 
> If any future platform does require compact-pt layout, when adding that
> plaform, we can then add a HAS_COMPACT_PT macro or something, which would be
> set for DG2 and the future platform.
> 
> For now, this code seems correct to me as it currently only affects DG2.

As matt mentioned IMHO we need to split the requirement of 64k min pagesize
for lmem from compact pt requirement to address existing and future
platform needs.

Just added another flag called needs_compact_pt in below mentioned patch, which will be set for
DG2 and XEHPSDV. If (NEEDS_COMPACT_PT() && HAS_64K_PAGES()) then we can
align to 2MB else to 64K.

Please have a look at
https://patchwork.freedesktop.org/series/99105/

Ram
> > 
> > > 
> > > If this is not the case, and all 64K page devices will also
> > > necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES
> > > and use 2MB everywhere, but so far this sounds unconfirmed.
> > > 
> > > > 
> > > > > 
> > > > > > > +            vm->min_alignment[INTEL_MEMORY_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_2M;
> > > > > > > +           
> > > > > > > vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_2M;
> > > > > > > +        } else {
> > > > > > > +            vm->min_alignment[INTEL_MEMORY_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_64K;
> > > > > > > +           
> > > > > > > vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_64K;
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > >       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
> > > > > > >       INIT_LIST_HEAD(&vm->bound_list);
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > index 8073438b67c8..b8da2514d601 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > @@ -29,6 +29,8 @@
> > > > > > >   #include "i915_selftest.h"
> > > > > > >   #include "i915_vma_resource.h"
> > > > > > >   #include "i915_vma_types.h"
> > > > > > > +#include "i915_params.h"
> > > > > > > +#include "intel_memory_region.h"
> > > > > > >   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL |
> > > > > > > __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
> > > > > > > @@ -223,6 +225,7 @@ struct i915_address_space {
> > > > > > >       struct device *dma;
> > > > > > >       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
> > > > > > >       u64 reserved;        /* size addr space reserved */
> > > > > > > +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
> > > > > > >       unsigned int bind_async_flags;
> > > > > > > @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct
> > > > > > > i915_address_space *vm)
> > > > > > >       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
> > > > > > >   }
> > > > > > > +static inline u64 i915_vm_min_alignment(struct
> > > > > > > i915_address_space *vm,
> > > > > > > +                    enum intel_memory_type type)
> > > > > > > +{
> > > > > > > +    return vm->min_alignment[type];
> > > > > > > +}
> > > > > > > +
> > > > > > >   static inline bool
> > > > > > >   i915_vm_has_cache_coloring(struct i915_address_space *vm)
> > > > > > >   {
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > b/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > index 1f15c3298112..9ac92e7a3566 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma
> > > > > > > *vma, u64 size, u64 alignment, u64 flags)
> > > > > > >       }
> > > > > > >       color = 0;
> > > > > > > +
> > > > > > > +    if (HAS_64K_PAGES(vma->vm->i915) &&
> > > > > > > i915_gem_object_is_lmem(vma->obj)) {
> > > > > > > +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
> > > > > > > +        /*
> > > > > > > +         * DG2 can not have different sized pages
> > > > > > > in any given PDE (2MB range).
> > > > > > > +         * Keeping things simple, we force any lmem
> > > > > > > object to reserve
> > > > > > > +         * 2MB chunks, preventing any smaller pages
> > > > > > > being used alongside
> > > > > > > +         */
> > > > > > > +        if (IS_DG2(vma->vm->i915)) {
> > > > > > Similarly here we dont need special case for DG2.
> > > > > > 
> > > > > > Ram
> > > > > > > +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
> > > > > > > +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > >       if (i915_vm_has_cache_coloring(vma->vm))
> > > > > > >           color = vma->obj->cache_level;
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > index 076d860ce01a..2f3f0c01786b 100644
> > > > > > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > @@ -238,6 +238,8 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                u64 hole_start, u64 hole_end,
> > > > > > >                unsigned long end_time)
> > > > > > >   {
> > > > > > > +    const unsigned int min_alignment =
> > > > > > > +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > >       I915_RND_STATE(seed_prng);
> > > > > > >       struct i915_vma_resource *mock_vma_res;
> > > > > > >       unsigned int size;
> > > > > > > @@ -251,9 +253,10 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           I915_RND_SUBSTATE(prng, seed_prng);
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           unsigned int *order, count, n;
> > > > > > > -        u64 hole_size;
> > > > > > > +        u64 hole_size, aligned_size;
> > > > > > > -        hole_size = (hole_end - hole_start) >> size;
> > > > > > > +        aligned_size = max_t(u32, ilog2(min_alignment), size);
> > > > > > > +        hole_size = (hole_end - hole_start) >> aligned_size;
> > > > > > >           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
> > > > > > >               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
> > > > > > >           count = hole_size >> 1;
> > > > > > > @@ -274,8 +277,8 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           }
> > > > > > >           GEM_BUG_ON(!order);
> > > > > > > -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
> > > > > > > -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
> > > > > > > +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
> > > > > > > +        GEM_BUG_ON(hole_start + count *
> > > > > > > BIT_ULL(aligned_size) > hole_end);
> > > > > > >           /* Ignore allocation failures (i.e. don't report them as
> > > > > > >            * a test failure) as we are purposefully allocating very
> > > > > > > @@ -298,10 +301,10 @@ static int
> > > > > > > lowlevel_hole(struct i915_address_space *vm,
> > > > > > >           }
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               intel_wakeref_t wakeref;
> > > > > > > -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> > > > > > > +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
> > > > > > >               if (igt_timeout(end_time,
> > > > > > >                       "%s timed out before %d/%d\n",
> > > > > > > @@ -344,7 +347,7 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >               }
> > > > > > >               mock_vma_res->bi.pages = obj->mm.pages;
> > > > > > > -            mock_vma_res->node_size = BIT_ULL(size);
> > > > > > > +            mock_vma_res->node_size = BIT_ULL(aligned_size);
> > > > > > >               mock_vma_res->start = addr;
> > > > > > >               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> > > > > > > @@ -355,7 +358,7 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           i915_random_reorder(order, count, &prng);
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               intel_wakeref_t wakeref;
> > > > > > >               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> > > > > > > @@ -399,8 +402,10 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >   {
> > > > > > >       const u64 hole_size = hole_end - hole_start;
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > > +    const unsigned int min_alignment =
> > > > > > > +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > >       const unsigned long max_pages =
> > > > > > > -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
> > > > > > > +        min_t(u64, ULONG_MAX - 1, (hole_size / 2)
> > > > > > > >> ilog2(min_alignment));
> > > > > > >       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
> > > > > > >       unsigned long npages, prime, flags;
> > > > > > >       struct i915_vma *vma;
> > > > > > > @@ -441,14 +446,17 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry(obj, &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       err = i915_vma_pin(vma, 0, 0, offset | flags);
> > > > > > > @@ -470,22 +478,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       i915_vma_unpin(vma);
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry(obj, &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       if (!drm_mm_node_allocated(&vma->node) ||
> > > > > > > @@ -506,22 +517,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       }
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry_reverse(obj,
> > > > > > > &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       err = i915_vma_pin(vma, 0, 0, offset | flags);
> > > > > > > @@ -543,22 +557,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       i915_vma_unpin(vma);
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry_reverse(obj,
> > > > > > > &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       if (!drm_mm_node_allocated(&vma->node) ||
> > > > > > > @@ -579,9 +596,9 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       }
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >               }
> > > > > > > @@ -611,6 +628,7 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       const u64 hole_size = hole_end - hole_start;
> > > > > > >       const unsigned long max_pages =
> > > > > > >           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
> > > > > > > +    unsigned long min_alignment;
> > > > > > >       unsigned long flags;
> > > > > > >       u64 size;
> > > > > > > @@ -620,6 +638,8 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       for_each_prime_number_from(size, 1, max_pages) {
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           struct i915_vma *vma;
> > > > > > > @@ -638,7 +658,7 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           for (addr = hole_start;
> > > > > > >                addr + obj->base.size < hole_end;
> > > > > > > -             addr += obj->base.size) {
> > > > > > > +             addr += round_up(obj->base.size, min_alignment)) {
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > >                   pr_err("%s bind failed at %llx +
> > > > > > > %llx [hole %llx- %llx] with err=%d\n",
> > > > > > > @@ -690,6 +710,7 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >   {
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > >       struct i915_vma *vma;
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned long flags;
> > > > > > >       unsigned int pot;
> > > > > > >       int err = 0;
> > > > > > > @@ -698,6 +719,8 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       obj =
> > > > > > > i915_gem_object_create_internal(vm->i915, 2 *
> > > > > > > I915_GTT_PAGE_SIZE);
> > > > > > >       if (IS_ERR(obj))
> > > > > > >           return PTR_ERR(obj);
> > > > > > > @@ -710,13 +733,13 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       /* Insert a pair of pages across every pot
> > > > > > > boundary within the hole */
> > > > > > >       for (pot = fls64(hole_end - 1) - 1;
> > > > > > > -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
> > > > > > > +         pot > ilog2(2 * min_alignment);
> > > > > > >            pot--) {
> > > > > > >           u64 step = BIT_ULL(pot);
> > > > > > >           u64 addr;
> > > > > > > -        for (addr = round_up(hole_start +
> > > > > > > I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> > > > > > > -             addr <= round_down(hole_end -
> > > > > > > 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> > > > > > > +        for (addr = round_up(hole_start +
> > > > > > > min_alignment, step) - min_alignment;
> > > > > > > +             addr <= round_down(hole_end - (2 *
> > > > > > > min_alignment), step) - min_alignment;
> > > > > > >                addr += step) {
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > > @@ -761,6 +784,7 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                 unsigned long end_time)
> > > > > > >   {
> > > > > > >       I915_RND_STATE(prng);
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned int size;
> > > > > > >       unsigned long flags;
> > > > > > > @@ -768,15 +792,18 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       /* Keep creating larger objects until one
> > > > > > > cannot fit into the hole */
> > > > > > >       for (size = 12; (hole_end - hole_start) >> size; size++) {
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           unsigned int *order, count, n;
> > > > > > >           struct i915_vma *vma;
> > > > > > > -        u64 hole_size;
> > > > > > > +        u64 hole_size, aligned_size;
> > > > > > >           int err = -ENODEV;
> > > > > > > -        hole_size = (hole_end - hole_start) >> size;
> > > > > > > +        aligned_size = max_t(u32, ilog2(min_alignment), size);
> > > > > > > +        hole_size = (hole_end - hole_start) >> aligned_size;
> > > > > > >           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
> > > > > > >               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
> > > > > > >           count = hole_size >> 1;
> > > > > > > @@ -816,7 +843,7 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           GEM_BUG_ON(vma->size != BIT_ULL(size));
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > > @@ -868,11 +895,14 @@ static int
> > > > > > > __shrink_hole(struct i915_address_space *vm,
> > > > > > >   {
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > >       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned int order = 12;
> > > > > > >       LIST_HEAD(objects);
> > > > > > >       int err = 0;
> > > > > > >       u64 addr;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       /* Keep creating larger objects until one
> > > > > > > cannot fit into the hole */
> > > > > > >       for (addr = hole_start; addr < hole_end; ) {
> > > > > > >           struct i915_vma *vma;
> > > > > > > @@ -913,7 +943,7 @@ static int __shrink_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           }
> > > > > > >           i915_vma_unpin(vma);
> > > > > > > -        addr += size;
> > > > > > > +        addr += round_up(size, min_alignment);
> > > > > > >           /*
> > > > > > >            * Since we are injecting allocation
> > > > > > > faults at random intervals,
> > > > > > > -- 
> > > > > > > 2.25.1
> > > > > > > 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 16:29                 ` C, Ramalingam
  0 siblings, 0 replies; 50+ messages in thread
From: C, Ramalingam @ 2022-01-20 16:29 UTC (permalink / raw)
  To: Robert Beckett
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Auld, Matthew

On 2022-01-20 at 16:09:01 +0000, Robert Beckett wrote:
> 
> 
> On 20/01/2022 15:58, Matthew Auld wrote:
> > On 20/01/2022 15:44, Robert Beckett wrote:
> > > 
> > > 
> > > On 20/01/2022 14:59, Matthew Auld wrote:
> > > > On 20/01/2022 13:15, Robert Beckett wrote:
> > > > > 
> > > > > 
> > > > > On 20/01/2022 11:46, Ramalingam C wrote:
> > > > > > On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
> > > > > > > From: Matthew Auld <matthew.auld@intel.com>
> > > > > > > 
> > > > > > > For local-memory objects we need to align the GTT addresses
> > > > > > > to 64K, both for the ppgtt and ggtt.
> > > > > > > 
> > > > > > > We need to support vm->min_alignment > 4K, depending
> > > > > > > on the vm itself and the type of object we are inserting.
> > > > > > > With this in mind update the GTT selftests to take this
> > > > > > > into account.
> > > > > > > 
> > > > > > > For DG2 we further align and pad lmem object GTT addresses
> > > > > > > to 2MB to ensure PDEs contain consistent page sizes as
> > > > > > > required by the HW.
> > > > > > > 
> > > > > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > > > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > > > > > > Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> > > > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > > ---
> > > > > > >   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
> > > > > > >   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
> > > > > > >   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
> > > > > > >   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
> > > > > > >   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96
> > > > > > > ++++++++++++-------
> > > > > > >   5 files changed, 115 insertions(+), 41 deletions(-)
> > > > > > > 
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > index c08f766e6e15..7fee95a65414 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > @@ -39,6 +39,7 @@ struct tiled_blits {
> > > > > > >       struct blit_buffer scratch;
> > > > > > >       struct i915_vma *batch;
> > > > > > >       u64 hole;
> > > > > > > +    u64 align;
> > > > > > >       u32 width;
> > > > > > >       u32 height;
> > > > > > >   };
> > > > > > > @@ -410,14 +411,21 @@ tiled_blits_create(struct
> > > > > > > intel_engine_cs *engine, struct rnd_state *prng)
> > > > > > >           goto err_free;
> > > > > > >       }
> > > > > > > -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> > > > > > > +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst
> > > > > > > case, derive from vm! */
> > > > > > > +    t->align = max(t->align,
> > > > > > > +               i915_vm_min_alignment(t->ce->vm,
> > > > > > > INTEL_MEMORY_LOCAL));
> > > > > > > +    t->align = max(t->align,
> > > > > > > +               i915_vm_min_alignment(t->ce->vm,
> > > > > > > INTEL_MEMORY_SYSTEM));
> > > > > > > +
> > > > > > > +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
> > > > > > >       hole_size *= 2; /* room to maneuver */
> > > > > > > -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> > > > > > > +    hole_size += 2 * t->align; /* padding on either side */
> > > > > > >       mutex_lock(&t->ce->vm->mutex);
> > > > > > >       memset(&hole, 0, sizeof(hole));
> > > > > > >       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> > > > > > > -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
> > > > > > > +                      hole_size, t->align,
> > > > > > > +                      I915_COLOR_UNEVICTABLE,
> > > > > > >                         0, U64_MAX,
> > > > > > >                         DRM_MM_INSERT_BEST);
> > > > > > >       if (!err)
> > > > > > > @@ -428,7 +436,7 @@ tiled_blits_create(struct
> > > > > > > intel_engine_cs *engine, struct rnd_state *prng)
> > > > > > >           goto err_put;
> > > > > > >       }
> > > > > > > -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> > > > > > > +    t->hole = hole.start + t->align;
> > > > > > >       pr_info("Using hole at %llx\n", t->hole);
> > > > > > >       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> > > > > > > @@ -455,7 +463,7 @@ static void
> > > > > > > tiled_blits_destroy(struct tiled_blits *t)
> > > > > > >   static int tiled_blits_prepare(struct tiled_blits *t,
> > > > > > >                      struct rnd_state *prng)
> > > > > > >   {
> > > > > > > -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> > > > > > > +    u64 offset = round_up(t->width * t->height * 4, t->align);
> > > > > > >       u32 *map;
> > > > > > >       int err;
> > > > > > >       int i;
> > > > > > > @@ -486,8 +494,7 @@ static int
> > > > > > > tiled_blits_prepare(struct tiled_blits *t,
> > > > > > >   static int tiled_blits_bounce(struct tiled_blits
> > > > > > > *t, struct rnd_state *prng)
> > > > > > >   {
> > > > > > > -    u64 offset =
> > > > > > > -        round_up(t->width * t->height * 4, 2 *
> > > > > > > I915_GTT_MIN_ALIGNMENT);
> > > > > > > +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
> > > > > > >       int err;
> > > > > > >       /* We want to check position invariant tiling
> > > > > > > across GTT eviction */
> > > > > > > @@ -500,7 +507,7 @@ static int
> > > > > > > tiled_blits_bounce(struct tiled_blits *t, struct
> > > > > > > rnd_state *prng)
> > > > > > >       /* Reposition so that we overlap the old
> > > > > > > addresses, and slightly off */
> > > > > > >       err = tiled_blit(t,
> > > > > > > -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> > > > > > > +             &t->buffers[2], t->hole + t->align,
> > > > > > >                &t->buffers[1], t->hole + 3 * offset / 2);
> > > > > > >       if (err)
> > > > > > >           return err;
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > index 46be4197b93f..7c92b25c0f26 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > @@ -223,6 +223,20 @@ void
> > > > > > > i915_address_space_init(struct i915_address_space
> > > > > > > *vm, int subclass)
> > > > > > >       GEM_BUG_ON(!vm->total);
> > > > > > >       drm_mm_init(&vm->mm, 0, vm->total);
> > > > > > > +
> > > > > > > +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> > > > > > > +         ARRAY_SIZE(vm->min_alignment));
> > > > > > > +
> > > > > > > +    if (HAS_64K_PAGES(vm->i915)) {
> > > > > > > +        if (IS_DG2(vm->i915)) {
> > > > > > I think we need this 2M alignment for all platform with HAS_64K_PAGES.
> > > > > > Not only for DG2.
> > > > > 
> > > > > really? can we get confirmation of this?
> > > > > this contradicts the documentation in patch 4, which you
> > > > > reviewed, so I am confused now
> > > > 
> > > > Starting from DG2, some platforms will have this new 64K GTT
> > > > page size restriction when dealing with LMEM. The
> > > > HAS_64K_PAGES() macro is meant to cover exactly that, AFAIK.
> > > 
> > > As I understood it, 64K pages only are a requirement going forward
> > > for discrete cards, but the restriction of nt sharing pdes with 4k
> > > and 64k pages was specific to DG2.
> > > 
> > > e.g.  xehpsdv is also defined as having 64k pages. And others in
> > > future are likely to, but without the PDE sharing restrictions.
> > 
> > Yeah, pretty much. But there is one other platform lurking.
> > 
> >  From chatting with Ram, it might also make sense to disentangle
> > HAS_64K_PAGES(), since it currently means both that we need min 64K page
> > granularity, and that there is this compact-pt layout thing which
> > doesn't allow mixing 64K and 4K in the same page-table.
> 
> okay, so it sounds to me like the IS_DG2 check here is appropriate. Other
> 64K page systems will not have the 2MB alignment requirement.
> 
> If any future platform does require compact-pt layout, when adding that
> plaform, we can then add a HAS_COMPACT_PT macro or something, which would be
> set for DG2 and the future platform.
> 
> For now, this code seems correct to me as it currently only affects DG2.

As matt mentioned IMHO we need to split the requirement of 64k min pagesize
for lmem from compact pt requirement to address existing and future
platform needs.

Just added another flag called needs_compact_pt in below mentioned patch, which will be set for
DG2 and XEHPSDV. If (NEEDS_COMPACT_PT() && HAS_64K_PAGES()) then we can
align to 2MB else to 64K.

Please have a look at
https://patchwork.freedesktop.org/series/99105/

Ram
> > 
> > > 
> > > If this is not the case, and all 64K page devices will also
> > > necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES
> > > and use 2MB everywhere, but so far this sounds unconfirmed.
> > > 
> > > > 
> > > > > 
> > > > > > > +            vm->min_alignment[INTEL_MEMORY_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_2M;
> > > > > > > +           
> > > > > > > vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_2M;
> > > > > > > +        } else {
> > > > > > > +            vm->min_alignment[INTEL_MEMORY_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_64K;
> > > > > > > +           
> > > > > > > vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_64K;
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > >       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
> > > > > > >       INIT_LIST_HEAD(&vm->bound_list);
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > index 8073438b67c8..b8da2514d601 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > @@ -29,6 +29,8 @@
> > > > > > >   #include "i915_selftest.h"
> > > > > > >   #include "i915_vma_resource.h"
> > > > > > >   #include "i915_vma_types.h"
> > > > > > > +#include "i915_params.h"
> > > > > > > +#include "intel_memory_region.h"
> > > > > > >   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL |
> > > > > > > __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
> > > > > > > @@ -223,6 +225,7 @@ struct i915_address_space {
> > > > > > >       struct device *dma;
> > > > > > >       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
> > > > > > >       u64 reserved;        /* size addr space reserved */
> > > > > > > +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
> > > > > > >       unsigned int bind_async_flags;
> > > > > > > @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct
> > > > > > > i915_address_space *vm)
> > > > > > >       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
> > > > > > >   }
> > > > > > > +static inline u64 i915_vm_min_alignment(struct
> > > > > > > i915_address_space *vm,
> > > > > > > +                    enum intel_memory_type type)
> > > > > > > +{
> > > > > > > +    return vm->min_alignment[type];
> > > > > > > +}
> > > > > > > +
> > > > > > >   static inline bool
> > > > > > >   i915_vm_has_cache_coloring(struct i915_address_space *vm)
> > > > > > >   {
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > b/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > index 1f15c3298112..9ac92e7a3566 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma
> > > > > > > *vma, u64 size, u64 alignment, u64 flags)
> > > > > > >       }
> > > > > > >       color = 0;
> > > > > > > +
> > > > > > > +    if (HAS_64K_PAGES(vma->vm->i915) &&
> > > > > > > i915_gem_object_is_lmem(vma->obj)) {
> > > > > > > +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
> > > > > > > +        /*
> > > > > > > +         * DG2 can not have different sized pages
> > > > > > > in any given PDE (2MB range).
> > > > > > > +         * Keeping things simple, we force any lmem
> > > > > > > object to reserve
> > > > > > > +         * 2MB chunks, preventing any smaller pages
> > > > > > > being used alongside
> > > > > > > +         */
> > > > > > > +        if (IS_DG2(vma->vm->i915)) {
> > > > > > Similarly here we dont need special case for DG2.
> > > > > > 
> > > > > > Ram
> > > > > > > +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
> > > > > > > +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > >       if (i915_vm_has_cache_coloring(vma->vm))
> > > > > > >           color = vma->obj->cache_level;
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > index 076d860ce01a..2f3f0c01786b 100644
> > > > > > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > @@ -238,6 +238,8 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                u64 hole_start, u64 hole_end,
> > > > > > >                unsigned long end_time)
> > > > > > >   {
> > > > > > > +    const unsigned int min_alignment =
> > > > > > > +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > >       I915_RND_STATE(seed_prng);
> > > > > > >       struct i915_vma_resource *mock_vma_res;
> > > > > > >       unsigned int size;
> > > > > > > @@ -251,9 +253,10 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           I915_RND_SUBSTATE(prng, seed_prng);
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           unsigned int *order, count, n;
> > > > > > > -        u64 hole_size;
> > > > > > > +        u64 hole_size, aligned_size;
> > > > > > > -        hole_size = (hole_end - hole_start) >> size;
> > > > > > > +        aligned_size = max_t(u32, ilog2(min_alignment), size);
> > > > > > > +        hole_size = (hole_end - hole_start) >> aligned_size;
> > > > > > >           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
> > > > > > >               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
> > > > > > >           count = hole_size >> 1;
> > > > > > > @@ -274,8 +277,8 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           }
> > > > > > >           GEM_BUG_ON(!order);
> > > > > > > -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
> > > > > > > -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
> > > > > > > +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
> > > > > > > +        GEM_BUG_ON(hole_start + count *
> > > > > > > BIT_ULL(aligned_size) > hole_end);
> > > > > > >           /* Ignore allocation failures (i.e. don't report them as
> > > > > > >            * a test failure) as we are purposefully allocating very
> > > > > > > @@ -298,10 +301,10 @@ static int
> > > > > > > lowlevel_hole(struct i915_address_space *vm,
> > > > > > >           }
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               intel_wakeref_t wakeref;
> > > > > > > -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> > > > > > > +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
> > > > > > >               if (igt_timeout(end_time,
> > > > > > >                       "%s timed out before %d/%d\n",
> > > > > > > @@ -344,7 +347,7 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >               }
> > > > > > >               mock_vma_res->bi.pages = obj->mm.pages;
> > > > > > > -            mock_vma_res->node_size = BIT_ULL(size);
> > > > > > > +            mock_vma_res->node_size = BIT_ULL(aligned_size);
> > > > > > >               mock_vma_res->start = addr;
> > > > > > >               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> > > > > > > @@ -355,7 +358,7 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           i915_random_reorder(order, count, &prng);
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               intel_wakeref_t wakeref;
> > > > > > >               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> > > > > > > @@ -399,8 +402,10 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >   {
> > > > > > >       const u64 hole_size = hole_end - hole_start;
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > > +    const unsigned int min_alignment =
> > > > > > > +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > >       const unsigned long max_pages =
> > > > > > > -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
> > > > > > > +        min_t(u64, ULONG_MAX - 1, (hole_size / 2)
> > > > > > > >> ilog2(min_alignment));
> > > > > > >       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
> > > > > > >       unsigned long npages, prime, flags;
> > > > > > >       struct i915_vma *vma;
> > > > > > > @@ -441,14 +446,17 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry(obj, &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       err = i915_vma_pin(vma, 0, 0, offset | flags);
> > > > > > > @@ -470,22 +478,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       i915_vma_unpin(vma);
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry(obj, &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       if (!drm_mm_node_allocated(&vma->node) ||
> > > > > > > @@ -506,22 +517,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       }
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry_reverse(obj,
> > > > > > > &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       err = i915_vma_pin(vma, 0, 0, offset | flags);
> > > > > > > @@ -543,22 +557,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       i915_vma_unpin(vma);
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry_reverse(obj,
> > > > > > > &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       if (!drm_mm_node_allocated(&vma->node) ||
> > > > > > > @@ -579,9 +596,9 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       }
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >               }
> > > > > > > @@ -611,6 +628,7 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       const u64 hole_size = hole_end - hole_start;
> > > > > > >       const unsigned long max_pages =
> > > > > > >           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
> > > > > > > +    unsigned long min_alignment;
> > > > > > >       unsigned long flags;
> > > > > > >       u64 size;
> > > > > > > @@ -620,6 +638,8 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       for_each_prime_number_from(size, 1, max_pages) {
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           struct i915_vma *vma;
> > > > > > > @@ -638,7 +658,7 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           for (addr = hole_start;
> > > > > > >                addr + obj->base.size < hole_end;
> > > > > > > -             addr += obj->base.size) {
> > > > > > > +             addr += round_up(obj->base.size, min_alignment)) {
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > >                   pr_err("%s bind failed at %llx +
> > > > > > > %llx [hole %llx- %llx] with err=%d\n",
> > > > > > > @@ -690,6 +710,7 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >   {
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > >       struct i915_vma *vma;
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned long flags;
> > > > > > >       unsigned int pot;
> > > > > > >       int err = 0;
> > > > > > > @@ -698,6 +719,8 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       obj =
> > > > > > > i915_gem_object_create_internal(vm->i915, 2 *
> > > > > > > I915_GTT_PAGE_SIZE);
> > > > > > >       if (IS_ERR(obj))
> > > > > > >           return PTR_ERR(obj);
> > > > > > > @@ -710,13 +733,13 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       /* Insert a pair of pages across every pot
> > > > > > > boundary within the hole */
> > > > > > >       for (pot = fls64(hole_end - 1) - 1;
> > > > > > > -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
> > > > > > > +         pot > ilog2(2 * min_alignment);
> > > > > > >            pot--) {
> > > > > > >           u64 step = BIT_ULL(pot);
> > > > > > >           u64 addr;
> > > > > > > -        for (addr = round_up(hole_start +
> > > > > > > I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> > > > > > > -             addr <= round_down(hole_end -
> > > > > > > 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> > > > > > > +        for (addr = round_up(hole_start +
> > > > > > > min_alignment, step) - min_alignment;
> > > > > > > +             addr <= round_down(hole_end - (2 *
> > > > > > > min_alignment), step) - min_alignment;
> > > > > > >                addr += step) {
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > > @@ -761,6 +784,7 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                 unsigned long end_time)
> > > > > > >   {
> > > > > > >       I915_RND_STATE(prng);
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned int size;
> > > > > > >       unsigned long flags;
> > > > > > > @@ -768,15 +792,18 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       /* Keep creating larger objects until one
> > > > > > > cannot fit into the hole */
> > > > > > >       for (size = 12; (hole_end - hole_start) >> size; size++) {
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           unsigned int *order, count, n;
> > > > > > >           struct i915_vma *vma;
> > > > > > > -        u64 hole_size;
> > > > > > > +        u64 hole_size, aligned_size;
> > > > > > >           int err = -ENODEV;
> > > > > > > -        hole_size = (hole_end - hole_start) >> size;
> > > > > > > +        aligned_size = max_t(u32, ilog2(min_alignment), size);
> > > > > > > +        hole_size = (hole_end - hole_start) >> aligned_size;
> > > > > > >           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
> > > > > > >               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
> > > > > > >           count = hole_size >> 1;
> > > > > > > @@ -816,7 +843,7 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           GEM_BUG_ON(vma->size != BIT_ULL(size));
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > > @@ -868,11 +895,14 @@ static int
> > > > > > > __shrink_hole(struct i915_address_space *vm,
> > > > > > >   {
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > >       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned int order = 12;
> > > > > > >       LIST_HEAD(objects);
> > > > > > >       int err = 0;
> > > > > > >       u64 addr;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       /* Keep creating larger objects until one
> > > > > > > cannot fit into the hole */
> > > > > > >       for (addr = hole_start; addr < hole_end; ) {
> > > > > > >           struct i915_vma *vma;
> > > > > > > @@ -913,7 +943,7 @@ static int __shrink_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           }
> > > > > > >           i915_vma_unpin(vma);
> > > > > > > -        addr += size;
> > > > > > > +        addr += round_up(size, min_alignment);
> > > > > > >           /*
> > > > > > >            * Since we are injecting allocation
> > > > > > > faults at random intervals,
> > > > > > > -- 
> > > > > > > 2.25.1
> > > > > > > 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-01-20 16:29                 ` C, Ramalingam
  0 siblings, 0 replies; 50+ messages in thread
From: C, Ramalingam @ 2022-01-20 16:29 UTC (permalink / raw)
  To: Robert Beckett
  Cc: Tvrtko Ursulin, dri-devel, David Airlie, intel-gfx, linux-kernel,
	Auld, Matthew, Vivi, Rodrigo

On 2022-01-20 at 16:09:01 +0000, Robert Beckett wrote:
> 
> 
> On 20/01/2022 15:58, Matthew Auld wrote:
> > On 20/01/2022 15:44, Robert Beckett wrote:
> > > 
> > > 
> > > On 20/01/2022 14:59, Matthew Auld wrote:
> > > > On 20/01/2022 13:15, Robert Beckett wrote:
> > > > > 
> > > > > 
> > > > > On 20/01/2022 11:46, Ramalingam C wrote:
> > > > > > On 2022-01-18 at 17:50:34 +0000, Robert Beckett wrote:
> > > > > > > From: Matthew Auld <matthew.auld@intel.com>
> > > > > > > 
> > > > > > > For local-memory objects we need to align the GTT addresses
> > > > > > > to 64K, both for the ppgtt and ggtt.
> > > > > > > 
> > > > > > > We need to support vm->min_alignment > 4K, depending
> > > > > > > on the vm itself and the type of object we are inserting.
> > > > > > > With this in mind update the GTT selftests to take this
> > > > > > > into account.
> > > > > > > 
> > > > > > > For DG2 we further align and pad lmem object GTT addresses
> > > > > > > to 2MB to ensure PDEs contain consistent page sizes as
> > > > > > > required by the HW.
> > > > > > > 
> > > > > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > > > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > > > > > > Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> > > > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > > ---
> > > > > > >   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
> > > > > > >   drivers/gpu/drm/i915/gt/intel_gtt.c           | 14 +++
> > > > > > >   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++
> > > > > > >   drivers/gpu/drm/i915/i915_vma.c               | 14 +++
> > > > > > >   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96
> > > > > > > ++++++++++++-------
> > > > > > >   5 files changed, 115 insertions(+), 41 deletions(-)
> > > > > > > 
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > index c08f766e6e15..7fee95a65414 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> > > > > > > @@ -39,6 +39,7 @@ struct tiled_blits {
> > > > > > >       struct blit_buffer scratch;
> > > > > > >       struct i915_vma *batch;
> > > > > > >       u64 hole;
> > > > > > > +    u64 align;
> > > > > > >       u32 width;
> > > > > > >       u32 height;
> > > > > > >   };
> > > > > > > @@ -410,14 +411,21 @@ tiled_blits_create(struct
> > > > > > > intel_engine_cs *engine, struct rnd_state *prng)
> > > > > > >           goto err_free;
> > > > > > >       }
> > > > > > > -    hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> > > > > > > +    t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst
> > > > > > > case, derive from vm! */
> > > > > > > +    t->align = max(t->align,
> > > > > > > +               i915_vm_min_alignment(t->ce->vm,
> > > > > > > INTEL_MEMORY_LOCAL));
> > > > > > > +    t->align = max(t->align,
> > > > > > > +               i915_vm_min_alignment(t->ce->vm,
> > > > > > > INTEL_MEMORY_SYSTEM));
> > > > > > > +
> > > > > > > +    hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
> > > > > > >       hole_size *= 2; /* room to maneuver */
> > > > > > > -    hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> > > > > > > +    hole_size += 2 * t->align; /* padding on either side */
> > > > > > >       mutex_lock(&t->ce->vm->mutex);
> > > > > > >       memset(&hole, 0, sizeof(hole));
> > > > > > >       err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> > > > > > > -                      hole_size, 0, I915_COLOR_UNEVICTABLE,
> > > > > > > +                      hole_size, t->align,
> > > > > > > +                      I915_COLOR_UNEVICTABLE,
> > > > > > >                         0, U64_MAX,
> > > > > > >                         DRM_MM_INSERT_BEST);
> > > > > > >       if (!err)
> > > > > > > @@ -428,7 +436,7 @@ tiled_blits_create(struct
> > > > > > > intel_engine_cs *engine, struct rnd_state *prng)
> > > > > > >           goto err_put;
> > > > > > >       }
> > > > > > > -    t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> > > > > > > +    t->hole = hole.start + t->align;
> > > > > > >       pr_info("Using hole at %llx\n", t->hole);
> > > > > > >       err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> > > > > > > @@ -455,7 +463,7 @@ static void
> > > > > > > tiled_blits_destroy(struct tiled_blits *t)
> > > > > > >   static int tiled_blits_prepare(struct tiled_blits *t,
> > > > > > >                      struct rnd_state *prng)
> > > > > > >   {
> > > > > > > -    u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> > > > > > > +    u64 offset = round_up(t->width * t->height * 4, t->align);
> > > > > > >       u32 *map;
> > > > > > >       int err;
> > > > > > >       int i;
> > > > > > > @@ -486,8 +494,7 @@ static int
> > > > > > > tiled_blits_prepare(struct tiled_blits *t,
> > > > > > >   static int tiled_blits_bounce(struct tiled_blits
> > > > > > > *t, struct rnd_state *prng)
> > > > > > >   {
> > > > > > > -    u64 offset =
> > > > > > > -        round_up(t->width * t->height * 4, 2 *
> > > > > > > I915_GTT_MIN_ALIGNMENT);
> > > > > > > +    u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
> > > > > > >       int err;
> > > > > > >       /* We want to check position invariant tiling
> > > > > > > across GTT eviction */
> > > > > > > @@ -500,7 +507,7 @@ static int
> > > > > > > tiled_blits_bounce(struct tiled_blits *t, struct
> > > > > > > rnd_state *prng)
> > > > > > >       /* Reposition so that we overlap the old
> > > > > > > addresses, and slightly off */
> > > > > > >       err = tiled_blit(t,
> > > > > > > -             &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> > > > > > > +             &t->buffers[2], t->hole + t->align,
> > > > > > >                &t->buffers[1], t->hole + 3 * offset / 2);
> > > > > > >       if (err)
> > > > > > >           return err;
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > index 46be4197b93f..7c92b25c0f26 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > > > > > > @@ -223,6 +223,20 @@ void
> > > > > > > i915_address_space_init(struct i915_address_space
> > > > > > > *vm, int subclass)
> > > > > > >       GEM_BUG_ON(!vm->total);
> > > > > > >       drm_mm_init(&vm->mm, 0, vm->total);
> > > > > > > +
> > > > > > > +    memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> > > > > > > +         ARRAY_SIZE(vm->min_alignment));
> > > > > > > +
> > > > > > > +    if (HAS_64K_PAGES(vm->i915)) {
> > > > > > > +        if (IS_DG2(vm->i915)) {
> > > > > > I think we need this 2M alignment for all platform with HAS_64K_PAGES.
> > > > > > Not only for DG2.
> > > > > 
> > > > > really? can we get confirmation of this?
> > > > > this contradicts the documentation in patch 4, which you
> > > > > reviewed, so I am confused now
> > > > 
> > > > Starting from DG2, some platforms will have this new 64K GTT
> > > > page size restriction when dealing with LMEM. The
> > > > HAS_64K_PAGES() macro is meant to cover exactly that, AFAIK.
> > > 
> > > As I understood it, 64K pages only are a requirement going forward
> > > for discrete cards, but the restriction of nt sharing pdes with 4k
> > > and 64k pages was specific to DG2.
> > > 
> > > e.g.  xehpsdv is also defined as having 64k pages. And others in
> > > future are likely to, but without the PDE sharing restrictions.
> > 
> > Yeah, pretty much. But there is one other platform lurking.
> > 
> >  From chatting with Ram, it might also make sense to disentangle
> > HAS_64K_PAGES(), since it currently means both that we need min 64K page
> > granularity, and that there is this compact-pt layout thing which
> > doesn't allow mixing 64K and 4K in the same page-table.
> 
> okay, so it sounds to me like the IS_DG2 check here is appropriate. Other
> 64K page systems will not have the 2MB alignment requirement.
> 
> If any future platform does require compact-pt layout, when adding that
> plaform, we can then add a HAS_COMPACT_PT macro or something, which would be
> set for DG2 and the future platform.
> 
> For now, this code seems correct to me as it currently only affects DG2.

As matt mentioned IMHO we need to split the requirement of 64k min pagesize
for lmem from compact pt requirement to address existing and future
platform needs.

Just added another flag called needs_compact_pt in below mentioned patch, which will be set for
DG2 and XEHPSDV. If (NEEDS_COMPACT_PT() && HAS_64K_PAGES()) then we can
align to 2MB else to 64K.

Please have a look at
https://patchwork.freedesktop.org/series/99105/

Ram
> > 
> > > 
> > > If this is not the case, and all 64K page devices will also
> > > necessitate not sharing PDEs, then we can just use the HAS_64K_PAGES
> > > and use 2MB everywhere, but so far this sounds unconfirmed.
> > > 
> > > > 
> > > > > 
> > > > > > > +            vm->min_alignment[INTEL_MEMORY_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_2M;
> > > > > > > +           
> > > > > > > vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_2M;
> > > > > > > +        } else {
> > > > > > > +            vm->min_alignment[INTEL_MEMORY_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_64K;
> > > > > > > +           
> > > > > > > vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] =
> > > > > > > I915_GTT_PAGE_SIZE_64K;
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > >       vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
> > > > > > >       INIT_LIST_HEAD(&vm->bound_list);
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > index 8073438b67c8..b8da2514d601 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > > > > > > @@ -29,6 +29,8 @@
> > > > > > >   #include "i915_selftest.h"
> > > > > > >   #include "i915_vma_resource.h"
> > > > > > >   #include "i915_vma_types.h"
> > > > > > > +#include "i915_params.h"
> > > > > > > +#include "intel_memory_region.h"
> > > > > > >   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL |
> > > > > > > __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
> > > > > > > @@ -223,6 +225,7 @@ struct i915_address_space {
> > > > > > >       struct device *dma;
> > > > > > >       u64 total;        /* size addr space maps (ex. 2GB for ggtt) */
> > > > > > >       u64 reserved;        /* size addr space reserved */
> > > > > > > +    u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
> > > > > > >       unsigned int bind_async_flags;
> > > > > > > @@ -384,6 +387,12 @@ i915_vm_has_scratch_64K(struct
> > > > > > > i915_address_space *vm)
> > > > > > >       return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
> > > > > > >   }
> > > > > > > +static inline u64 i915_vm_min_alignment(struct
> > > > > > > i915_address_space *vm,
> > > > > > > +                    enum intel_memory_type type)
> > > > > > > +{
> > > > > > > +    return vm->min_alignment[type];
> > > > > > > +}
> > > > > > > +
> > > > > > >   static inline bool
> > > > > > >   i915_vm_has_cache_coloring(struct i915_address_space *vm)
> > > > > > >   {
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > b/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > index 1f15c3298112..9ac92e7a3566 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > > > > > @@ -756,6 +756,20 @@ i915_vma_insert(struct i915_vma
> > > > > > > *vma, u64 size, u64 alignment, u64 flags)
> > > > > > >       }
> > > > > > >       color = 0;
> > > > > > > +
> > > > > > > +    if (HAS_64K_PAGES(vma->vm->i915) &&
> > > > > > > i915_gem_object_is_lmem(vma->obj)) {
> > > > > > > +        alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
> > > > > > > +        /*
> > > > > > > +         * DG2 can not have different sized pages
> > > > > > > in any given PDE (2MB range).
> > > > > > > +         * Keeping things simple, we force any lmem
> > > > > > > object to reserve
> > > > > > > +         * 2MB chunks, preventing any smaller pages
> > > > > > > being used alongside
> > > > > > > +         */
> > > > > > > +        if (IS_DG2(vma->vm->i915)) {
> > > > > > Similarly here we dont need special case for DG2.
> > > > > > 
> > > > > > Ram
> > > > > > > +            alignment = max(alignment, I915_GTT_PAGE_SIZE_2M);
> > > > > > > +            size = round_up(size, I915_GTT_PAGE_SIZE_2M);
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > >       if (i915_vm_has_cache_coloring(vma->vm))
> > > > > > >           color = vma->obj->cache_level;
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > index 076d860ce01a..2f3f0c01786b 100644
> > > > > > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> > > > > > > @@ -238,6 +238,8 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                u64 hole_start, u64 hole_end,
> > > > > > >                unsigned long end_time)
> > > > > > >   {
> > > > > > > +    const unsigned int min_alignment =
> > > > > > > +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > >       I915_RND_STATE(seed_prng);
> > > > > > >       struct i915_vma_resource *mock_vma_res;
> > > > > > >       unsigned int size;
> > > > > > > @@ -251,9 +253,10 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           I915_RND_SUBSTATE(prng, seed_prng);
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           unsigned int *order, count, n;
> > > > > > > -        u64 hole_size;
> > > > > > > +        u64 hole_size, aligned_size;
> > > > > > > -        hole_size = (hole_end - hole_start) >> size;
> > > > > > > +        aligned_size = max_t(u32, ilog2(min_alignment), size);
> > > > > > > +        hole_size = (hole_end - hole_start) >> aligned_size;
> > > > > > >           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
> > > > > > >               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
> > > > > > >           count = hole_size >> 1;
> > > > > > > @@ -274,8 +277,8 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           }
> > > > > > >           GEM_BUG_ON(!order);
> > > > > > > -        GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
> > > > > > > -        GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
> > > > > > > +        GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
> > > > > > > +        GEM_BUG_ON(hole_start + count *
> > > > > > > BIT_ULL(aligned_size) > hole_end);
> > > > > > >           /* Ignore allocation failures (i.e. don't report them as
> > > > > > >            * a test failure) as we are purposefully allocating very
> > > > > > > @@ -298,10 +301,10 @@ static int
> > > > > > > lowlevel_hole(struct i915_address_space *vm,
> > > > > > >           }
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               intel_wakeref_t wakeref;
> > > > > > > -            GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> > > > > > > +            GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
> > > > > > >               if (igt_timeout(end_time,
> > > > > > >                       "%s timed out before %d/%d\n",
> > > > > > > @@ -344,7 +347,7 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >               }
> > > > > > >               mock_vma_res->bi.pages = obj->mm.pages;
> > > > > > > -            mock_vma_res->node_size = BIT_ULL(size);
> > > > > > > +            mock_vma_res->node_size = BIT_ULL(aligned_size);
> > > > > > >               mock_vma_res->start = addr;
> > > > > > >               with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
> > > > > > > @@ -355,7 +358,7 @@ static int lowlevel_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           i915_random_reorder(order, count, &prng);
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               intel_wakeref_t wakeref;
> > > > > > >               GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
> > > > > > > @@ -399,8 +402,10 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >   {
> > > > > > >       const u64 hole_size = hole_end - hole_start;
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > > +    const unsigned int min_alignment =
> > > > > > > +        i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > >       const unsigned long max_pages =
> > > > > > > -        min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
> > > > > > > +        min_t(u64, ULONG_MAX - 1, (hole_size / 2)
> > > > > > > >> ilog2(min_alignment));
> > > > > > >       const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
> > > > > > >       unsigned long npages, prime, flags;
> > > > > > >       struct i915_vma *vma;
> > > > > > > @@ -441,14 +446,17 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry(obj, &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       err = i915_vma_pin(vma, 0, 0, offset | flags);
> > > > > > > @@ -470,22 +478,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       i915_vma_unpin(vma);
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry(obj, &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       if (!drm_mm_node_allocated(&vma->node) ||
> > > > > > > @@ -506,22 +517,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       }
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry_reverse(obj,
> > > > > > > &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       err = i915_vma_pin(vma, 0, 0, offset | flags);
> > > > > > > @@ -543,22 +557,25 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       i915_vma_unpin(vma);
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >                   offset = p->offset;
> > > > > > >                   list_for_each_entry_reverse(obj,
> > > > > > > &objects, st_link) {
> > > > > > > +                    u64 aligned_size = round_up(obj->base.size,
> > > > > > > +                                    min_alignment);
> > > > > > > +
> > > > > > >                       vma = i915_vma_instance(obj, vm, NULL);
> > > > > > >                       if (IS_ERR(vma))
> > > > > > >                           continue;
> > > > > > >                       if (p->step < 0) {
> > > > > > > -                        if (offset < hole_start + obj->base.size)
> > > > > > > +                        if (offset < hole_start + aligned_size)
> > > > > > >                               break;
> > > > > > > -                        offset -= obj->base.size;
> > > > > > > +                        offset -= aligned_size;
> > > > > > >                       }
> > > > > > >                       if (!drm_mm_node_allocated(&vma->node) ||
> > > > > > > @@ -579,9 +596,9 @@ static int fill_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                       }
> > > > > > >                       if (p->step > 0) {
> > > > > > > -                        if (offset + obj->base.size > hole_end)
> > > > > > > +                        if (offset + aligned_size > hole_end)
> > > > > > >                               break;
> > > > > > > -                        offset += obj->base.size;
> > > > > > > +                        offset += aligned_size;
> > > > > > >                       }
> > > > > > >                   }
> > > > > > >               }
> > > > > > > @@ -611,6 +628,7 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       const u64 hole_size = hole_end - hole_start;
> > > > > > >       const unsigned long max_pages =
> > > > > > >           min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
> > > > > > > +    unsigned long min_alignment;
> > > > > > >       unsigned long flags;
> > > > > > >       u64 size;
> > > > > > > @@ -620,6 +638,8 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       for_each_prime_number_from(size, 1, max_pages) {
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           struct i915_vma *vma;
> > > > > > > @@ -638,7 +658,7 @@ static int walk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           for (addr = hole_start;
> > > > > > >                addr + obj->base.size < hole_end;
> > > > > > > -             addr += obj->base.size) {
> > > > > > > +             addr += round_up(obj->base.size, min_alignment)) {
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > >                   pr_err("%s bind failed at %llx +
> > > > > > > %llx [hole %llx- %llx] with err=%d\n",
> > > > > > > @@ -690,6 +710,7 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >   {
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > >       struct i915_vma *vma;
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned long flags;
> > > > > > >       unsigned int pot;
> > > > > > >       int err = 0;
> > > > > > > @@ -698,6 +719,8 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       obj =
> > > > > > > i915_gem_object_create_internal(vm->i915, 2 *
> > > > > > > I915_GTT_PAGE_SIZE);
> > > > > > >       if (IS_ERR(obj))
> > > > > > >           return PTR_ERR(obj);
> > > > > > > @@ -710,13 +733,13 @@ static int pot_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       /* Insert a pair of pages across every pot
> > > > > > > boundary within the hole */
> > > > > > >       for (pot = fls64(hole_end - 1) - 1;
> > > > > > > -         pot > ilog2(2 * I915_GTT_PAGE_SIZE);
> > > > > > > +         pot > ilog2(2 * min_alignment);
> > > > > > >            pot--) {
> > > > > > >           u64 step = BIT_ULL(pot);
> > > > > > >           u64 addr;
> > > > > > > -        for (addr = round_up(hole_start +
> > > > > > > I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> > > > > > > -             addr <= round_down(hole_end -
> > > > > > > 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
> > > > > > > +        for (addr = round_up(hole_start +
> > > > > > > min_alignment, step) - min_alignment;
> > > > > > > +             addr <= round_down(hole_end - (2 *
> > > > > > > min_alignment), step) - min_alignment;
> > > > > > >                addr += step) {
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > > @@ -761,6 +784,7 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >                 unsigned long end_time)
> > > > > > >   {
> > > > > > >       I915_RND_STATE(prng);
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned int size;
> > > > > > >       unsigned long flags;
> > > > > > > @@ -768,15 +792,18 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >       if (i915_is_ggtt(vm))
> > > > > > >           flags |= PIN_GLOBAL;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       /* Keep creating larger objects until one
> > > > > > > cannot fit into the hole */
> > > > > > >       for (size = 12; (hole_end - hole_start) >> size; size++) {
> > > > > > >           struct drm_i915_gem_object *obj;
> > > > > > >           unsigned int *order, count, n;
> > > > > > >           struct i915_vma *vma;
> > > > > > > -        u64 hole_size;
> > > > > > > +        u64 hole_size, aligned_size;
> > > > > > >           int err = -ENODEV;
> > > > > > > -        hole_size = (hole_end - hole_start) >> size;
> > > > > > > +        aligned_size = max_t(u32, ilog2(min_alignment), size);
> > > > > > > +        hole_size = (hole_end - hole_start) >> aligned_size;
> > > > > > >           if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
> > > > > > >               hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
> > > > > > >           count = hole_size >> 1;
> > > > > > > @@ -816,7 +843,7 @@ static int drunk_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           GEM_BUG_ON(vma->size != BIT_ULL(size));
> > > > > > >           for (n = 0; n < count; n++) {
> > > > > > > -            u64 addr = hole_start + order[n] * BIT_ULL(size);
> > > > > > > +            u64 addr = hole_start + order[n] *
> > > > > > > BIT_ULL(aligned_size);
> > > > > > >               err = i915_vma_pin(vma, 0, 0, addr | flags);
> > > > > > >               if (err) {
> > > > > > > @@ -868,11 +895,14 @@ static int
> > > > > > > __shrink_hole(struct i915_address_space *vm,
> > > > > > >   {
> > > > > > >       struct drm_i915_gem_object *obj;
> > > > > > >       unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
> > > > > > > +    unsigned int min_alignment;
> > > > > > >       unsigned int order = 12;
> > > > > > >       LIST_HEAD(objects);
> > > > > > >       int err = 0;
> > > > > > >       u64 addr;
> > > > > > > +    min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
> > > > > > > +
> > > > > > >       /* Keep creating larger objects until one
> > > > > > > cannot fit into the hole */
> > > > > > >       for (addr = hole_start; addr < hole_end; ) {
> > > > > > >           struct i915_vma *vma;
> > > > > > > @@ -913,7 +943,7 @@ static int __shrink_hole(struct
> > > > > > > i915_address_space *vm,
> > > > > > >           }
> > > > > > >           i915_vma_unpin(vma);
> > > > > > > -        addr += size;
> > > > > > > +        addr += round_up(size, min_alignment);
> > > > > > >           /*
> > > > > > >            * Since we are injecting allocation
> > > > > > > faults at random intervals,
> > > > > > > -- 
> > > > > > > 2.25.1
> > > > > > > 

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-01-20 16:29 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 17:50 [PATCH v2 0/4] discsrete card 64K page support Robert Beckett
2022-01-18 17:50 ` [Intel-gfx] " Robert Beckett
2022-01-18 17:50 ` [PATCH v2 1/4] drm/i915: enforce min GTT alignment for discrete cards Robert Beckett
2022-01-18 17:50   ` [Intel-gfx] " Robert Beckett
2022-01-18 17:50   ` Robert Beckett
2022-01-20 11:46   ` Ramalingam C
2022-01-20 11:46     ` [Intel-gfx] " Ramalingam C
2022-01-20 11:46     ` Ramalingam C
2022-01-20 13:15     ` Robert Beckett
2022-01-20 13:15       ` [Intel-gfx] " Robert Beckett
2022-01-20 13:15       ` Robert Beckett
2022-01-20 14:59       ` Matthew Auld
2022-01-20 14:59         ` [Intel-gfx] " Matthew Auld
2022-01-20 14:59         ` Matthew Auld
2022-01-20 15:44         ` Robert Beckett
2022-01-20 15:44           ` [Intel-gfx] " Robert Beckett
2022-01-20 15:44           ` Robert Beckett
2022-01-20 15:58           ` Matthew Auld
2022-01-20 15:58             ` [Intel-gfx] " Matthew Auld
2022-01-20 15:58             ` Matthew Auld
2022-01-20 16:09             ` Robert Beckett
2022-01-20 16:09               ` [Intel-gfx] " Robert Beckett
2022-01-20 16:09               ` Robert Beckett
2022-01-20 16:25               ` Matthew Auld
2022-01-20 16:25                 ` [Intel-gfx] " Matthew Auld
2022-01-20 16:25                 ` Matthew Auld
2022-01-20 16:29               ` C, Ramalingam
2022-01-20 16:29                 ` C, Ramalingam
2022-01-20 16:29                 ` [Intel-gfx] " C, Ramalingam
2022-01-18 17:50 ` [PATCH v2 2/4] drm/i915: support 64K GTT pages " Robert Beckett
2022-01-18 17:50   ` [Intel-gfx] " Robert Beckett
2022-01-18 17:50   ` Robert Beckett
2022-01-18 17:50 ` [PATCH v2 3/4] drm/i915: add gtt misalignment test Robert Beckett
2022-01-18 17:50   ` [Intel-gfx] " Robert Beckett
2022-01-18 17:50   ` Robert Beckett
2022-01-18 17:50 ` [PATCH v2 4/4] drm/i915/uapi: document behaviour for DG2 64K support Robert Beckett
2022-01-18 17:50   ` [Intel-gfx] " Robert Beckett
2022-01-18 17:50   ` Robert Beckett
2022-01-19 18:36   ` Jordan Justen
2022-01-19 18:36     ` [Intel-gfx] " Jordan Justen
2022-01-19 18:36     ` Jordan Justen
2022-01-19 19:49     ` Robert Beckett
2022-01-19 19:49       ` [Intel-gfx] " Robert Beckett
2022-01-19 19:49       ` Robert Beckett
2022-01-20 11:53   ` Ramalingam C
2022-01-20 11:53     ` [Intel-gfx] " Ramalingam C
2022-01-20 11:53     ` Ramalingam C
2022-01-18 18:02 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for discsrete card 64K page support Patchwork
2022-01-18 18:03 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-18 18:34 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.