All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] DG2 accelerated migration/clearing support
@ 2021-12-03 12:24 ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, adrian.larumbe, dri-devel

Enable accelerated moves and clearing on DG2. On such HW we have minimum page
size restrictions when accessing LMEM from the GTT, where we now have to use 64K
GTT pages or larger. With the ppGTT the page-table also has a slightly different
layout from past generations when using the 64K GTT mode(which is still enabled
on via some PDE bit), where it is now compacted down to 32 qword entries. Note
that on discrete the paging structures must also be placed in LMEM, and we need
to able to modify them via the GTT itself(see patch 7), which is one of the
complications here.

The series needs to be applied on top of the DG2 enabling branch:
https://cgit.freedesktop.org/~ramaling/linux/log/?h=dg2_enabling_ww49.3

Patches 2, 7 and 8 have a dependency on patches in that branch, but the rest can
likely already land if the direction makes sense.

Matthew Auld (8):
  drm/i915/migrate: don't check the scratch page
  drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  drm/i915/gtt: add gtt mappable plumbing
  drm/i915/migrate: fix offset calculation
  drm/i915/migrate: fix length calculation
  drm/i915/selftests: handle object rounding
  drm/i915/migrate: add acceleration support for DG2
  drm/i915/migrate: turn on acceleration for DG2

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  53 ++++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |   1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |   2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 196 ++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |   1 +
 drivers/gpu/drm/i915/gvt/scheduler.c          |   2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   4 +-
 15 files changed, 241 insertions(+), 63 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 0/8] DG2 accelerated migration/clearing support
@ 2021-12-03 12:24 ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: adrian.larumbe, dri-devel

Enable accelerated moves and clearing on DG2. On such HW we have minimum page
size restrictions when accessing LMEM from the GTT, where we now have to use 64K
GTT pages or larger. With the ppGTT the page-table also has a slightly different
layout from past generations when using the 64K GTT mode(which is still enabled
on via some PDE bit), where it is now compacted down to 32 qword entries. Note
that on discrete the paging structures must also be placed in LMEM, and we need
to able to modify them via the GTT itself(see patch 7), which is one of the
complications here.

The series needs to be applied on top of the DG2 enabling branch:
https://cgit.freedesktop.org/~ramaling/linux/log/?h=dg2_enabling_ww49.3

Patches 2, 7 and 8 have a dependency on patches in that branch, but the rest can
likely already land if the direction makes sense.

Matthew Auld (8):
  drm/i915/migrate: don't check the scratch page
  drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  drm/i915/gtt: add gtt mappable plumbing
  drm/i915/migrate: fix offset calculation
  drm/i915/migrate: fix length calculation
  drm/i915/selftests: handle object rounding
  drm/i915/migrate: add acceleration support for DG2
  drm/i915/migrate: turn on acceleration for DG2

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   4 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          |  53 ++++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |   1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |   2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   7 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   9 +
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 196 ++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |   1 +
 drivers/gpu/drm/i915/gvt/scheduler.c          |   2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   4 +-
 15 files changed, 241 insertions(+), 63 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/8] drm/i915/migrate: don't check the scratch page
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

The scratch page might not be allocated in LMEM(like on DG2), so instead
of using that as the deciding factor for where the paging structures
live, let's just query the pt before mapping it.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 765c6d48fe52..2d3188a398dd 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -13,7 +13,6 @@
 
 struct insert_pte_data {
 	u64 offset;
-	bool is_lmem;
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
@@ -41,7 +40,7 @@ static void insert_pte(struct i915_address_space *vm,
 	struct insert_pte_data *d = data;
 
 	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
-			d->is_lmem ? PTE_LM : 0);
+			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
 	d->offset += PAGE_SIZE;
 }
 
@@ -135,7 +134,6 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]);
 		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 1/8] drm/i915/migrate: don't check the scratch page
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

The scratch page might not be allocated in LMEM(like on DG2), so instead
of using that as the deciding factor for where the paging structures
live, let's just query the pt before mapping it.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 765c6d48fe52..2d3188a398dd 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -13,7 +13,6 @@
 
 struct insert_pte_data {
 	u64 offset;
-	bool is_lmem;
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
@@ -41,7 +40,7 @@ static void insert_pte(struct i915_address_space *vm,
 	struct insert_pte_data *d = data;
 
 	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
-			d->is_lmem ? PTE_LM : 0);
+			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
 	d->offset += PAGE_SIZE;
 }
 
@@ -135,7 +134,6 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]);
 		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

If this is LMEM then we get a 32 entry PT, with each PTE pointing to
some 64K block of memory, otherwise it's just the usual 512 entry PT.
This very much assumes the caller knows what they are doing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index bd3ca0996a23..312b2267bf87 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 		gen8_pdp_for_page_index(vm, idx);
 	struct i915_page_directory *pd =
 		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
 	gen8_pte_t *vaddr;
 
-	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
+	GEM_BUG_ON(pt->is_compact);
+
+	vaddr = px_vaddr(pt);
 	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
 	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
+					    dma_addr_t addr,
+					    u64 offset,
+					    enum i915_cache_level level,
+					    u32 flags)
+{
+	u64 idx = offset >> GEN8_PTE_SHIFT;
+	struct i915_page_directory * const pdp =
+		gen8_pdp_for_page_index(vm, idx);
+	struct i915_page_directory *pd =
+		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
+	gen8_pte_t *vaddr;
+
+	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
+	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
+
+	if (!pt->is_compact) {
+		vaddr = px_vaddr(pd);
+		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
+		pt->is_compact = true;
+	}
+
+	vaddr = px_vaddr(pt);
+	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+}
+
+static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
+				       dma_addr_t addr,
+				       u64 offset,
+				       enum i915_cache_level level,
+				       u32 flags)
+{
+	if (flags & PTE_LM)
+		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
+						       level, flags);
+
+	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+}
+
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	u32 pte_flags;
@@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
-	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
+	else
+		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
 	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
 	ppgtt->vm.clear_range = gen8_ppgtt_clear;
 	ppgtt->vm.foreach = gen8_ppgtt_foreach;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

If this is LMEM then we get a 32 entry PT, with each PTE pointing to
some 64K block of memory, otherwise it's just the usual 512 entry PT.
This very much assumes the caller knows what they are doing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index bd3ca0996a23..312b2267bf87 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 		gen8_pdp_for_page_index(vm, idx);
 	struct i915_page_directory *pd =
 		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
 	gen8_pte_t *vaddr;
 
-	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
+	GEM_BUG_ON(pt->is_compact);
+
+	vaddr = px_vaddr(pt);
 	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
 	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
+					    dma_addr_t addr,
+					    u64 offset,
+					    enum i915_cache_level level,
+					    u32 flags)
+{
+	u64 idx = offset >> GEN8_PTE_SHIFT;
+	struct i915_page_directory * const pdp =
+		gen8_pdp_for_page_index(vm, idx);
+	struct i915_page_directory *pd =
+		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
+	gen8_pte_t *vaddr;
+
+	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
+	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
+
+	if (!pt->is_compact) {
+		vaddr = px_vaddr(pd);
+		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
+		pt->is_compact = true;
+	}
+
+	vaddr = px_vaddr(pt);
+	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+}
+
+static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
+				       dma_addr_t addr,
+				       u64 offset,
+				       enum i915_cache_level level,
+				       u32 flags)
+{
+	if (flags & PTE_LM)
+		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
+						       level, flags);
+
+	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+}
+
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	u32 pte_flags;
@@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
-	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
+	else
+		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
 	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
 	ppgtt->vm.clear_range = gen8_ppgtt_clear;
 	ppgtt->vm.foreach = gen8_ppgtt_foreach;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

With object clearing/copying we need to be able to modify the PTEs on
the fly via some batch buffer, which means we need to be able to map the
paging structures(or at the very least the PT, but being able to also
map the PD might also be useful at some point) into the GTT. And since
the paging structures must reside in LMEM on discrete, we need to ensure
that these objects have correct physical alignment, as per any min page
restrictions, like on DG2. This is potentially costly, but this should
be limited to the special migrate_vm, which only needs to a few fixed
sized windows.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c     |  4 ++--
 drivers/gpu/drm/i915/gem/selftests/huge_pages.c |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c            |  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c            |  3 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h            |  1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c            |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c              |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c             |  7 +++++++
 drivers/gpu/drm/i915/gt/intel_gtt.h             |  9 +++++++++
 drivers/gpu/drm/i915/gt/intel_migrate.c         |  4 +++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c           | 17 ++++++++++++-----
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c    |  2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c            |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  4 ++--
 14 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ebd775cb1661..b394954726b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1559,7 +1559,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
 	} else if (HAS_FULL_PPGTT(i915)) {
 		struct i915_ppgtt *ppgtt;
 
-		ppgtt = i915_ppgtt_create(&i915->gt, 0);
+		ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
 		if (IS_ERR(ppgtt)) {
 			drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
 				PTR_ERR(ppgtt));
@@ -1742,7 +1742,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (args->flags)
 		return -EINVAL;
 
-	ppgtt = i915_ppgtt_create(&i915->gt, 0);
+	ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index bd8dc1a28022..c1b86c7a4754 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_mock_selftests(void)
 	mkwrite_device_info(dev_priv)->ppgtt_type = INTEL_PPGTT_FULL;
 	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
 
-	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
+	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
 	if (IS_ERR(ppgtt)) {
 		err = PTR_ERR(ppgtt);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index c0d149f04949..778472e563aa 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -443,7 +443,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
 
 	mutex_init(&ppgtt->flush);
 
-	ppgtt_init(&ppgtt->base, gt, 0);
+	ppgtt_init(&ppgtt->base, gt, 0, 0);
 	ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
 	ppgtt->base.vm.top = 1;
 
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 312b2267bf87..dfca803b4ff1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -912,6 +912,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
  *
  */
 struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags)
 {
 	struct i915_ppgtt *ppgtt;
@@ -921,7 +922,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
+	ppgtt_init(ppgtt, gt, vm_flags, lmem_pt_obj_flags);
 	ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
 	ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
 
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
index f541d19264b4..c0af12593576 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
@@ -13,6 +13,7 @@ struct intel_gt;
 enum i915_cache_level;
 
 struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags);
 
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 47f88f031749..938af60fd32f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -661,7 +661,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
 	struct i915_ppgtt *ppgtt;
 	int err;
 
-	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0);
+	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 510cda6a163f..991a514a1dc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -484,7 +484,7 @@ static void intel_gt_fini_scratch(struct intel_gt *gt)
 static struct i915_address_space *kernel_vm(struct intel_gt *gt)
 {
 	if (INTEL_PPGTT(gt->i915) > INTEL_PPGTT_ALIASING)
-		return &i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY)->vm;
+		return &i915_ppgtt_create(gt, 0, I915_BO_ALLOC_PM_EARLY)->vm;
 	else
 		return i915_vm_get(&gt->ggtt->vm);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 5447615fc6f3..d9bf53dc1d85 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -18,6 +18,13 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
 {
 	struct drm_i915_gem_object *obj;
 
+	if (vm->vm_flags & I915_VM_GTT_MAPPABLE) {
+		struct intel_memory_region *mr =
+			vm->i915->mm.regions[INTEL_REGION_LMEM];
+
+		sz = max_t(int, sz, mr->min_page_size);
+	}
+
 	/*
 	 * To avoid severe over-allocation when dealing with min_page_size
 	 * restrictions, we override that behaviour here by allowing an object
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index cbc0b5266cb4..eee97b46a1f9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -266,6 +266,13 @@ struct i915_address_space {
 	u8 pd_shift;
 	u8 scratch_order;
 
+/*
+ * Paging structures are going to accessed via the GTT itself, and therefore
+ * might need special alignment.
+ */
+#define I915_VM_GTT_MAPPABLE BIT(0)
+	unsigned long vm_flags;
+
 	/* Flags used when creating page-table objects for this vm */
 	unsigned long lmem_pt_obj_flags;
 
@@ -543,6 +550,7 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
 }
 
 void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
+		unsigned long vm_flags,
 		unsigned long lmem_pt_obj_flags);
 
 int i915_ggtt_probe_hw(struct drm_i915_private *i915);
@@ -562,6 +570,7 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
 int i915_ppgtt_init_hw(struct intel_gt *gt);
 
 struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags);
 
 void i915_ggtt_suspend_vm(struct i915_address_space *vm);
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 2d3188a398dd..d553b76b1168 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -78,7 +78,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * TODO: Add support for huge LMEM PTEs
 	 */
 
-	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
+	vm = i915_ppgtt_create(gt,
+			       I915_VM_GTT_MAPPABLE,
+			       I915_BO_ALLOC_PM_EARLY);
 	if (IS_ERR(vm))
 		return ERR_CAST(vm);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index b8238f5bc8b1..1218024dfd57 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -156,20 +156,25 @@ int i915_ppgtt_init_hw(struct intel_gt *gt)
 }
 
 static struct i915_ppgtt *
-__ppgtt_create(struct intel_gt *gt, unsigned long lmem_pt_obj_flags)
+__ppgtt_create(struct intel_gt *gt,
+	       unsigned long vm_flags,
+	       unsigned long lmem_pt_obj_flags)
 {
-	if (GRAPHICS_VER(gt->i915) < 8)
+	if (GRAPHICS_VER(gt->i915) < 8) {
+		WARN_ON_ONCE(vm_flags);
 		return gen6_ppgtt_create(gt);
-	else
-		return gen8_ppgtt_create(gt, lmem_pt_obj_flags);
+	} else {
+		return gen8_ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
+	}
 }
 
 struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags)
 {
 	struct i915_ppgtt *ppgtt;
 
-	ppgtt = __ppgtt_create(gt, lmem_pt_obj_flags);
+	ppgtt = __ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
 	if (IS_ERR(ppgtt))
 		return ppgtt;
 
@@ -301,6 +306,7 @@ int ppgtt_set_pages(struct i915_vma *vma)
 }
 
 void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
+		unsigned long vm_flags,
 		unsigned long lmem_pt_obj_flags)
 {
 	struct drm_i915_private *i915 = gt->i915;
@@ -309,6 +315,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
 	ppgtt->vm.i915 = i915;
 	ppgtt->vm.dma = i915->drm.dev;
 	ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
+	ppgtt->vm.vm_flags = vm_flags;
 	ppgtt->vm.lmem_pt_obj_flags = lmem_pt_obj_flags;
 
 	dma_resv_init(&ppgtt->vm._resv);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index e5ad4d5a91c0..8c299189e9cb 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -1600,7 +1600,7 @@ static int igt_reset_evict_ppgtt(void *arg)
 	if (INTEL_PPGTT(gt->i915) < INTEL_PPGTT_FULL)
 		return 0;
 
-	ppgtt = i915_ppgtt_create(gt, 0);
+	ppgtt = i915_ppgtt_create(gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 6c804102528b..d726eee3aba5 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -1386,7 +1386,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
 	enum intel_engine_id i;
 	int ret;
 
-	ppgtt = i915_ppgtt_create(&i915->gt, I915_BO_ALLOC_PM_EARLY);
+	ppgtt = i915_ppgtt_create(&i915->gt, 0, I915_BO_ALLOC_PM_EARLY);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fdb4bf88293b..3bcd2bb85d10 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -155,7 +155,7 @@ static int igt_ppgtt_alloc(void *arg)
 	if (!HAS_PPGTT(dev_priv))
 		return 0;
 
-	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
+	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
@@ -1083,7 +1083,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
+	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
 	if (IS_ERR(ppgtt)) {
 		err = PTR_ERR(ppgtt);
 		goto out_free;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

With object clearing/copying we need to be able to modify the PTEs on
the fly via some batch buffer, which means we need to be able to map the
paging structures(or at the very least the PT, but being able to also
map the PD might also be useful at some point) into the GTT. And since
the paging structures must reside in LMEM on discrete, we need to ensure
that these objects have correct physical alignment, as per any min page
restrictions, like on DG2. This is potentially costly, but this should
be limited to the special migrate_vm, which only needs to a few fixed
sized windows.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c     |  4 ++--
 drivers/gpu/drm/i915/gem/selftests/huge_pages.c |  2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c            |  2 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c            |  3 ++-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.h            |  1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c            |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt.c              |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c             |  7 +++++++
 drivers/gpu/drm/i915/gt/intel_gtt.h             |  9 +++++++++
 drivers/gpu/drm/i915/gt/intel_migrate.c         |  4 +++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c           | 17 ++++++++++++-----
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c    |  2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c            |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  4 ++--
 14 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ebd775cb1661..b394954726b0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1559,7 +1559,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
 	} else if (HAS_FULL_PPGTT(i915)) {
 		struct i915_ppgtt *ppgtt;
 
-		ppgtt = i915_ppgtt_create(&i915->gt, 0);
+		ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
 		if (IS_ERR(ppgtt)) {
 			drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
 				PTR_ERR(ppgtt));
@@ -1742,7 +1742,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
 	if (args->flags)
 		return -EINVAL;
 
-	ppgtt = i915_ppgtt_create(&i915->gt, 0);
+	ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index bd8dc1a28022..c1b86c7a4754 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_mock_selftests(void)
 	mkwrite_device_info(dev_priv)->ppgtt_type = INTEL_PPGTT_FULL;
 	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
 
-	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
+	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
 	if (IS_ERR(ppgtt)) {
 		err = PTR_ERR(ppgtt);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index c0d149f04949..778472e563aa 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -443,7 +443,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
 
 	mutex_init(&ppgtt->flush);
 
-	ppgtt_init(&ppgtt->base, gt, 0);
+	ppgtt_init(&ppgtt->base, gt, 0, 0);
 	ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
 	ppgtt->base.vm.top = 1;
 
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 312b2267bf87..dfca803b4ff1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -912,6 +912,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
  *
  */
 struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags)
 {
 	struct i915_ppgtt *ppgtt;
@@ -921,7 +922,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
+	ppgtt_init(ppgtt, gt, vm_flags, lmem_pt_obj_flags);
 	ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
 	ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
 
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
index f541d19264b4..c0af12593576 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
@@ -13,6 +13,7 @@ struct intel_gt;
 enum i915_cache_level;
 
 struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags);
 
 u64 gen8_ggtt_pte_encode(dma_addr_t addr,
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 47f88f031749..938af60fd32f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -661,7 +661,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
 	struct i915_ppgtt *ppgtt;
 	int err;
 
-	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0);
+	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 510cda6a163f..991a514a1dc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -484,7 +484,7 @@ static void intel_gt_fini_scratch(struct intel_gt *gt)
 static struct i915_address_space *kernel_vm(struct intel_gt *gt)
 {
 	if (INTEL_PPGTT(gt->i915) > INTEL_PPGTT_ALIASING)
-		return &i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY)->vm;
+		return &i915_ppgtt_create(gt, 0, I915_BO_ALLOC_PM_EARLY)->vm;
 	else
 		return i915_vm_get(&gt->ggtt->vm);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 5447615fc6f3..d9bf53dc1d85 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -18,6 +18,13 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
 {
 	struct drm_i915_gem_object *obj;
 
+	if (vm->vm_flags & I915_VM_GTT_MAPPABLE) {
+		struct intel_memory_region *mr =
+			vm->i915->mm.regions[INTEL_REGION_LMEM];
+
+		sz = max_t(int, sz, mr->min_page_size);
+	}
+
 	/*
 	 * To avoid severe over-allocation when dealing with min_page_size
 	 * restrictions, we override that behaviour here by allowing an object
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index cbc0b5266cb4..eee97b46a1f9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -266,6 +266,13 @@ struct i915_address_space {
 	u8 pd_shift;
 	u8 scratch_order;
 
+/*
+ * Paging structures are going to accessed via the GTT itself, and therefore
+ * might need special alignment.
+ */
+#define I915_VM_GTT_MAPPABLE BIT(0)
+	unsigned long vm_flags;
+
 	/* Flags used when creating page-table objects for this vm */
 	unsigned long lmem_pt_obj_flags;
 
@@ -543,6 +550,7 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
 }
 
 void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
+		unsigned long vm_flags,
 		unsigned long lmem_pt_obj_flags);
 
 int i915_ggtt_probe_hw(struct drm_i915_private *i915);
@@ -562,6 +570,7 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
 int i915_ppgtt_init_hw(struct intel_gt *gt);
 
 struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags);
 
 void i915_ggtt_suspend_vm(struct i915_address_space *vm);
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 2d3188a398dd..d553b76b1168 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -78,7 +78,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * TODO: Add support for huge LMEM PTEs
 	 */
 
-	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
+	vm = i915_ppgtt_create(gt,
+			       I915_VM_GTT_MAPPABLE,
+			       I915_BO_ALLOC_PM_EARLY);
 	if (IS_ERR(vm))
 		return ERR_CAST(vm);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index b8238f5bc8b1..1218024dfd57 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -156,20 +156,25 @@ int i915_ppgtt_init_hw(struct intel_gt *gt)
 }
 
 static struct i915_ppgtt *
-__ppgtt_create(struct intel_gt *gt, unsigned long lmem_pt_obj_flags)
+__ppgtt_create(struct intel_gt *gt,
+	       unsigned long vm_flags,
+	       unsigned long lmem_pt_obj_flags)
 {
-	if (GRAPHICS_VER(gt->i915) < 8)
+	if (GRAPHICS_VER(gt->i915) < 8) {
+		WARN_ON_ONCE(vm_flags);
 		return gen6_ppgtt_create(gt);
-	else
-		return gen8_ppgtt_create(gt, lmem_pt_obj_flags);
+	} else {
+		return gen8_ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
+	}
 }
 
 struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
+				     unsigned long vm_flags,
 				     unsigned long lmem_pt_obj_flags)
 {
 	struct i915_ppgtt *ppgtt;
 
-	ppgtt = __ppgtt_create(gt, lmem_pt_obj_flags);
+	ppgtt = __ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
 	if (IS_ERR(ppgtt))
 		return ppgtt;
 
@@ -301,6 +306,7 @@ int ppgtt_set_pages(struct i915_vma *vma)
 }
 
 void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
+		unsigned long vm_flags,
 		unsigned long lmem_pt_obj_flags)
 {
 	struct drm_i915_private *i915 = gt->i915;
@@ -309,6 +315,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
 	ppgtt->vm.i915 = i915;
 	ppgtt->vm.dma = i915->drm.dev;
 	ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
+	ppgtt->vm.vm_flags = vm_flags;
 	ppgtt->vm.lmem_pt_obj_flags = lmem_pt_obj_flags;
 
 	dma_resv_init(&ppgtt->vm._resv);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index e5ad4d5a91c0..8c299189e9cb 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -1600,7 +1600,7 @@ static int igt_reset_evict_ppgtt(void *arg)
 	if (INTEL_PPGTT(gt->i915) < INTEL_PPGTT_FULL)
 		return 0;
 
-	ppgtt = i915_ppgtt_create(gt, 0);
+	ppgtt = i915_ppgtt_create(gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 6c804102528b..d726eee3aba5 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -1386,7 +1386,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
 	enum intel_engine_id i;
 	int ret;
 
-	ppgtt = i915_ppgtt_create(&i915->gt, I915_BO_ALLOC_PM_EARLY);
+	ppgtt = i915_ppgtt_create(&i915->gt, 0, I915_BO_ALLOC_PM_EARLY);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fdb4bf88293b..3bcd2bb85d10 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -155,7 +155,7 @@ static int igt_ppgtt_alloc(void *arg)
 	if (!HAS_PPGTT(dev_priv))
 		return 0;
 
-	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
+	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
@@ -1083,7 +1083,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
+	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
 	if (IS_ERR(ppgtt)) {
 		err = PTR_ERR(ppgtt);
 		goto out_free;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/8] drm/i915/migrate: fix offset calculation
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

Ensure we add the engine base only after we calculate the qword offset
into the PTE window.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index d553b76b1168..cb0bb3b94644 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -284,10 +284,10 @@ static int emit_pte(struct i915_request *rq,
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
 	/* Compute the page directory offset for the target address range */
-	offset += (u64)rq->engine->instance << 32;
 	offset >>= 12;
 	offset *= sizeof(u64);
 	offset += 2 * CHUNK_SZ;
+	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
 	if (IS_ERR(cs))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 4/8] drm/i915/migrate: fix offset calculation
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

Ensure we add the engine base only after we calculate the qword offset
into the PTE window.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index d553b76b1168..cb0bb3b94644 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -284,10 +284,10 @@ static int emit_pte(struct i915_request *rq,
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
 	/* Compute the page directory offset for the target address range */
-	offset += (u64)rq->engine->instance << 32;
 	offset >>= 12;
 	offset *= sizeof(u64);
 	offset += 2 * CHUNK_SZ;
+	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
 	if (IS_ERR(cs))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/8] drm/i915/migrate: fix length calculation
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

No need to insert PTEs for the PTE window itself, also foreach expects a
length not an end offset, which could be gigantic here with a second
engine.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index cb0bb3b94644..2076e24e0489 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -136,7 +136,7 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
+		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
 	}
 
 	return &vm->vm;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 5/8] drm/i915/migrate: fix length calculation
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

No need to insert PTEs for the PTE window itself, also foreach expects a
length not an end offset, which could be gigantic here with a second
engine.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index cb0bb3b94644..2076e24e0489 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -136,7 +136,7 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
+		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
 	}
 
 	return &vm->vm;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/8] drm/i915/selftests: handle object rounding
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

Ensure we account for any object rounding due to min_page_size
restrictions.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_migrate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index 12ef2837c89b..e21787301bbd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -49,6 +49,7 @@ static int copy(struct intel_migrate *migrate,
 	if (IS_ERR(src))
 		return 0;
 
+	sz = src->base.size;
 	dst = i915_gem_object_create_internal(i915, sz);
 	if (IS_ERR(dst))
 		goto err_free_src;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 6/8] drm/i915/selftests: handle object rounding
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

Ensure we account for any object rounding due to min_page_size
restrictions.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_migrate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index 12ef2837c89b..e21787301bbd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -49,6 +49,7 @@ static int copy(struct intel_migrate *migrate,
 	if (IS_ERR(src))
 		return 0;
 
+	sz = src->base.size;
 	dst = i915_gem_object_create_internal(i915, sz);
 	if (IS_ERR(dst))
 		goto err_free_src;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 7/8] drm/i915/migrate: add acceleration support for DG2
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

This is all kinds of awkward since we now have to contend with using 64K
GTT pages when mapping anything in LMEM(including the page-tables
themselves).

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 186 +++++++++++++++++++-----
 1 file changed, 147 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 2076e24e0489..a804c57b61df 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -33,6 +33,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
 	return true;
 }
 
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
+				struct i915_page_table *pt,
+				void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
+	 * we have a correctly setup PDE structure for later use.
+	 */
+	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	GEM_BUG_ON(!pt->is_compact);
+	d->offset += SZ_2M;
+}
+
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
+			       struct i915_page_table *pt,
+			       void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * We are playing tricks here, since the actual pt, from the hw
+	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
+	 * entries, but we are still guaranteed that the physical
+	 * alignment is 64K underneath for the pt, and we are careful
+	 * not to access the space in the void.
+	 */
+	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	d->offset += SZ_64K;
+}
+
 static void insert_pte(struct i915_address_space *vm,
 		       struct i915_page_table *pt,
 		       void *data)
@@ -75,7 +107,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * i.e. within the same non-preemptible window so that we do not switch
 	 * to another migration context that overwrites the PTE.
 	 *
-	 * TODO: Add support for huge LMEM PTEs
+	 * On platforms with HAS_64K_PAGES support we have three windows, and
+	 * dedicate two windows just for mapping lmem pages(smem <-> smem is not
+	 * a thing), since we are forced to use 64K GTT pages underneath which
+	 * requires also modifying the PDE. An alternative might be to instead
+	 * map the PD into the GTT, and then on the fly toggle the 4K/64K mode
+	 * in the PDE from the same batch that also modifies the PTEs.
 	 */
 
 	vm = i915_ppgtt_create(gt,
@@ -108,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
 		 * 4x2 page directories for source/destination.
 		 */
-		sz = 2 * CHUNK_SZ;
+		if (HAS_64K_PAGES(gt->i915))
+			sz = 3 * CHUNK_SZ;
+		else
+			sz = 2 * CHUNK_SZ;
 		d.offset = base + sz;
 
 		/*
 		 * We need another page directory setup so that we can write
 		 * the 8x512 PTE in each chunk.
 		 */
-		sz += (sz >> 12) * sizeof(u64);
+		if (HAS_64K_PAGES(gt->i915))
+			sz += (sz / SZ_2M) * SZ_64K;
+		else
+			sz += (sz >> 12) * sizeof(u64);
 
 		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
 		if (err)
@@ -136,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
+		if (HAS_64K_PAGES(gt->i915)) {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       xehpsdv_insert_pte, &d);
+			d.offset = base + CHUNK_SZ;
+			vm->vm.foreach(&vm->vm,
+				       d.offset,
+				       2 * CHUNK_SZ,
+				       xehpsdv_toggle_pdes, &d);
+		} else {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       insert_pte, &d);
+		}
 	}
 
 	return &vm->vm;
@@ -274,19 +328,38 @@ static int emit_pte(struct i915_request *rq,
 		    u64 offset,
 		    int length)
 {
+	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
-	int total = 0;
+	int pkt, dword_length;
+	u32 total = 0;
+	u32 page_size;
 	u32 *hdr, *cs;
-	int pkt;
 
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
+	page_size = I915_GTT_PAGE_SIZE;
+	dword_length = 0x400;
+
 	/* Compute the page directory offset for the target address range */
-	offset >>= 12;
-	offset *= sizeof(u64);
-	offset += 2 * CHUNK_SZ;
+	if (has_64K_pages) {
+		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
+
+		offset /= SZ_2M;
+		offset *= SZ_64K;
+		offset += 3 * CHUNK_SZ;
+
+		if (is_lmem) {
+			page_size = I915_GTT_PAGE_SIZE_64K;
+			dword_length = 0x40;
+		}
+	} else {
+		offset >>= 12;
+		offset *= sizeof(u64);
+		offset += 2 * CHUNK_SZ;
+	}
+
 	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
@@ -294,7 +367,7 @@ static int emit_pte(struct i915_request *rq,
 		return PTR_ERR(cs);
 
 	/* Pack as many PTE updates as possible into a single MI command */
-	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
 	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 	hdr = cs;
@@ -304,6 +377,8 @@ static int emit_pte(struct i915_request *rq,
 
 	do {
 		if (cs - hdr >= pkt) {
+			int dword_rem;
+
 			*hdr += cs - hdr - 2;
 			*cs++ = MI_NOOP;
 
@@ -315,7 +390,18 @@ static int emit_pte(struct i915_request *rq,
 			if (IS_ERR(cs))
 				return PTR_ERR(cs);
 
-			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+			dword_rem = dword_length;
+			if (has_64K_pages) {
+				if (IS_ALIGNED(total, SZ_2M)) {
+					offset = round_up(offset, SZ_64K);
+				} else {
+					dword_rem = SZ_2M - (total & (SZ_2M - 1));
+					dword_rem /= page_size;
+					dword_rem *= 2;
+				}
+			}
+
+			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
 			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 			hdr = cs;
@@ -324,13 +410,15 @@ static int emit_pte(struct i915_request *rq,
 			*cs++ = upper_32_bits(offset);
 		}
 
+		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
+
 		*cs++ = lower_32_bits(encode | it->dma);
 		*cs++ = upper_32_bits(encode | it->dma);
 
 		offset += 8;
-		total += I915_GTT_PAGE_SIZE;
+		total += page_size;
 
-		it->dma += I915_GTT_PAGE_SIZE;
+		it->dma += page_size;
 		if (it->dma >= it->max) {
 			it->sg = __sg_next(it->sg);
 			if (!it->sg || sg_dma_len(it->sg) == 0)
@@ -361,7 +449,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
-static int emit_copy(struct i915_request *rq, int size)
+static int emit_copy(struct i915_request *rq,
+		     u32 dst_offset, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
@@ -376,31 +465,31 @@ static int emit_copy(struct i915_request *rq, int size)
 		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else if (ver >= 8) {
 		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else {
 		GEM_BUG_ON(instance);
 		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 	}
 
 	intel_ring_advance(rq, cs);
@@ -428,6 +517,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -455,15 +545,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
-			       CHUNK_SZ);
+		src_offset = 0;
+		dst_offset = CHUNK_SZ;
+		if (HAS_64K_PAGES(ce->engine->i915)) {
+			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+			src_offset = 0;
+			dst_offset = 0;
+			if (src_is_lmem)
+				src_offset = CHUNK_SZ;
+			if (dst_is_lmem)
+				dst_offset = 2 * CHUNK_SZ;
+		}
+
+		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
 		}
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       CHUNK_SZ, len);
+			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
 		if (err < len) {
@@ -475,7 +578,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, len);
+		err = emit_copy(rq, dst_offset, src_offset, len);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -573,18 +676,20 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
 }
 
 static int emit_clear(struct i915_request *rq,
+		      u64 offset,
 		      int size,
 		      u32 value,
 		      bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
-	u32 instance = rq->engine->instance;
-	u32 *cs;
 	struct drm_i915_private *i915 = rq->engine->i915;
+	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 num_ccs_blks, ccs_ring_size;
+	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
+	offset += (u64)rq->engine->instance << 32;
+
 	/* Clear flat css only when value is 0 */
 	ccs_ring_size = (is_lmem && !value) ?
 			 calc_ctrl_surf_instr_size(i915, size)
@@ -599,17 +704,17 @@ static int emit_clear(struct i915_request *rq,
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0; /* offset */
-		*cs++ = instance;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
 		*cs++ = value;
 		*cs++ = MI_NOOP;
 	} else {
-		GEM_BUG_ON(instance);
+		GEM_BUG_ON(upper_32_bits(offset));
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0;
+		*cs++ = lower_32_bits(offset);
 		*cs++ = value;
 	}
 
@@ -625,17 +730,15 @@ static int emit_clear(struct i915_request *rq,
 		 * and use it as a source.
 		 */
 
-		cs = i915_flush_dw(cs, (u64)instance << 32,
-				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
 		cs = _i915_ctrl_surf_copy_blt(cs,
-					      (u64)instance << 32,
-					      (u64)instance << 32,
+					      offset,
+					      offset,
 					      DIRECT_ACCESS,
 					      INDIRECT_ACCESS,
 					      1, 1,
 					      num_ccs_blks);
-		cs = i915_flush_dw(cs, (u64)instance << 32,
-				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
 	}
 	intel_ring_advance(rq, cs);
 	return 0;
@@ -660,6 +763,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -687,7 +791,11 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
+		offset = 0;
+		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+			offset = CHUNK_SZ;
+
+		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -697,7 +805,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value, is_lmem);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 7/8] drm/i915/migrate: add acceleration support for DG2
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

This is all kinds of awkward since we now have to contend with using 64K
GTT pages when mapping anything in LMEM(including the page-tables
themselves).

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 186 +++++++++++++++++++-----
 1 file changed, 147 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 2076e24e0489..a804c57b61df 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -33,6 +33,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
 	return true;
 }
 
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
+				struct i915_page_table *pt,
+				void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
+	 * we have a correctly setup PDE structure for later use.
+	 */
+	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	GEM_BUG_ON(!pt->is_compact);
+	d->offset += SZ_2M;
+}
+
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
+			       struct i915_page_table *pt,
+			       void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * We are playing tricks here, since the actual pt, from the hw
+	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
+	 * entries, but we are still guaranteed that the physical
+	 * alignment is 64K underneath for the pt, and we are careful
+	 * not to access the space in the void.
+	 */
+	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	d->offset += SZ_64K;
+}
+
 static void insert_pte(struct i915_address_space *vm,
 		       struct i915_page_table *pt,
 		       void *data)
@@ -75,7 +107,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * i.e. within the same non-preemptible window so that we do not switch
 	 * to another migration context that overwrites the PTE.
 	 *
-	 * TODO: Add support for huge LMEM PTEs
+	 * On platforms with HAS_64K_PAGES support we have three windows, and
+	 * dedicate two windows just for mapping lmem pages(smem <-> smem is not
+	 * a thing), since we are forced to use 64K GTT pages underneath which
+	 * requires also modifying the PDE. An alternative might be to instead
+	 * map the PD into the GTT, and then on the fly toggle the 4K/64K mode
+	 * in the PDE from the same batch that also modifies the PTEs.
 	 */
 
 	vm = i915_ppgtt_create(gt,
@@ -108,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
 		 * 4x2 page directories for source/destination.
 		 */
-		sz = 2 * CHUNK_SZ;
+		if (HAS_64K_PAGES(gt->i915))
+			sz = 3 * CHUNK_SZ;
+		else
+			sz = 2 * CHUNK_SZ;
 		d.offset = base + sz;
 
 		/*
 		 * We need another page directory setup so that we can write
 		 * the 8x512 PTE in each chunk.
 		 */
-		sz += (sz >> 12) * sizeof(u64);
+		if (HAS_64K_PAGES(gt->i915))
+			sz += (sz / SZ_2M) * SZ_64K;
+		else
+			sz += (sz >> 12) * sizeof(u64);
 
 		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
 		if (err)
@@ -136,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
+		if (HAS_64K_PAGES(gt->i915)) {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       xehpsdv_insert_pte, &d);
+			d.offset = base + CHUNK_SZ;
+			vm->vm.foreach(&vm->vm,
+				       d.offset,
+				       2 * CHUNK_SZ,
+				       xehpsdv_toggle_pdes, &d);
+		} else {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       insert_pte, &d);
+		}
 	}
 
 	return &vm->vm;
@@ -274,19 +328,38 @@ static int emit_pte(struct i915_request *rq,
 		    u64 offset,
 		    int length)
 {
+	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
-	int total = 0;
+	int pkt, dword_length;
+	u32 total = 0;
+	u32 page_size;
 	u32 *hdr, *cs;
-	int pkt;
 
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
+	page_size = I915_GTT_PAGE_SIZE;
+	dword_length = 0x400;
+
 	/* Compute the page directory offset for the target address range */
-	offset >>= 12;
-	offset *= sizeof(u64);
-	offset += 2 * CHUNK_SZ;
+	if (has_64K_pages) {
+		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
+
+		offset /= SZ_2M;
+		offset *= SZ_64K;
+		offset += 3 * CHUNK_SZ;
+
+		if (is_lmem) {
+			page_size = I915_GTT_PAGE_SIZE_64K;
+			dword_length = 0x40;
+		}
+	} else {
+		offset >>= 12;
+		offset *= sizeof(u64);
+		offset += 2 * CHUNK_SZ;
+	}
+
 	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
@@ -294,7 +367,7 @@ static int emit_pte(struct i915_request *rq,
 		return PTR_ERR(cs);
 
 	/* Pack as many PTE updates as possible into a single MI command */
-	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
 	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 	hdr = cs;
@@ -304,6 +377,8 @@ static int emit_pte(struct i915_request *rq,
 
 	do {
 		if (cs - hdr >= pkt) {
+			int dword_rem;
+
 			*hdr += cs - hdr - 2;
 			*cs++ = MI_NOOP;
 
@@ -315,7 +390,18 @@ static int emit_pte(struct i915_request *rq,
 			if (IS_ERR(cs))
 				return PTR_ERR(cs);
 
-			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+			dword_rem = dword_length;
+			if (has_64K_pages) {
+				if (IS_ALIGNED(total, SZ_2M)) {
+					offset = round_up(offset, SZ_64K);
+				} else {
+					dword_rem = SZ_2M - (total & (SZ_2M - 1));
+					dword_rem /= page_size;
+					dword_rem *= 2;
+				}
+			}
+
+			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
 			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 			hdr = cs;
@@ -324,13 +410,15 @@ static int emit_pte(struct i915_request *rq,
 			*cs++ = upper_32_bits(offset);
 		}
 
+		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
+
 		*cs++ = lower_32_bits(encode | it->dma);
 		*cs++ = upper_32_bits(encode | it->dma);
 
 		offset += 8;
-		total += I915_GTT_PAGE_SIZE;
+		total += page_size;
 
-		it->dma += I915_GTT_PAGE_SIZE;
+		it->dma += page_size;
 		if (it->dma >= it->max) {
 			it->sg = __sg_next(it->sg);
 			if (!it->sg || sg_dma_len(it->sg) == 0)
@@ -361,7 +449,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
-static int emit_copy(struct i915_request *rq, int size)
+static int emit_copy(struct i915_request *rq,
+		     u32 dst_offset, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
@@ -376,31 +465,31 @@ static int emit_copy(struct i915_request *rq, int size)
 		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else if (ver >= 8) {
 		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else {
 		GEM_BUG_ON(instance);
 		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 	}
 
 	intel_ring_advance(rq, cs);
@@ -428,6 +517,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -455,15 +545,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
-			       CHUNK_SZ);
+		src_offset = 0;
+		dst_offset = CHUNK_SZ;
+		if (HAS_64K_PAGES(ce->engine->i915)) {
+			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+			src_offset = 0;
+			dst_offset = 0;
+			if (src_is_lmem)
+				src_offset = CHUNK_SZ;
+			if (dst_is_lmem)
+				dst_offset = 2 * CHUNK_SZ;
+		}
+
+		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
 		}
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       CHUNK_SZ, len);
+			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
 		if (err < len) {
@@ -475,7 +578,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, len);
+		err = emit_copy(rq, dst_offset, src_offset, len);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -573,18 +676,20 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
 }
 
 static int emit_clear(struct i915_request *rq,
+		      u64 offset,
 		      int size,
 		      u32 value,
 		      bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
-	u32 instance = rq->engine->instance;
-	u32 *cs;
 	struct drm_i915_private *i915 = rq->engine->i915;
+	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 num_ccs_blks, ccs_ring_size;
+	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
+	offset += (u64)rq->engine->instance << 32;
+
 	/* Clear flat css only when value is 0 */
 	ccs_ring_size = (is_lmem && !value) ?
 			 calc_ctrl_surf_instr_size(i915, size)
@@ -599,17 +704,17 @@ static int emit_clear(struct i915_request *rq,
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0; /* offset */
-		*cs++ = instance;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
 		*cs++ = value;
 		*cs++ = MI_NOOP;
 	} else {
-		GEM_BUG_ON(instance);
+		GEM_BUG_ON(upper_32_bits(offset));
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0;
+		*cs++ = lower_32_bits(offset);
 		*cs++ = value;
 	}
 
@@ -625,17 +730,15 @@ static int emit_clear(struct i915_request *rq,
 		 * and use it as a source.
 		 */
 
-		cs = i915_flush_dw(cs, (u64)instance << 32,
-				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
 		cs = _i915_ctrl_surf_copy_blt(cs,
-					      (u64)instance << 32,
-					      (u64)instance << 32,
+					      offset,
+					      offset,
 					      DIRECT_ACCESS,
 					      INDIRECT_ACCESS,
 					      1, 1,
 					      num_ccs_blks);
-		cs = i915_flush_dw(cs, (u64)instance << 32,
-				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
 	}
 	intel_ring_advance(rq, cs);
 	return 0;
@@ -660,6 +763,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -687,7 +791,11 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
+		offset = 0;
+		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+			offset = CHUNK_SZ;
+
+		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -697,7 +805,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value, is_lmem);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 8/8] drm/i915/migrate: turn on acceleration for DG2
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 12:24   ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: bob.beckett, Thomas Hellström, adrian.larumbe, dri-devel

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index a804c57b61df..0da27ec808dc 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -242,8 +242,6 @@ int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt)
 
 	memset(m, 0, sizeof(*m));
 
-	return 0;
-
 	ce = pinned_context(gt);
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH v2 8/8] drm/i915/migrate: turn on acceleration for DG2
@ 2021-12-03 12:24   ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 12:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Thomas Hellström, adrian.larumbe, dri-devel

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index a804c57b61df..0da27ec808dc 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -242,8 +242,6 @@ int intel_migrate_init(struct intel_migrate *m, struct intel_gt *gt)
 
 	memset(m, 0, sizeof(*m));
 
-	return 0;
-
 	ce = pinned_context(gt);
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for DG2 accelerated migration/clearing support
  2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
                   ` (8 preceding siblings ...)
  (?)
@ 2021-12-03 14:40 ` Patchwork
  -1 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2021-12-03 14:40 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx

== Series Details ==

Series: DG2 accelerated migration/clearing support
URL   : https://patchwork.freedesktop.org/series/97544/
State : failure

== Summary ==

Applying: drm/i915/migrate: don't check the scratch page
Applying: drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
Applying: drm/i915/gtt: add gtt mappable plumbing
Applying: drm/i915/migrate: fix offset calculation
Applying: drm/i915/migrate: fix length calculation
Applying: drm/i915/selftests: handle object rounding
Applying: drm/i915/migrate: add acceleration support for DG2
error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gt/intel_migrate.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0007 drm/i915/migrate: add acceleration support for DG2
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/8] drm/i915/migrate: don't check the scratch page
  2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 16:38     ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 16:38 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:19 +0000, Matthew Auld wrote:
> The scratch page might not be allocated in LMEM(like on DG2), so instead
> of using that as the deciding factor for where the paging structures
> live, let's just query the pt before mapping it.
> 
Looks good to me.

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 765c6d48fe52..2d3188a398dd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -13,7 +13,6 @@
>  
>  struct insert_pte_data {
>  	u64 offset;
> -	bool is_lmem;
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> @@ -41,7 +40,7 @@ static void insert_pte(struct i915_address_space *vm,
>  	struct insert_pte_data *d = data;
>  
>  	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
> -			d->is_lmem ? PTE_LM : 0);
> +			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>  	d->offset += PAGE_SIZE;
>  }
>  
> @@ -135,7 +134,6 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>  			goto err_vm;
>  
>  		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
> -		d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]);
>  		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
>  	}
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/8] drm/i915/migrate: don't check the scratch page
@ 2021-12-03 16:38     ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 16:38 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:19 +0000, Matthew Auld wrote:
> The scratch page might not be allocated in LMEM(like on DG2), so instead
> of using that as the deciding factor for where the paging structures
> live, let's just query the pt before mapping it.
> 
Looks good to me.

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 765c6d48fe52..2d3188a398dd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -13,7 +13,6 @@
>  
>  struct insert_pte_data {
>  	u64 offset;
> -	bool is_lmem;
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> @@ -41,7 +40,7 @@ static void insert_pte(struct i915_address_space *vm,
>  	struct insert_pte_data *d = data;
>  
>  	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE,
> -			d->is_lmem ? PTE_LM : 0);
> +			i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0);
>  	d->offset += PAGE_SIZE;
>  }
>  
> @@ -135,7 +134,6 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>  			goto err_vm;
>  
>  		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
> -		d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]);
>  		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
>  	}
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 16:59     ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 16:59 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:20 +0000, Matthew Auld wrote:
> If this is LMEM then we get a 32 entry PT, with each PTE pointing to
> some 64K block of memory, otherwise it's just the usual 512 entry PT.
> This very much assumes the caller knows what they are doing.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
>  1 file changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index bd3ca0996a23..312b2267bf87 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>  		gen8_pdp_for_page_index(vm, idx);
>  	struct i915_page_directory *pd =
>  		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
>  	gen8_pte_t *vaddr;
>  
> -	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
> +	GEM_BUG_ON(pt->is_compact);

Do we have compact PT for smem with 64k pages?

> +
> +	vaddr = px_vaddr(pt);
>  	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>  	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>  }
>  
> +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
> +					    dma_addr_t addr,
> +					    u64 offset,
> +					    enum i915_cache_level level,
> +					    u32 flags)
> +{
> +	u64 idx = offset >> GEN8_PTE_SHIFT;
> +	struct i915_page_directory * const pdp =
> +		gen8_pdp_for_page_index(vm, idx);
> +	struct i915_page_directory *pd =
> +		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
> +	gen8_pte_t *vaddr;
> +
> +	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
> +	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
> +
> +	if (!pt->is_compact) {
> +		vaddr = px_vaddr(pd);
> +		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
> +		pt->is_compact = true;
> +	}
> +
> +	vaddr = px_vaddr(pt);
> +	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
> +}
> +
> +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
> +				       dma_addr_t addr,
> +				       u64 offset,
> +				       enum i915_cache_level level,
> +				       u32 flags)
> +{
> +	if (flags & PTE_LM)
> +		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
> +						       level, flags);
> +
> +	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
Matt,

Is this call for gen8_*** is for insertion of smem PTE entries on the
64K capable platforms like DG2?

Ram

> +}
> +
>  static int gen8_init_scratch(struct i915_address_space *vm)
>  {
>  	u32 pte_flags;
> @@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  
>  	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
>  	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
> -	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
> +	if (HAS_64K_PAGES(gt->i915))
> +		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
> +	else
> +		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
>  	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
>  	ppgtt->vm.clear_range = gen8_ppgtt_clear;
>  	ppgtt->vm.foreach = gen8_ppgtt_foreach;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
@ 2021-12-03 16:59     ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 16:59 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:20 +0000, Matthew Auld wrote:
> If this is LMEM then we get a 32 entry PT, with each PTE pointing to
> some 64K block of memory, otherwise it's just the usual 512 entry PT.
> This very much assumes the caller knows what they are doing.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
>  1 file changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index bd3ca0996a23..312b2267bf87 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>  		gen8_pdp_for_page_index(vm, idx);
>  	struct i915_page_directory *pd =
>  		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
>  	gen8_pte_t *vaddr;
>  
> -	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
> +	GEM_BUG_ON(pt->is_compact);

Do we have compact PT for smem with 64k pages?

> +
> +	vaddr = px_vaddr(pt);
>  	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>  	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>  }
>  
> +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
> +					    dma_addr_t addr,
> +					    u64 offset,
> +					    enum i915_cache_level level,
> +					    u32 flags)
> +{
> +	u64 idx = offset >> GEN8_PTE_SHIFT;
> +	struct i915_page_directory * const pdp =
> +		gen8_pdp_for_page_index(vm, idx);
> +	struct i915_page_directory *pd =
> +		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
> +	gen8_pte_t *vaddr;
> +
> +	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
> +	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
> +
> +	if (!pt->is_compact) {
> +		vaddr = px_vaddr(pd);
> +		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
> +		pt->is_compact = true;
> +	}
> +
> +	vaddr = px_vaddr(pt);
> +	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
> +}
> +
> +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
> +				       dma_addr_t addr,
> +				       u64 offset,
> +				       enum i915_cache_level level,
> +				       u32 flags)
> +{
> +	if (flags & PTE_LM)
> +		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
> +						       level, flags);
> +
> +	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
Matt,

Is this call for gen8_*** is for insertion of smem PTE entries on the
64K capable platforms like DG2?

Ram

> +}
> +
>  static int gen8_init_scratch(struct i915_address_space *vm)
>  {
>  	u32 pte_flags;
> @@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  
>  	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
>  	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
> -	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
> +	if (HAS_64K_PAGES(gt->i915))
> +		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
> +	else
> +		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
>  	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
>  	ppgtt->vm.clear_range = gen8_ppgtt_clear;
>  	ppgtt->vm.foreach = gen8_ppgtt_foreach;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing
  2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 17:25     ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:25 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:21 +0000, Matthew Auld wrote:
> With object clearing/copying we need to be able to modify the PTEs on
> the fly via some batch buffer, which means we need to be able to map the
> paging structures(or at the very least the PT, but being able to also
> map the PD might also be useful at some point) into the GTT. And since
> the paging structures must reside in LMEM on discrete, we need to ensure
> that these objects have correct physical alignment, as per any min page
> restrictions, like on DG2. This is potentially costly, but this should
> be limited to the special migrate_vm, which only needs to a few fixed
> sized windows.

Matt,

Just a thought. instead of classifying whole ppgtt as VM_GTT_MAPPABLE
and rounding up the pt size to min_page_size,
could we just add size of pt as parameter into i915_vm_alloc_pt_stash
and alloc_pt, which can be used for vm->alloc_pt_dma() instead of
I915_GTT_PAGE_SIZE_4K.

But PT for a smem entries also needs to be 64k aligned to be mapped into
the GTT right? So no advantage of having the pt_stash level physical
alignment..

Any thoughts on this line?

Ram

> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c     |  4 ++--
>  drivers/gpu/drm/i915/gem/selftests/huge_pages.c |  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c            |  2 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c            |  3 ++-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h            |  1 +
>  drivers/gpu/drm/i915/gt/intel_ggtt.c            |  2 +-
>  drivers/gpu/drm/i915/gt/intel_gt.c              |  2 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c             |  7 +++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.h             |  9 +++++++++
>  drivers/gpu/drm/i915/gt/intel_migrate.c         |  4 +++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c           | 17 ++++++++++++-----
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c    |  2 +-
>  drivers/gpu/drm/i915/gvt/scheduler.c            |  2 +-
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  4 ++--
>  14 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ebd775cb1661..b394954726b0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1559,7 +1559,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
>  	} else if (HAS_FULL_PPGTT(i915)) {
>  		struct i915_ppgtt *ppgtt;
>  
> -		ppgtt = i915_ppgtt_create(&i915->gt, 0);
> +		ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>  		if (IS_ERR(ppgtt)) {
>  			drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
>  				PTR_ERR(ppgtt));
> @@ -1742,7 +1742,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
>  	if (args->flags)
>  		return -EINVAL;
>  
> -	ppgtt = i915_ppgtt_create(&i915->gt, 0);
> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index bd8dc1a28022..c1b86c7a4754 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -1764,7 +1764,7 @@ int i915_gem_huge_page_mock_selftests(void)
>  	mkwrite_device_info(dev_priv)->ppgtt_type = INTEL_PPGTT_FULL;
>  	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
>  
> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>  	if (IS_ERR(ppgtt)) {
>  		err = PTR_ERR(ppgtt);
>  		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index c0d149f04949..778472e563aa 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -443,7 +443,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
>  
>  	mutex_init(&ppgtt->flush);
>  
> -	ppgtt_init(&ppgtt->base, gt, 0);
> +	ppgtt_init(&ppgtt->base, gt, 0, 0);
>  	ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
>  	ppgtt->base.vm.top = 1;
>  
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 312b2267bf87..dfca803b4ff1 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -912,6 +912,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
>   *
>   */
>  struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags)
>  {
>  	struct i915_ppgtt *ppgtt;
> @@ -921,7 +922,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  	if (!ppgtt)
>  		return ERR_PTR(-ENOMEM);
>  
> -	ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
> +	ppgtt_init(ppgtt, gt, vm_flags, lmem_pt_obj_flags);
>  	ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
>  	ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
>  
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> index f541d19264b4..c0af12593576 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> @@ -13,6 +13,7 @@ struct intel_gt;
>  enum i915_cache_level;
>  
>  struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags);
>  
>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 47f88f031749..938af60fd32f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -661,7 +661,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>  	struct i915_ppgtt *ppgtt;
>  	int err;
>  
> -	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0);
> +	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 510cda6a163f..991a514a1dc3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -484,7 +484,7 @@ static void intel_gt_fini_scratch(struct intel_gt *gt)
>  static struct i915_address_space *kernel_vm(struct intel_gt *gt)
>  {
>  	if (INTEL_PPGTT(gt->i915) > INTEL_PPGTT_ALIASING)
> -		return &i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY)->vm;
> +		return &i915_ppgtt_create(gt, 0, I915_BO_ALLOC_PM_EARLY)->vm;
>  	else
>  		return i915_vm_get(&gt->ggtt->vm);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 5447615fc6f3..d9bf53dc1d85 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -18,6 +18,13 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
>  {
>  	struct drm_i915_gem_object *obj;
>  
> +	if (vm->vm_flags & I915_VM_GTT_MAPPABLE) {
> +		struct intel_memory_region *mr =
> +			vm->i915->mm.regions[INTEL_REGION_LMEM];
> +
> +		sz = max_t(int, sz, mr->min_page_size);
> +	}
> +
>  	/*
>  	 * To avoid severe over-allocation when dealing with min_page_size
>  	 * restrictions, we override that behaviour here by allowing an object
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index cbc0b5266cb4..eee97b46a1f9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -266,6 +266,13 @@ struct i915_address_space {
>  	u8 pd_shift;
>  	u8 scratch_order;
>  
> +/*
> + * Paging structures are going to accessed via the GTT itself, and therefore
> + * might need special alignment.
> + */
> +#define I915_VM_GTT_MAPPABLE BIT(0)
> +	unsigned long vm_flags;
> +
>  	/* Flags used when creating page-table objects for this vm */
>  	unsigned long lmem_pt_obj_flags;
>  
> @@ -543,6 +550,7 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
>  }
>  
>  void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
> +		unsigned long vm_flags,
>  		unsigned long lmem_pt_obj_flags);
>  
>  int i915_ggtt_probe_hw(struct drm_i915_private *i915);
> @@ -562,6 +570,7 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
>  int i915_ppgtt_init_hw(struct intel_gt *gt);
>  
>  struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags);
>  
>  void i915_ggtt_suspend_vm(struct i915_address_space *vm);
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 2d3188a398dd..d553b76b1168 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -78,7 +78,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>  	 * TODO: Add support for huge LMEM PTEs
>  	 */
>  
> -	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
> +	vm = i915_ppgtt_create(gt,
> +			       I915_VM_GTT_MAPPABLE,
> +			       I915_BO_ALLOC_PM_EARLY);
>  	if (IS_ERR(vm))
>  		return ERR_CAST(vm);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index b8238f5bc8b1..1218024dfd57 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -156,20 +156,25 @@ int i915_ppgtt_init_hw(struct intel_gt *gt)
>  }
>  
>  static struct i915_ppgtt *
> -__ppgtt_create(struct intel_gt *gt, unsigned long lmem_pt_obj_flags)
> +__ppgtt_create(struct intel_gt *gt,
> +	       unsigned long vm_flags,
> +	       unsigned long lmem_pt_obj_flags)
>  {
> -	if (GRAPHICS_VER(gt->i915) < 8)
> +	if (GRAPHICS_VER(gt->i915) < 8) {
> +		WARN_ON_ONCE(vm_flags);
>  		return gen6_ppgtt_create(gt);
> -	else
> -		return gen8_ppgtt_create(gt, lmem_pt_obj_flags);
> +	} else {
> +		return gen8_ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
> +	}
>  }
>  
>  struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags)
>  {
>  	struct i915_ppgtt *ppgtt;
>  
> -	ppgtt = __ppgtt_create(gt, lmem_pt_obj_flags);
> +	ppgtt = __ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
>  	if (IS_ERR(ppgtt))
>  		return ppgtt;
>  
> @@ -301,6 +306,7 @@ int ppgtt_set_pages(struct i915_vma *vma)
>  }
>  
>  void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
> +		unsigned long vm_flags,
>  		unsigned long lmem_pt_obj_flags)
>  {
>  	struct drm_i915_private *i915 = gt->i915;
> @@ -309,6 +315,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>  	ppgtt->vm.i915 = i915;
>  	ppgtt->vm.dma = i915->drm.dev;
>  	ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
> +	ppgtt->vm.vm_flags = vm_flags;
>  	ppgtt->vm.lmem_pt_obj_flags = lmem_pt_obj_flags;
>  
>  	dma_resv_init(&ppgtt->vm._resv);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index e5ad4d5a91c0..8c299189e9cb 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -1600,7 +1600,7 @@ static int igt_reset_evict_ppgtt(void *arg)
>  	if (INTEL_PPGTT(gt->i915) < INTEL_PPGTT_FULL)
>  		return 0;
>  
> -	ppgtt = i915_ppgtt_create(gt, 0);
> +	ppgtt = i915_ppgtt_create(gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index 6c804102528b..d726eee3aba5 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -1386,7 +1386,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
>  	enum intel_engine_id i;
>  	int ret;
>  
> -	ppgtt = i915_ppgtt_create(&i915->gt, I915_BO_ALLOC_PM_EARLY);
> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, I915_BO_ALLOC_PM_EARLY);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index fdb4bf88293b..3bcd2bb85d10 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -155,7 +155,7 @@ static int igt_ppgtt_alloc(void *arg)
>  	if (!HAS_PPGTT(dev_priv))
>  		return 0;
>  
> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> @@ -1083,7 +1083,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>  	if (IS_ERR(file))
>  		return PTR_ERR(file);
>  
> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>  	if (IS_ERR(ppgtt)) {
>  		err = PTR_ERR(ppgtt);
>  		goto out_free;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing
@ 2021-12-03 17:25     ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:25 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:21 +0000, Matthew Auld wrote:
> With object clearing/copying we need to be able to modify the PTEs on
> the fly via some batch buffer, which means we need to be able to map the
> paging structures(or at the very least the PT, but being able to also
> map the PD might also be useful at some point) into the GTT. And since
> the paging structures must reside in LMEM on discrete, we need to ensure
> that these objects have correct physical alignment, as per any min page
> restrictions, like on DG2. This is potentially costly, but this should
> be limited to the special migrate_vm, which only needs to a few fixed
> sized windows.

Matt,

Just a thought. instead of classifying whole ppgtt as VM_GTT_MAPPABLE
and rounding up the pt size to min_page_size,
could we just add size of pt as parameter into i915_vm_alloc_pt_stash
and alloc_pt, which can be used for vm->alloc_pt_dma() instead of
I915_GTT_PAGE_SIZE_4K.

But PT for a smem entries also needs to be 64k aligned to be mapped into
the GTT right? So no advantage of having the pt_stash level physical
alignment..

Any thoughts on this line?

Ram

> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c     |  4 ++--
>  drivers/gpu/drm/i915/gem/selftests/huge_pages.c |  2 +-
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c            |  2 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c            |  3 ++-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.h            |  1 +
>  drivers/gpu/drm/i915/gt/intel_ggtt.c            |  2 +-
>  drivers/gpu/drm/i915/gt/intel_gt.c              |  2 +-
>  drivers/gpu/drm/i915/gt/intel_gtt.c             |  7 +++++++
>  drivers/gpu/drm/i915/gt/intel_gtt.h             |  9 +++++++++
>  drivers/gpu/drm/i915/gt/intel_migrate.c         |  4 +++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c           | 17 ++++++++++++-----
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c    |  2 +-
>  drivers/gpu/drm/i915/gvt/scheduler.c            |  2 +-
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  4 ++--
>  14 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ebd775cb1661..b394954726b0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1559,7 +1559,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
>  	} else if (HAS_FULL_PPGTT(i915)) {
>  		struct i915_ppgtt *ppgtt;
>  
> -		ppgtt = i915_ppgtt_create(&i915->gt, 0);
> +		ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>  		if (IS_ERR(ppgtt)) {
>  			drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
>  				PTR_ERR(ppgtt));
> @@ -1742,7 +1742,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
>  	if (args->flags)
>  		return -EINVAL;
>  
> -	ppgtt = i915_ppgtt_create(&i915->gt, 0);
> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index bd8dc1a28022..c1b86c7a4754 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -1764,7 +1764,7 @@ int i915_gem_huge_page_mock_selftests(void)
>  	mkwrite_device_info(dev_priv)->ppgtt_type = INTEL_PPGTT_FULL;
>  	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
>  
> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>  	if (IS_ERR(ppgtt)) {
>  		err = PTR_ERR(ppgtt);
>  		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index c0d149f04949..778472e563aa 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -443,7 +443,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
>  
>  	mutex_init(&ppgtt->flush);
>  
> -	ppgtt_init(&ppgtt->base, gt, 0);
> +	ppgtt_init(&ppgtt->base, gt, 0, 0);
>  	ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
>  	ppgtt->base.vm.top = 1;
>  
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 312b2267bf87..dfca803b4ff1 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -912,6 +912,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
>   *
>   */
>  struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags)
>  {
>  	struct i915_ppgtt *ppgtt;
> @@ -921,7 +922,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>  	if (!ppgtt)
>  		return ERR_PTR(-ENOMEM);
>  
> -	ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
> +	ppgtt_init(ppgtt, gt, vm_flags, lmem_pt_obj_flags);
>  	ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
>  	ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
>  
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> index f541d19264b4..c0af12593576 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
> @@ -13,6 +13,7 @@ struct intel_gt;
>  enum i915_cache_level;
>  
>  struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags);
>  
>  u64 gen8_ggtt_pte_encode(dma_addr_t addr,
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 47f88f031749..938af60fd32f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -661,7 +661,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>  	struct i915_ppgtt *ppgtt;
>  	int err;
>  
> -	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0);
> +	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 510cda6a163f..991a514a1dc3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -484,7 +484,7 @@ static void intel_gt_fini_scratch(struct intel_gt *gt)
>  static struct i915_address_space *kernel_vm(struct intel_gt *gt)
>  {
>  	if (INTEL_PPGTT(gt->i915) > INTEL_PPGTT_ALIASING)
> -		return &i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY)->vm;
> +		return &i915_ppgtt_create(gt, 0, I915_BO_ALLOC_PM_EARLY)->vm;
>  	else
>  		return i915_vm_get(&gt->ggtt->vm);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 5447615fc6f3..d9bf53dc1d85 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -18,6 +18,13 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
>  {
>  	struct drm_i915_gem_object *obj;
>  
> +	if (vm->vm_flags & I915_VM_GTT_MAPPABLE) {
> +		struct intel_memory_region *mr =
> +			vm->i915->mm.regions[INTEL_REGION_LMEM];
> +
> +		sz = max_t(int, sz, mr->min_page_size);
> +	}
> +
>  	/*
>  	 * To avoid severe over-allocation when dealing with min_page_size
>  	 * restrictions, we override that behaviour here by allowing an object
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index cbc0b5266cb4..eee97b46a1f9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -266,6 +266,13 @@ struct i915_address_space {
>  	u8 pd_shift;
>  	u8 scratch_order;
>  
> +/*
> + * Paging structures are going to accessed via the GTT itself, and therefore
> + * might need special alignment.
> + */
> +#define I915_VM_GTT_MAPPABLE BIT(0)
> +	unsigned long vm_flags;
> +
>  	/* Flags used when creating page-table objects for this vm */
>  	unsigned long lmem_pt_obj_flags;
>  
> @@ -543,6 +550,7 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
>  }
>  
>  void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
> +		unsigned long vm_flags,
>  		unsigned long lmem_pt_obj_flags);
>  
>  int i915_ggtt_probe_hw(struct drm_i915_private *i915);
> @@ -562,6 +570,7 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
>  int i915_ppgtt_init_hw(struct intel_gt *gt);
>  
>  struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags);
>  
>  void i915_ggtt_suspend_vm(struct i915_address_space *vm);
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 2d3188a398dd..d553b76b1168 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -78,7 +78,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>  	 * TODO: Add support for huge LMEM PTEs
>  	 */
>  
> -	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
> +	vm = i915_ppgtt_create(gt,
> +			       I915_VM_GTT_MAPPABLE,
> +			       I915_BO_ALLOC_PM_EARLY);
>  	if (IS_ERR(vm))
>  		return ERR_CAST(vm);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index b8238f5bc8b1..1218024dfd57 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -156,20 +156,25 @@ int i915_ppgtt_init_hw(struct intel_gt *gt)
>  }
>  
>  static struct i915_ppgtt *
> -__ppgtt_create(struct intel_gt *gt, unsigned long lmem_pt_obj_flags)
> +__ppgtt_create(struct intel_gt *gt,
> +	       unsigned long vm_flags,
> +	       unsigned long lmem_pt_obj_flags)
>  {
> -	if (GRAPHICS_VER(gt->i915) < 8)
> +	if (GRAPHICS_VER(gt->i915) < 8) {
> +		WARN_ON_ONCE(vm_flags);
>  		return gen6_ppgtt_create(gt);
> -	else
> -		return gen8_ppgtt_create(gt, lmem_pt_obj_flags);
> +	} else {
> +		return gen8_ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
> +	}
>  }
>  
>  struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
> +				     unsigned long vm_flags,
>  				     unsigned long lmem_pt_obj_flags)
>  {
>  	struct i915_ppgtt *ppgtt;
>  
> -	ppgtt = __ppgtt_create(gt, lmem_pt_obj_flags);
> +	ppgtt = __ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
>  	if (IS_ERR(ppgtt))
>  		return ppgtt;
>  
> @@ -301,6 +306,7 @@ int ppgtt_set_pages(struct i915_vma *vma)
>  }
>  
>  void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
> +		unsigned long vm_flags,
>  		unsigned long lmem_pt_obj_flags)
>  {
>  	struct drm_i915_private *i915 = gt->i915;
> @@ -309,6 +315,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>  	ppgtt->vm.i915 = i915;
>  	ppgtt->vm.dma = i915->drm.dev;
>  	ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
> +	ppgtt->vm.vm_flags = vm_flags;
>  	ppgtt->vm.lmem_pt_obj_flags = lmem_pt_obj_flags;
>  
>  	dma_resv_init(&ppgtt->vm._resv);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index e5ad4d5a91c0..8c299189e9cb 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -1600,7 +1600,7 @@ static int igt_reset_evict_ppgtt(void *arg)
>  	if (INTEL_PPGTT(gt->i915) < INTEL_PPGTT_FULL)
>  		return 0;
>  
> -	ppgtt = i915_ppgtt_create(gt, 0);
> +	ppgtt = i915_ppgtt_create(gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index 6c804102528b..d726eee3aba5 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -1386,7 +1386,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
>  	enum intel_engine_id i;
>  	int ret;
>  
> -	ppgtt = i915_ppgtt_create(&i915->gt, I915_BO_ALLOC_PM_EARLY);
> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, I915_BO_ALLOC_PM_EARLY);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index fdb4bf88293b..3bcd2bb85d10 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -155,7 +155,7 @@ static int igt_ppgtt_alloc(void *arg)
>  	if (!HAS_PPGTT(dev_priv))
>  		return 0;
>  
> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>  	if (IS_ERR(ppgtt))
>  		return PTR_ERR(ppgtt);
>  
> @@ -1083,7 +1083,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>  	if (IS_ERR(file))
>  		return PTR_ERR(file);
>  
> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>  	if (IS_ERR(ppgtt)) {
>  		err = PTR_ERR(ppgtt);
>  		goto out_free;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 4/8] drm/i915/migrate: fix offset calculation
  2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 17:30     ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:30 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:22 +0000, Matthew Auld wrote:
> Ensure we add the engine base only after we calculate the qword offset
> into the PTE window.

So we didn't hit this issue because we were always using the
engine->instance 0!?

Looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index d553b76b1168..cb0bb3b94644 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -284,10 +284,10 @@ static int emit_pte(struct i915_request *rq,
>  	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
>  
>  	/* Compute the page directory offset for the target address range */
> -	offset += (u64)rq->engine->instance << 32;
>  	offset >>= 12;
>  	offset *= sizeof(u64);
>  	offset += 2 * CHUNK_SZ;
> +	offset += (u64)rq->engine->instance << 32;
>  
>  	cs = intel_ring_begin(rq, 6);
>  	if (IS_ERR(cs))
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 4/8] drm/i915/migrate: fix offset calculation
@ 2021-12-03 17:30     ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:30 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:22 +0000, Matthew Auld wrote:
> Ensure we add the engine base only after we calculate the qword offset
> into the PTE window.

So we didn't hit this issue because we were always using the
engine->instance 0!?

Looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index d553b76b1168..cb0bb3b94644 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -284,10 +284,10 @@ static int emit_pte(struct i915_request *rq,
>  	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
>  
>  	/* Compute the page directory offset for the target address range */
> -	offset += (u64)rq->engine->instance << 32;
>  	offset >>= 12;
>  	offset *= sizeof(u64);
>  	offset += 2 * CHUNK_SZ;
> +	offset += (u64)rq->engine->instance << 32;
>  
>  	cs = intel_ring_begin(rq, 6);
>  	if (IS_ERR(cs))
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  2021-12-03 16:59     ` [Intel-gfx] " Ramalingam C
@ 2021-12-03 17:31       ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 17:31 UTC (permalink / raw)
  To: Ramalingam C
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 03/12/2021 16:59, Ramalingam C wrote:
> On 2021-12-03 at 12:24:20 +0000, Matthew Auld wrote:
>> If this is LMEM then we get a 32 entry PT, with each PTE pointing to
>> some 64K block of memory, otherwise it's just the usual 512 entry PT.
>> This very much assumes the caller knows what they are doing.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
>>   1 file changed, 48 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index bd3ca0996a23..312b2267bf87 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>>   		gen8_pdp_for_page_index(vm, idx);
>>   	struct i915_page_directory *pd =
>>   		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
>>   	gen8_pte_t *vaddr;
>>   
>> -	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
>> +	GEM_BUG_ON(pt->is_compact);
> 
> Do we have compact PT for smem with 64k pages?

It's technically possible but we don't bother trying to support it in 
the driver.

> 
>> +
>> +	vaddr = px_vaddr(pt);
>>   	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>>   	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>>   }
>>   
>> +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>> +					    dma_addr_t addr,
>> +					    u64 offset,
>> +					    enum i915_cache_level level,
>> +					    u32 flags)
>> +{
>> +	u64 idx = offset >> GEN8_PTE_SHIFT;
>> +	struct i915_page_directory * const pdp =
>> +		gen8_pdp_for_page_index(vm, idx);
>> +	struct i915_page_directory *pd =
>> +		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
>> +	gen8_pte_t *vaddr;
>> +
>> +	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
>> +	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
>> +
>> +	if (!pt->is_compact) {
>> +		vaddr = px_vaddr(pd);
>> +		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
>> +		pt->is_compact = true;
>> +	}
>> +
>> +	vaddr = px_vaddr(pt);
>> +	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
>> +}
>> +
>> +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>> +				       dma_addr_t addr,
>> +				       u64 offset,
>> +				       enum i915_cache_level level,
>> +				       u32 flags)
>> +{
>> +	if (flags & PTE_LM)
>> +		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
>> +						       level, flags);
>> +
>> +	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
> Matt,
> 
> Is this call for gen8_*** is for insertion of smem PTE entries on the
> 64K capable platforms like DG2?

Yeah, this just falls back to the generic 512 entry layout for the PT.

> 
> Ram
> 
>> +}
>> +
>>   static int gen8_init_scratch(struct i915_address_space *vm)
>>   {
>>   	u32 pte_flags;
>> @@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>   
>>   	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
>>   	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
>> -	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
>> +	if (HAS_64K_PAGES(gt->i915))
>> +		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
>> +	else
>> +		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
>>   	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
>>   	ppgtt->vm.clear_range = gen8_ppgtt_clear;
>>   	ppgtt->vm.foreach = gen8_ppgtt_foreach;
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
@ 2021-12-03 17:31       ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 17:31 UTC (permalink / raw)
  To: Ramalingam C; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 03/12/2021 16:59, Ramalingam C wrote:
> On 2021-12-03 at 12:24:20 +0000, Matthew Auld wrote:
>> If this is LMEM then we get a 32 entry PT, with each PTE pointing to
>> some 64K block of memory, otherwise it's just the usual 512 entry PT.
>> This very much assumes the caller knows what they are doing.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
>>   1 file changed, 48 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index bd3ca0996a23..312b2267bf87 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
>>   		gen8_pdp_for_page_index(vm, idx);
>>   	struct i915_page_directory *pd =
>>   		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
>>   	gen8_pte_t *vaddr;
>>   
>> -	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
>> +	GEM_BUG_ON(pt->is_compact);
> 
> Do we have compact PT for smem with 64k pages?

It's technically possible but we don't bother trying to support it in 
the driver.

> 
>> +
>> +	vaddr = px_vaddr(pt);
>>   	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>>   	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
>>   }
>>   
>> +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
>> +					    dma_addr_t addr,
>> +					    u64 offset,
>> +					    enum i915_cache_level level,
>> +					    u32 flags)
>> +{
>> +	u64 idx = offset >> GEN8_PTE_SHIFT;
>> +	struct i915_page_directory * const pdp =
>> +		gen8_pdp_for_page_index(vm, idx);
>> +	struct i915_page_directory *pd =
>> +		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
>> +	gen8_pte_t *vaddr;
>> +
>> +	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
>> +	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
>> +
>> +	if (!pt->is_compact) {
>> +		vaddr = px_vaddr(pd);
>> +		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
>> +		pt->is_compact = true;
>> +	}
>> +
>> +	vaddr = px_vaddr(pt);
>> +	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
>> +}
>> +
>> +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>> +				       dma_addr_t addr,
>> +				       u64 offset,
>> +				       enum i915_cache_level level,
>> +				       u32 flags)
>> +{
>> +	if (flags & PTE_LM)
>> +		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
>> +						       level, flags);
>> +
>> +	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
> Matt,
> 
> Is this call for gen8_*** is for insertion of smem PTE entries on the
> 64K capable platforms like DG2?

Yeah, this just falls back to the generic 512 entry layout for the PT.

> 
> Ram
> 
>> +}
>> +
>>   static int gen8_init_scratch(struct i915_address_space *vm)
>>   {
>>   	u32 pte_flags;
>> @@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>   
>>   	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
>>   	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
>> -	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
>> +	if (HAS_64K_PAGES(gt->i915))
>> +		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
>> +	else
>> +		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
>>   	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
>>   	ppgtt->vm.clear_range = gen8_ppgtt_clear;
>>   	ppgtt->vm.foreach = gen8_ppgtt_foreach;
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 5/8] drm/i915/migrate: fix length calculation
  2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 17:36     ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:36 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:23 +0000, Matthew Auld wrote:
> No need to insert PTEs for the PTE window itself, also foreach expects a
> length not an end offset, which could be gigantic here with a second
> engine.
> 
Looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index cb0bb3b94644..2076e24e0489 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -136,7 +136,7 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>  			goto err_vm;
>  
>  		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
> -		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
> +		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
>  	}
>  
>  	return &vm->vm;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 5/8] drm/i915/migrate: fix length calculation
@ 2021-12-03 17:36     ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:36 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:23 +0000, Matthew Auld wrote:
> No need to insert PTEs for the PTE window itself, also foreach expects a
> length not an end offset, which could be gigantic here with a second
> engine.
> 
Looks good to me

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index cb0bb3b94644..2076e24e0489 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -136,7 +136,7 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>  			goto err_vm;
>  
>  		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
> -		vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d);
> +		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
>  	}
>  
>  	return &vm->vm;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing
  2021-12-03 17:25     ` [Intel-gfx] " Ramalingam C
@ 2021-12-03 17:38       ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 17:38 UTC (permalink / raw)
  To: Ramalingam C
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 03/12/2021 17:25, Ramalingam C wrote:
> On 2021-12-03 at 12:24:21 +0000, Matthew Auld wrote:
>> With object clearing/copying we need to be able to modify the PTEs on
>> the fly via some batch buffer, which means we need to be able to map the
>> paging structures(or at the very least the PT, but being able to also
>> map the PD might also be useful at some point) into the GTT. And since
>> the paging structures must reside in LMEM on discrete, we need to ensure
>> that these objects have correct physical alignment, as per any min page
>> restrictions, like on DG2. This is potentially costly, but this should
>> be limited to the special migrate_vm, which only needs to a few fixed
>> sized windows.
> 
> Matt,
> 
> Just a thought. instead of classifying whole ppgtt as VM_GTT_MAPPABLE
> and rounding up the pt size to min_page_size,
> could we just add size of pt as parameter into i915_vm_alloc_pt_stash
> and alloc_pt, which can be used for vm->alloc_pt_dma() instead of
> I915_GTT_PAGE_SIZE_4K.
> 
> But PT for a smem entries also needs to be 64k aligned to be mapped into
> the GTT right? So no advantage of having the pt_stash level physical
> alignment..
> 
> Any thoughts on this line?

Yes, this sounds like a good idea. Initially I was worried about stuff 
like gen8_alloc_top_pd() which would skip this, but it looks like we 
only really care about the PT and maybe also the PD having correct 
alignment. Will change.

> 
> Ram
> 
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_context.c     |  4 ++--
>>   drivers/gpu/drm/i915/gem/selftests/huge_pages.c |  2 +-
>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c            |  2 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c            |  3 ++-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h            |  1 +
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c            |  2 +-
>>   drivers/gpu/drm/i915/gt/intel_gt.c              |  2 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c             |  7 +++++++
>>   drivers/gpu/drm/i915/gt/intel_gtt.h             |  9 +++++++++
>>   drivers/gpu/drm/i915/gt/intel_migrate.c         |  4 +++-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c           | 17 ++++++++++++-----
>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c    |  2 +-
>>   drivers/gpu/drm/i915/gvt/scheduler.c            |  2 +-
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  4 ++--
>>   14 files changed, 44 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> index ebd775cb1661..b394954726b0 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> @@ -1559,7 +1559,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
>>   	} else if (HAS_FULL_PPGTT(i915)) {
>>   		struct i915_ppgtt *ppgtt;
>>   
>> -		ppgtt = i915_ppgtt_create(&i915->gt, 0);
>> +		ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>>   		if (IS_ERR(ppgtt)) {
>>   			drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
>>   				PTR_ERR(ppgtt));
>> @@ -1742,7 +1742,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
>>   	if (args->flags)
>>   		return -EINVAL;
>>   
>> -	ppgtt = i915_ppgtt_create(&i915->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index bd8dc1a28022..c1b86c7a4754 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -1764,7 +1764,7 @@ int i915_gem_huge_page_mock_selftests(void)
>>   	mkwrite_device_info(dev_priv)->ppgtt_type = INTEL_PPGTT_FULL;
>>   	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
>>   
>> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>>   	if (IS_ERR(ppgtt)) {
>>   		err = PTR_ERR(ppgtt);
>>   		goto out_unlock;
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> index c0d149f04949..778472e563aa 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> @@ -443,7 +443,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
>>   
>>   	mutex_init(&ppgtt->flush);
>>   
>> -	ppgtt_init(&ppgtt->base, gt, 0);
>> +	ppgtt_init(&ppgtt->base, gt, 0, 0);
>>   	ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
>>   	ppgtt->base.vm.top = 1;
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 312b2267bf87..dfca803b4ff1 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -912,6 +912,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
>>    *
>>    */
>>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags)
>>   {
>>   	struct i915_ppgtt *ppgtt;
>> @@ -921,7 +922,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>   	if (!ppgtt)
>>   		return ERR_PTR(-ENOMEM);
>>   
>> -	ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
>> +	ppgtt_init(ppgtt, gt, vm_flags, lmem_pt_obj_flags);
>>   	ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
>>   	ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> index f541d19264b4..c0af12593576 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> @@ -13,6 +13,7 @@ struct intel_gt;
>>   enum i915_cache_level;
>>   
>>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags);
>>   
>>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index 47f88f031749..938af60fd32f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -661,7 +661,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>>   	struct i915_ppgtt *ppgtt;
>>   	int err;
>>   
>> -	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0);
>> +	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index 510cda6a163f..991a514a1dc3 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -484,7 +484,7 @@ static void intel_gt_fini_scratch(struct intel_gt *gt)
>>   static struct i915_address_space *kernel_vm(struct intel_gt *gt)
>>   {
>>   	if (INTEL_PPGTT(gt->i915) > INTEL_PPGTT_ALIASING)
>> -		return &i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY)->vm;
>> +		return &i915_ppgtt_create(gt, 0, I915_BO_ALLOC_PM_EARLY)->vm;
>>   	else
>>   		return i915_vm_get(&gt->ggtt->vm);
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 5447615fc6f3..d9bf53dc1d85 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -18,6 +18,13 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   
>> +	if (vm->vm_flags & I915_VM_GTT_MAPPABLE) {
>> +		struct intel_memory_region *mr =
>> +			vm->i915->mm.regions[INTEL_REGION_LMEM];
>> +
>> +		sz = max_t(int, sz, mr->min_page_size);
>> +	}
>> +
>>   	/*
>>   	 * To avoid severe over-allocation when dealing with min_page_size
>>   	 * restrictions, we override that behaviour here by allowing an object
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index cbc0b5266cb4..eee97b46a1f9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -266,6 +266,13 @@ struct i915_address_space {
>>   	u8 pd_shift;
>>   	u8 scratch_order;
>>   
>> +/*
>> + * Paging structures are going to accessed via the GTT itself, and therefore
>> + * might need special alignment.
>> + */
>> +#define I915_VM_GTT_MAPPABLE BIT(0)
>> +	unsigned long vm_flags;
>> +
>>   	/* Flags used when creating page-table objects for this vm */
>>   	unsigned long lmem_pt_obj_flags;
>>   
>> @@ -543,6 +550,7 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
>>   }
>>   
>>   void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>> +		unsigned long vm_flags,
>>   		unsigned long lmem_pt_obj_flags);
>>   
>>   int i915_ggtt_probe_hw(struct drm_i915_private *i915);
>> @@ -562,6 +570,7 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
>>   int i915_ppgtt_init_hw(struct intel_gt *gt);
>>   
>>   struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags);
>>   
>>   void i915_ggtt_suspend_vm(struct i915_address_space *vm);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index 2d3188a398dd..d553b76b1168 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -78,7 +78,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>>   	 * TODO: Add support for huge LMEM PTEs
>>   	 */
>>   
>> -	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
>> +	vm = i915_ppgtt_create(gt,
>> +			       I915_VM_GTT_MAPPABLE,
>> +			       I915_BO_ALLOC_PM_EARLY);
>>   	if (IS_ERR(vm))
>>   		return ERR_CAST(vm);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index b8238f5bc8b1..1218024dfd57 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -156,20 +156,25 @@ int i915_ppgtt_init_hw(struct intel_gt *gt)
>>   }
>>   
>>   static struct i915_ppgtt *
>> -__ppgtt_create(struct intel_gt *gt, unsigned long lmem_pt_obj_flags)
>> +__ppgtt_create(struct intel_gt *gt,
>> +	       unsigned long vm_flags,
>> +	       unsigned long lmem_pt_obj_flags)
>>   {
>> -	if (GRAPHICS_VER(gt->i915) < 8)
>> +	if (GRAPHICS_VER(gt->i915) < 8) {
>> +		WARN_ON_ONCE(vm_flags);
>>   		return gen6_ppgtt_create(gt);
>> -	else
>> -		return gen8_ppgtt_create(gt, lmem_pt_obj_flags);
>> +	} else {
>> +		return gen8_ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
>> +	}
>>   }
>>   
>>   struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags)
>>   {
>>   	struct i915_ppgtt *ppgtt;
>>   
>> -	ppgtt = __ppgtt_create(gt, lmem_pt_obj_flags);
>> +	ppgtt = __ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
>>   	if (IS_ERR(ppgtt))
>>   		return ppgtt;
>>   
>> @@ -301,6 +306,7 @@ int ppgtt_set_pages(struct i915_vma *vma)
>>   }
>>   
>>   void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>> +		unsigned long vm_flags,
>>   		unsigned long lmem_pt_obj_flags)
>>   {
>>   	struct drm_i915_private *i915 = gt->i915;
>> @@ -309,6 +315,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>>   	ppgtt->vm.i915 = i915;
>>   	ppgtt->vm.dma = i915->drm.dev;
>>   	ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
>> +	ppgtt->vm.vm_flags = vm_flags;
>>   	ppgtt->vm.lmem_pt_obj_flags = lmem_pt_obj_flags;
>>   
>>   	dma_resv_init(&ppgtt->vm._resv);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> index e5ad4d5a91c0..8c299189e9cb 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> @@ -1600,7 +1600,7 @@ static int igt_reset_evict_ppgtt(void *arg)
>>   	if (INTEL_PPGTT(gt->i915) < INTEL_PPGTT_FULL)
>>   		return 0;
>>   
>> -	ppgtt = i915_ppgtt_create(gt, 0);
>> +	ppgtt = i915_ppgtt_create(gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
>> index 6c804102528b..d726eee3aba5 100644
>> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
>> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
>> @@ -1386,7 +1386,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
>>   	enum intel_engine_id i;
>>   	int ret;
>>   
>> -	ppgtt = i915_ppgtt_create(&i915->gt, I915_BO_ALLOC_PM_EARLY);
>> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, I915_BO_ALLOC_PM_EARLY);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index fdb4bf88293b..3bcd2bb85d10 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -155,7 +155,7 @@ static int igt_ppgtt_alloc(void *arg)
>>   	if (!HAS_PPGTT(dev_priv))
>>   		return 0;
>>   
>> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> @@ -1083,7 +1083,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>>   	if (IS_ERR(file))
>>   		return PTR_ERR(file);
>>   
>> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>>   	if (IS_ERR(ppgtt)) {
>>   		err = PTR_ERR(ppgtt);
>>   		goto out_free;
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing
@ 2021-12-03 17:38       ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 17:38 UTC (permalink / raw)
  To: Ramalingam C; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 03/12/2021 17:25, Ramalingam C wrote:
> On 2021-12-03 at 12:24:21 +0000, Matthew Auld wrote:
>> With object clearing/copying we need to be able to modify the PTEs on
>> the fly via some batch buffer, which means we need to be able to map the
>> paging structures(or at the very least the PT, but being able to also
>> map the PD might also be useful at some point) into the GTT. And since
>> the paging structures must reside in LMEM on discrete, we need to ensure
>> that these objects have correct physical alignment, as per any min page
>> restrictions, like on DG2. This is potentially costly, but this should
>> be limited to the special migrate_vm, which only needs to a few fixed
>> sized windows.
> 
> Matt,
> 
> Just a thought. instead of classifying whole ppgtt as VM_GTT_MAPPABLE
> and rounding up the pt size to min_page_size,
> could we just add size of pt as parameter into i915_vm_alloc_pt_stash
> and alloc_pt, which can be used for vm->alloc_pt_dma() instead of
> I915_GTT_PAGE_SIZE_4K.
> 
> But PT for a smem entries also needs to be 64k aligned to be mapped into
> the GTT right? So no advantage of having the pt_stash level physical
> alignment..
> 
> Any thoughts on this line?

Yes, this sounds like a good idea. Initially I was worried about stuff 
like gen8_alloc_top_pd() which would skip this, but it looks like we 
only really care about the PT and maybe also the PD having correct 
alignment. Will change.

> 
> Ram
> 
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_context.c     |  4 ++--
>>   drivers/gpu/drm/i915/gem/selftests/huge_pages.c |  2 +-
>>   drivers/gpu/drm/i915/gt/gen6_ppgtt.c            |  2 +-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c            |  3 ++-
>>   drivers/gpu/drm/i915/gt/gen8_ppgtt.h            |  1 +
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c            |  2 +-
>>   drivers/gpu/drm/i915/gt/intel_gt.c              |  2 +-
>>   drivers/gpu/drm/i915/gt/intel_gtt.c             |  7 +++++++
>>   drivers/gpu/drm/i915/gt/intel_gtt.h             |  9 +++++++++
>>   drivers/gpu/drm/i915/gt/intel_migrate.c         |  4 +++-
>>   drivers/gpu/drm/i915/gt/intel_ppgtt.c           | 17 ++++++++++++-----
>>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c    |  2 +-
>>   drivers/gpu/drm/i915/gvt/scheduler.c            |  2 +-
>>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  4 ++--
>>   14 files changed, 44 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> index ebd775cb1661..b394954726b0 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>> @@ -1559,7 +1559,7 @@ i915_gem_create_context(struct drm_i915_private *i915,
>>   	} else if (HAS_FULL_PPGTT(i915)) {
>>   		struct i915_ppgtt *ppgtt;
>>   
>> -		ppgtt = i915_ppgtt_create(&i915->gt, 0);
>> +		ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>>   		if (IS_ERR(ppgtt)) {
>>   			drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
>>   				PTR_ERR(ppgtt));
>> @@ -1742,7 +1742,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
>>   	if (args->flags)
>>   		return -EINVAL;
>>   
>> -	ppgtt = i915_ppgtt_create(&i915->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> index bd8dc1a28022..c1b86c7a4754 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
>> @@ -1764,7 +1764,7 @@ int i915_gem_huge_page_mock_selftests(void)
>>   	mkwrite_device_info(dev_priv)->ppgtt_type = INTEL_PPGTT_FULL;
>>   	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
>>   
>> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>>   	if (IS_ERR(ppgtt)) {
>>   		err = PTR_ERR(ppgtt);
>>   		goto out_unlock;
>> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> index c0d149f04949..778472e563aa 100644
>> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
>> @@ -443,7 +443,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
>>   
>>   	mutex_init(&ppgtt->flush);
>>   
>> -	ppgtt_init(&ppgtt->base, gt, 0);
>> +	ppgtt_init(&ppgtt->base, gt, 0, 0);
>>   	ppgtt->base.vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen6_pte_t));
>>   	ppgtt->base.vm.top = 1;
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> index 312b2267bf87..dfca803b4ff1 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -912,6 +912,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm)
>>    *
>>    */
>>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags)
>>   {
>>   	struct i915_ppgtt *ppgtt;
>> @@ -921,7 +922,7 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>>   	if (!ppgtt)
>>   		return ERR_PTR(-ENOMEM);
>>   
>> -	ppgtt_init(ppgtt, gt, lmem_pt_obj_flags);
>> +	ppgtt_init(ppgtt, gt, vm_flags, lmem_pt_obj_flags);
>>   	ppgtt->vm.top = i915_vm_is_4lvl(&ppgtt->vm) ? 3 : 2;
>>   	ppgtt->vm.pd_shift = ilog2(SZ_4K * SZ_4K / sizeof(gen8_pte_t));
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> index f541d19264b4..c0af12593576 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.h
>> @@ -13,6 +13,7 @@ struct intel_gt;
>>   enum i915_cache_level;
>>   
>>   struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags);
>>   
>>   u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index 47f88f031749..938af60fd32f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -661,7 +661,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt)
>>   	struct i915_ppgtt *ppgtt;
>>   	int err;
>>   
>> -	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0);
>> +	ppgtt = i915_ppgtt_create(ggtt->vm.gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index 510cda6a163f..991a514a1dc3 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -484,7 +484,7 @@ static void intel_gt_fini_scratch(struct intel_gt *gt)
>>   static struct i915_address_space *kernel_vm(struct intel_gt *gt)
>>   {
>>   	if (INTEL_PPGTT(gt->i915) > INTEL_PPGTT_ALIASING)
>> -		return &i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY)->vm;
>> +		return &i915_ppgtt_create(gt, 0, I915_BO_ALLOC_PM_EARLY)->vm;
>>   	else
>>   		return i915_vm_get(&gt->ggtt->vm);
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> index 5447615fc6f3..d9bf53dc1d85 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
>> @@ -18,6 +18,13 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz)
>>   {
>>   	struct drm_i915_gem_object *obj;
>>   
>> +	if (vm->vm_flags & I915_VM_GTT_MAPPABLE) {
>> +		struct intel_memory_region *mr =
>> +			vm->i915->mm.regions[INTEL_REGION_LMEM];
>> +
>> +		sz = max_t(int, sz, mr->min_page_size);
>> +	}
>> +
>>   	/*
>>   	 * To avoid severe over-allocation when dealing with min_page_size
>>   	 * restrictions, we override that behaviour here by allowing an object
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> index cbc0b5266cb4..eee97b46a1f9 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>> @@ -266,6 +266,13 @@ struct i915_address_space {
>>   	u8 pd_shift;
>>   	u8 scratch_order;
>>   
>> +/*
>> + * Paging structures are going to accessed via the GTT itself, and therefore
>> + * might need special alignment.
>> + */
>> +#define I915_VM_GTT_MAPPABLE BIT(0)
>> +	unsigned long vm_flags;
>> +
>>   	/* Flags used when creating page-table objects for this vm */
>>   	unsigned long lmem_pt_obj_flags;
>>   
>> @@ -543,6 +550,7 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
>>   }
>>   
>>   void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>> +		unsigned long vm_flags,
>>   		unsigned long lmem_pt_obj_flags);
>>   
>>   int i915_ggtt_probe_hw(struct drm_i915_private *i915);
>> @@ -562,6 +570,7 @@ static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
>>   int i915_ppgtt_init_hw(struct intel_gt *gt);
>>   
>>   struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags);
>>   
>>   void i915_ggtt_suspend_vm(struct i915_address_space *vm);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index 2d3188a398dd..d553b76b1168 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -78,7 +78,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>>   	 * TODO: Add support for huge LMEM PTEs
>>   	 */
>>   
>> -	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
>> +	vm = i915_ppgtt_create(gt,
>> +			       I915_VM_GTT_MAPPABLE,
>> +			       I915_BO_ALLOC_PM_EARLY);
>>   	if (IS_ERR(vm))
>>   		return ERR_CAST(vm);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> index b8238f5bc8b1..1218024dfd57 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>> @@ -156,20 +156,25 @@ int i915_ppgtt_init_hw(struct intel_gt *gt)
>>   }
>>   
>>   static struct i915_ppgtt *
>> -__ppgtt_create(struct intel_gt *gt, unsigned long lmem_pt_obj_flags)
>> +__ppgtt_create(struct intel_gt *gt,
>> +	       unsigned long vm_flags,
>> +	       unsigned long lmem_pt_obj_flags)
>>   {
>> -	if (GRAPHICS_VER(gt->i915) < 8)
>> +	if (GRAPHICS_VER(gt->i915) < 8) {
>> +		WARN_ON_ONCE(vm_flags);
>>   		return gen6_ppgtt_create(gt);
>> -	else
>> -		return gen8_ppgtt_create(gt, lmem_pt_obj_flags);
>> +	} else {
>> +		return gen8_ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
>> +	}
>>   }
>>   
>>   struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
>> +				     unsigned long vm_flags,
>>   				     unsigned long lmem_pt_obj_flags)
>>   {
>>   	struct i915_ppgtt *ppgtt;
>>   
>> -	ppgtt = __ppgtt_create(gt, lmem_pt_obj_flags);
>> +	ppgtt = __ppgtt_create(gt, vm_flags, lmem_pt_obj_flags);
>>   	if (IS_ERR(ppgtt))
>>   		return ppgtt;
>>   
>> @@ -301,6 +306,7 @@ int ppgtt_set_pages(struct i915_vma *vma)
>>   }
>>   
>>   void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>> +		unsigned long vm_flags,
>>   		unsigned long lmem_pt_obj_flags)
>>   {
>>   	struct drm_i915_private *i915 = gt->i915;
>> @@ -309,6 +315,7 @@ void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
>>   	ppgtt->vm.i915 = i915;
>>   	ppgtt->vm.dma = i915->drm.dev;
>>   	ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size);
>> +	ppgtt->vm.vm_flags = vm_flags;
>>   	ppgtt->vm.lmem_pt_obj_flags = lmem_pt_obj_flags;
>>   
>>   	dma_resv_init(&ppgtt->vm._resv);
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> index e5ad4d5a91c0..8c299189e9cb 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
>> @@ -1600,7 +1600,7 @@ static int igt_reset_evict_ppgtt(void *arg)
>>   	if (INTEL_PPGTT(gt->i915) < INTEL_PPGTT_FULL)
>>   		return 0;
>>   
>> -	ppgtt = i915_ppgtt_create(gt, 0);
>> +	ppgtt = i915_ppgtt_create(gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
>> index 6c804102528b..d726eee3aba5 100644
>> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
>> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
>> @@ -1386,7 +1386,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
>>   	enum intel_engine_id i;
>>   	int ret;
>>   
>> -	ppgtt = i915_ppgtt_create(&i915->gt, I915_BO_ALLOC_PM_EARLY);
>> +	ppgtt = i915_ppgtt_create(&i915->gt, 0, I915_BO_ALLOC_PM_EARLY);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> index fdb4bf88293b..3bcd2bb85d10 100644
>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
>> @@ -155,7 +155,7 @@ static int igt_ppgtt_alloc(void *arg)
>>   	if (!HAS_PPGTT(dev_priv))
>>   		return 0;
>>   
>> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>>   	if (IS_ERR(ppgtt))
>>   		return PTR_ERR(ppgtt);
>>   
>> @@ -1083,7 +1083,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>>   	if (IS_ERR(file))
>>   		return PTR_ERR(file);
>>   
>> -	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0);
>> +	ppgtt = i915_ppgtt_create(&dev_priv->gt, 0, 0);
>>   	if (IS_ERR(ppgtt)) {
>>   		err = PTR_ERR(ppgtt);
>>   		goto out_free;
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 4/8] drm/i915/migrate: fix offset calculation
  2021-12-03 17:30     ` [Intel-gfx] " Ramalingam C
@ 2021-12-03 17:39       ` Matthew Auld
  -1 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 17:39 UTC (permalink / raw)
  To: Ramalingam C
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 03/12/2021 17:30, Ramalingam C wrote:
> On 2021-12-03 at 12:24:22 +0000, Matthew Auld wrote:
>> Ensure we add the engine base only after we calculate the qword offset
>> into the PTE window.
> 
> So we didn't hit this issue because we were always using the
> engine->instance 0!?

Yes, AFAIK.

> 
> Looks good to me
> 
> Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
> 
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index d553b76b1168..cb0bb3b94644 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -284,10 +284,10 @@ static int emit_pte(struct i915_request *rq,
>>   	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
>>   
>>   	/* Compute the page directory offset for the target address range */
>> -	offset += (u64)rq->engine->instance << 32;
>>   	offset >>= 12;
>>   	offset *= sizeof(u64);
>>   	offset += 2 * CHUNK_SZ;
>> +	offset += (u64)rq->engine->instance << 32;
>>   
>>   	cs = intel_ring_begin(rq, 6);
>>   	if (IS_ERR(cs))
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 4/8] drm/i915/migrate: fix offset calculation
@ 2021-12-03 17:39       ` Matthew Auld
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Auld @ 2021-12-03 17:39 UTC (permalink / raw)
  To: Ramalingam C; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 03/12/2021 17:30, Ramalingam C wrote:
> On 2021-12-03 at 12:24:22 +0000, Matthew Auld wrote:
>> Ensure we add the engine base only after we calculate the qword offset
>> into the PTE window.
> 
> So we didn't hit this issue because we were always using the
> engine->instance 0!?

Yes, AFAIK.

> 
> Looks good to me
> 
> Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
> 
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> index d553b76b1168..cb0bb3b94644 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
>> @@ -284,10 +284,10 @@ static int emit_pte(struct i915_request *rq,
>>   	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
>>   
>>   	/* Compute the page directory offset for the target address range */
>> -	offset += (u64)rq->engine->instance << 32;
>>   	offset >>= 12;
>>   	offset *= sizeof(u64);
>>   	offset += 2 * CHUNK_SZ;
>> +	offset += (u64)rq->engine->instance << 32;
>>   
>>   	cs = intel_ring_begin(rq, 6);
>>   	if (IS_ERR(cs))
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 6/8] drm/i915/selftests: handle object rounding
  2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 17:40     ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:40 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:24 +0000, Matthew Auld wrote:
> Ensure we account for any object rounding due to min_page_size
> restrictions.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/selftest_migrate.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index 12ef2837c89b..e21787301bbd 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -49,6 +49,7 @@ static int copy(struct intel_migrate *migrate,
>  	if (IS_ERR(src))
>  		return 0;
>  
> +	sz = src->base.size;
>  	dst = i915_gem_object_create_internal(i915, sz);
>  	if (IS_ERR(dst))
>  		goto err_free_src;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 6/8] drm/i915/selftests: handle object rounding
@ 2021-12-03 17:40     ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:40 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 12:24:24 +0000, Matthew Auld wrote:
> Ensure we account for any object rounding due to min_page_size
> restrictions.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>

> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/selftest_migrate.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index 12ef2837c89b..e21787301bbd 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -49,6 +49,7 @@ static int copy(struct intel_migrate *migrate,
>  	if (IS_ERR(src))
>  		return 0;
>  
> +	sz = src->base.size;
>  	dst = i915_gem_object_create_internal(i915, sz);
>  	if (IS_ERR(dst))
>  		goto err_free_src;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  2021-12-03 17:31       ` [Intel-gfx] " Matthew Auld
@ 2021-12-03 17:45         ` Ramalingam C
  -1 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:45 UTC (permalink / raw)
  To: Matthew Auld
  Cc: bob.beckett, Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 17:31:11 +0000, Matthew Auld wrote:
> On 03/12/2021 16:59, Ramalingam C wrote:
> > On 2021-12-03 at 12:24:20 +0000, Matthew Auld wrote:
> > > If this is LMEM then we get a 32 entry PT, with each PTE pointing to
> > > some 64K block of memory, otherwise it's just the usual 512 entry PT.
> > > This very much assumes the caller knows what they are doing.
> > > 
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: Ramalingam C <ramalingam.c@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
> > >   1 file changed, 48 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > index bd3ca0996a23..312b2267bf87 100644
> > > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > @@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
> > >   		gen8_pdp_for_page_index(vm, idx);
> > >   	struct i915_page_directory *pd =
> > >   		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> > > +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
> > >   	gen8_pte_t *vaddr;
> > > -	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
> > > +	GEM_BUG_ON(pt->is_compact);
> > 
> > Do we have compact PT for smem with 64k pages?
> 
> It's technically possible but we don't bother trying to support it in the
> driver.
Ok.

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
> 
> > 
> > > +
> > > +	vaddr = px_vaddr(pt);
> > >   	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
> > >   	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
> > >   }
> > > +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
> > > +					    dma_addr_t addr,
> > > +					    u64 offset,
> > > +					    enum i915_cache_level level,
> > > +					    u32 flags)
> > > +{
> > > +	u64 idx = offset >> GEN8_PTE_SHIFT;
> > > +	struct i915_page_directory * const pdp =
> > > +		gen8_pdp_for_page_index(vm, idx);
> > > +	struct i915_page_directory *pd =
> > > +		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> > > +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
> > > +	gen8_pte_t *vaddr;
> > > +
> > > +	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
> > > +	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
> > > +
> > > +	if (!pt->is_compact) {
> > > +		vaddr = px_vaddr(pd);
> > > +		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
> > > +		pt->is_compact = true;
> > > +	}
> > > +
> > > +	vaddr = px_vaddr(pt);
> > > +	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
> > > +}
> > > +
> > > +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
> > > +				       dma_addr_t addr,
> > > +				       u64 offset,
> > > +				       enum i915_cache_level level,
> > > +				       u32 flags)
> > > +{
> > > +	if (flags & PTE_LM)
> > > +		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
> > > +						       level, flags);
> > > +
> > > +	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
> > Matt,
> > 
> > Is this call for gen8_*** is for insertion of smem PTE entries on the
> > 64K capable platforms like DG2?
> 
> Yeah, this just falls back to the generic 512 entry layout for the PT.
> 
> > 
> > Ram
> > 
> > > +}
> > > +
> > >   static int gen8_init_scratch(struct i915_address_space *vm)
> > >   {
> > >   	u32 pte_flags;
> > > @@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> > >   	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
> > >   	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
> > > -	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
> > > +	if (HAS_64K_PAGES(gt->i915))
> > > +		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
> > > +	else
> > > +		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
> > >   	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
> > >   	ppgtt->vm.clear_range = gen8_ppgtt_clear;
> > >   	ppgtt->vm.foreach = gen8_ppgtt_foreach;
> > > -- 
> > > 2.31.1
> > > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
@ 2021-12-03 17:45         ` Ramalingam C
  0 siblings, 0 replies; 39+ messages in thread
From: Ramalingam C @ 2021-12-03 17:45 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Thomas Hellström, intel-gfx, adrian.larumbe, dri-devel

On 2021-12-03 at 17:31:11 +0000, Matthew Auld wrote:
> On 03/12/2021 16:59, Ramalingam C wrote:
> > On 2021-12-03 at 12:24:20 +0000, Matthew Auld wrote:
> > > If this is LMEM then we get a 32 entry PT, with each PTE pointing to
> > > some 64K block of memory, otherwise it's just the usual 512 entry PT.
> > > This very much assumes the caller knows what they are doing.
> > > 
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: Ramalingam C <ramalingam.c@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
> > >   1 file changed, 48 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > index bd3ca0996a23..312b2267bf87 100644
> > > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> > > @@ -728,13 +728,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
> > >   		gen8_pdp_for_page_index(vm, idx);
> > >   	struct i915_page_directory *pd =
> > >   		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> > > +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
> > >   	gen8_pte_t *vaddr;
> > > -	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
> > > +	GEM_BUG_ON(pt->is_compact);
> > 
> > Do we have compact PT for smem with 64k pages?
> 
> It's technically possible but we don't bother trying to support it in the
> driver.
Ok.

Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
> 
> > 
> > > +
> > > +	vaddr = px_vaddr(pt);
> > >   	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
> > >   	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
> > >   }
> > > +static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
> > > +					    dma_addr_t addr,
> > > +					    u64 offset,
> > > +					    enum i915_cache_level level,
> > > +					    u32 flags)
> > > +{
> > > +	u64 idx = offset >> GEN8_PTE_SHIFT;
> > > +	struct i915_page_directory * const pdp =
> > > +		gen8_pdp_for_page_index(vm, idx);
> > > +	struct i915_page_directory *pd =
> > > +		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
> > > +	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
> > > +	gen8_pte_t *vaddr;
> > > +
> > > +	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
> > > +	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
> > > +
> > > +	if (!pt->is_compact) {
> > > +		vaddr = px_vaddr(pd);
> > > +		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
> > > +		pt->is_compact = true;
> > > +	}
> > > +
> > > +	vaddr = px_vaddr(pt);
> > > +	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
> > > +}
> > > +
> > > +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
> > > +				       dma_addr_t addr,
> > > +				       u64 offset,
> > > +				       enum i915_cache_level level,
> > > +				       u32 flags)
> > > +{
> > > +	if (flags & PTE_LM)
> > > +		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
> > > +						       level, flags);
> > > +
> > > +	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
> > Matt,
> > 
> > Is this call for gen8_*** is for insertion of smem PTE entries on the
> > 64K capable platforms like DG2?
> 
> Yeah, this just falls back to the generic 512 entry layout for the PT.
> 
> > 
> > Ram
> > 
> > > +}
> > > +
> > >   static int gen8_init_scratch(struct i915_address_space *vm)
> > >   {
> > >   	u32 pte_flags;
> > > @@ -937,7 +980,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
> > >   	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
> > >   	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
> > > -	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
> > > +	if (HAS_64K_PAGES(gt->i915))
> > > +		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
> > > +	else
> > > +		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
> > >   	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
> > >   	ppgtt->vm.clear_range = gen8_ppgtt_clear;
> > >   	ppgtt->vm.foreach = gen8_ppgtt_foreach;
> > > -- 
> > > 2.31.1
> > > 

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-12-03 17:42 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 12:24 [PATCH v2 0/8] DG2 accelerated migration/clearing support Matthew Auld
2021-12-03 12:24 ` [Intel-gfx] " Matthew Auld
2021-12-03 12:24 ` [PATCH v2 1/8] drm/i915/migrate: don't check the scratch page Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 16:38   ` Ramalingam C
2021-12-03 16:38     ` [Intel-gfx] " Ramalingam C
2021-12-03 12:24 ` [PATCH v2 2/8] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 16:59   ` Ramalingam C
2021-12-03 16:59     ` [Intel-gfx] " Ramalingam C
2021-12-03 17:31     ` Matthew Auld
2021-12-03 17:31       ` [Intel-gfx] " Matthew Auld
2021-12-03 17:45       ` Ramalingam C
2021-12-03 17:45         ` [Intel-gfx] " Ramalingam C
2021-12-03 12:24 ` [PATCH v2 3/8] drm/i915/gtt: add gtt mappable plumbing Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 17:25   ` Ramalingam C
2021-12-03 17:25     ` [Intel-gfx] " Ramalingam C
2021-12-03 17:38     ` Matthew Auld
2021-12-03 17:38       ` [Intel-gfx] " Matthew Auld
2021-12-03 12:24 ` [PATCH v2 4/8] drm/i915/migrate: fix offset calculation Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 17:30   ` Ramalingam C
2021-12-03 17:30     ` [Intel-gfx] " Ramalingam C
2021-12-03 17:39     ` Matthew Auld
2021-12-03 17:39       ` [Intel-gfx] " Matthew Auld
2021-12-03 12:24 ` [PATCH v2 5/8] drm/i915/migrate: fix length calculation Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 17:36   ` Ramalingam C
2021-12-03 17:36     ` [Intel-gfx] " Ramalingam C
2021-12-03 12:24 ` [PATCH v2 6/8] drm/i915/selftests: handle object rounding Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 17:40   ` Ramalingam C
2021-12-03 17:40     ` [Intel-gfx] " Ramalingam C
2021-12-03 12:24 ` [PATCH v2 7/8] drm/i915/migrate: add acceleration support for DG2 Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 12:24 ` [PATCH v2 8/8] drm/i915/migrate: turn on acceleration " Matthew Auld
2021-12-03 12:24   ` [Intel-gfx] " Matthew Auld
2021-12-03 14:40 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for DG2 accelerated migration/clearing support Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.