All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] 48-bit PPGTT
@ 2015-06-10 16:46 Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands Michel Thierry
                   ` (19 more replies)
  0 siblings, 20 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

These are the rebased patches, after Mika's ppgtt clean-up series (and reusing
the macros added). New functions also follow these changes.

In order expand the GPU address space, a 4th level translation is added, the
Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
used, only adding the 4th level changes. I also updated some remaining
variables that were 32b only.

There are 2 hardware workarounds needed to allow correct operation with 48b
addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). I added a
flag (I915_EXEC_SUPPORT_48BADDRESS) that will indicate if a given object can be
allocated outside the first 4 PDPs; if that's the case, insert will use the
DRM_MM_CREATE_TOP flag. I'm also including an igt test for this change.

This feature is only available in BDW and Gen9, requires LRC submission
mode (execlists) and setting i915.enable_ppgtt=3.

Also note that this expanded address space is only available for full PPGTT,
aliasing PPGTT remains 32b.

Michel Thierry (18):
  drm/i915/lrc: Update PDPx registers with lri commands
  drm/i915/gtt: Switch gen8_free_page_tables params
  drm/i915: Remove unnecessary gen8_clamp_pd
  drm/i915/gen8: Make pdp allocation more dynamic
  drm/i915/gen8: Abstract PDP usage
  drm/i915/gen8: Add dynamic page trace events
  drm/i915/gen8: implement alloc/free for 4lvl
  drm/i915/gen8: Add 4 level switching infrastructure and lrc support
  drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
  drm/i915/gen8: Pass sg_iter through pte inserts
  drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  drm/i915/gen8: Initialize PDPs
  drm/i915: Expand error state's address width to 64b
  drm/i915/gen8: Add ppgtt info and debug_dump
  drm/i915: object size needs to be u64
  drm/i915: Check against correct user_size limit in 48b ppgtt mode
  drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  drm/i915/gen8: Flip the 48b switch

 drivers/gpu/drm/i915/i915_debugfs.c        |  18 +-
 drivers/gpu/drm/i915/i915_drv.h            |  12 +-
 drivers/gpu/drm/i915/i915_gem.c            |  24 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  36 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 674 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  66 ++-
 drivers/gpu/drm/i915/i915_gem_userptr.c    |  12 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  17 +-
 drivers/gpu/drm/i915/i915_params.c         |   2 +-
 drivers/gpu/drm/i915/i915_reg.h            |   5 +-
 drivers/gpu/drm/i915/i915_trace.h          |  16 +
 drivers/gpu/drm/i915/intel_lrc.c           |  93 +++-
 include/uapi/drm/i915_drm.h                |   4 +-
 13 files changed, 809 insertions(+), 170 deletions(-)

-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-11 18:04   ` Mika Kuoppala
  2015-06-26 12:46   ` [PATCH v3] " Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params Michel Thierry
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

A safer way to update the PDPx registers is sending lri commands, added
in the ring before the batchbuffer start. Otherwise, the ctx must be idle
before trying to change anything (but the ring-tail) in the ctx image. An
example where the ctx won't be idle is lite-restore.

This patch depends on [1], and has the advantage that it doesn't require
to pre-allocate the top pdps like here [2].

[1] http://mid.gmane.org/1432314314-23530-2-git-send-email-mika.kuoppala@intel.com
[2] http://mid.gmane.org/1432314314-23530-3-git-send-email-mika.kuoppala@intel.com

v2: Combine lri writes (and save 8 commands). (Mika)

Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 43 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 626949a..51c0e06 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1116,13 +1116,56 @@ static int gen9_init_render_ring(struct intel_engine_cs *ring)
 	return init_workarounds_ring(ring);
 }
 
+static int intel_logical_ring_emit_pdps(struct intel_engine_cs *ring,
+					struct intel_context *ctx)
+{
+	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
+	int i, ret;
+
+	ret = intel_logical_ring_begin(ringbuf, ctx, num_lri_cmds * 2 + 2);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
+
+		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_UDW(ring, i));
+		intel_logical_ring_emit(ringbuf, upper_32_bits(pd_daddr));
+		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_LDW(ring, i));
+		intel_logical_ring_emit(ringbuf, lower_32_bits(pd_daddr));
+	}
+
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
 			      struct intel_context *ctx,
 			      u64 offset, unsigned dispatch_flags)
 {
+	struct intel_engine_cs *ring = ringbuf->ring;
 	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
 
+	/* Don't rely in hw updating PDPs, specially in lite-restore.
+	 * Ideally, we should set Force PD Restore in ctx descriptor,
+	 * but we can't. Force Restore would be a second option, but
+	 * it is unsafe in case of lite-restore (because the ctx is
+	 * not idle). */
+	if (ctx->ppgtt &&
+	    (intel_ring_flag(ring) & ctx->ppgtt->pd_dirty_rings)) {
+		ret = intel_logical_ring_emit_pdps(ring, ctx);
+		if (ret)
+			return ret;
+
+		ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(ring);
+	}
+
 	ret = intel_logical_ring_begin(ringbuf, ctx, 4);
 	if (ret)
 		return ret;
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-11 18:05   ` Mika Kuoppala
  2015-06-10 16:46 ` [PATCH v2 03/18] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

After Mika's ppgtt cleanup series, all the other free functions have
drm_device as the first parameter, except this one.

No functional changes.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8f79125..8314e59 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -766,7 +766,8 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_px(ppgtt, pt_vaddr);
 }
 
-static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
+static void gen8_free_page_tables(struct drm_device *dev,
+				  struct i915_page_directory *pd)
 {
 	int i;
 
@@ -792,7 +793,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		gen8_free_page_tables(ppgtt->base.dev,
+				      ppgtt->pdp.page_directory[i]);
 		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 03/18] drm/i915: Remove unnecessary gen8_clamp_pd
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 04/18] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory boundary.

Furthermore, i915_pte_count also restricts to the next page table
boundary.

v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h | 11 -----------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8314e59..d8afda5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1023,7 +1023,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		gen8_pde_t *const page_directory = kmap_px(pd);
 		struct i915_page_table *pt;
-		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_len = length;
 		uint64_t pd_start = start;
 		uint32_t pde;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index ba46374..8ce8894 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -438,17 +438,6 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-/* Clamp length to the next page_directory boundary */
-static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
-{
-	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
-
-	if (next_pd > (start + length))
-		return length;
-
-	return next_pd - start;
-}
-
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 04/18] drm/i915/gen8: Make pdp allocation more dynamic
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (2 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 03/18] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 05/18] drm/i915/gen8: Abstract PDP usage Michel Thierry
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier. The patch also introduces the PML4, ie. the new top level structure
of the page tables.

v2: Renamed  pdp_free to be similar to  pd/pt (unmap_and_free_pdp).

v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.

v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.

v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.

v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and follow
his nomenclature in pdp functions (there is no alloc_pdp yet).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_drv.h     |   7 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 125 ++++++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  41 +++++++++---
 3 files changed, 135 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9adfd12..83b7530 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2436,7 +2436,12 @@ struct drm_i915_cmd_table {
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define USES_PPGTT(dev)		(i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt >= 2)
+#ifdef CONFIG_X86_64
+# define USES_FULL_48BIT_PPGTT(dev)	(i915.enable_ppgtt == 3)
+#else
+# define USES_FULL_48BIT_PPGTT(dev)	false
+#endif
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d8afda5..85e49c5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,13 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 {
 	bool has_aliasing_ppgtt;
 	bool has_full_ppgtt;
+	bool has_full_64bit_ppgtt;
 
 	has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+	has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
+			       (IS_BROADWELL(dev) ||
+				INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
 
 	if (intel_vgpu_active(dev))
 		has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +129,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	if (enable_ppgtt == 2 && has_full_ppgtt)
 		return 2;
 
+	if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+		return 3;
+
 #ifdef CONFIG_INTEL_IOMMU
 	/* Disable ppgtt on SNB if VT-d is on. */
 	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -488,6 +495,45 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 	fill_px(vm->dev, pd, scratch_pde);
 }
 
+static int __pdp_init(struct drm_device *dev,
+		      struct i915_page_directory_pointer *pdp)
+{
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+	pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+				  sizeof(unsigned long),
+				  GFP_KERNEL);
+	if (!pdp->used_pdpes)
+		return -ENOMEM;
+
+	pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
+				      GFP_KERNEL);
+	if (!pdp->page_directory) {
+		kfree(pdp->used_pdpes);
+		/* the PDP might be the statically allocated top level. Keep it
+		 * as clean as possible */
+		pdp->used_pdpes = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void __pdp_fini(struct i915_page_directory_pointer *pdp)
+{
+	kfree(pdp->used_pdpes);
+	kfree(pdp->page_directory);
+	pdp->page_directory = NULL;
+}
+
+static void free_pdp(struct drm_device *dev,
+		     struct i915_page_directory_pointer *pdp)
+{
+	__pdp_fini(pdp);
+	if (USES_FULL_48BIT_PPGTT(dev))
+		kfree(pdp);
+}
+
 #define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
 
 static int alloc_scratch_page(struct i915_address_space *vm)
@@ -739,9 +785,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-			break;
-
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
@@ -789,7 +832,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -798,6 +842,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
+	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
 	cleanup_scratch(vm);
 }
 
@@ -889,8 +934,9 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	struct i915_page_directory *pd;
 	uint64_t temp;
 	uint32_t pdpe;
+	uint32_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
 
-	WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+	WARN_ON(!bitmap_empty(new_pds, pdpes));
 
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		if (pd)
@@ -908,18 +954,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_pds, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
+		       uint32_t pdpes)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+	for (i = 0; i < pdpes; i++)
 		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
@@ -930,23 +977,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
  */
 static
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts)
+					 unsigned long ***new_pts,
+					 uint32_t pdpes)
 {
 	int i;
 	unsigned long *pds;
 	unsigned long **pts;
 
-	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
 	if (!pds)
 		return -ENOMEM;
 
-	pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
+	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
 	if (!pts) {
 		kfree(pds);
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for (i = 0; i < pdpes; i++) {
 		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
 				 sizeof(unsigned long), GFP_KERNEL);
 		if (!pts[i])
@@ -959,7 +1007,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts);
+	free_gen8_temp_bitmaps(pds, pts, pdpes);
 	return -ENOMEM;
 }
 
@@ -984,6 +1032,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
+	uint32_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
 	/* Wrap is never okay since we can only represent 48b, and we don't
@@ -995,7 +1044,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	if (WARN_ON(start + length > ppgtt->base.total))
 		return -ENODEV;
 
-	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
 	if (ret)
 		return ret;
 
@@ -1003,7 +1052,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
 					new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
@@ -1057,7 +1106,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
@@ -1067,10 +1116,10 @@ err_out:
 			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
 	}
 
-	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
@@ -1101,7 +1150,8 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 	/* We allocate temp bitmap for page tables for no gain
 	 * but as this is for init only, lets keep the things simple
 	 */
-	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables,
+				      GEN8_LEGACY_PDPES);
 	if (ret)
 		return ret;
 
@@ -1112,7 +1162,8 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 						0, 1ULL << 32,
 						new_page_dirs);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables,
+			       GEN8_LEGACY_PDPES);
 
 	/* mark all pdps as used, otherwise we won't clean them correctly */
 	bitmap_fill(ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES);
@@ -1132,7 +1183,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	int ret;
 
 	ppgtt->base.start = 0;
-	ppgtt->base.total = 1ULL << 32;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
@@ -1146,17 +1196,36 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	if (hw_wont_flush_pdp_tlbs(ppgtt)) {
-		/* Avoid the tlb flush bug by preallocating
-		 * whole top level pdp structure so it stays
-		 * static even if our va space grows.
-		 */
-		ret = gen8_preallocate_top_level_pdps(ppgtt);
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		ret = __pdp_init(false, &ppgtt->pdp);
+
 		if (ret)
-			return ret;
+			goto clear_scratch;
+
+		ppgtt->base.total = 1ULL << 32;
+		if (hw_wont_flush_pdp_tlbs(ppgtt)) {
+			/* Avoid the tlb flush bug by preallocating
+			 * whole top level pdp structure so it stays
+			 * static even if our va space grows.
+			 * PDP preallocation is only needed in 32-bit mode,
+			 * in 48-bit, there's the one and only PML4.
+			 */
+			ret = gen8_preallocate_top_level_pdps(ppgtt);
+			if (ret)
+				goto clear_pdp;
+		}
+	} else {
+		ppgtt->base.total = 1ULL << 48;
+		return -EPERM; /* Not yet implemented */
 	}
 
 	return 0;
+
+clear_pdp:
+	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
+clear_scratch:
+	cleanup_scratch(&ppgtt->base);
+	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8ce8894..f3135a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
  * PDPE  |  PDE  |  PTE  | offset
  * The difference as compared to normal x86 3 level page table is the PDPEs are
  * programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 |  11:0
+ * PML4E | PDPE  |  PDE  |  PTE  | offset
  */
+#define GEN8_PML4ES_PER_PML4		512
+#define GEN8_PML4E_SHIFT		39
 #define GEN8_PDPE_SHIFT			30
-#define GEN8_PDPE_MASK			0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK			0x1ff
 #define GEN8_PDE_SHIFT			21
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
@@ -98,6 +106,9 @@ typedef uint64_t gen8_pde_t;
 #define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES			I915_PTES(sizeof(gen8_pte_t))
 
+#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+				GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
+
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
 #define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
@@ -235,9 +246,17 @@ struct i915_page_directory {
 };
 
 struct i915_page_directory_pointer {
-	/* struct page *page; */
-	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
-	struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
+	struct i915_page_dma base;
+
+	unsigned long *used_pdpes;
+	struct i915_page_directory **page_directory;
+};
+
+struct i915_pml4 {
+	struct i915_page_dma base;
+
+	DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+	struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
 };
 
 struct i915_address_space {
@@ -335,8 +354,9 @@ struct i915_hw_ppgtt {
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
 	union {
-		struct i915_page_directory_pointer pdp;
-		struct i915_page_directory pd;
+		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
+		struct i915_page_directory_pointer pdp;	/* GEN8+ */
+		struct i915_page_directory pd;		/* GEN6-7 */
 	};
 
 	struct drm_i915_file_private *file_priv;
@@ -430,14 +450,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
-	for (iter = gen8_pdpe_index(start);	\
-	     pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES;	\
+#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b)	\
+	for (iter = gen8_pdpe_index(start); \
+	     pd = (pdp)->page_directory[iter], length > 0 && (iter < b);	\
 	     iter++,				\
 	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 05/18] drm/i915/gen8: Abstract PDP usage
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (3 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 04/18] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 06/18] drm/i915/gen8: Add dynamic page trace events Michel Thierry
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 138 +++++++++++++++++++++++-------------
 1 file changed, 89 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 85e49c5..857e287 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -495,6 +495,25 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 	fill_px(vm->dev, pd, scratch_pde);
 }
 
+/* It's likely we'll map more than one page table at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to pde_encode. The ppgtt is only needed to reuse the kunmap macro. */
+static void gen8_map_pagetable_range(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory *pd,
+				     uint64_t start,
+				     uint64_t length)
+{
+	gen8_pde_t * const page_directory = kmap_px(pd);
+	struct i915_page_table *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		page_directory[pde] = gen8_pde_encode(px_dma(pt),
+						      I915_CACHE_LLC);
+
+	kunmap_px(ppgtt, page_directory);
+}
+
 static int __pdp_init(struct drm_device *dev,
 		      struct i915_page_directory_pointer *pdp)
 {
@@ -721,6 +740,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_pte_t *pt_vaddr, scratch_pte;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -735,10 +755,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		struct i915_page_directory *pd;
 		struct i915_page_table *pt;
 
-		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+		if (WARN_ON(!pdp->page_directory[pdpe]))
 			continue;
 
-		pd = ppgtt->pdp.page_directory[pdpe];
+		pd = pdp->page_directory[pdpe];
 
 		if (WARN_ON(!pd->page_table[pde]))
 			continue;
@@ -776,6 +796,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -786,7 +807,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_directory *pd = pdp->page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
 			pt_vaddr = kmap_px(pt);
 		}
@@ -832,23 +853,28 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+				continue;
 
-		gen8_free_page_tables(ppgtt->base.dev,
-				      ppgtt->pdp.page_directory[i]);
-		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
+			gen8_free_page_tables(ppgtt->base.dev,
+					      ppgtt->pdp.page_directory[i]);
+			free_pd(ppgtt->base.dev,
+				ppgtt->pdp.page_directory[i]);
+		}
+		free_pdp(ppgtt->base.dev, &ppgtt->pdp);
+	} else {
+		WARN_ON(1); /* to be implemented later */
 	}
 
-	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
 	cleanup_scratch(vm);
 }
 
 /**
  * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pd:		Page directory for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -864,13 +890,15 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 				     struct i915_page_directory *pd,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pts)
 {
-	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_hw_ppgtt *ppgtt =
+	    container_of(vm, struct i915_hw_ppgtt, base);
+	struct drm_device *dev = vm->dev;
 	struct i915_page_table *pt;
 	uint64_t temp;
 	uint32_t pde;
@@ -887,7 +915,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 		if (IS_ERR(pt))
 			goto unwind_out;
 
-		gen8_initialize_pt(&ppgtt->base, pt);
+		gen8_initialize_pt(vm, pt);
 		pd->page_table[pde] = pt;
 		__set_bit(pde, new_pts);
 	}
@@ -903,7 +931,7 @@ unwind_out:
 
 /**
  * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pdp:	Page directory pointer for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -924,17 +952,18 @@ unwind_out:
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
-				     struct i915_page_directory_pointer *pdp,
-				     uint64_t start,
-				     uint64_t length,
-				     unsigned long *new_pds)
+static int
+gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
+				  struct i915_page_directory_pointer *pdp,
+				  uint64_t start,
+				  uint64_t length,
+				  unsigned long *new_pds)
 {
-	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	uint64_t temp;
 	uint32_t pdpe;
-	uint32_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
+	uint32_t pdpes =  I915_PDPES_PER_PDP(vm->dev);
 
 	WARN_ON(!bitmap_empty(new_pds, pdpes));
 
@@ -946,7 +975,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 		if (IS_ERR(pd))
 			goto unwind_out;
 
-		gen8_initialize_pd(&ppgtt->base, pd);
+		gen8_initialize_pd(vm, pd);
 		pdp->page_directory[pdpe] = pd;
 		__set_bit(pdpe, new_pds);
 	}
@@ -1020,13 +1049,15 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start,
-			       uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+				    struct i915_page_directory_pointer *pdp,
+				    uint64_t start,
+				    uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	unsigned long *new_page_dirs, **new_page_tables;
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -1049,16 +1080,15 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return ret;
 
 	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
-					new_page_dirs);
+	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
+						new_page_dirs);
 	if (ret) {
 		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
-	/* For every page directory referenced, allocate page tables */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
 						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
@@ -1067,10 +1097,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	start = orig_start;
 	length = orig_length;
 
-	/* Allocations have completed successfully, so set the bitmaps, and do
-	 * the mappings. */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_pde_t *const page_directory = kmap_px(pd);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		struct i915_page_table *pt;
 		uint64_t pd_len = length;
 		uint64_t pd_start = start;
@@ -1092,18 +1119,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 
 			/* Our pde is now pointing to the pagetable, pt */
 			__set_bit(pde, pd->used_pdes);
-
-			/* Map the PDE to the page table */
-			page_directory[pde] = gen8_pde_encode(px_dma(pt),
-							      I915_CACHE_LLC);
-
-			/* NB: We haven't yet mapped ptes to pages. At this
-			 * point we're still relying on insert_entries() */
 		}
 
-		kunmap_px(ppgtt, page_directory);
-
-		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		__set_bit(pdpe, pdp->used_pdpes);
+		gen8_map_pagetable_range(ppgtt, pd, start, length);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1113,17 +1132,38 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
-			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
+			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
-		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
+		free_pd(dev, pdp->page_directory[pdpe]);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
 
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+				    struct i915_pml4 *pml4,
+				    uint64_t start,
+				    uint64_t length)
+{
+	WARN_ON(1); /* to be implemented later */
+	return 0;
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev))
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+	else
+		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
 /* With some architectures and 32bit legacy mode, hardware pre-loads the
  * top level pdps but the tlb invalidation only invalidates the lower levels.
  * This might lead to hw fetching with stale pdp entries if top level
@@ -1158,7 +1198,7 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 	/* Allocate for all pdps regardless of how the ppgtt
 	 * was defined.
 	 */
-	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp,
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->base, &ppgtt->pdp,
 						0, 1ULL << 32,
 						new_page_dirs);
 
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 06/18] drm/i915/gen8: Add dynamic page trace events
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (4 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 05/18] drm/i915/gen8: Abstract PDP usage Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 07/18] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.

v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  9 ++++++++-
 drivers/gpu/drm/i915/i915_trace.h   | 16 ++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 857e287..736523c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -507,9 +507,14 @@ static void gen8_map_pagetable_range(struct i915_hw_ppgtt *ppgtt,
 	struct i915_page_table *pt;
 	uint64_t temp, pde;
 
-	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
 		page_directory[pde] = gen8_pde_encode(px_dma(pt),
 						      I915_CACHE_LLC);
+		trace_i915_page_table_entry_map(&ppgtt->base, pde, pt,
+						gen8_pte_index(start),
+						gen8_pte_count(start, length),
+						GEN8_PTES);
+	}
 
 	kunmap_px(ppgtt, page_directory);
 }
@@ -918,6 +923,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 		gen8_initialize_pt(vm, pt);
 		pd->page_table[pde] = pt;
 		__set_bit(pde, new_pts);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
 	}
 
 	return 0;
@@ -978,6 +984,7 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 		gen8_initialize_pd(vm, pd);
 		pdp->page_directory[pdpe] = pd;
 		__set_bit(pdpe, new_pds);
+		trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 497cba5..7f68ec3 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -213,6 +213,22 @@ DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
 	     TP_ARGS(vm, pde, start, pde_shift)
 );
 
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+		   TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+		   TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_pointer_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+		   TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+		   TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 07/18] drm/i915/gen8: implement alloc/free for 4lvl
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (5 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 06/18] drm/i915/gen8: Add dynamic page trace events Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 08/18] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.

The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.

v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.

v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).

v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.

v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.

v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)

v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.

v8: Change location of pml4_init/fini. It will make next patches
cleaner.

v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 161 ++++++++++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 ++-
 2 files changed, 145 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 736523c..da1a964 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -550,12 +550,44 @@ static void __pdp_fini(struct i915_page_directory_pointer *pdp)
 	pdp->page_directory = NULL;
 }
 
+static struct
+i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
+{
+	struct i915_page_directory_pointer *pdp;
+	int ret = -ENOMEM;
+
+	WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+	pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
+	if (!pdp)
+		return ERR_PTR(-ENOMEM);
+
+	ret = __pdp_init(dev, pdp);
+	if (ret)
+		goto fail_bitmap;
+
+	ret = setup_px(dev, pdp);
+	if (ret)
+		goto fail_page_m;
+
+	return pdp;
+
+fail_page_m:
+	__pdp_fini(pdp);
+fail_bitmap:
+	kfree(pdp);
+
+	return ERR_PTR(ret);
+}
+
 static void free_pdp(struct drm_device *dev,
 		     struct i915_page_directory_pointer *pdp)
 {
 	__pdp_fini(pdp);
-	if (USES_FULL_48BIT_PPGTT(dev))
+	if (USES_FULL_48BIT_PPGTT(dev)) {
+		cleanup_px(dev, pdp);
 		kfree(pdp);
+	}
 }
 
 #define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
@@ -852,28 +884,46 @@ static void gen8_free_page_tables(struct drm_device *dev,
 	}
 }
 
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
+				    struct i915_page_directory_pointer *pdp)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-				continue;
+	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+		if (WARN_ON(!pdp->page_directory[i]))
+			continue;
 
-			gen8_free_page_tables(ppgtt->base.dev,
-					      ppgtt->pdp.page_directory[i]);
-			free_pd(ppgtt->base.dev,
-				ppgtt->pdp.page_directory[i]);
-		}
-		free_pdp(ppgtt->base.dev, &ppgtt->pdp);
-	} else {
-		WARN_ON(1); /* to be implemented later */
+		gen8_free_page_tables(dev, pdp->page_directory[i]);
+		free_pd(dev, pdp->page_directory[i]);
 	}
 
+	free_pdp(dev, pdp);
+}
+
+static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+		if (WARN_ON(!ppgtt->pml4.pdps[i]))
+			continue;
+
+		gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
+	}
+
+	cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
+}
+
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
+	else
+		gen8_ppgtt_cleanup_4lvl(ppgtt);
+
 	cleanup_scratch(vm);
 }
 
@@ -1155,8 +1205,62 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 				    uint64_t start,
 				    uint64_t length)
 {
-	WARN_ON(1); /* to be implemented later */
+	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
+	uint64_t temp, pml4e;
+	int ret = 0;
+
+	/* Do the pml4 allocations first, so we don't need to track the newly
+	 * allocated tables below the pdp */
+	bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+	/* The pagedirectory and pagetable allocations are done in the shared 3
+	 * and 4 level code. Just allocate the pdps.
+	 */
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		if (!pdp) {
+			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+			pdp = alloc_pdp(vm->dev);
+			if (IS_ERR(pdp))
+				goto err_out;
+
+			pml4->pdps[pml4e] = pdp;
+			__set_bit(pml4e, new_pdps);
+			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
+						   pml4e << GEN8_PML4E_SHIFT,
+						   GEN8_PML4E_SHIFT);
+		}
+	}
+
+	WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+	     "The allocation has spanned more than 512GB. "
+	     "It is highly likely this is incorrect.");
+
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		WARN_ON(!pdp);
+
+		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		if (ret)
+			goto err_out;
+	}
+
+	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+		  GEN8_PML4ES_PER_PML4);
+
 	return 0;
+
+err_out:
+	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+		gen8_ppgtt_cleanup_3lvl(vm->dev, pml4->pdps[pml4e]);
+
+	return ret;
 }
 
 static int gen8_alloc_va_range(struct i915_address_space *vm,
@@ -1165,10 +1269,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	if (!USES_FULL_48BIT_PPGTT(vm->dev))
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-	else
+	if (USES_FULL_48BIT_PPGTT(vm->dev))
 		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	else
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 /* With some architectures and 32bit legacy mode, hardware pre-loads the
@@ -1243,13 +1347,21 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		ret = __pdp_init(false, &ppgtt->pdp);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
+		if (ret)
+			goto clear_scratch;
 
+		ppgtt->base.total = 1ULL << 48;
+	} else {
+		ret = __pdp_init(false, &ppgtt->pdp);
 		if (ret)
 			goto clear_scratch;
 
 		ppgtt->base.total = 1ULL << 32;
+		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
+							      0, 0,
+							      GEN8_PML4E_SHIFT);
 		if (hw_wont_flush_pdp_tlbs(ppgtt)) {
 			/* Avoid the tlb flush bug by preallocating
 			 * whole top level pdp structure so it stays
@@ -1261,9 +1373,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 			if (ret)
 				goto clear_pdp;
 		}
-	} else {
-		ppgtt->base.total = 1ULL << 48;
-		return -EPERM; /* Not yet implemented */
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f3135a1..b038b86 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -95,6 +95,7 @@ typedef uint64_t gen8_pde_t;
  */
 #define GEN8_PML4ES_PER_PML4		512
 #define GEN8_PML4E_SHIFT		39
+#define GEN8_PML4E_MASK			(GEN8_PML4ES_PER_PML4 - 1)
 #define GEN8_PDPE_SHIFT			30
 /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
  * tables */
@@ -458,6 +459,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter)	\
+	for (iter = gen8_pml4e_index(start);	\
+	     pdp = (pml4)->pdps[iter], length > 0 && iter < GEN8_PML4ES_PER_PML4;	\
+	     iter++,				\
+	     temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
@@ -478,8 +487,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
 
 static inline uint32_t gen8_pml4e_index(uint64_t address)
 {
-	WARN_ON(1); /* For 64B */
-	return 0;
+	return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
 }
 
 static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 08/18] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (6 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 07/18] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 09/18] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.

In LRC, the addressing mode must be specified in every context descriptor.

v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.

v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)

v4: Squashed lrc-specific code and use a macro to set PML4 register.

v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
 drivers/gpu/drm/i915/i915_reg.h     |  5 +++-
 drivers/gpu/drm/i915/intel_lrc.c    | 58 +++++++++++++++++++++++++++----------
 4 files changed, 96 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index da1a964..cbc6aaf 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,9 @@ static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
 	return pde;
 }
 
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
 static gen6_pte_t snb_pte_encode(dma_addr_t addr,
 				 enum i915_cache_level level,
 				 bool valid, u32 unused)
@@ -590,6 +593,35 @@ static void free_pdp(struct drm_device *dev,
 	}
 }
 
+static void
+gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
+			  struct i915_page_directory_pointer *pdp,
+			  struct i915_page_directory *pd,
+			  int index)
+{
+	gen8_ppgtt_pdpe_t *page_directorypo;
+
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		return;
+
+	page_directorypo = kmap_px(pdp);
+	page_directorypo[index] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
+	kunmap_px(ppgtt, page_directorypo);
+}
+
+static void
+gen8_setup_page_directory_pointer(struct i915_hw_ppgtt *ppgtt,
+				  struct i915_pml4 *pml4,
+				  struct i915_page_directory_pointer *pdp,
+				  int index)
+{
+	gen8_ppgtt_pml4e_t *pagemap = kmap_px(pml4);
+
+	WARN_ON(!USES_FULL_48BIT_PPGTT(ppgtt->base.dev));
+	pagemap[index] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
+	kunmap_px(ppgtt, pagemap);
+}
+
 #define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
 
 static int alloc_scratch_page(struct i915_address_space *vm)
@@ -754,8 +786,8 @@ static int gen8_write_pdp(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct intel_engine_cs *ring)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+				 struct intel_engine_cs *ring)
 {
 	int i, ret;
 
@@ -770,6 +802,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+			      struct intel_engine_cs *ring)
+{
+	return gen8_write_pdp(ring, 0, px_dma(&ppgtt->pml4));
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t start,
 				   uint64_t length,
@@ -1180,6 +1218,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 
 		__set_bit(pdpe, pdp->used_pdpes);
 		gen8_map_pagetable_range(ppgtt, pd, start, length);
+		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1249,6 +1288,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
 		if (ret)
 			goto err_out;
+
+		gen8_setup_page_directory_pointer(ppgtt, pml4, pdp, pml4e);
 	}
 
 	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
@@ -1284,7 +1325,7 @@ static bool hw_wont_flush_pdp_tlbs(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 
-	if (GEN8_CTX_ADDRESSING_MODE != LEGACY_32B_CONTEXT)
+	if (GEN8_CTX_ADDRESSING_MODE(dev) != LEGACY_32B_CONTEXT)
 		return false;
 
 	if (IS_GEN8(dev) || IS_GEN9(dev))
@@ -1341,8 +1382,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.unbind_vma = ppgtt_unbind_vma;
 	ppgtt->base.bind_vma = ppgtt_bind_vma;
 
-	ppgtt->switch_mm = gen8_mm_switch;
-
 	ret = setup_scratch(&ppgtt->base);
 	if (ret)
 		return ret;
@@ -1353,12 +1392,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 			goto clear_scratch;
 
 		ppgtt->base.total = 1ULL << 48;
+		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
 		ret = __pdp_init(false, &ppgtt->pdp);
 		if (ret)
 			goto clear_scratch;
 
 		ppgtt->base.total = 1ULL << 32;
+		ppgtt->switch_mm = gen8_legacy_mm_switch;
 		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
 							      0, 0,
 							      GEN8_PML4E_SHIFT);
@@ -1565,8 +1606,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
 	int j;
 
 	for_each_ring(ring, dev_priv, j) {
+		u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_48B : 0;
 		I915_WRITE(RING_MODE_GEN7(ring),
-			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b038b86..5b04211 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -39,6 +39,8 @@ struct drm_i915_file_private;
 typedef uint32_t gen6_pte_t;
 typedef uint64_t gen8_pte_t;
 typedef uint64_t gen8_pde_t;
+typedef uint64_t gen8_ppgtt_pdpe_t;
+typedef uint64_t gen8_ppgtt_pml4e_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 334324b..7f03a09 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1642,6 +1642,7 @@ enum skl_disp_power_wells {
 #define   GFX_REPLAY_MODE		(1<<11)
 #define   GFX_PSMI_GRANULARITY		(1<<10)
 #define   GFX_PPGTT_ENABLE		(1<<9)
+#define   GEN8_GFX_PPGTT_48B		(1<<7)
 
 #define VLV_DISPLAY_BASE 0x180000
 #define VLV_MIPI_BASE VLV_DISPLAY_BASE
@@ -2792,7 +2793,9 @@ enum {
 };
 
 #define GEN8_CTX_ADDRESSING_MODE_SHIFT	3
-#define GEN8_CTX_ADDRESSING_MODE	LEGACY_32B_CONTEXT
+#define GEN8_CTX_ADDRESSING_MODE(dev)	(USES_FULL_48BIT_PPGTT(dev) ?\
+						LEGACY_64B_CONTEXT :\
+						LEGACY_32B_CONTEXT)
 
 /*
  * Overlay regs
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 51c0e06..55ba5a1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -189,6 +189,11 @@
 	reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
 }
 
+#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
+	reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(px_dma(&ppgtt->pml4)); \
+	reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
+}
+
 enum {
 	FAULT_AND_HANG = 0,
 	FAULT_AND_HALT, /* Debug only */
@@ -258,7 +263,7 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE << GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
 	if (IS_GEN8(ctx_obj->base.dev))
 		desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
@@ -329,10 +334,16 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
-	/* True PPGTT with dynamic page allocation: update PDP registers and
-	 * point the unallocated PDPs to the scratch page
-	 */
-	if (ppgtt) {
+	if (ppgtt && USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* True 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored
+		 */
+		ASSIGN_CTX_PML4(ppgtt, reg_state);
+	} else if (ppgtt) {
+		/* True 32b PPGTT with dynamic page allocation: update PDP
+		 * registers and point the unallocated PDPs to the scratch page
+		 */
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
@@ -1156,12 +1167,16 @@ static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
 	 * Ideally, we should set Force PD Restore in ctx descriptor,
 	 * but we can't. Force Restore would be a second option, but
 	 * it is unsafe in case of lite-restore (because the ctx is
-	 * not idle). */
+	 * not idle). PML4 is allocated during ppgtt init so this is
+	 * not needed in 48-bit.*/
 	if (ctx->ppgtt &&
 	    (intel_ring_flag(ring) & ctx->ppgtt->pd_dirty_rings)) {
-		ret = intel_logical_ring_emit_pdps(ring, ctx);
-		if (ret)
-			return ret;
+		if (GEN8_CTX_ADDRESSING_MODE(ring->dev) == LEGACY_32B_CONTEXT){
+			ret = intel_logical_ring_emit_pdps(ring, ctx);
+
+			if (ret)
+				return ret;
+		}
 
 		ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(ring);
 	}
@@ -1805,13 +1820,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
 
-	/* With dynamic page allocation, PDPs may not be allocated at this point,
-	 * Point the unallocated PDPs to the scratch page
-	 */
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored.
+		 */
+		ASSIGN_CTX_PML4(ppgtt, reg_state);
+	} else {
+		/* 32b PPGTT
+		 * PDP*_DESCRIPTOR contains the base address of space supported.
+		 * With dynamic page allocation, PDPs may not be allocated at
+		 * this point. Point the unallocated PDPs to the scratch page
+		 */
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 09/18] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (7 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 08/18] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 10/18] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.

This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception.

v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 51 ++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cbc6aaf..ea20e5a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -808,23 +808,21 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return gen8_write_pdp(ring, 0, px_dma(&ppgtt->pml4));
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
-				   bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
+				       struct i915_page_directory_pointer *pdp,
+				       uint64_t start,
+				       uint64_t length,
+				       gen8_pte_t scratch_pte)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-	gen8_pte_t *pt_vaddr, scratch_pte;
+	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
-	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
-				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
 		struct i915_page_directory *pd;
@@ -864,14 +862,30 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
-				      struct sg_table *pages,
-				      uint64_t start,
-				      enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+				   uint64_t start,
+				   uint64_t length,
+				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+						 I915_CACHE_LLC, use_scratch);
+
+	gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+}
+
+static void
+gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
+			      struct i915_page_directory_pointer *pdp,
+			      struct sg_table *pages,
+			      uint64_t start,
+			      enum i915_cache_level cache_level)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -905,6 +919,19 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_px(ppgtt, pt_vaddr);
 }
 
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+				      struct sg_table *pages,
+				      uint64_t start,
+				      enum i915_cache_level cache_level,
+				      u32 unused)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+}
+
 static void gen8_free_page_tables(struct drm_device *dev,
 				  struct i915_page_directory *pd)
 {
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 10/18] drm/i915/gen8: Pass sg_iter through pte inserts
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (8 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 09/18] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 11/18] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.

v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ea20e5a..1c9f662 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -880,7 +880,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 static void
 gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 			      struct i915_page_directory_pointer *pdp,
-			      struct sg_table *pages,
+			      struct sg_page_iter *sg_iter,
 			      uint64_t start,
 			      enum i915_cache_level cache_level)
 {
@@ -890,11 +890,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
-	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	while (__sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = pdp->page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
@@ -902,7 +901,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 		}
 
 		pt_vaddr[pte] =
-			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+			gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES) {
 			kunmap_px(ppgtt, pt_vaddr);
@@ -928,8 +927,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct sg_page_iter sg_iter;
 
-	gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+	gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
 }
 
 static void gen8_free_page_tables(struct drm_device *dev,
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 11/18] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (9 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 10/18] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 12/18] drm/i915/gen8: Initialize PDPs Michel Thierry
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.

Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.

This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".

v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 51 ++++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 11 ++++++++
 2 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1c9f662..9919c3b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -817,9 +817,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pte_t *pt_vaddr;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
@@ -869,12 +869,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
 	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
 						 I915_CACHE_LLC, use_scratch);
 
-	gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
+					   scratch_pte);
+	} else {
+		uint64_t templ4, pml4e;
+		struct i915_page_directory_pointer *pdp;
+
+		gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+			uint64_t pdp_len = gen8_clamp_pdp(start, length);
+			uint64_t pdp_start = start;
+
+			gen8_ppgtt_clear_pte_range(vm, pdp, pdp_start, pdp_len,
+						   scratch_pte);
+		}
+	}
 }
 
 static void
@@ -887,9 +899,9 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pte_t *pt_vaddr;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 
 	pt_vaddr = NULL;
 
@@ -907,7 +919,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 			kunmap_px(ppgtt, pt_vaddr);
 			pt_vaddr = NULL;
 			if (++pde == I915_PDES) {
-				pdpe++;
+				if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+					break;
 				pde = 0;
 			}
 			pte = 0;
@@ -926,11 +939,25 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	struct sg_page_iter sg_iter;
 
 	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
-	gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
+					      cache_level);
+	} else {
+		struct i915_page_directory_pointer *pdp;
+		uint64_t templ4, pml4e;
+		uint64_t length = (uint64_t)sg_nents(pages->sgl) << PAGE_SHIFT;
+
+		gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+			uint64_t pdp_start = start;
+
+			gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
+						      pdp_start, cache_level);
+		}
+	}
 }
 
 static void gen8_free_page_tables(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 5b04211..1803e91 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -472,6 +472,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
+/* Clamp length to the next page_directory pointer boundary */
+static inline uint64_t gen8_clamp_pdp(uint64_t start, uint64_t length)
+{
+	uint64_t next_pdp = ALIGN(start + 1, 1ULL << GEN8_PML4E_SHIFT);
+
+	if (next_pdp > (start + length))
+		return length;
+
+	return next_pdp - start;
+}
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 12/18] drm/i915/gen8: Initialize PDPs
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (10 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 11/18] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 13/18] drm/i915: Expand error state's address width to 64b Michel Thierry
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pdp before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or  proactive prefetch.

Although the ggtt is always 32-bit, the scratch_pdp will be initialized/destroyed
at the same time as the other scratch pages, to keep it consistent.

v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 53 ++++++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9919c3b..65d0787 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -593,6 +593,27 @@ static void free_pdp(struct drm_device *dev,
 	}
 }
 
+static void gen8_initialize_pdp(struct i915_address_space *vm,
+				struct i915_page_directory_pointer *pdp)
+{
+	gen8_ppgtt_pdpe_t scratch_pdpe;
+
+	scratch_pdpe = gen8_pdpe_encode(px_dma(vm->scratch_pd), I915_CACHE_LLC);
+
+	fill_px(vm->dev, pdp, scratch_pdpe);
+}
+
+static void gen8_initialize_pml4(struct i915_address_space *vm,
+				 struct i915_pml4 *pml4)
+{
+	gen8_ppgtt_pml4e_t scratch_pml4e;
+
+	scratch_pml4e = gen8_pml4e_encode(px_dma(vm->scratch_pdp),
+					  I915_CACHE_LLC);
+
+	fill_px(vm->dev, pml4, scratch_pml4e);
+}
+
 static void
 gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
 			  struct i915_page_directory_pointer *pdp,
@@ -693,12 +714,30 @@ static int setup_scratch_ggtt(struct i915_address_space *vm)
 
 		WARN_ON(px_dma(vm->scratch_pt) == 0);
 		gen8_initialize_pd(vm, vm->scratch_pd);
+
+		/* although scratch_pdp is only needed for 48-bit ppgtt,
+		 * keep it with the other scratch pages for consistency.
+		 */
+		if (USES_FULL_48BIT_PPGTT(dev)) {
+			WARN_ON(vm->scratch_pdp);
+
+			vm->scratch_pdp = alloc_pdp(vm->dev);
+			if (IS_ERR(vm->scratch_pdp)) {
+				ret = PTR_ERR(vm->scratch_pdp);
+				goto err_pdp;
+			}
+
+			WARN_ON(px_dma(vm->scratch_pd) == 0);
+			gen8_initialize_pdp(vm, vm->scratch_pdp);
+		}
 	} else {
 		gen6_initialize_pt(vm, vm->scratch_pt);
 	}
 
 	return 0;
 
+err_pdp:
+	free_pd(vm->dev, vm->scratch_pd);
 err_pd:
 	free_pt(vm->dev, vm->scratch_pt);
 	return ret;
@@ -714,6 +753,7 @@ static int setup_scratch(struct i915_address_space *vm)
 	vm->scratch_page = ggtt_vm->scratch_page;
 	vm->scratch_pt = ggtt_vm->scratch_pt;
 	vm->scratch_pd = ggtt_vm->scratch_pd;
+	vm->scratch_pdp = ggtt_vm->scratch_pdp;
 
 	return 0;
 }
@@ -748,8 +788,12 @@ static void cleanup_scratch_ggtt(struct i915_address_space *vm)
 
 	free_pt(vm->dev, vm->scratch_pt);
 
-	if (INTEL_INFO(vm->dev)->gen >= 8)
+	if (INTEL_INFO(vm->dev)->gen >= 8) {
 		free_pd(vm->dev, vm->scratch_pd);
+
+		if (USES_FULL_48BIT_PPGTT(dev))
+			free_pdp(vm->dev, vm->scratch_pdp);
+	}
 }
 
 static void cleanup_scratch(struct i915_address_space *vm)
@@ -760,6 +804,7 @@ static void cleanup_scratch(struct i915_address_space *vm)
 	vm->scratch_page = NULL;
 	vm->scratch_pt = NULL;
 	vm->scratch_pd = NULL;
+	vm->scratch_pdp = NULL;
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -1316,12 +1361,12 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 	 * and 4 level code. Just allocate the pdps.
 	 */
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-		if (!pdp) {
-			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+		if (!test_bit(pml4e, pml4->used_pml4es)) {
 			pdp = alloc_pdp(vm->dev);
 			if (IS_ERR(pdp))
 				goto err_out;
 
+			gen8_initialize_pdp(vm, pdp);
 			pml4->pdps[pml4e] = pdp;
 			__set_bit(pml4e, new_pdps);
 			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
@@ -1446,6 +1491,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		if (ret)
 			goto clear_scratch;
 
+		gen8_initialize_pml4(&ppgtt->base, &ppgtt->pml4);
+
 		ppgtt->base.total = 1ULL << 48;
 		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1803e91..c0a6487 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -272,6 +272,7 @@ struct i915_address_space {
 	struct i915_page_scratch *scratch_page;
 	struct i915_page_table *scratch_pt;
 	struct i915_page_directory *scratch_pd;
+	struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
 
 	/**
 	 * List of objects currently involved in rendering.
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 13/18] drm/i915: Expand error state's address width to 64b
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (11 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 12/18] drm/i915/gen8: Initialize PDPs Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 14/18] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 17 +++++++++--------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 83b7530..95f59d3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -507,7 +507,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_object {
 			int page_count;
-			u32 gtt_offset;
+			u64 gtt_offset;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -533,7 +533,7 @@ struct drm_i915_error_state {
 		u32 size;
 		u32 name;
 		u32 rseqno[I915_NUM_RINGS], wseqno;
-		u32 gtt_offset;
+		u64 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6f42569..cdbd4c2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -197,7 +197,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x [ ",
+		err_printf(m, "    %016llx %8u %02x %02x [ ",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
@@ -426,7 +426,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				err_printf(m, " (submitted by %s [%d])",
 					   error->ring[i].comm,
 					   error->ring[i].pid);
-			err_printf(m, " --- gtt_offset = 0x%08x\n",
+			err_printf(m, " --- gtt_offset = 0x%016llx\n",
 				   obj->gtt_offset);
 			print_error_obj(m, obj);
 		}
@@ -434,7 +434,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		obj = error->ring[i].wa_batchbuffer;
 		if (obj) {
 			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-				   dev_priv->ring[i].name, obj->gtt_offset);
+				   dev_priv->ring[i].name,
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
@@ -453,14 +454,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
 		if ((obj = error->ring[i].hws_page)) {
 			err_printf(m, "%s --- HW Status = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			offset = 0;
 			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -476,13 +477,13 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ctx)) {
 			err_printf(m, "%s --- HW Context = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 	}
 
 	if ((obj = error->semaphore_obj)) {
-		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
 		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
 				   elt * 4,
@@ -590,7 +591,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
-	u32 reloc_offset;
+	u64 reloc_offset;
 
 	if (src == NULL || src->pages == NULL)
 		return NULL;
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 14/18] drm/i915/gen8: Add ppgtt info and debug_dump
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (12 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 13/18] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 15/18] drm/i915: object size needs to be u64 Michel Thierry
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
 drivers/gpu/drm/i915/i915_gem_gtt.c | 92 +++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 22770aa..1c876cb 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2227,7 +2227,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	struct drm_file *file;
 	int i;
 
 	if (INTEL_INFO(dev)->gen == 6)
@@ -2250,13 +2249,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		ppgtt->debug_dump(ppgtt, m);
 	}
 
-	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-		struct drm_i915_file_private *file_priv = file->driver_priv;
-
-		seq_printf(m, "proc: %s\n",
-			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
-	}
 	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
@@ -2265,6 +2257,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_file *file;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -2276,6 +2269,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	else if (INTEL_INFO(dev)->gen >= 6)
 		gen6_ppgtt_info(m, dev);
 
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+
+		seq_printf(m, "\nproc: %s\n",
+			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)(unsigned long)m);
+	}
+
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 65d0787..4c41d55 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1463,6 +1463,97 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 	return ret;
 }
 
+static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
+			  uint64_t start, uint64_t length,
+			  gen8_pte_t scratch_pte,
+			  struct seq_file *m)
+{
+	struct i915_page_directory *pd;
+	uint64_t temp;
+	uint32_t pdpe;
+
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		struct i915_page_table *pt;
+		uint64_t pd_len = length;
+		uint64_t pd_start = start;
+		uint32_t pde;
+
+		if (!pd)
+			continue;
+
+		if(!test_bit(pdpe, pdp->used_pdpes))
+			continue;
+
+		seq_printf(m, "\tPDPE #%d\n", pdpe);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			uint32_t  pte;
+			gen8_pte_t *pt_vaddr;
+
+			if (!pt)
+				continue;
+
+			pt_vaddr = kmap_px(pt);
+			for (pte = 0; pte < GEN8_PTES; pte+=4) {
+				uint64_t va =
+					(pdpe << GEN8_PDPE_SHIFT) |
+					(pde << GEN8_PDE_SHIFT) |
+					(pte << GEN8_PTE_SHIFT);
+				int i;
+				bool found = false;
+				for (i = 0; i < 4; i++)
+					if (pt_vaddr[pte + i] != scratch_pte)
+						found = true;
+				if (!found)
+					continue;
+
+				seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
+				for (i = 0; i < 4; i++) {
+					if (pt_vaddr[pte + i] != scratch_pte)
+						seq_printf(m, " %llx", pt_vaddr[pte + i]);
+					else
+						seq_puts(m, "  SCRATCH ");
+				}
+				seq_puts(m, "\n");
+			}
+			/* don't use kunmap_px, it could trigger
+			 * an unnecessary flush.
+			 */
+			kunmap_atomic(pt_vaddr);
+		}
+	}
+}
+
+static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
+{
+	struct i915_address_space *vm = &ppgtt->base;
+	uint64_t start = ppgtt->base.start;
+	uint64_t length = ppgtt->base.total;
+	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+						 I915_CACHE_LLC, true);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
+	} else {
+		uint64_t templ4, pml4e;
+		struct i915_pml4 *pml4 = &ppgtt->pml4;
+		struct i915_page_directory_pointer *pdp;
+
+		gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
+			uint64_t pdp_len = length;
+			uint64_t pdp_start = start;
+
+			if (!pdp)
+				continue;
+
+			if (!test_bit(pml4e, pml4->used_pml4es))
+				continue;
+
+			seq_printf(m, "    PML4E #%llu\n", pml4e);
+			gen8_dump_pdp(pdp, pdp_start, pdp_len, scratch_pte, m);
+		}
+	}
+}
+
 /*
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1481,6 +1572,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.unbind_vma = ppgtt_unbind_vma;
 	ppgtt->base.bind_vma = ppgtt_bind_vma;
+	ppgtt->debug_dump = gen8_dump_ppgtt;
 
 	ret = setup_scratch(&ppgtt->base);
 	if (ret)
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 15/18] drm/i915: object size needs to be u64
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (13 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 14/18] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 16/18] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.

Also added a warning for illegal bind with size = 0.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     | 5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 25e375c..35690ef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3670,7 +3670,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 size, fence_size, fence_alignment, unfenced_alignment;
+	u32 fence_alignment, unfenced_alignment;
+	u64 size, fence_size;
 	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	u64 end =
@@ -3729,7 +3730,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	 * attempt to find space.
 	 */
 	if (size > end) {
-		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
+		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%llu\n",
 			  ggtt_view ? ggtt_view->type : 0,
 			  size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4c41d55..320a570 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3433,6 +3433,9 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 	if (WARN_ON(flags == 0))
 		return -EINVAL;
 
+	if (WARN_ON(vma->node.size == 0))
+		return -EINVAL;
+
 	bind_flags = 0;
 	if (flags & PIN_GLOBAL)
 		bind_flags |= GLOBAL_BIND;
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 16/18] drm/i915: Check against correct user_size limit in 48b ppgtt mode
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (14 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 15/18] drm/i915: object size needs to be u64 Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 17:57   ` Chris Wilson
  2015-06-10 16:46 ` [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

GTT is only 32b and its max value is 4GB. In order to allow objects
bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl needs to check
against max 48b range (1ULL << 48).

Whenever possible, read the PPGTT's total instead of the GTT one, this
will be accurate in 32 and 48 bit modes.

v2: Use the default ctx to infer the ppgtt max size (Akash).

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1f4e5a3..9783415 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -789,8 +789,10 @@ int
 i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct drm_i915_gem_userptr *args = data;
 	struct drm_i915_gem_object *obj;
+	struct intel_context *ctx;
 	int ret;
 	u32 handle;
 
@@ -801,8 +803,14 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 	if (offset_in_page(args->user_ptr | args->user_size))
 		return -EINVAL;
 
-	if (args->user_size > dev_priv->gtt.base.total)
-		return -E2BIG;
+	ctx = i915_gem_context_get(file_priv, DEFAULT_CONTEXT_HANDLE);
+	if (ctx->ppgtt) {
+		if (args->user_size > ctx->ppgtt->base.total)
+			return -E2BIG;
+	} else {
+		if (args->user_size > dev_priv->gtt.base.total)
+			return -E2BIG;
+	}
 
 	if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
 		       (char __user *)(unsigned long)args->user_ptr, args->user_size))
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (15 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 16/18] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 18:09   ` Chris Wilson
  2015-06-23 12:21   ` [PATCH v3] " Michel Thierry
  2015-06-10 16:46 ` [PATCH v2 18/18] drm/i915/gen8: Flip the 48b switch Michel Thierry
                   ` (2 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

There are some allocations that must be only referenced by 32bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
be allocated above the 32b address range.

The flag is ignored in 32b PPGTT.

v2: Changed flag logic from neeeds_32b, to supports_48b.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem.c            | 19 ++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 36 +++++++++++++++++++++---------
 include/uapi/drm/i915_drm.h                |  4 +++-
 4 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 95f59d3..d73dddb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2714,6 +2714,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS	(1<<3)
 #define PIN_USER	(1<<4)
 #define PIN_UPDATE	(1<<5)
+#define PIN_FULL_RANGE	(1<<6)
 #define PIN_OFFSET_MASK (~4095)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 35690ef..e6325a4a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3672,6 +3672,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 fence_alignment, unfenced_alignment;
 	u64 size, fence_size;
+	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
+	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
 	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	u64 end =
@@ -3713,6 +3715,19 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 						   obj->tiling_mode,
 						   false);
 		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+		 * limit address to 4GB-1 for objects requiring this wa; for
+		 * others, set alloc flag to TOP.
+		 */
+		if (USES_FULL_48BIT_PPGTT(dev)) {
+			if (flags & PIN_FULL_RANGE) {
+				search_flag = DRM_MM_SEARCH_BELOW;
+				alloc_flag = DRM_MM_CREATE_TOP;
+			} else {
+				end = ((4ULL << GEN8_PDPE_SHIFT) - 1);
+			}
+		}
 	}
 
 	if (alignment == 0)
@@ -3755,8 +3770,8 @@ search_free:
 						  size, alignment,
 						  obj->cache_level,
 						  start, end,
-						  DRM_MM_SEARCH_DEFAULT,
-						  DRM_MM_CREATE_DEFAULT);
+						  search_flag,
+						  alloc_flag);
 	if (ret) {
 		ret = i915_gem_evict_something(dev, vm, size, alignment,
 					       obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bd0e4bd..04af62b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -577,7 +577,8 @@ static bool only_mappable_for_reloc(unsigned int flags)
 static int
 i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 				struct intel_engine_cs *ring,
-				bool *need_reloc)
+				bool *need_reloc,
+				bool support_48baddr)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
@@ -588,6 +589,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 
+	if (support_48baddr)
+		flags |= PIN_FULL_RANGE;
+
 	if (!drm_mm_node_allocated(&vma->node)) {
 		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
 			flags |= PIN_GLOBAL | PIN_MAPPABLE;
@@ -650,7 +654,7 @@ need_reloc_mappable(struct i915_vma *vma)
 }
 
 static bool
-eb_vma_misplaced(struct i915_vma *vma)
+eb_vma_misplaced(struct i915_vma *vma, bool support_48baddr)
 {
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	struct drm_i915_gem_object *obj = vma->obj;
@@ -670,13 +674,18 @@ eb_vma_misplaced(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
 		return !only_mappable_for_reloc(entry->flags);
 
+	if (!support_48baddr &&
+	    vma->node.start >= (1ULL << 32))
+		return true;
+
 	return false;
 }
 
 static int
 i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 			    struct list_head *vmas,
-			    bool *need_relocs)
+			    bool *need_relocs,
+			    bool support_48baddr)
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -737,10 +746,11 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
-			if (eb_vma_misplaced(vma))
+			if (eb_vma_misplaced(vma, support_48baddr))
 				ret = i915_vma_unbind(vma);
 			else
-				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
+				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs,
+								      support_48baddr);
 			if (ret)
 				goto err;
 		}
@@ -750,7 +760,9 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
 			if (drm_mm_node_allocated(&vma->node))
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
+			ret = i915_gem_execbuffer_reserve_vma(vma, ring,
+							      need_relocs,
+							      support_48baddr);
 			if (ret)
 				goto err;
 		}
@@ -780,7 +792,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 	struct drm_i915_gem_relocation_entry *reloc;
 	struct i915_address_space *vm;
 	struct i915_vma *vma;
-	bool need_relocs;
+	bool need_relocs, support_48baddr;
 	int *reloc_offset;
 	int i, total, ret;
 	unsigned count = args->buffer_count;
@@ -861,7 +873,9 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 		goto err;
 
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
+	support_48baddr = (args->flags & I915_EXEC_SUPPORT_48BADDRESS);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs,
+					  support_48baddr);
 	if (ret)
 		goto err;
 
@@ -1411,7 +1425,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	u64 exec_start = args->batch_start_offset;
 	u32 dispatch_flags;
 	int ret;
-	bool need_relocs;
+	bool need_relocs, support_48baddr;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1519,7 +1533,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
+	support_48baddr = (args->flags & I915_EXEC_SUPPORT_48BADDRESS);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs,
+					  support_48baddr);
 	if (ret)
 		goto err;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4851d66..18df34a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -760,7 +760,9 @@ struct drm_i915_gem_execbuffer2 {
 #define I915_EXEC_BSD_RING1		(1<<13)
 #define I915_EXEC_BSD_RING2		(2<<13)
 
-#define __I915_EXEC_UNKNOWN_FLAGS -(1<<15)
+#define I915_EXEC_SUPPORT_48BADDRESS	(1<<15)
+
+#define __I915_EXEC_UNKNOWN_FLAGS -(1<<16)
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2 18/18] drm/i915/gen8: Flip the 48b switch
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (16 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-06-10 16:46 ` [PATCH v2] tests/gem_ppgtt: Check Wa32bitOffsets workarounds Michel Thierry
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

Use 48b addresses if hw supports it and i915.enable_ppgtt=3.

Note, aliasing PPGTT remains 32b only.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 drivers/gpu/drm/i915/i915_params.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 320a570..a189b4a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -110,7 +110,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
 	has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
 			       (IS_BROADWELL(dev) ||
-				INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
+				INTEL_INFO(dev)->gen >= 9);
 
 	if (intel_vgpu_active(dev))
 		has_full_ppgtt = false; /* emulation is too hard */
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 8ac5a1b..743eefa 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -116,7 +116,7 @@ MODULE_PARM_DESC(enable_hangcheck,
 module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
 MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
-	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
 
 module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
 MODULE_PARM_DESC(enable_execlists,
-- 
2.4.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v2] tests/gem_ppgtt: Check Wa32bitOffsets workarounds
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (17 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2 18/18] drm/i915/gen8: Flip the 48b switch Michel Thierry
@ 2015-06-10 16:46 ` Michel Thierry
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
  19 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-10 16:46 UTC (permalink / raw)
  To: intel-gfx

Test I915_EXEC_SUPPORTS_48BADDRESS flag to use 32b+ segment.
Driver will try to use lower PDPs of each PPGTT for the objects
requiring Wa32bitGeneralStateOffset or Wa32bitInstructionBaseOffset.

v2: Add flink cases, (suggested by Daniel Vetter).

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 tests/gem_ppgtt.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 147 insertions(+), 5 deletions(-)

diff --git a/tests/gem_ppgtt.c b/tests/gem_ppgtt.c
index d1e484a..2471cf6 100644
--- a/tests/gem_ppgtt.c
+++ b/tests/gem_ppgtt.c
@@ -47,8 +47,18 @@
 #define STRIDE (WIDTH*4)
 #define HEIGHT 512
 #define SIZE (HEIGHT*STRIDE)
+#define K (1024l)
+#define M (1024l * K)
+#define G (1024l * M)
+#define LOCAL_EXEC_SUPPORT_48BADDRESS (1<<15)
 
-static bool uses_full_ppgtt(int fd)
+/*
+ * 0 - No PPGTT
+ * 1 - Aliasing PPGTT
+ * 2 - Full PPGTT (32b)
+ * 3 - Full PPGTT (48b)
+ */
+static bool __uses_full_ppgtt(int fd, int min)
 {
 	struct drm_i915_getparam gp;
 	int val = 0;
@@ -61,7 +71,17 @@ static bool uses_full_ppgtt(int fd)
 		return 0;
 
 	errno = 0;
-	return val > 1;
+	return val >= min;
+}
+
+static bool uses_full_ppgtt(int fd)
+{
+	return __uses_full_ppgtt(fd, 2);
+}
+
+static bool uses_48b_full_ppgtt(int fd)
+{
+	return __uses_full_ppgtt(fd, 3);
 }
 
 static drm_intel_bo *create_bo(drm_intel_bufmgr *bufmgr,
@@ -216,7 +236,7 @@ static void surfaces_check(dri_bo **bo, int count, uint32_t expected)
 	}
 }
 
-static uint64_t exec_and_get_offset(int fd, uint32_t batch)
+static uint64_t exec_and_get_offset(int fd, uint32_t batch, bool needs_32b_addr)
 {
 	struct drm_i915_gem_execbuffer2 execbuf;
 	struct drm_i915_gem_exec_object2 exec[1];
@@ -230,6 +250,7 @@ static uint64_t exec_and_get_offset(int fd, uint32_t batch)
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffers_ptr = (uintptr_t)exec;
 	execbuf.buffer_count = 1;
+	execbuf.flags = (needs_32b_addr) ? 0 : LOCAL_EXEC_SUPPORT_48BADDRESS;
 
 	gem_execbuf(fd, &execbuf);
 	igt_assert_neq(exec[0].offset, -1);
@@ -252,7 +273,7 @@ static void flink_and_close(void)
 	fd2 = drm_open_any();
 
 	flinked_bo = gem_open(fd2, name);
-	offset = exec_and_get_offset(fd2, flinked_bo);
+	offset = exec_and_get_offset(fd2, flinked_bo, 0);
 	gem_sync(fd2, flinked_bo);
 	gem_close(fd2, flinked_bo);
 
@@ -260,7 +281,7 @@ static void flink_and_close(void)
 	 * same size should get the same offset
 	 */
 	new_bo = gem_create(fd2, 4096);
-	offset_new = exec_and_get_offset(fd2, new_bo);
+	offset_new = exec_and_get_offset(fd2, new_bo, 0);
 	gem_close(fd2, new_bo);
 
 	igt_assert_eq(offset, offset_new);
@@ -270,6 +291,124 @@ static void flink_and_close(void)
 	close(fd2);
 }
 
+static bool is_32b(uint64_t offset)
+{
+	return (offset < (1ULL << 32));
+}
+
+static void reusebo_and_compare_offsets(uint32_t fd,
+					uint64_t buf_size)
+{
+	uint32_t bo;
+	uint64_t offset, offset2;
+
+	bo = gem_create(fd, buf_size);
+	/* support 48b addresses */
+	offset = exec_and_get_offset(fd, bo, 0);
+	gem_sync(fd, bo);
+
+	/* require 32b address */
+	offset2 = exec_and_get_offset(fd, bo, 1);
+	gem_sync(fd, bo);
+
+	igt_assert(is_32b(offset2));
+	igt_assert_neq(offset, offset2);
+	gem_close(fd, bo);
+}
+
+static void flinkbo_and_compare_offsets(uint32_t fd, uint32_t fd2,
+					uint64_t buf_size)
+{
+	uint32_t bo, flinked_bo, name;
+	uint64_t offset, offset2;
+
+	bo = gem_create(fd, buf_size);
+	name = gem_flink(fd, bo);
+
+	/* support 48b addresses */
+	offset = exec_and_get_offset(fd, bo, 0);
+	gem_sync(fd, bo);
+
+	/* require 32b address */
+	flinked_bo = gem_open(fd2, name);
+	offset2 = exec_and_get_offset(fd2, flinked_bo, 1);
+	gem_sync(fd2, flinked_bo);
+
+	igt_assert(is_32b(offset2));
+	igt_assert_neq(offset, offset2);
+	gem_close(fd2, flinked_bo);
+	gem_close(fd, bo);
+}
+
+static void createbo_and_compare_offsets(uint32_t fd, uint32_t fd2,
+					 uint64_t buf_size,
+					 bool needs_32b, bool needs_32b2)
+{
+	uint32_t bo, bo2;
+	uint64_t offset, offset2;
+
+	bo = gem_create(fd, buf_size);
+	offset = exec_and_get_offset(fd, bo, needs_32b);
+	gem_sync(fd, bo);
+
+	bo2 = gem_create(fd2, buf_size);
+	offset2 = exec_and_get_offset(fd2, bo2, needs_32b2);
+	gem_sync(fd2, bo2);
+
+	if (needs_32b == needs_32b2)
+		igt_assert_eq(offset, offset2);
+	else
+		igt_assert_neq(offset, offset2);
+
+
+	/* lower PDPs of each PPGTT are reserved for the objects
+	 * requiring this workaround
+	 */
+	if (needs_32b)
+		igt_assert(is_32b(offset));
+
+	if (needs_32b2)
+		igt_assert(is_32b(offset2));
+
+	gem_close(fd, bo);
+	gem_close(fd2, bo2);
+}
+
+static void wa_32b_offset_test(void)
+{
+	uint32_t fd, fd2, gb;
+
+	fd = drm_open_any();
+	igt_require(uses_48b_full_ppgtt(fd));
+
+	intel_require_memory(1, 4*G, CHECK_RAM);
+
+	fd2 = drm_open_any();
+
+	for (gb = 1; gb < 4; gb++) {
+		/* same bo test */
+		reusebo_and_compare_offsets(fd, gb*G);
+
+
+		/* allow full addr range */
+		createbo_and_compare_offsets(fd, fd2, gb*G, 0, 0);
+
+		/* limit 32b addr range */
+		createbo_and_compare_offsets(fd, fd2, gb*G, 1, 1);
+
+		/* mixed */
+		createbo_and_compare_offsets(fd, fd2, gb*G, 0, 1);
+		createbo_and_compare_offsets(fd, fd2, gb*G, 1, 0);
+
+		/* flink obj */
+		flinkbo_and_compare_offsets(fd, fd2, gb*G);
+	}
+
+	close(fd);
+	close(fd2);
+}
+
+
 #define N_CHILD 8
 int main(int argc, char **argv)
 {
@@ -302,5 +441,8 @@ int main(int argc, char **argv)
 	igt_subtest("flink-and-close-vma-leak")
 		flink_and_close();
 
+	igt_subtest("wa-32b-offset-test")
+		wa_32b_offset_test();
+
 	igt_exit();
 }
-- 
2.3.6

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 16/18] drm/i915: Check against correct user_size limit in 48b ppgtt mode
  2015-06-10 16:46 ` [PATCH v2 16/18] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
@ 2015-06-10 17:57   ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-06-10 17:57 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx, Akash Goel

On Wed, Jun 10, 2015 at 05:46:53PM +0100, Michel Thierry wrote:
> GTT is only 32b and its max value is 4GB. In order to allow objects
> bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl needs to check
> against max 48b range (1ULL << 48).
> 
> Whenever possible, read the PPGTT's total instead of the GTT one, this
> will be accurate in 32 and 48 bit modes.

Just kill the limit. It is only there for early detection of an error
when it is used for execbuffer - however, we may be using the bo for
other purposes where the limit doesn't apply. Or the check may be
invalid (such as now).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-10 16:46 ` [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-06-10 18:09   ` Chris Wilson
  2015-06-17 12:49     ` Daniel Vetter
  2015-06-23 12:21   ` [PATCH v3] " Michel Thierry
  1 sibling, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2015-06-10 18:09 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> DRM_MM_CREATE_TOP flags
> 
> User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> be allocated above the 32b address range.

This should be a per-object flag not per-execbuffer.
 
> The flag is ignored in 32b PPGTT.
> 
> v2: Changed flag logic from neeeds_32b, to supports_48b.
> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  1 +
>  drivers/gpu/drm/i915/i915_gem.c            | 19 ++++++++++++++--
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 36 +++++++++++++++++++++---------
>  include/uapi/drm/i915_drm.h                |  4 +++-
>  4 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 95f59d3..d73dddb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2714,6 +2714,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
>  #define PIN_OFFSET_BIAS	(1<<3)
>  #define PIN_USER	(1<<4)
>  #define PIN_UPDATE	(1<<5)
> +#define PIN_FULL_RANGE	(1<<6)

Halfway through our flags. We should get around to putting a reminder
that 1<<11 is the last flag.

>  #define PIN_OFFSET_MASK (~4095)
>  int __must_check
>  i915_gem_object_pin(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 35690ef..e6325a4a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3672,6 +3672,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u32 fence_alignment, unfenced_alignment;
>  	u64 size, fence_size;
> +	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
> +	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
>  	u64 start =
>  		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
>  	u64 end =
> @@ -3713,6 +3715,19 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  						   obj->tiling_mode,
>  						   false);
>  		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> +
> +		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> +		 * limit address to 4GB-1 for objects requiring this wa; for
> +		 * others, set alloc flag to TOP.
> +		 */
> +		if (USES_FULL_48BIT_PPGTT(dev)) {
> +			if (flags & PIN_FULL_RANGE) {
> +				search_flag = DRM_MM_SEARCH_BELOW;
> +				alloc_flag = DRM_MM_CREATE_TOP;
> +			} else {
> +				end = ((4ULL << GEN8_PDPE_SHIFT) - 1);

Looking at this, I think this is better as two flags. I have used
SEARCH_BELOW in the past to try and keep objects out of the mappable
aperture. Having that as separate flag is quite useful in its own right.

Then the second flag is PIN_BELOW_4G which we can set by default (and
cleared when the user specifies EXEC_OBJECT_48BIT). Not so sure, but I
think with the right combination of flags you can avoid having device
and w/a specific logic here. (This should be mechanism, push the policy
out to the boundary, preferrably into userspace.)

All the i915_gem_execbuffer changes are wrong due to the flag not being
on the object.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands
  2015-06-10 16:46 ` [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands Michel Thierry
@ 2015-06-11 18:04   ` Mika Kuoppala
  2015-06-22  9:18     ` Michel Thierry
  2015-06-26 12:46   ` [PATCH v3] " Michel Thierry
  1 sibling, 1 reply; 74+ messages in thread
From: Mika Kuoppala @ 2015-06-11 18:04 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> A safer way to update the PDPx registers is sending lri commands, added
> in the ring before the batchbuffer start. Otherwise, the ctx must be idle
> before trying to change anything (but the ring-tail) in the ctx image. An
> example where the ctx won't be idle is lite-restore.
>
> This patch depends on [1], and has the advantage that it doesn't require
> to pre-allocate the top pdps like here [2].
>
> [1] http://mid.gmane.org/1432314314-23530-2-git-send-email-mika.kuoppala@intel.com
> [2] http://mid.gmane.org/1432314314-23530-3-git-send-email-mika.kuoppala@intel.com
>
> v2: Combine lri writes (and save 8 commands). (Mika)
>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 43 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 626949a..51c0e06 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1116,13 +1116,56 @@ static int gen9_init_render_ring(struct intel_engine_cs *ring)
>  	return init_workarounds_ring(ring);
>  }
>  
> +static int intel_logical_ring_emit_pdps(struct intel_engine_cs *ring,
> +					struct intel_context *ctx)
> +{
> +	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
> +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> +	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
> +	int i, ret;
> +
> +	ret = intel_logical_ring_begin(ringbuf, ctx, num_lri_cmds * 2 + 2);
> +	if (ret)
> +		return ret;
> +
> +	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
> +	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
> +		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
> +
> +		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_UDW(ring, i));
> +		intel_logical_ring_emit(ringbuf, upper_32_bits(pd_daddr));
> +		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_LDW(ring, i));
> +		intel_logical_ring_emit(ringbuf, lower_32_bits(pd_daddr));
> +	}
> +
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
>  static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
>  			      struct intel_context *ctx,
>  			      u64 offset, unsigned dispatch_flags)
>  {
> +	struct intel_engine_cs *ring = ringbuf->ring;
>  	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
>  	int ret;
>  
> +	/* Don't rely in hw updating PDPs, specially in lite-restore.
> +	 * Ideally, we should set Force PD Restore in ctx descriptor,
> +	 * but we can't. Force Restore would be a second option, but
> +	 * it is unsafe in case of lite-restore (because the ctx is
> +	 * not idle). */
> +	if (ctx->ppgtt &&

Is this superfluous? Can the ctx->ppgtt ever be null with
execlists?
-Mika


> +	    (intel_ring_flag(ring) & ctx->ppgtt->pd_dirty_rings)) {
> +		ret = intel_logical_ring_emit_pdps(ring, ctx);
> +		if (ret)
> +			return ret;
> +
> +		ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(ring);
> +	}
> +
>  	ret = intel_logical_ring_begin(ringbuf, ctx, 4);
>  	if (ret)
>  		return ret;
> -- 
> 2.4.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params
  2015-06-10 16:46 ` [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params Michel Thierry
@ 2015-06-11 18:05   ` Mika Kuoppala
  2015-06-26 16:38     ` Daniel Vetter
  0 siblings, 1 reply; 74+ messages in thread
From: Mika Kuoppala @ 2015-06-11 18:05 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> After Mika's ppgtt cleanup series, all the other free functions have
> drm_device as the first parameter, except this one.
>
> No functional changes.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 8f79125..8314e59 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -766,7 +766,8 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  		kunmap_px(ppgtt, pt_vaddr);
>  }
>  
> -static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
> +static void gen8_free_page_tables(struct drm_device *dev,
> +				  struct i915_page_directory *pd)
>  {
>  	int i;
>  
> @@ -792,7 +793,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
>  			continue;
>  
> -		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
> +		gen8_free_page_tables(ppgtt->base.dev,
> +				      ppgtt->pdp.page_directory[i]);
>  		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
>  	}
>  
> -- 
> 2.4.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-10 18:09   ` Chris Wilson
@ 2015-06-17 12:49     ` Daniel Vetter
  2015-06-17 12:53       ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Daniel Vetter @ 2015-06-17 12:49 UTC (permalink / raw)
  To: Chris Wilson, Michel Thierry, intel-gfx

On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > There are some allocations that must be only referenced by 32bit
> > offsets. To limit the chances of having the first 4GB already full,
> > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > DRM_MM_CREATE_TOP flags
> > 
> > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > be allocated above the 32b address range.
> 
> This should be a per-object flag not per-execbuffer.

We need both. This one to opt into the large address space, the per-object
one to apply the w/a. Also libdrm/mesa patches for this are still missing.
-Daniel

>  
> > The flag is ignored in 32b PPGTT.
> > 
> > v2: Changed flag logic from neeeds_32b, to supports_48b.
> > 
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h            |  1 +
> >  drivers/gpu/drm/i915/i915_gem.c            | 19 ++++++++++++++--
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 36 +++++++++++++++++++++---------
> >  include/uapi/drm/i915_drm.h                |  4 +++-
> >  4 files changed, 47 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 95f59d3..d73dddb 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2714,6 +2714,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
> >  #define PIN_OFFSET_BIAS	(1<<3)
> >  #define PIN_USER	(1<<4)
> >  #define PIN_UPDATE	(1<<5)
> > +#define PIN_FULL_RANGE	(1<<6)
> 
> Halfway through our flags. We should get around to putting a reminder
> that 1<<11 is the last flag.
> 
> >  #define PIN_OFFSET_MASK (~4095)
> >  int __must_check
> >  i915_gem_object_pin(struct drm_i915_gem_object *obj,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 35690ef..e6325a4a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3672,6 +3672,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	u32 fence_alignment, unfenced_alignment;
> >  	u64 size, fence_size;
> > +	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
> > +	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
> >  	u64 start =
> >  		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> >  	u64 end =
> > @@ -3713,6 +3715,19 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> >  						   obj->tiling_mode,
> >  						   false);
> >  		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> > +
> > +		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> > +		 * limit address to 4GB-1 for objects requiring this wa; for
> > +		 * others, set alloc flag to TOP.
> > +		 */
> > +		if (USES_FULL_48BIT_PPGTT(dev)) {
> > +			if (flags & PIN_FULL_RANGE) {
> > +				search_flag = DRM_MM_SEARCH_BELOW;
> > +				alloc_flag = DRM_MM_CREATE_TOP;
> > +			} else {
> > +				end = ((4ULL << GEN8_PDPE_SHIFT) - 1);
> 
> Looking at this, I think this is better as two flags. I have used
> SEARCH_BELOW in the past to try and keep objects out of the mappable
> aperture. Having that as separate flag is quite useful in its own right.
> 
> Then the second flag is PIN_BELOW_4G which we can set by default (and
> cleared when the user specifies EXEC_OBJECT_48BIT). Not so sure, but I
> think with the right combination of flags you can avoid having device
> and w/a specific logic here. (This should be mechanism, push the policy
> out to the boundary, preferrably into userspace.)
> 
> All the i915_gem_execbuffer changes are wrong due to the flag not being
> on the object.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-17 12:49     ` Daniel Vetter
@ 2015-06-17 12:53       ` Chris Wilson
  2015-06-17 15:03         ` Daniel Vetter
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2015-06-17 12:53 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote:
> On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> > On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > > There are some allocations that must be only referenced by 32bit
> > > offsets. To limit the chances of having the first 4GB already full,
> > > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > > DRM_MM_CREATE_TOP flags
> > > 
> > > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > > be allocated above the 32b address range.
> > 
> > This should be a per-object flag not per-execbuffer.
> 
> We need both. This one to opt into the large address space, the per-object
> one to apply the w/a. Also libdrm/mesa patches for this are still missing.

Do we need the opt in on the context? The 48bit vm is lazily
constructed, if no object asks to use the high range, it will never be
populated. Or is there a cost with preparing a 48bit vm?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-17 12:53       ` Chris Wilson
@ 2015-06-17 15:03         ` Daniel Vetter
  2015-06-17 17:37           ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Daniel Vetter @ 2015-06-17 15:03 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx

On Wed, Jun 17, 2015 at 01:53:17PM +0100, Chris Wilson wrote:
> On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> > > On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > > > There are some allocations that must be only referenced by 32bit
> > > > offsets. To limit the chances of having the first 4GB already full,
> > > > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > > > DRM_MM_CREATE_TOP flags
> > > > 
> > > > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > > > be allocated above the 32b address range.
> > > 
> > > This should be a per-object flag not per-execbuffer.
> > 
> > We need both. This one to opt into the large address space, the per-object
> > one to apply the w/a. Also libdrm/mesa patches for this are still missing.
> 
> Do we need the opt in on the context? The 48bit vm is lazily
> constructed, if no object asks to use the high range, it will never be
> populated. Or is there a cost with preparing a 48bit vm?

If we restrict to 4G we'll evict objects if we run out, and will stay
correct even when processing fairly large workloads. With just lazily
eating into 48b that won't be the case. A bit far-fetched, but if we go
to the trouble of implementing this might as well do it right.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-17 15:03         ` Daniel Vetter
@ 2015-06-17 17:37           ` Chris Wilson
  2015-06-18  6:45             ` Daniel Vetter
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2015-06-17 17:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Jun 17, 2015 at 05:03:19PM +0200, Daniel Vetter wrote:
> On Wed, Jun 17, 2015 at 01:53:17PM +0100, Chris Wilson wrote:
> > On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote:
> > > On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> > > > On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > > > > There are some allocations that must be only referenced by 32bit
> > > > > offsets. To limit the chances of having the first 4GB already full,
> > > > > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > > > > DRM_MM_CREATE_TOP flags
> > > > > 
> > > > > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > > > > be allocated above the 32b address range.
> > > > 
> > > > This should be a per-object flag not per-execbuffer.
> > > 
> > > We need both. This one to opt into the large address space, the per-object
> > > one to apply the w/a. Also libdrm/mesa patches for this are still missing.
> > 
> > Do we need the opt in on the context? The 48bit vm is lazily
> > constructed, if no object asks to use the high range, it will never be
> > populated. Or is there a cost with preparing a 48bit vm?
> 
> If we restrict to 4G we'll evict objects if we run out, and will stay
> correct even when processing fairly large workloads. With just lazily
> eating into 48b that won't be the case. A bit far-fetched, but if we go
> to the trouble of implementing this might as well do it right.

i915_evict_something runs between the range requested for pinning. If we
run out of 4G space and the desired pin does not opt into 48bit, we will
evict from the lower 4G.

I obviously missed your concern. Care to elaborate?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-17 17:37           ` Chris Wilson
@ 2015-06-18  6:45             ` Daniel Vetter
  2015-06-18  7:03               ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Daniel Vetter @ 2015-06-18  6:45 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx

On Wed, Jun 17, 2015 at 06:37:03PM +0100, Chris Wilson wrote:
> On Wed, Jun 17, 2015 at 05:03:19PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 17, 2015 at 01:53:17PM +0100, Chris Wilson wrote:
> > > On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote:
> > > > On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> > > > > On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > > > > > There are some allocations that must be only referenced by 32bit
> > > > > > offsets. To limit the chances of having the first 4GB already full,
> > > > > > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > > > > > DRM_MM_CREATE_TOP flags
> > > > > > 
> > > > > > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > > > > > be allocated above the 32b address range.
> > > > > 
> > > > > This should be a per-object flag not per-execbuffer.
> > > > 
> > > > We need both. This one to opt into the large address space, the per-object
> > > > one to apply the w/a. Also libdrm/mesa patches for this are still missing.
> > > 
> > > Do we need the opt in on the context? The 48bit vm is lazily
> > > constructed, if no object asks to use the high range, it will never be
> > > populated. Or is there a cost with preparing a 48bit vm?
> > 
> > If we restrict to 4G we'll evict objects if we run out, and will stay
> > correct even when processing fairly large workloads. With just lazily
> > eating into 48b that won't be the case. A bit far-fetched, but if we go
> > to the trouble of implementing this might as well do it right.
> 
> i915_evict_something runs between the range requested for pinning. If we
> run out of 4G space and the desired pin does not opt into 48bit, we will
> evict from the lower 4G.
> 
> I obviously missed your concern. Care to elaborate?

Current situation: You always get an address below 4G for all objects,
even if you use more than 4G of textures - the evict code will make space.

New situation with 48b address space enabled but existing userspace and a
total BO set bigger than 4G: The kernel will eventually hand out ppgtt
addresses > 4G, which means if we get such an address potentially even for
an object where this wa needs to apply. This would be a regression. But if
we make 48b strictly opt-in the kernel will restrict _all_ objects to
below 4G, creating no regression.

Ofc new userspace on 48b would set both the execbuf opt-in (or context
flag, we have those now) plus the per-obj "I need this below 4G" flag for
the objects that need this wa.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-18  6:45             ` Daniel Vetter
@ 2015-06-18  7:03               ` Chris Wilson
  2015-06-18  7:11                 ` Daniel Vetter
  0 siblings, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2015-06-18  7:03 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 08:45:50AM +0200, Daniel Vetter wrote:
> On Wed, Jun 17, 2015 at 06:37:03PM +0100, Chris Wilson wrote:
> > On Wed, Jun 17, 2015 at 05:03:19PM +0200, Daniel Vetter wrote:
> > > On Wed, Jun 17, 2015 at 01:53:17PM +0100, Chris Wilson wrote:
> > > > On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> > > > > > On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > > > > > > There are some allocations that must be only referenced by 32bit
> > > > > > > offsets. To limit the chances of having the first 4GB already full,
> > > > > > > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > > > > > > DRM_MM_CREATE_TOP flags
> > > > > > > 
> > > > > > > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > > > > > > be allocated above the 32b address range.
> > > > > > 
> > > > > > This should be a per-object flag not per-execbuffer.
> > > > > 
> > > > > We need both. This one to opt into the large address space, the per-object
> > > > > one to apply the w/a. Also libdrm/mesa patches for this are still missing.
> > > > 
> > > > Do we need the opt in on the context? The 48bit vm is lazily
> > > > constructed, if no object asks to use the high range, it will never be
> > > > populated. Or is there a cost with preparing a 48bit vm?
> > > 
> > > If we restrict to 4G we'll evict objects if we run out, and will stay
> > > correct even when processing fairly large workloads. With just lazily
> > > eating into 48b that won't be the case. A bit far-fetched, but if we go
> > > to the trouble of implementing this might as well do it right.
> > 
> > i915_evict_something runs between the range requested for pinning. If we
> > run out of 4G space and the desired pin does not opt into 48bit, we will
> > evict from the lower 4G.
> > 
> > I obviously missed your concern. Care to elaborate?
> 
> Current situation: You always get an address below 4G for all objects,
> even if you use more than 4G of textures - the evict code will make space.
> 
> New situation with 48b address space enabled but existing userspace and a
> total BO set bigger than 4G: The kernel will eventually hand out ppgtt
> addresses > 4G, which means if we get such an address potentially even for
> an object where this wa needs to apply. This would be a regression. But if
> we make 48b strictly opt-in the kernel will restrict _all_ objects to
> below 4G, creating no regression.

How? The pin code requires PIN_48BIT to be set to hand out higher
addresses. That is only set by execbuffer if execobject->flags is also set.
 
> Ofc new userspace on 48b would set both the execbuf opt-in (or context
> flag, we have those now) plus the per-obj "I need this below 4G" flag for
> the objects that need this wa.

I don't see why we need another flag beyond the per-object flag. If you
are thinking validation, we have to validate per-object flags anyway.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-18  7:03               ` Chris Wilson
@ 2015-06-18  7:11                 ` Daniel Vetter
  2015-06-18  7:34                   ` Chris Wilson
  0 siblings, 1 reply; 74+ messages in thread
From: Daniel Vetter @ 2015-06-18  7:11 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx

On Thu, Jun 18, 2015 at 08:03:26AM +0100, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 08:45:50AM +0200, Daniel Vetter wrote:
> > On Wed, Jun 17, 2015 at 06:37:03PM +0100, Chris Wilson wrote:
> > > On Wed, Jun 17, 2015 at 05:03:19PM +0200, Daniel Vetter wrote:
> > > > On Wed, Jun 17, 2015 at 01:53:17PM +0100, Chris Wilson wrote:
> > > > > On Wed, Jun 17, 2015 at 02:49:47PM +0200, Daniel Vetter wrote:
> > > > > > On Wed, Jun 10, 2015 at 07:09:03PM +0100, Chris Wilson wrote:
> > > > > > > On Wed, Jun 10, 2015 at 05:46:54PM +0100, Michel Thierry wrote:
> > > > > > > > There are some allocations that must be only referenced by 32bit
> > > > > > > > offsets. To limit the chances of having the first 4GB already full,
> > > > > > > > objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> > > > > > > > DRM_MM_CREATE_TOP flags
> > > > > > > > 
> > > > > > > > User must pass I915_EXEC_SUPPORTS_48BADDRESS flag to indicate it can
> > > > > > > > be allocated above the 32b address range.
> > > > > > > 
> > > > > > > This should be a per-object flag not per-execbuffer.
> > > > > > 
> > > > > > We need both. This one to opt into the large address space, the per-object
> > > > > > one to apply the w/a. Also libdrm/mesa patches for this are still missing.
> > > > > 
> > > > > Do we need the opt in on the context? The 48bit vm is lazily
> > > > > constructed, if no object asks to use the high range, it will never be
> > > > > populated. Or is there a cost with preparing a 48bit vm?
> > > > 
> > > > If we restrict to 4G we'll evict objects if we run out, and will stay
> > > > correct even when processing fairly large workloads. With just lazily
> > > > eating into 48b that won't be the case. A bit far-fetched, but if we go
> > > > to the trouble of implementing this might as well do it right.
> > > 
> > > i915_evict_something runs between the range requested for pinning. If we
> > > run out of 4G space and the desired pin does not opt into 48bit, we will
> > > evict from the lower 4G.
> > > 
> > > I obviously missed your concern. Care to elaborate?
> > 
> > Current situation: You always get an address below 4G for all objects,
> > even if you use more than 4G of textures - the evict code will make space.
> > 
> > New situation with 48b address space enabled but existing userspace and a
> > total BO set bigger than 4G: The kernel will eventually hand out ppgtt
> > addresses > 4G, which means if we get such an address potentially even for
> > an object where this wa needs to apply. This would be a regression. But if
> > we make 48b strictly opt-in the kernel will restrict _all_ objects to
> > below 4G, creating no regression.
> 
> How? The pin code requires PIN_48BIT to be set to hand out higher
> addresses. That is only set by execbuffer if execobject->flags is also set.

I've been dense, somehow I thought we need the execbuf opt-in with the
object opt-out. But opt-in at the object level is indeed all we need.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-18  7:11                 ` Daniel Vetter
@ 2015-06-18  7:34                   ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-06-18  7:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 09:11:46AM +0200, Daniel Vetter wrote:
> I've been dense, somehow I thought we need the execbuf opt-in with the
> object opt-out. But opt-in at the object level is indeed all we need.

To be fair and recap our discussion on irc, the other side of the coin
is that at some point we want to use 48bit by default (gen9, gen10,
whenever it is robust!) Daniel's argument is that with an high level
enable bit + opt-out, there is less work in userspace to dtrt.

Imo, having changed userspace to opt-in when possible with gen8, having
userspace opt-in for all objects is then trivial (plus it is then easier
for userspace to disable it again). Having a flag at the execbuf level
would be nice, but given the changes we need in userspace today (to
support either the opt-in/opt-out model), I think a second flag is of no
practical value.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands
  2015-06-11 18:04   ` Mika Kuoppala
@ 2015-06-22  9:18     ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-06-22  9:18 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 6/11/2015 7:04 PM, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> A safer way to update the PDPx registers is sending lri commands, added
>> in the ring before the batchbuffer start. Otherwise, the ctx must be idle
>> before trying to change anything (but the ring-tail) in the ctx image. An
>> example where the ctx won't be idle is lite-restore.
>>
>> This patch depends on [1], and has the advantage that it doesn't require
>> to pre-allocate the top pdps like here [2].
>>
>> [1] http://mid.gmane.org/1432314314-23530-2-git-send-email-mika.kuoppala@intel.com
>> [2] http://mid.gmane.org/1432314314-23530-3-git-send-email-mika.kuoppala@intel.com
>>
>> v2: Combine lri writes (and save 8 commands). (Mika)
>>
>> Cc: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 43 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 43 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 626949a..51c0e06 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1116,13 +1116,56 @@ static int gen9_init_render_ring(struct intel_engine_cs *ring)
>>   	return init_workarounds_ring(ring);
>>   }
>>
>> +static int intel_logical_ring_emit_pdps(struct intel_engine_cs *ring,
>> +					struct intel_context *ctx)
>> +{
>> +	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
>> +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
>> +	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
>> +	int i, ret;
>> +
>> +	ret = intel_logical_ring_begin(ringbuf, ctx, num_lri_cmds * 2 + 2);
>> +	if (ret)
>> +		return ret;
>> +
>> +	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
>> +	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
>> +		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>> +
>> +		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_UDW(ring, i));
>> +		intel_logical_ring_emit(ringbuf, upper_32_bits(pd_daddr));
>> +		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_LDW(ring, i));
>> +		intel_logical_ring_emit(ringbuf, lower_32_bits(pd_daddr));
>> +	}
>> +
>> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
>> +	intel_logical_ring_advance(ringbuf);
>> +
>> +	return 0;
>> +}
>> +
>>   static int gen8_emit_bb_start(struct intel_ringbuffer *ringbuf,
>>   			      struct intel_context *ctx,
>>   			      u64 offset, unsigned dispatch_flags)
>>   {
>> +	struct intel_engine_cs *ring = ringbuf->ring;
>>   	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
>>   	int ret;
>>
>> +	/* Don't rely in hw updating PDPs, specially in lite-restore.
>> +	 * Ideally, we should set Force PD Restore in ctx descriptor,
>> +	 * but we can't. Force Restore would be a second option, but
>> +	 * it is unsafe in case of lite-restore (because the ctx is
>> +	 * not idle). */
>> +	if (ctx->ppgtt &&
>
> Is this superfluous? Can the ctx->ppgtt ever be null with
> execlists?

It's for execlists with aliasing ppgtt. In that case ctx->ppgtt is null 
(and we shouldn't need to update the pdps).

-Michel

> -Mika
>
>
>> +	    (intel_ring_flag(ring) & ctx->ppgtt->pd_dirty_rings)) {
>> +		ret = intel_logical_ring_emit_pdps(ring, ctx);
>> +		if (ret)
>> +			return ret;
>> +
>> +		ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(ring);
>> +	}
>> +
>>   	ret = intel_logical_ring_begin(ringbuf, ctx, 4);
>>   	if (ret)
>>   		return ret;
>> --
>> 2.4.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-10 16:46 ` [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
  2015-06-10 18:09   ` Chris Wilson
@ 2015-06-23 12:21   ` Michel Thierry
  2015-06-23 13:22     ` Chris Wilson
  1 sibling, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-06-23 12:21 UTC (permalink / raw)
  To: intel-gfx

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Intructions State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48BADDRESS flag to indicate if
they can be allocated above the 32-bit address range.

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem.c            | 19 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  7 +++++++
 include/uapi/drm/i915_drm.h                |  3 ++-
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a6bc27a..57af235 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2739,6 +2739,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS	(1<<3)
 #define PIN_USER	(1<<4)
 #define PIN_UPDATE	(1<<5)
+#define PIN_FULL_RANGE	(1<<6)
 #define PIN_OFFSET_MASK (~4095)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f4ddf6e..db22559 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3669,6 +3669,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 fence_alignment, unfenced_alignment;
 	u64 size, fence_size;
+	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
+	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
 	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	u64 end =
@@ -3710,6 +3712,19 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 						   obj->tiling_mode,
 						   false);
 		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+		 * limit address to 4GB-1 for objects requiring this wa; for
+		 * others, set alloc flag to TOP.
+		 */
+		if (USES_FULL_48BIT_PPGTT(dev)) {
+			if (flags & PIN_FULL_RANGE) {
+				search_flag = DRM_MM_SEARCH_BELOW;
+				alloc_flag = DRM_MM_CREATE_TOP;
+			} else {
+				end = ((4ULL << GEN8_PDPE_SHIFT) - 1);
+			}
+		}
 	}
 
 	if (alignment == 0)
@@ -3752,8 +3767,8 @@ search_free:
 						  size, alignment,
 						  obj->cache_level,
 						  start, end,
-						  DRM_MM_SEARCH_DEFAULT,
-						  DRM_MM_CREATE_DEFAULT);
+						  search_flag,
+						  alloc_flag);
 	if (ret) {
 		ret = i915_gem_evict_something(dev, vm, size, alignment,
 					       obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3336e1c..ec8c72d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -588,6 +588,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 
+	if (entry->flags & EXEC_OBJECT_SUPPORTS_48BBADDRESS)
+		flags |= PIN_FULL_RANGE;
+
 	if (!drm_mm_node_allocated(&vma->node)) {
 		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
 			flags |= PIN_GLOBAL | PIN_MAPPABLE;
@@ -670,6 +673,10 @@ eb_vma_misplaced(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
 		return !only_mappable_for_reloc(entry->flags);
 
+	if (!(entry->flags & EXEC_OBJECT_SUPPORTS_48BBADDRESS) &&
+	    vma->node.start >= (1ULL << 32))
+		return true;
+
 	return false;
 }
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f88cc1c..55ba527 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -685,7 +685,8 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
 #define EXEC_OBJECT_NEEDS_GTT	(1<<1)
 #define EXEC_OBJECT_WRITE	(1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_SUPPORTS_48BBADDRESS (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48BBADDRESS<<1)
 	__u64 flags;
 
 	__u64 rsvd1;
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v3] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-06-23 12:21   ` [PATCH v3] " Michel Thierry
@ 2015-06-23 13:22     ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-06-23 13:22 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Jun 23, 2015 at 01:21:05PM +0100, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32-bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> DRM_MM_CREATE_TOP flags
> 
> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> General State Heap (GSH) or Intructions State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State
> Offset are limited to 32-bits.
> 
> Objects must have EXEC_OBJECT_SUPPORTS_48BADDRESS flag to indicate if
> they can be allocated above the 32-bit address range.
> 
> v2: Changed flag logic from neeeds_32b, to supports_48b.
> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  1 +
>  drivers/gpu/drm/i915/i915_gem.c            | 19 +++++++++++++++++--
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  7 +++++++
>  include/uapi/drm/i915_drm.h                |  3 ++-
>  4 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a6bc27a..57af235 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2739,6 +2739,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
>  #define PIN_OFFSET_BIAS	(1<<3)
>  #define PIN_USER	(1<<4)
>  #define PIN_UPDATE	(1<<5)
> +#define PIN_FULL_RANGE	(1<<6)
>  #define PIN_OFFSET_MASK (~4095)
>  int __must_check
>  i915_gem_object_pin(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f4ddf6e..db22559 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3669,6 +3669,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u32 fence_alignment, unfenced_alignment;
>  	u64 size, fence_size;
> +	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
> +	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
>  	u64 start =
>  		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
>  	u64 end =
> @@ -3710,6 +3712,19 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  						   obj->tiling_mode,
>  						   false);
>  		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> +
> +		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> +		 * limit address to 4GB-1 for objects requiring this wa; for
> +		 * others, set alloc flag to TOP.
> +		 */
> +		if (USES_FULL_48BIT_PPGTT(dev)) {
> +			if (flags & PIN_FULL_RANGE) {

I wanted this as a separate PIN_FLAG. (a) it is generally useful (for
example, I think anything that has PIN_GLOBAL but not PIN_MAPPABLE is a
candidate for this flag), but (b) there are bugs in the drm_mm
implementation for searching below...

> +				search_flag = DRM_MM_SEARCH_BELOW;
> +				alloc_flag = DRM_MM_CREATE_TOP;
> +			} else {

This would be better internally as a PIN_ZONE_4G flag. 

> +				end = ((4ULL << GEN8_PDPE_SHIFT) - 1);

end should not be -1 here (or if that is actually required by the
hardware -4096).
> +			}
> +		}
>  	}
>  
>  	if (alignment == 0)
> @@ -3752,8 +3767,8 @@ search_free:
>  						  size, alignment,
>  						  obj->cache_level,
>  						  start, end,
> -						  DRM_MM_SEARCH_DEFAULT,
> -						  DRM_MM_CREATE_DEFAULT);
> +						  search_flag,
> +						  alloc_flag);
>  	if (ret) {
>  		ret = i915_gem_evict_something(dev, vm, size, alignment,
>  					       obj->cache_level,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 3336e1c..ec8c72d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -588,6 +588,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
>  		flags |= PIN_GLOBAL;
>  
> +	if (entry->flags & EXEC_OBJECT_SUPPORTS_48BBADDRESS)
> +		flags |= PIN_FULL_RANGE;

flags |= PIN_ZONE_4G;
if (entry->flags & EXEC_OBJECT_SUPPORTS_48BBADDRESS)
	flags &= ~PIN_ZONE_4G;

>  	if (!drm_mm_node_allocated(&vma->node)) {
>  		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
>  			flags |= PIN_GLOBAL | PIN_MAPPABLE;

if ((flags & PIN_MAPPABLE) == 0)
	flags |= PIN_HIGH;

> @@ -670,6 +673,10 @@ eb_vma_misplaced(struct i915_vma *vma)
>  	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
>  		return !only_mappable_for_reloc(entry->flags);
>  
> +	if (!(entry->flags & EXEC_OBJECT_SUPPORTS_48BBADDRESS) &&
> +	    vma->node.start >= (1ULL << 32))

vma->node.start + vma->node.size > 1<<32
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3] drm/i915/lrc: Update PDPx registers with lri commands
  2015-06-10 16:46 ` [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands Michel Thierry
  2015-06-11 18:04   ` Mika Kuoppala
@ 2015-06-26 12:46   ` Michel Thierry
  2015-06-26 14:45     ` Mika Kuoppala
  1 sibling, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-06-26 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

A safer way to update the PDPx registers is sending lri commands, added
in the ring before the batchbuffer start. Otherwise, the ctx must be idle
before trying to change anything (but the ring-tail) in the ctx image. An
example where the ctx won't be idle is lite-restore.

This patch depends on 5b7e4c9ce ("drm/i915/gtt: Mark TLBS dirty for gen8+").

v2: Combine lri writes (and save 8 commands). (Mika)
v3: Rebase after ring/req changes, and removed references to deprecated patches.

Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d527b7b..e87d74c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1379,6 +1379,34 @@ static int gen9_init_render_ring(struct intel_engine_cs *ring)
 	return init_workarounds_ring(ring);
 }
 
+static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
+{
+	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
+	struct intel_engine_cs *ring = req->ring;
+	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
+	int i, ret;
+
+	ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
+
+		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_UDW(ring, i));
+		intel_logical_ring_emit(ringbuf, upper_32_bits(pd_daddr));
+		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_LDW(ring, i));
+		intel_logical_ring_emit(ringbuf, lower_32_bits(pd_daddr));
+	}
+
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
 static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 			      u64 offset, unsigned dispatch_flags)
 {
@@ -1386,6 +1414,20 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
 	int ret;
 
+	/* Don't rely in hw updating PDPs, specially in lite-restore.
+	 * Ideally, we should set Force PD Restore in ctx descriptor,
+	 * but we can't. Force Restore would be a second option, but
+	 * it is unsafe in case of lite-restore (because the ctx is
+	 * not idle). */
+	if (req->ctx->ppgtt &&
+	    (intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
+		ret = intel_logical_ring_emit_pdps(req);
+		if (ret)
+			return ret;
+
+		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
+	}
+
 	ret = intel_logical_ring_begin(req, 4);
 	if (ret)
 		return ret;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v3] drm/i915/lrc: Update PDPx registers with lri commands
  2015-06-26 12:46   ` [PATCH v3] " Michel Thierry
@ 2015-06-26 14:45     ` Mika Kuoppala
  0 siblings, 0 replies; 74+ messages in thread
From: Mika Kuoppala @ 2015-06-26 14:45 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> A safer way to update the PDPx registers is sending lri commands, added
> in the ring before the batchbuffer start. Otherwise, the ctx must be idle
> before trying to change anything (but the ring-tail) in the ctx image. An
> example where the ctx won't be idle is lite-restore.
>
> This patch depends on 5b7e4c9ce ("drm/i915/gtt: Mark TLBS dirty for gen8+").
>
> v2: Combine lri writes (and save 8 commands). (Mika)
> v3: Rebase after ring/req changes, and removed references to deprecated patches.
>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 42 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d527b7b..e87d74c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1379,6 +1379,34 @@ static int gen9_init_render_ring(struct intel_engine_cs *ring)
>  	return init_workarounds_ring(ring);
>  }
>  
> +static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
> +{
> +	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;
> +	struct intel_engine_cs *ring = req->ring;
> +	struct intel_ringbuffer *ringbuf = req->ringbuf;
> +	const int num_lri_cmds = GEN8_LEGACY_PDPES * 2;
> +	int i, ret;
> +
> +	ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2);
> +	if (ret)
> +		return ret;
> +
> +	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds));
> +	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
> +		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
> +
> +		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_UDW(ring, i));
> +		intel_logical_ring_emit(ringbuf, upper_32_bits(pd_daddr));
> +		intel_logical_ring_emit(ringbuf, GEN8_RING_PDP_LDW(ring, i));
> +		intel_logical_ring_emit(ringbuf, lower_32_bits(pd_daddr));
> +	}
> +
> +	intel_logical_ring_emit(ringbuf, MI_NOOP);
> +	intel_logical_ring_advance(ringbuf);
> +
> +	return 0;
> +}
> +
>  static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
>  			      u64 offset, unsigned dispatch_flags)
>  {
> @@ -1386,6 +1414,20 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
>  	bool ppgtt = !(dispatch_flags & I915_DISPATCH_SECURE);
>  	int ret;
>  
> +	/* Don't rely in hw updating PDPs, specially in lite-restore.
> +	 * Ideally, we should set Force PD Restore in ctx descriptor,
> +	 * but we can't. Force Restore would be a second option, but
> +	 * it is unsafe in case of lite-restore (because the ctx is
> +	 * not idle). */
> +	if (req->ctx->ppgtt &&
> +	    (intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
> +		ret = intel_logical_ring_emit_pdps(req);
> +		if (ret)
> +			return ret;
> +
> +		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
> +	}
> +
>  	ret = intel_logical_ring_begin(req, 4);
>  	if (ret)
>  		return ret;
> -- 
> 2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params
  2015-06-11 18:05   ` Mika Kuoppala
@ 2015-06-26 16:38     ` Daniel Vetter
  0 siblings, 0 replies; 74+ messages in thread
From: Daniel Vetter @ 2015-06-26 16:38 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Thu, Jun 11, 2015 at 09:05:38PM +0300, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
> 
> > After Mika's ppgtt cleanup series, all the other free functions have
> > drm_device as the first parameter, except this one.
> >
> > No functional changes.
> >
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

First two patches merged, thanks.
-Daniel

> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 8f79125..8314e59 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -766,7 +766,8 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> >  		kunmap_px(ppgtt, pt_vaddr);
> >  }
> >  
> > -static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
> > +static void gen8_free_page_tables(struct drm_device *dev,
> > +				  struct i915_page_directory *pd)
> >  {
> >  	int i;
> >  
> > @@ -792,7 +793,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> >  		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> >  			continue;
> >  
> > -		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
> > +		gen8_free_page_tables(ppgtt->base.dev,
> > +				      ppgtt->pdp.page_directory[i]);
> >  		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
> >  	}
> >  
> > -- 
> > 2.4.0
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3 00/17] 48-bit PPGTT
  2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
                   ` (18 preceding siblings ...)
  2015-06-10 16:46 ` [PATCH v2] tests/gem_ppgtt: Check Wa32bitOffsets workarounds Michel Thierry
@ 2015-07-01 15:27 ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 01/17] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
                     ` (17 more replies)
  19 siblings, 18 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

These are the rebased patches, after Mika's final ppgtt clean-up series landed
(it relies in the macros added). New functions also follow these changes.

In order expand the GPU address space, a 4th level translation is added, the
Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
used, only adding the 4th level changes. I also updated some remaining
variables that were 32b only.

There are 2 hardware workarounds needed to allow correct operation with 48b
addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). This
new patchset version includes the comments and suggestions from Chris Wilson.
A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can be
allocated outside the first 4 PDPs; if not, the end range is forced to 4GB. Also,
more objects now use the DRM_MM_CREATE_TOP flag. To maintain compatibility, in
libdrm I added a new drm_intel_bo_emit_reloc_48bit function that will flag
these objects, while the existing drm_intel_bo_emit_reloc clears it.

Finally, this feature is only available in BDW and Gen9, requires LRC submission
mode (execlists) and it can be detected by i915.enable_ppgtt=3.

Also note that this expanded address space is only available for full PPGTT,
aliasing PPGTT and Global GTT remain 32-bit.

Michel Thierry (17):
  drm/i915: Remove unnecessary gen8_clamp_pd
  drm/i915/gen8: Make pdp allocation more dynamic
  drm/i915/gen8: Abstract PDP usage
  drm/i915/gen8: Add dynamic page trace events
  drm/i915/gen8: implement alloc/free for 4lvl
  drm/i915/gen8: Add 4 level switching infrastructure and lrc support
  drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
  drm/i915/gen8: Pass sg_iter through pte inserts
  drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  drm/i915/gen8: Initialize PDPs
  drm/i915: Expand error state's address width to 64b
  drm/i915/gen8: Add ppgtt info and debug_dump
  drm/i915: object size needs to be u64
  drm/i915: batch_obj vm offset must be u64
  drm/i915/userptr: Kill user_size limit check
  drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  drm/i915/gen8: Flip the 48b switch

 drivers/gpu/drm/i915/i915_debugfs.c        |  18 +-
 drivers/gpu/drm/i915/i915_drv.h            |  17 +-
 drivers/gpu/drm/i915/i915_gem.c            |  22 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 649 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  66 ++-
 drivers/gpu/drm/i915/i915_gem_userptr.c    |   4 -
 drivers/gpu/drm/i915/i915_gpu_error.c      |  17 +-
 drivers/gpu/drm/i915/i915_params.c         |   2 +-
 drivers/gpu/drm/i915/i915_reg.h            |   1 +
 drivers/gpu/drm/i915/i915_trace.h          |  16 +
 drivers/gpu/drm/i915/intel_lrc.c           |  65 ++-
 include/uapi/drm/i915_drm.h                |   3 +-
 13 files changed, 725 insertions(+), 165 deletions(-)

-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v3 01/17] drm/i915: Remove unnecessary gen8_clamp_pd
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory boundary.

Furthermore, i915_pte_count also restricts to the next page table
boundary.

v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h | 11 -----------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b29b73f..712ca34 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -955,7 +955,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		gen8_pde_t *const page_directory = kmap_px(pd);
 		struct i915_page_table *pt;
-		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_len = length;
 		uint64_t pd_start = start;
 		uint32_t pde;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e1cfa29..d5bf953 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -444,17 +444,6 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-/* Clamp length to the next page_directory boundary */
-static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
-{
-	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
-
-	if (next_pd > (start + length))
-		return length;
-
-	return next_pd - start;
-}
-
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 01/17] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-07 12:36     ` Goel, Akash
  2015-07-01 15:27   ` [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage Michel Thierry
                     ` (15 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier. The patch also introduces the PML4, ie. the new top level structure
of the page tables.

v2: Renamed  pdp_free to be similar to  pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_drv.h     |   7 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 116 ++++++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  41 ++++++++++---
 3 files changed, 128 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1dbd957..7bccfd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2490,7 +2490,12 @@ struct drm_i915_cmd_table {
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define USES_PPGTT(dev)		(i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt >= 2)
+#ifdef CONFIG_X86_64
+# define USES_FULL_48BIT_PPGTT(dev)	(i915.enable_ppgtt == 3)
+#else
+# define USES_FULL_48BIT_PPGTT(dev)	false
+#endif
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 712ca34..cdcc778 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,13 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 {
 	bool has_aliasing_ppgtt;
 	bool has_full_ppgtt;
+	bool has_full_64bit_ppgtt;
 
 	has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+	has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
+			       (IS_BROADWELL(dev) ||
+				INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
 
 	if (intel_vgpu_active(dev))
 		has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +129,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	if (enable_ppgtt == 2 && has_full_ppgtt)
 		return 2;
 
+	if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+		return 3;
+
 #ifdef CONFIG_INTEL_IOMMU
 	/* Disable ppgtt on SNB if VT-d is on. */
 	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -522,6 +529,45 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 	fill_px(vm->dev, pd, scratch_pde);
 }
 
+static int __pdp_init(struct drm_device *dev,
+		      struct i915_page_directory_pointer *pdp)
+{
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+	pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+				  sizeof(unsigned long),
+				  GFP_KERNEL);
+	if (!pdp->used_pdpes)
+		return -ENOMEM;
+
+	pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
+				      GFP_KERNEL);
+	if (!pdp->page_directory) {
+		kfree(pdp->used_pdpes);
+		/* the PDP might be the statically allocated top level. Keep it
+		 * as clean as possible */
+		pdp->used_pdpes = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void __pdp_fini(struct i915_page_directory_pointer *pdp)
+{
+	kfree(pdp->used_pdpes);
+	kfree(pdp->page_directory);
+	pdp->page_directory = NULL;
+}
+
+static void free_pdp(struct drm_device *dev,
+		     struct i915_page_directory_pointer *pdp)
+{
+	__pdp_fini(pdp);
+	if (USES_FULL_48BIT_PPGTT(dev))
+		kfree(pdp);
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct drm_i915_gem_request *req,
 			  unsigned entry,
@@ -634,9 +680,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-			break;
-
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
@@ -720,7 +763,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -729,6 +773,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
+	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
 	gen8_free_scratch(vm);
 }
 
@@ -820,8 +865,9 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	struct i915_page_directory *pd;
 	uint64_t temp;
 	uint32_t pdpe;
+	uint32_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
 
-	WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+	WARN_ON(!bitmap_empty(new_pds, pdpes));
 
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		if (pd)
@@ -839,18 +885,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_pds, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
+		       uint32_t pdpes)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+	for (i = 0; i < pdpes; i++)
 		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
@@ -861,23 +908,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
  */
 static
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts)
+					 unsigned long ***new_pts,
+					 uint32_t pdpes)
 {
 	int i;
 	unsigned long *pds;
 	unsigned long **pts;
 
-	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
 	if (!pds)
 		return -ENOMEM;
 
-	pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
+	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
 	if (!pts) {
 		kfree(pds);
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for (i = 0; i < pdpes; i++) {
 		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
 				 sizeof(unsigned long), GFP_KERNEL);
 		if (!pts[i])
@@ -890,7 +938,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts);
+	free_gen8_temp_bitmaps(pds, pts, pdpes);
 	return -ENOMEM;
 }
 
@@ -916,6 +964,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
+	uint32_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
 	/* Wrap is never okay since we can only represent 48b, and we don't
@@ -927,7 +976,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	if (WARN_ON(start + length > ppgtt->base.total))
 		return -ENODEV;
 
-	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
 	if (ret)
 		return ret;
 
@@ -935,7 +984,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
 					new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
@@ -989,7 +1038,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
@@ -999,10 +1048,10 @@ err_out:
 			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
 	}
 
-	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
@@ -1023,14 +1072,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 
 	ppgtt->base.start = 0;
-	ppgtt->base.total = 1ULL << 32;
-	if (IS_ENABLED(CONFIG_X86_32))
-		/* While we have a proliferation of size_t variables
-		 * we cannot represent the full ppgtt size on 32bit,
-		 * so limit it to the same size as the GGTT (currently
-		 * 2GiB).
-		 */
-		ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
@@ -1040,7 +1081,30 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		ret = __pdp_init(false, &ppgtt->pdp);
+
+		if (ret)
+			goto free_scratch;
+
+		ppgtt->base.total = 1ULL << 32;
+		if (IS_ENABLED(CONFIG_X86_32))
+			/* While we have a proliferation of size_t variables
+			 * we cannot represent the full ppgtt size on 32bit,
+			 * so limit it to the same size as the GGTT (currently
+			 * 2GiB).
+			 */
+			ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+	} else {
+		ppgtt->base.total = 1ULL << 48;
+		return -EPERM; /* Not yet implemented */
+	}
+
 	return 0;
+
+free_scratch:
+	gen8_free_scratch(&ppgtt->base);
+	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d5bf953..e2b684e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
  * PDPE  |  PDE  |  PTE  | offset
  * The difference as compared to normal x86 3 level page table is the PDPEs are
  * programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 |  11:0
+ * PML4E | PDPE  |  PDE  |  PTE  | offset
  */
+#define GEN8_PML4ES_PER_PML4		512
+#define GEN8_PML4E_SHIFT		39
 #define GEN8_PDPE_SHIFT			30
-#define GEN8_PDPE_MASK			0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK			0x1ff
 #define GEN8_PDE_SHIFT			21
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
@@ -98,6 +106,9 @@ typedef uint64_t gen8_pde_t;
 #define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES			I915_PTES(sizeof(gen8_pte_t))
 
+#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+				GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
+
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
 #define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
@@ -241,9 +252,17 @@ struct i915_page_directory {
 };
 
 struct i915_page_directory_pointer {
-	/* struct page *page; */
-	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
-	struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
+	struct i915_page_dma base;
+
+	unsigned long *used_pdpes;
+	struct i915_page_directory **page_directory;
+};
+
+struct i915_pml4 {
+	struct i915_page_dma base;
+
+	DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+	struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
 };
 
 struct i915_address_space {
@@ -341,8 +360,9 @@ struct i915_hw_ppgtt {
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
 	union {
-		struct i915_page_directory_pointer pdp;
-		struct i915_page_directory pd;
+		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
+		struct i915_page_directory_pointer pdp;	/* GEN8+ */
+		struct i915_page_directory pd;		/* GEN6-7 */
 	};
 
 	struct drm_i915_file_private *file_priv;
@@ -436,14 +456,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
-	for (iter = gen8_pdpe_index(start);	\
-	     pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES;	\
+#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b)	\
+	for (iter = gen8_pdpe_index(start); \
+	     pd = (pdp)->page_directory[iter], length > 0 && (iter < b);	\
 	     iter++,				\
 	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 01/17] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-07 12:43     ` Goel, Akash
  2015-07-01 15:27   ` [PATCH v3 04/17] drm/i915/gen8: Add dynamic page trace events Michel Thierry
                     ` (14 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 136 +++++++++++++++++++++++-------------
 1 file changed, 88 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cdcc778..41a18ff 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -529,6 +529,25 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 	fill_px(vm->dev, pd, scratch_pde);
 }
 
+/* It's likely we'll map more than one page table at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to pde_encode. The ppgtt is only needed to reuse the kunmap macro. */
+static void gen8_map_pagetable_range(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory *pd,
+				     uint64_t start,
+				     uint64_t length)
+{
+	gen8_pde_t * const page_directory = kmap_px(pd);
+	struct i915_page_table *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		page_directory[pde] = gen8_pde_encode(px_dma(pt),
+						      I915_CACHE_LLC);
+
+	kunmap_px(ppgtt, page_directory);
+}
+
 static int __pdp_init(struct drm_device *dev,
 		      struct i915_page_directory_pointer *pdp)
 {
@@ -616,6 +635,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_pte_t *pt_vaddr, scratch_pte;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -630,10 +650,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		struct i915_page_directory *pd;
 		struct i915_page_table *pt;
 
-		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+		if (WARN_ON(!pdp->page_directory[pdpe]))
 			break;
 
-		pd = ppgtt->pdp.page_directory[pdpe];
+		pd = pdp->page_directory[pdpe];
 
 		if (WARN_ON(!pd->page_table[pde]))
 			break;
@@ -671,6 +691,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -681,7 +702,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_directory *pd = pdp->page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
 			pt_vaddr = kmap_px(pt);
 		}
@@ -763,23 +784,28 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+				continue;
 
-		gen8_free_page_tables(ppgtt->base.dev,
-				      ppgtt->pdp.page_directory[i]);
-		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
+			gen8_free_page_tables(ppgtt->base.dev,
+					      ppgtt->pdp.page_directory[i]);
+			free_pd(ppgtt->base.dev,
+				ppgtt->pdp.page_directory[i]);
+		}
+		free_pdp(ppgtt->base.dev, &ppgtt->pdp);
+	} else {
+		WARN_ON(1); /* to be implemented later */
 	}
 
-	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
 	gen8_free_scratch(vm);
 }
 
 /**
  * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pd:		Page directory for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -795,13 +821,15 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 				     struct i915_page_directory *pd,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pts)
 {
-	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_hw_ppgtt *ppgtt =
+	    container_of(vm, struct i915_hw_ppgtt, base);
+	struct drm_device *dev = vm->dev;
 	struct i915_page_table *pt;
 	uint64_t temp;
 	uint32_t pde;
@@ -818,7 +846,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 		if (IS_ERR(pt))
 			goto unwind_out;
 
-		gen8_initialize_pt(&ppgtt->base, pt);
+		gen8_initialize_pt(vm, pt);
 		pd->page_table[pde] = pt;
 		__set_bit(pde, new_pts);
 	}
@@ -834,7 +862,7 @@ unwind_out:
 
 /**
  * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pdp:	Page directory pointer for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -855,17 +883,18 @@ unwind_out:
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
-				     struct i915_page_directory_pointer *pdp,
-				     uint64_t start,
-				     uint64_t length,
-				     unsigned long *new_pds)
+static int
+gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
+				  struct i915_page_directory_pointer *pdp,
+				  uint64_t start,
+				  uint64_t length,
+				  unsigned long *new_pds)
 {
-	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	uint64_t temp;
 	uint32_t pdpe;
-	uint32_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
+	uint32_t pdpes =  I915_PDPES_PER_PDP(vm->dev);
 
 	WARN_ON(!bitmap_empty(new_pds, pdpes));
 
@@ -877,7 +906,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 		if (IS_ERR(pd))
 			goto unwind_out;
 
-		gen8_initialize_pd(&ppgtt->base, pd);
+		gen8_initialize_pd(vm, pd);
 		pdp->page_directory[pdpe] = pd;
 		__set_bit(pdpe, new_pds);
 	}
@@ -952,13 +981,15 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start,
-			       uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+				    struct i915_page_directory_pointer *pdp,
+				    uint64_t start,
+				    uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	unsigned long *new_page_dirs, **new_page_tables;
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -981,16 +1012,15 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return ret;
 
 	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
-					new_page_dirs);
+	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
+						new_page_dirs);
 	if (ret) {
 		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
-	/* For every page directory referenced, allocate page tables */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
 						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
@@ -999,10 +1029,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	start = orig_start;
 	length = orig_length;
 
-	/* Allocations have completed successfully, so set the bitmaps, and do
-	 * the mappings. */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_pde_t *const page_directory = kmap_px(pd);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		struct i915_page_table *pt;
 		uint64_t pd_len = length;
 		uint64_t pd_start = start;
@@ -1024,18 +1051,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 
 			/* Our pde is now pointing to the pagetable, pt */
 			__set_bit(pde, pd->used_pdes);
-
-			/* Map the PDE to the page table */
-			page_directory[pde] = gen8_pde_encode(px_dma(pt),
-							      I915_CACHE_LLC);
-
-			/* NB: We haven't yet mapped ptes to pages. At this
-			 * point we're still relying on insert_entries() */
 		}
 
-		kunmap_px(ppgtt, page_directory);
-
-		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		__set_bit(pdpe, pdp->used_pdpes);
+		gen8_map_pagetable_range(ppgtt, pd, start, length);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1045,17 +1064,38 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
-			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
+			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
-		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
+		free_pd(dev, pdp->page_directory[pdpe]);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
 
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+				    struct i915_pml4 *pml4,
+				    uint64_t start,
+				    uint64_t length)
+{
+	WARN_ON(1); /* to be implemented later */
+	return 0;
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev))
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+	else
+		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
 /*
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 04/17] drm/i915/gen8: Add dynamic page trace events
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (2 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.

v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  9 ++++++++-
 drivers/gpu/drm/i915/i915_trace.h   | 16 ++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 41a18ff..1327e41 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -541,9 +541,14 @@ static void gen8_map_pagetable_range(struct i915_hw_ppgtt *ppgtt,
 	struct i915_page_table *pt;
 	uint64_t temp, pde;
 
-	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
 		page_directory[pde] = gen8_pde_encode(px_dma(pt),
 						      I915_CACHE_LLC);
+		trace_i915_page_table_entry_map(&ppgtt->base, pde, pt,
+						gen8_pte_index(start),
+						gen8_pte_count(start, length),
+						GEN8_PTES);
+	}
 
 	kunmap_px(ppgtt, page_directory);
 }
@@ -849,6 +854,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 		gen8_initialize_pt(vm, pt);
 		pd->page_table[pde] = pt;
 		__set_bit(pde, new_pts);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
 	}
 
 	return 0;
@@ -909,6 +915,7 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 		gen8_initialize_pd(vm, pd);
 		pdp->page_directory[pdpe] = pd;
 		__set_bit(pdpe, new_pds);
+		trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 63328b6..15cf1af 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -213,6 +213,22 @@ DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
 	     TP_ARGS(vm, pde, start, pde_shift)
 );
 
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+		   TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+		   TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_pointer_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+		   TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+		   TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (3 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 04/17] drm/i915/gen8: Add dynamic page trace events Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-07 12:48     ` Goel, Akash
  2015-07-01 15:27   ` [PATCH v3 06/17] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
                     ` (12 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.

The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.

v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch patches.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 162 ++++++++++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 ++-
 2 files changed, 146 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1327e41..d23b0a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -584,12 +584,44 @@ static void __pdp_fini(struct i915_page_directory_pointer *pdp)
 	pdp->page_directory = NULL;
 }
 
+static struct
+i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
+{
+	struct i915_page_directory_pointer *pdp;
+	int ret = -ENOMEM;
+
+	WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+	pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
+	if (!pdp)
+		return ERR_PTR(-ENOMEM);
+
+	ret = __pdp_init(dev, pdp);
+	if (ret)
+		goto fail_bitmap;
+
+	ret = setup_px(dev, pdp);
+	if (ret)
+		goto fail_page_m;
+
+	return pdp;
+
+fail_page_m:
+	__pdp_fini(pdp);
+fail_bitmap:
+	kfree(pdp);
+
+	return ERR_PTR(ret);
+}
+
 static void free_pdp(struct drm_device *dev,
 		     struct i915_page_directory_pointer *pdp)
 {
 	__pdp_fini(pdp);
-	if (USES_FULL_48BIT_PPGTT(dev))
+	if (USES_FULL_48BIT_PPGTT(dev)) {
+		cleanup_px(dev, pdp);
 		kfree(pdp);
+	}
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -783,28 +815,46 @@ static void gen8_free_scratch(struct i915_address_space *vm)
 	free_scratch_page(dev, vm->scratch_page);
 }
 
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
+				    struct i915_page_directory_pointer *pdp)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-				continue;
+	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+		if (WARN_ON(!pdp->page_directory[i]))
+			continue;
 
-			gen8_free_page_tables(ppgtt->base.dev,
-					      ppgtt->pdp.page_directory[i]);
-			free_pd(ppgtt->base.dev,
-				ppgtt->pdp.page_directory[i]);
-		}
-		free_pdp(ppgtt->base.dev, &ppgtt->pdp);
-	} else {
-		WARN_ON(1); /* to be implemented later */
+		gen8_free_page_tables(dev, pdp->page_directory[i]);
+		free_pd(dev, pdp->page_directory[i]);
 	}
 
+	free_pdp(dev, pdp);
+}
+
+static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+		if (WARN_ON(!ppgtt->pml4.pdps[i]))
+			continue;
+
+		gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
+	}
+
+	cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
+}
+
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
+	else
+		gen8_ppgtt_cleanup_4lvl(ppgtt);
+
 	gen8_free_scratch(vm);
 }
 
@@ -1087,8 +1137,62 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 				    uint64_t start,
 				    uint64_t length)
 {
-	WARN_ON(1); /* to be implemented later */
+	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
+	uint64_t temp, pml4e;
+	int ret = 0;
+
+	/* Do the pml4 allocations first, so we don't need to track the newly
+	 * allocated tables below the pdp */
+	bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+	/* The pagedirectory and pagetable allocations are done in the shared 3
+	 * and 4 level code. Just allocate the pdps.
+	 */
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		if (!pdp) {
+			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+			pdp = alloc_pdp(vm->dev);
+			if (IS_ERR(pdp))
+				goto err_out;
+
+			pml4->pdps[pml4e] = pdp;
+			__set_bit(pml4e, new_pdps);
+			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
+						   pml4e << GEN8_PML4E_SHIFT,
+						   GEN8_PML4E_SHIFT);
+		}
+	}
+
+	WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+	     "The allocation has spanned more than 512GB. "
+	     "It is highly likely this is incorrect.");
+
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		WARN_ON(!pdp);
+
+		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		if (ret)
+			goto err_out;
+	}
+
+	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+		  GEN8_PML4ES_PER_PML4);
+
 	return 0;
+
+err_out:
+	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+		gen8_ppgtt_cleanup_3lvl(vm->dev, pml4->pdps[pml4e]);
+
+	return ret;
 }
 
 static int gen8_alloc_va_range(struct i915_address_space *vm,
@@ -1097,10 +1201,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	if (!USES_FULL_48BIT_PPGTT(vm->dev))
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-	else
+	if (USES_FULL_48BIT_PPGTT(vm->dev))
 		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	else
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 /*
@@ -1128,9 +1232,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		ret = __pdp_init(false, &ppgtt->pdp);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
+		if (ret)
+			goto free_scratch;
 
+		ppgtt->base.total = 1ULL << 48;
+	} else {
+		ret = __pdp_init(false, &ppgtt->pdp);
 		if (ret)
 			goto free_scratch;
 
@@ -1142,9 +1251,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 			 * 2GiB).
 			 */
 			ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
-	} else {
-		ppgtt->base.total = 1ULL << 48;
-		return -EPERM; /* Not yet implemented */
+
+		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
+							      0, 0,
+							      GEN8_PML4E_SHIFT);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e2b684e..c8ac0b5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -95,6 +95,7 @@ typedef uint64_t gen8_pde_t;
  */
 #define GEN8_PML4ES_PER_PML4		512
 #define GEN8_PML4E_SHIFT		39
+#define GEN8_PML4E_MASK			(GEN8_PML4ES_PER_PML4 - 1)
 #define GEN8_PDPE_SHIFT			30
 /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
  * tables */
@@ -464,6 +465,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter)	\
+	for (iter = gen8_pml4e_index(start);	\
+	     pdp = (pml4)->pdps[iter], length > 0 && iter < GEN8_PML4ES_PER_PML4;	\
+	     iter++,				\
+	     temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
@@ -484,8 +493,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
 
 static inline uint32_t gen8_pml4e_index(uint64_t address)
 {
-	WARN_ON(1); /* For 64B */
-	return 0;
+	return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
 }
 
 static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 06/17] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (4 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 07/17] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.

In LRC, the addressing mode must be specified in every context descriptor.

v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 52 ++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
 drivers/gpu/drm/i915/i915_reg.h     |  1 +
 drivers/gpu/drm/i915/intel_lrc.c    | 65 +++++++++++++++++++++++++++----------
 4 files changed, 97 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d23b0a8..fcb8c4b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,9 @@ static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
 	return pde;
 }
 
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
 static gen6_pte_t snb_pte_encode(dma_addr_t addr,
 				 enum i915_cache_level level,
 				 bool valid, u32 unused)
@@ -624,6 +627,35 @@ static void free_pdp(struct drm_device *dev,
 	}
 }
 
+static void
+gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
+			  struct i915_page_directory_pointer *pdp,
+			  struct i915_page_directory *pd,
+			  int index)
+{
+	gen8_ppgtt_pdpe_t *page_directorypo;
+
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		return;
+
+	page_directorypo = kmap_px(pdp);
+	page_directorypo[index] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
+	kunmap_px(ppgtt, page_directorypo);
+}
+
+static void
+gen8_setup_page_directory_pointer(struct i915_hw_ppgtt *ppgtt,
+				  struct i915_pml4 *pml4,
+				  struct i915_page_directory_pointer *pdp,
+				  int index)
+{
+	gen8_ppgtt_pml4e_t *pagemap = kmap_px(pml4);
+
+	WARN_ON(!USES_FULL_48BIT_PPGTT(ppgtt->base.dev));
+	pagemap[index] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
+	kunmap_px(ppgtt, pagemap);
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct drm_i915_gem_request *req,
 			  unsigned entry,
@@ -649,8 +681,8 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct drm_i915_gem_request *req)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+				 struct drm_i915_gem_request *req)
 {
 	int i, ret;
 
@@ -665,6 +697,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+			      struct drm_i915_gem_request *req)
+{
+	return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t start,
 				   uint64_t length,
@@ -1112,6 +1150,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 
 		__set_bit(pdpe, pdp->used_pdpes);
 		gen8_map_pagetable_range(ppgtt, pd, start, length);
+		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1181,6 +1220,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
 		if (ret)
 			goto err_out;
+
+		gen8_setup_page_directory_pointer(ppgtt, pml4, pdp, pml4e);
 	}
 
 	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
@@ -1230,14 +1271,13 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.unbind_vma = ppgtt_unbind_vma;
 	ppgtt->base.bind_vma = ppgtt_bind_vma;
 
-	ppgtt->switch_mm = gen8_mm_switch;
-
 	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
 		if (ret)
 			goto free_scratch;
 
 		ppgtt->base.total = 1ULL << 48;
+		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
 		ret = __pdp_init(false, &ppgtt->pdp);
 		if (ret)
@@ -1252,6 +1292,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 			 */
 			ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
 
+		ppgtt->switch_mm = gen8_legacy_mm_switch;
 		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
 							      0, 0,
 							      GEN8_PML4E_SHIFT);
@@ -1449,8 +1490,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
 	int j;
 
 	for_each_ring(ring, dev_priv, j) {
+		u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_48B : 0;
 		I915_WRITE(RING_MODE_GEN7(ring),
-			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c8ac0b5..fb939fb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -39,6 +39,8 @@ struct drm_i915_file_private;
 typedef uint32_t gen6_pte_t;
 typedef uint64_t gen8_pte_t;
 typedef uint64_t gen8_pde_t;
+typedef uint64_t gen8_ppgtt_pdpe_t;
+typedef uint64_t gen8_ppgtt_pml4e_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 313b1f9..d99125b 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1664,6 +1664,7 @@ enum skl_disp_power_wells {
 #define   GFX_REPLAY_MODE		(1<<11)
 #define   GFX_PSMI_GRANULARITY		(1<<10)
 #define   GFX_PPGTT_ENABLE		(1<<9)
+#define   GEN8_GFX_PPGTT_48B		(1<<7)
 
 #define VLV_DISPLAY_BASE 0x180000
 #define VLV_MIPI_BASE VLV_DISPLAY_BASE
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e87d74c..719434e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -195,13 +195,21 @@
 	reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
 }
 
+#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
+	reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(px_dma(&ppgtt->pml4)); \
+	reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
+}
+
 enum {
 	ADVANCED_CONTEXT = 0,
-	LEGACY_CONTEXT,
+	LEGACY_32B_CONTEXT,
 	ADVANCED_AD_CONTEXT,
 	LEGACY_64B_CONTEXT
 };
-#define GEN8_CTX_MODE_SHIFT 3
+#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
+#define GEN8_CTX_ADDRESSING_MODE(dev)  (USES_FULL_48BIT_PPGTT(dev) ?\
+		LEGACY_64B_CONTEXT :\
+		LEGACY_32B_CONTEXT)
 enum {
 	FAULT_AND_HANG = 0,
 	FAULT_AND_HALT, /* Debug only */
@@ -272,7 +280,7 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
 	if (IS_GEN8(ctx_obj->base.dev))
 		desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
@@ -343,10 +351,16 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
-	/* True PPGTT with dynamic page allocation: update PDP registers and
-	 * point the unallocated PDPs to the scratch page
-	 */
-	if (ppgtt) {
+	if (ppgtt && USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* True 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored
+		 */
+		ASSIGN_CTX_PML4(ppgtt, reg_state);
+	} else if (ppgtt) {
+		/* True 32b PPGTT with dynamic page allocation: update PDP
+		 * registers and point the unallocated PDPs to the scratch page
+		 */
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
@@ -1418,12 +1432,16 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	 * Ideally, we should set Force PD Restore in ctx descriptor,
 	 * but we can't. Force Restore would be a second option, but
 	 * it is unsafe in case of lite-restore (because the ctx is
-	 * not idle). */
+	 * not idle). PML4 is allocated during ppgtt init so this is
+	 * not needed in 48-bit.*/
 	if (req->ctx->ppgtt &&
 	    (intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
-		ret = intel_logical_ring_emit_pdps(req);
-		if (ret)
-			return ret;
+		if (GEN8_CTX_ADDRESSING_MODE(req->i915) == LEGACY_32B_CONTEXT){
+			ret = intel_logical_ring_emit_pdps(req);
+
+			if (ret)
+				return ret;
+		}
 
 		req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
 	}
@@ -2086,13 +2104,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
 
-	/* With dynamic page allocation, PDPs may not be allocated at this point,
-	 * Point the unallocated PDPs to the scratch page
-	 */
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
-	ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored.
+		 */
+		ASSIGN_CTX_PML4(ppgtt, reg_state);
+	} else {
+		/* 32b PPGTT
+		 * PDP*_DESCRIPTOR contains the base address of space supported.
+		 * With dynamic page allocation, PDPs may not be allocated at
+		 * this point. Point the unallocated PDPs to the scratch page
+		 */
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
+		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 07/17] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (5 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 06/17] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 08/17] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.

This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception.

v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v3: Rebase after final merged version of Mika's ppgtt/scratch patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fcb8c4b..bd31cbc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -703,24 +703,21 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
-				   bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
+				       struct i915_page_directory_pointer *pdp,
+				       uint64_t start,
+				       uint64_t length,
+				       gen8_pte_t scratch_pte)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-	gen8_pte_t *pt_vaddr, scratch_pte;
+	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
-	scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
-				      I915_CACHE_LLC, use_scratch);
-
 	while (num_entries) {
 		struct i915_page_directory *pd;
 		struct i915_page_table *pt;
@@ -759,14 +756,30 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
-				      struct sg_table *pages,
-				      uint64_t start,
-				      enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+				   uint64_t start,
+				   uint64_t length,
+				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+						 I915_CACHE_LLC, use_scratch);
+
+	gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+}
+
+static void
+gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
+			      struct i915_page_directory_pointer *pdp,
+			      struct sg_table *pages,
+			      uint64_t start,
+			      enum i915_cache_level cache_level)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -800,6 +813,19 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_px(ppgtt, pt_vaddr);
 }
 
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+				      struct sg_table *pages,
+				      uint64_t start,
+				      enum i915_cache_level cache_level,
+				      u32 unused)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+}
+
 static void gen8_free_page_tables(struct drm_device *dev,
 				  struct i915_page_directory *pd)
 {
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 08/17] drm/i915/gen8: Pass sg_iter through pte inserts
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (6 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 07/17] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.

v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd31cbc..67d02b9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -774,7 +774,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 static void
 gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 			      struct i915_page_directory_pointer *pdp,
-			      struct sg_table *pages,
+			      struct sg_page_iter *sg_iter,
 			      uint64_t start,
 			      enum i915_cache_level cache_level)
 {
@@ -784,11 +784,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
-	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	while (__sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = pdp->page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
@@ -796,7 +795,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 		}
 
 		pt_vaddr[pte] =
-			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+			gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES) {
 			kunmap_px(ppgtt, pt_vaddr);
@@ -822,8 +821,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct sg_page_iter sg_iter;
 
-	gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+	gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
 }
 
 static void gen8_free_page_tables(struct drm_device *dev,
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (7 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 08/17] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-07 12:51     ` Goel, Akash
  2015-07-01 15:27   ` [PATCH v3 10/17] drm/i915/gen8: Initialize PDPs Michel Thierry
                     ` (8 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.

Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.

This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".

v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 51 ++++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 11 ++++++++
 2 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 67d02b9..d16fbce 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -712,9 +712,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pte_t *pt_vaddr;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
@@ -763,12 +763,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
 	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
 						 I915_CACHE_LLC, use_scratch);
 
-	gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
+					   scratch_pte);
+	} else {
+		uint64_t templ4, pml4e;
+		struct i915_page_directory_pointer *pdp;
+
+		gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+			uint64_t pdp_len = gen8_clamp_pdp(start, length);
+			uint64_t pdp_start = start;
+
+			gen8_ppgtt_clear_pte_range(vm, pdp, pdp_start, pdp_len,
+						   scratch_pte);
+		}
+	}
 }
 
 static void
@@ -781,9 +793,9 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pte_t *pt_vaddr;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 
 	pt_vaddr = NULL;
 
@@ -801,7 +813,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
 			kunmap_px(ppgtt, pt_vaddr);
 			pt_vaddr = NULL;
 			if (++pde == I915_PDES) {
-				pdpe++;
+				if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+					break;
 				pde = 0;
 			}
 			pte = 0;
@@ -820,11 +833,25 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	struct sg_page_iter sg_iter;
 
 	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
-	gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
+					      cache_level);
+	} else {
+		struct i915_page_directory_pointer *pdp;
+		uint64_t templ4, pml4e;
+		uint64_t length = (uint64_t)sg_nents(pages->sgl) << PAGE_SHIFT;
+
+		gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+			uint64_t pdp_start = start;
+
+			gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
+						      pdp_start, cache_level);
+		}
+	}
 }
 
 static void gen8_free_page_tables(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fb939fb..fd61325 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -478,6 +478,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
+/* Clamp length to the next page_directory pointer boundary */
+static inline uint64_t gen8_clamp_pdp(uint64_t start, uint64_t length)
+{
+	uint64_t next_pdp = ALIGN(start + 1, 1ULL << GEN8_PML4E_SHIFT);
+
+	if (next_pdp > (start + length))
+		return length;
+
+	return next_pdp - start;
+}
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 10/17] drm/i915/gen8: Initialize PDPs
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (8 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b Michel Thierry
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pdp before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or  proactive prefetch.

Although the ggtt is always 32-bit, the scratch_pdp will be initialized/destroyed
at the same time as the other scratch pages, to keep it consistent.

v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.
v4: Rebase after final merged version of Mika's ppgtt/scratch patches.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 41 +++++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d16fbce..c6fc0d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -627,6 +627,27 @@ static void free_pdp(struct drm_device *dev,
 	}
 }
 
+static void gen8_initialize_pdp(struct i915_address_space *vm,
+				struct i915_page_directory_pointer *pdp)
+{
+	gen8_ppgtt_pdpe_t scratch_pdpe;
+
+	scratch_pdpe = gen8_pdpe_encode(px_dma(vm->scratch_pd), I915_CACHE_LLC);
+
+	fill_px(vm->dev, pdp, scratch_pdpe);
+}
+
+static void gen8_initialize_pml4(struct i915_address_space *vm,
+				 struct i915_pml4 *pml4)
+{
+	gen8_ppgtt_pml4e_t scratch_pml4e;
+
+	scratch_pml4e = gen8_pml4e_encode(px_dma(vm->scratch_pdp),
+					  I915_CACHE_LLC);
+
+	fill_px(vm->dev, pml4, scratch_pml4e);
+}
+
 static void
 gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
 			  struct i915_page_directory_pointer *pdp,
@@ -892,8 +913,20 @@ static int gen8_init_scratch(struct i915_address_space *vm)
 		return PTR_ERR(vm->scratch_pd);
 	}
 
+	if (USES_FULL_48BIT_PPGTT(dev)) {
+		vm->scratch_pdp = alloc_pdp(dev);
+		if (IS_ERR(vm->scratch_pdp)) {
+			free_pd(dev, vm->scratch_pd);
+			free_pt(dev, vm->scratch_pt);
+			free_scratch_page(dev, vm->scratch_page);
+			return PTR_ERR(vm->scratch_pdp);
+		}
+	}
+
 	gen8_initialize_pt(vm, vm->scratch_pt);
 	gen8_initialize_pd(vm, vm->scratch_pd);
+	if (USES_FULL_48BIT_PPGTT(dev))
+		gen8_initialize_pdp(vm, vm->scratch_pdp);
 
 	return 0;
 }
@@ -902,6 +935,8 @@ static void gen8_free_scratch(struct i915_address_space *vm)
 {
 	struct drm_device *dev = vm->dev;
 
+	if (USES_FULL_48BIT_PPGTT(dev))
+		free_pdp(dev, vm->scratch_pdp);
 	free_pd(dev, vm->scratch_pd);
 	free_pt(dev, vm->scratch_pt);
 	free_scratch_page(dev, vm->scratch_page);
@@ -1247,12 +1282,12 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 	 * and 4 level code. Just allocate the pdps.
 	 */
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-		if (!pdp) {
-			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+		if (!test_bit(pml4e, pml4->used_pml4es)) {
 			pdp = alloc_pdp(vm->dev);
 			if (IS_ERR(pdp))
 				goto err_out;
 
+			gen8_initialize_pdp(vm, pdp);
 			pml4->pdps[pml4e] = pdp;
 			__set_bit(pml4e, new_pdps);
 			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
@@ -1330,6 +1365,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		if (ret)
 			goto free_scratch;
 
+		gen8_initialize_pml4(&ppgtt->base, &ppgtt->pml4);
+
 		ppgtt->base.total = 1ULL << 48;
 		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fd61325..2b2505a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -278,6 +278,7 @@ struct i915_address_space {
 	struct i915_page_scratch *scratch_page;
 	struct i915_page_table *scratch_pt;
 	struct i915_page_directory *scratch_pd;
+	struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
 
 	/**
 	 * List of objects currently involved in rendering.
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (9 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 10/17] drm/i915/gen8: Initialize PDPs Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-07 12:53     ` Goel, Akash
  2015-07-01 15:27   ` [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
                     ` (6 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 17 +++++++++--------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7bccfd5..d245c82 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -546,7 +546,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_object {
 			int page_count;
-			u32 gtt_offset;
+			u64 gtt_offset;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -572,7 +572,7 @@ struct drm_i915_error_state {
 		u32 size;
 		u32 name;
 		u32 rseqno[I915_NUM_RINGS], wseqno;
-		u32 gtt_offset;
+		u64 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6f42569..cdbd4c2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -197,7 +197,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x [ ",
+		err_printf(m, "    %016llx %8u %02x %02x [ ",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
@@ -426,7 +426,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				err_printf(m, " (submitted by %s [%d])",
 					   error->ring[i].comm,
 					   error->ring[i].pid);
-			err_printf(m, " --- gtt_offset = 0x%08x\n",
+			err_printf(m, " --- gtt_offset = 0x%016llx\n",
 				   obj->gtt_offset);
 			print_error_obj(m, obj);
 		}
@@ -434,7 +434,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		obj = error->ring[i].wa_batchbuffer;
 		if (obj) {
 			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-				   dev_priv->ring[i].name, obj->gtt_offset);
+				   dev_priv->ring[i].name,
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
@@ -453,14 +454,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
 		if ((obj = error->ring[i].hws_page)) {
 			err_printf(m, "%s --- HW Status = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			offset = 0;
 			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -476,13 +477,13 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ctx)) {
 			err_printf(m, "%s --- HW Context = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 	}
 
 	if ((obj = error->semaphore_obj)) {
-		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
 		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
 				   elt * 4,
@@ -590,7 +591,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
-	u32 reloc_offset;
+	u64 reloc_offset;
 
 	if (src == NULL || src->pages == NULL)
 		return NULL;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (10 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-07 12:56     ` Goel, Akash
  2015-07-01 15:27   ` [PATCH v3 13/17] drm/i915: object size needs to be u64 Michel Thierry
                     ` (5 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
 drivers/gpu/drm/i915/i915_gem_gtt.c | 92 +++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ad9a737..8c3dcc9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2223,7 +2223,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	struct drm_file *file;
 	int i;
 
 	if (INTEL_INFO(dev)->gen == 6)
@@ -2246,13 +2245,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		ppgtt->debug_dump(ppgtt, m);
 	}
 
-	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-		struct drm_i915_file_private *file_priv = file->driver_priv;
-
-		seq_printf(m, "proc: %s\n",
-			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
-	}
 	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
@@ -2261,6 +2253,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_file *file;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -2272,6 +2265,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	else if (INTEL_INFO(dev)->gen >= 6)
 		gen6_ppgtt_info(m, dev);
 
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+
+		seq_printf(m, "\nproc: %s\n",
+			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)(unsigned long)m);
+	}
+
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c6fc0d3..0c41e5d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1337,6 +1337,97 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
+static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
+			  uint64_t start, uint64_t length,
+			  gen8_pte_t scratch_pte,
+			  struct seq_file *m)
+{
+	struct i915_page_directory *pd;
+	uint64_t temp;
+	uint32_t pdpe;
+
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		struct i915_page_table *pt;
+		uint64_t pd_len = length;
+		uint64_t pd_start = start;
+		uint32_t pde;
+
+		if (!pd)
+			continue;
+
+		if(!test_bit(pdpe, pdp->used_pdpes))
+			continue;
+
+		seq_printf(m, "\tPDPE #%d\n", pdpe);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			uint32_t  pte;
+			gen8_pte_t *pt_vaddr;
+
+			if (!pt)
+				continue;
+
+			pt_vaddr = kmap_px(pt);
+			for (pte = 0; pte < GEN8_PTES; pte+=4) {
+				uint64_t va =
+					(pdpe << GEN8_PDPE_SHIFT) |
+					(pde << GEN8_PDE_SHIFT) |
+					(pte << GEN8_PTE_SHIFT);
+				int i;
+				bool found = false;
+				for (i = 0; i < 4; i++)
+					if (pt_vaddr[pte + i] != scratch_pte)
+						found = true;
+				if (!found)
+					continue;
+
+				seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
+				for (i = 0; i < 4; i++) {
+					if (pt_vaddr[pte + i] != scratch_pte)
+						seq_printf(m, " %llx", pt_vaddr[pte + i]);
+					else
+						seq_puts(m, "  SCRATCH ");
+				}
+				seq_puts(m, "\n");
+			}
+			/* don't use kunmap_px, it could trigger
+			 * an unnecessary flush.
+			 */
+			kunmap_atomic(pt_vaddr);
+		}
+	}
+}
+
+static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
+{
+	struct i915_address_space *vm = &ppgtt->base;
+	uint64_t start = ppgtt->base.start;
+	uint64_t length = ppgtt->base.total;
+	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+						 I915_CACHE_LLC, true);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
+	} else {
+		uint64_t templ4, pml4e;
+		struct i915_pml4 *pml4 = &ppgtt->pml4;
+		struct i915_page_directory_pointer *pdp;
+
+		gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
+			uint64_t pdp_len = length;
+			uint64_t pdp_start = start;
+
+			if (!pdp)
+				continue;
+
+			if (!test_bit(pml4e, pml4->used_pml4es))
+				continue;
+
+			seq_printf(m, "    PML4E #%llu\n", pml4e);
+			gen8_dump_pdp(pdp, pdp_start, pdp_len, scratch_pte, m);
+		}
+	}
+}
+
 /*
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1359,6 +1450,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.unbind_vma = ppgtt_unbind_vma;
 	ppgtt->base.bind_vma = ppgtt_bind_vma;
+	ppgtt->debug_dump = gen8_dump_ppgtt;
 
 	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 13/17] drm/i915: object size needs to be u64
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (11 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 14/17] drm/i915: batch_obj vm offset must " Michel Thierry
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.

Also added a warning for illegal bind with size = 0.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     | 5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2a4a27..eeea748 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3716,7 +3716,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 size, fence_size, fence_alignment, unfenced_alignment;
+	u32 fence_alignment, unfenced_alignment;
+	u64 size, fence_size;
 	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	u64 end =
@@ -3775,7 +3776,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	 * attempt to find space.
 	 */
 	if (size > end) {
-		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
+		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%llu\n",
 			  ggtt_view ? ggtt_view->type : 0,
 			  size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0c41e5d..7712b10 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3336,6 +3336,9 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 	if (WARN_ON(flags == 0))
 		return -EINVAL;
 
+	if (WARN_ON(vma->node.size == 0))
+		return -EINVAL;
+
 	bind_flags = 0;
 	if (flags & PIN_GLOBAL)
 		bind_flags |= GLOBAL_BIND;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 14/17] drm/i915: batch_obj vm offset must be u64
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (12 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 13/17] drm/i915: object size needs to be u64 Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 16:07     ` John Harrison
  2015-07-01 15:27   ` [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check Michel Thierry
                     ` (3 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

Otherwise it can overflow in 48-bit mode, and cause an incorrect
exec_start.

Before commit 5f19e2bff ("drm/i915: Merged the many do_execbuf()
parameters into a structure"), it was already an u64, so it could be
seen as a regression (or as an optimization that looked good at that time).

Cc: John Harrison <john.c.harrison@Intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d245c82..c720a18 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1664,7 +1664,7 @@ struct i915_execbuffer_params {
 	struct drm_file                 *file;
 	uint32_t                        dispatch_flags;
 	uint32_t                        args_batch_start_offset;
-	uint32_t                        batch_obj_vm_offset;
+	uint64_t                        batch_obj_vm_offset;
 	struct intel_engine_cs          *ring;
 	struct drm_i915_gem_object      *batch_obj;
 	struct intel_context            *ctx;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (13 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 14/17] drm/i915: batch_obj vm offset must " Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:31     ` Chris Wilson
  2015-07-01 15:27   ` [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
                     ` (2 subsequent siblings)
  17 siblings, 1 reply; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

GTT was only 32b and its max value is 4GB. In order to allow objects
bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
against max 48b range (1ULL << 48).

But since the check no longer applies, just kill the limit.

v2: Use the default ctx to infer the ppgtt max size (Akash).
v3: Just kill the limit, it was only there for early detection of an
error when used for execbuffer (Chris).

Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1f4e5a3..1b66e39 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -788,7 +788,6 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
 int
 i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_userptr *args = data;
 	struct drm_i915_gem_object *obj;
 	int ret;
@@ -801,9 +800,6 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 	if (offset_in_page(args->user_ptr | args->user_size))
 		return -EINVAL;
 
-	if (args->user_size > dev_priv->gtt.base.total)
-		return -E2BIG;
-
 	if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
 		       (char __user *)(unsigned long)args->user_ptr, args->user_size))
 		return -EFAULT;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (14 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:43     ` Chris Wilson
  2015-07-01 16:02     ` [PATCH v5] " Michel Thierry
  2015-07-01 15:27   ` [PATCH v3 17/17] drm/i915/gen8: Flip the 48b switch Michel Thierry
  2015-07-01 15:38   ` [PATCH v3 00/17] 48-bit PPGTT Daniel Vetter
  17 siblings, 2 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel, Ben Widawsky

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Intructions State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c            | 17 +++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 ++++++++++
 include/uapi/drm/i915_drm.h                |  3 ++-
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c720a18..aac51fb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2765,7 +2765,9 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS	(1<<3)
 #define PIN_USER	(1<<4)
 #define PIN_UPDATE	(1<<5)
-#define PIN_OFFSET_MASK (~4095)
+#define PIN_ZONE_4G	(1<<6)
+#define PIN_HIGH	(1<<7)
+#define PIN_OFFSET_MASK -(PIN_HIGH<<1)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eeea748..8aa0189 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3718,6 +3718,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 fence_alignment, unfenced_alignment;
 	u64 size, fence_size;
+	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
+	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
 	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	u64 end =
@@ -3759,6 +3761,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 						   obj->tiling_mode,
 						   false);
 		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+		if (flags & PIN_HIGH) {
+			search_flag = DRM_MM_SEARCH_BELOW;
+			alloc_flag = DRM_MM_CREATE_TOP;
+		}
+
+		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+		 * limit address to the first 4GBs for flagged objects.
+		 */
+		if (flags & PIN_ZONE_4G)
+			end = (1ULL << 32);
 	}
 
 	if (alignment == 0)
@@ -3801,8 +3814,8 @@ search_free:
 						  size, alignment,
 						  obj->cache_level,
 						  start, end,
-						  DRM_MM_SEARCH_DEFAULT,
-						  DRM_MM_CREATE_DEFAULT);
+						  search_flag,
+						  alloc_flag);
 	if (ret) {
 		ret = i915_gem_evict_something(dev, vm, size, alignment,
 					       obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 600db74..f52b736 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -588,11 +588,17 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 
+	flags |= PIN_ZONE_4G;
+	if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
+		flags &= ~PIN_ZONE_4G;
+
 	if (!drm_mm_node_allocated(&vma->node)) {
 		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
 			flags |= PIN_GLOBAL | PIN_MAPPABLE;
 		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
 			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+		if ((flags & PIN_MAPPABLE) == 0)
+			flags |= PIN_HIGH;
 	}
 
 	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
@@ -670,6 +676,10 @@ eb_vma_misplaced(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
 		return !only_mappable_for_reloc(entry->flags);
 
+	if (!(entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
+	    (vma->node.start + vma->node.size) >= (1ULL << 32))
+		return true;
+
 	return false;
 }
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f88cc1c..b91cf45 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -685,7 +685,8 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
 #define EXEC_OBJECT_NEEDS_GTT	(1<<1)
 #define EXEC_OBJECT_WRITE	(1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
 	__u64 flags;
 
 	__u64 rsvd1;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v3 17/17] drm/i915/gen8: Flip the 48b switch
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (15 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-07-01 15:27   ` Michel Thierry
  2015-07-01 15:38   ` [PATCH v3 00/17] 48-bit PPGTT Daniel Vetter
  17 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:27 UTC (permalink / raw)
  To: intel-gfx; +Cc: akash.goel

Use 48b addresses if hw supports it (i915.enable_ppgtt=3).

Note, aliasing PPGTT remains 32b only.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++--
 drivers/gpu/drm/i915/i915_params.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7712b10..27dc28c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -110,7 +110,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
 	has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
 			       (IS_BROADWELL(dev) ||
-				INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
+				INTEL_INFO(dev)->gen >= 9);
 
 	if (intel_vgpu_active(dev))
 		has_full_ppgtt = false; /* emulation is too hard */
@@ -148,7 +148,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	}
 
 	if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
-		return 2;
+		return has_full_64bit_ppgtt ? 3 : 2;
 	else
 		return has_aliasing_ppgtt ? 1 : 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 7983fe4..ccf3eb2 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -110,7 +110,7 @@ MODULE_PARM_DESC(enable_hangcheck,
 module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
 MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
-	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
 
 module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
 MODULE_PARM_DESC(enable_execlists,
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check
  2015-07-01 15:27   ` [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check Michel Thierry
@ 2015-07-01 15:31     ` Chris Wilson
  0 siblings, 0 replies; 74+ messages in thread
From: Chris Wilson @ 2015-07-01 15:31 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx, akash.goel

On Wed, Jul 01, 2015 at 04:27:31PM +0100, Michel Thierry wrote:
> GTT was only 32b and its max value is 4GB. In order to allow objects
> bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
> against max 48b range (1ULL << 48).
> 
> But since the check no longer applies, just kill the limit.
> 
> v2: Use the default ctx to infer the ppgtt max size (Akash).
> v3: Just kill the limit, it was only there for early detection of an
> error when used for execbuffer (Chris).
> 
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 00/17] 48-bit PPGTT
  2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
                     ` (16 preceding siblings ...)
  2015-07-01 15:27   ` [PATCH v3 17/17] drm/i915/gen8: Flip the 48b switch Michel Thierry
@ 2015-07-01 15:38   ` Daniel Vetter
  17 siblings, 0 replies; 74+ messages in thread
From: Daniel Vetter @ 2015-07-01 15:38 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx, akash.goel

On Wed, Jul 01, 2015 at 04:27:16PM +0100, Michel Thierry wrote:
> These are the rebased patches, after Mika's final ppgtt clean-up series landed
> (it relies in the macros added). New functions also follow these changes.
> 
> In order expand the GPU address space, a 4th level translation is added, the
> Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
> each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
> used, only adding the 4th level changes. I also updated some remaining
> variables that were 32b only.
> 
> There are 2 hardware workarounds needed to allow correct operation with 48b
> addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). This
> new patchset version includes the comments and suggestions from Chris Wilson.
> A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can be
> allocated outside the first 4 PDPs; if not, the end range is forced to 4GB. Also,
> more objects now use the DRM_MM_CREATE_TOP flag. To maintain compatibility, in
> libdrm I added a new drm_intel_bo_emit_reloc_48bit function that will flag
> these objects, while the existing drm_intel_bo_emit_reloc clears it.
> 
> Finally, this feature is only available in BDW and Gen9, requires LRC submission
> mode (execlists) and it can be detected by i915.enable_ppgtt=3.
> 
> Also note that this expanded address space is only available for full PPGTT,
> aliasing PPGTT and Global GTT remain 32-bit.
> 
> Michel Thierry (17):
>   drm/i915: Remove unnecessary gen8_clamp_pd
>   drm/i915/gen8: Make pdp allocation more dynamic
>   drm/i915/gen8: Abstract PDP usage
>   drm/i915/gen8: Add dynamic page trace events
>   drm/i915/gen8: implement alloc/free for 4lvl
>   drm/i915/gen8: Add 4 level switching infrastructure and lrc support
>   drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
>   drm/i915/gen8: Pass sg_iter through pte inserts
>   drm/i915/gen8: Add 4 level support in insert_entries and clear_range
>   drm/i915/gen8: Initialize PDPs
>   drm/i915: Expand error state's address width to 64b
>   drm/i915/gen8: Add ppgtt info and debug_dump
>   drm/i915: object size needs to be u64
>   drm/i915: batch_obj vm offset must be u64
>   drm/i915/userptr: Kill user_size limit check
>   drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
>   drm/i915/gen8: Flip the 48b switch

Please start a new thread when resending the entire patch series. Only
in-reply-to parts of a series. It's harder to piece the series together
this way and hence doesn't really improve things compared to just
in-reply-to all the patches individually. The point of a full resend is to
make restart/consolidate the review discussions.

But don't resend this one here now since that will make a discussion split
guaranteed.
-Daniel

> 
>  drivers/gpu/drm/i915/i915_debugfs.c        |  18 +-
>  drivers/gpu/drm/i915/i915_drv.h            |  17 +-
>  drivers/gpu/drm/i915/i915_gem.c            |  22 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 649 ++++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/i915_gem_gtt.h        |  66 ++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c    |   4 -
>  drivers/gpu/drm/i915/i915_gpu_error.c      |  17 +-
>  drivers/gpu/drm/i915/i915_params.c         |   2 +-
>  drivers/gpu/drm/i915/i915_reg.h            |   1 +
>  drivers/gpu/drm/i915/i915_trace.h          |  16 +
>  drivers/gpu/drm/i915/intel_lrc.c           |  65 ++-
>  include/uapi/drm/i915_drm.h                |   3 +-
>  13 files changed, 725 insertions(+), 165 deletions(-)
> 
> -- 
> 2.4.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-07-01 15:27   ` [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-07-01 15:43     ` Chris Wilson
  2015-07-01 15:54       ` Michel Thierry
  2015-07-01 16:02     ` [PATCH v5] " Michel Thierry
  1 sibling, 1 reply; 74+ messages in thread
From: Chris Wilson @ 2015-07-01 15:43 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx, akash.goel, Ben Widawsky

On Wed, Jul 01, 2015 at 04:27:32PM +0100, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32-bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> DRM_MM_CREATE_TOP flags
> 
> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> General State Heap (GSH) or Intructions State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State
> Offset are limited to 32-bits.
> 
> Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
> they can be allocated above the 32-bit address range. To limit the
> chances of having the first 4GB already full, objects will use
> DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
> 
> v2: Changed flag logic from neeeds_32b, to supports_48b.
> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
> to use last PIN_ defined instead of hard-coded value; use correct limit
> check in eb_vma_misplaced. (Chris)
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  4 +++-
>  drivers/gpu/drm/i915/i915_gem.c            | 17 +++++++++++++++--
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 ++++++++++
>  include/uapi/drm/i915_drm.h                |  3 ++-
>  4 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c720a18..aac51fb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2765,7 +2765,9 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
>  #define PIN_OFFSET_BIAS	(1<<3)
>  #define PIN_USER	(1<<4)
>  #define PIN_UPDATE	(1<<5)
> -#define PIN_OFFSET_MASK (~4095)
> +#define PIN_ZONE_4G	(1<<6)
> +#define PIN_HIGH	(1<<7)
> +#define PIN_OFFSET_MASK -(PIN_HIGH<<1)

The offset has to be 4096 aligned - it imposes an upper limit on how
many low bits we can use for flags. When we exceed it, it probably past
time for a params struct!

>  int __must_check
>  i915_gem_object_pin(struct drm_i915_gem_object *obj,
>  		    struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index eeea748..8aa0189 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3718,6 +3718,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u32 fence_alignment, unfenced_alignment;
>  	u64 size, fence_size;
> +	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
> +	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
>  	u64 start =
>  		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
>  	u64 end =
> @@ -3759,6 +3761,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  						   obj->tiling_mode,
>  						   false);
>  		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> +
> +		if (flags & PIN_HIGH) {
> +			search_flag = DRM_MM_SEARCH_BELOW;
> +			alloc_flag = DRM_MM_CREATE_TOP;
> +		}
> +
> +		/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> +		 * limit address to the first 4GBs for flagged objects.
> +		 */

This note is best next to where we set PIN_ZONE_4G in execbuffer.

> +		if (flags & PIN_ZONE_4G)
> +			end = (1ULL << 32);
>  	}
>  
>  	if (alignment == 0)

> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 600db74..f52b736 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -588,11 +588,17 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
>  		flags |= PIN_GLOBAL;
>  
> +	flags |= PIN_ZONE_4G;
> +	if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
> +		flags &= ~PIN_ZONE_4G;
> +
>  	if (!drm_mm_node_allocated(&vma->node)) {
>  		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
>  			flags |= PIN_GLOBAL | PIN_MAPPABLE;
>  		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
>  			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> +		if ((flags & PIN_MAPPABLE) == 0)
> +			flags |= PIN_HIGH;

I'm still debating the right semantics to use, but I'm happy with this
until I can find something better. (The biggest issue is that drm_mm is
not indexed for fast top-down searching. The current search code I put
into drm_mm is unfortunately broken, the idea I have in mind to fix it is
to add a hole_list into drm_mm/drm_mm_node, so that we can just walk
holes in up/down, recent/old order. And with that allocating top-down
will not be any more expensive than the current reuse recent hole -
though perhaps given a fragment drm_mm the hole stack probably requires
fewer steps to find a large hole.)

Other than don't touch PIN_OFFSET_MASK and move the w/a note next to
where we tweak the PIN_ZONE_4G, it lgtm, so with those changes,

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-07-01 15:43     ` Chris Wilson
@ 2015-07-01 15:54       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 15:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, akash.goel, Ben Widawsky

On 7/1/2015 4:43 PM, Chris Wilson wrote:
> On Wed, Jul 01, 2015 at 04:27:32PM +0100, Michel Thierry wrote:
>>
>> +	flags |= PIN_ZONE_4G;
>> +	if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
>> +		flags &= ~PIN_ZONE_4G;
>> +
>>   	if (!drm_mm_node_allocated(&vma->node)) {
>>   		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
>>   			flags |= PIN_GLOBAL | PIN_MAPPABLE;
>>   		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
>>   			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
>> +		if ((flags & PIN_MAPPABLE) == 0)
>> +			flags |= PIN_HIGH;
>
> I'm still debating the right semantics to use, but I'm happy with this
> until I can find something better. (The biggest issue is that drm_mm is
> not indexed for fast top-down searching. The current search code I put
> into drm_mm is unfortunately broken, the idea I have in mind to fix it is
> to add a hole_list into drm_mm/drm_mm_node, so that we can just walk
> holes in up/down, recent/old order. And with that allocating top-down
> will not be any more expensive than the current reuse recent hole -
> though perhaps given a fragment drm_mm the hole stack probably requires
> fewer steps to find a large hole.)
>
> Other than don't touch PIN_OFFSET_MASK and move the w/a note next to
> where we tweak the PIN_ZONE_4G, it lgtm, so with those changes,

Also the comment should be "limit address to the first 4GBs for 
_unflagged_ objects". I didn't update it after changing the logic.

I'll resend with these changes.

Thanks,
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v5] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  2015-07-01 15:27   ` [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
  2015-07-01 15:43     ` Chris Wilson
@ 2015-07-01 16:02     ` Michel Thierry
  1 sibling, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-01 16:02 UTC (permalink / raw)
  To: intel-gfx

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Intructions State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 ++
 drivers/gpu/drm/i915/i915_gem.c            | 14 ++++++++++++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++
 include/uapi/drm/i915_drm.h                |  3 ++-
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3fbfce5..cda6366 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2768,6 +2768,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS	(1<<3)
 #define PIN_USER	(1<<4)
 #define PIN_UPDATE	(1<<5)
+#define PIN_ZONE_4G	(1<<6)
+#define PIN_HIGH	(1<<7)
 #define PIN_OFFSET_MASK (~4095)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 43719b8..1372259 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3722,6 +3722,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 fence_alignment, unfenced_alignment;
 	u64 size, fence_size;
+	u32 search_flag = DRM_MM_SEARCH_DEFAULT;
+	u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
 	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
 	u64 end =
@@ -3763,6 +3765,14 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 						   obj->tiling_mode,
 						   false);
 		size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+		if (flags & PIN_HIGH) {
+			search_flag = DRM_MM_SEARCH_BELOW;
+			alloc_flag = DRM_MM_CREATE_TOP;
+		}
+
+		if (flags & PIN_ZONE_4G)
+			end = (1ULL << 32);
 	}
 
 	if (alignment == 0)
@@ -3805,8 +3815,8 @@ search_free:
 						  size, alignment,
 						  obj->cache_level,
 						  start, end,
-						  DRM_MM_SEARCH_DEFAULT,
-						  DRM_MM_CREATE_DEFAULT);
+						  search_flag,
+						  alloc_flag);
 	if (ret) {
 		ret = i915_gem_evict_something(dev, vm, size, alignment,
 					       obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 600db74..ff50619 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -588,11 +588,20 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
 		flags |= PIN_GLOBAL;
 
+	/* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+	 * limit address to the first 4GBs for unflagged objects.
+	 */
+	flags |= PIN_ZONE_4G;
+	if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
+		flags &= ~PIN_ZONE_4G;
+
 	if (!drm_mm_node_allocated(&vma->node)) {
 		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
 			flags |= PIN_GLOBAL | PIN_MAPPABLE;
 		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
 			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+		if ((flags & PIN_MAPPABLE) == 0)
+			flags |= PIN_HIGH;
 	}
 
 	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
@@ -670,6 +679,10 @@ eb_vma_misplaced(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
 		return !only_mappable_for_reloc(entry->flags);
 
+	if (!(entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
+	    (vma->node.start + vma->node.size) >= (1ULL << 32))
+		return true;
+
 	return false;
 }
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f88cc1c..b91cf45 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -685,7 +685,8 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
 #define EXEC_OBJECT_NEEDS_GTT	(1<<1)
 #define EXEC_OBJECT_WRITE	(1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
 	__u64 flags;
 
 	__u64 rsvd1;
-- 
2.4.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 14/17] drm/i915: batch_obj vm offset must be u64
  2015-07-01 15:27   ` [PATCH v3 14/17] drm/i915: batch_obj vm offset must " Michel Thierry
@ 2015-07-01 16:07     ` John Harrison
  0 siblings, 0 replies; 74+ messages in thread
From: John Harrison @ 2015-07-01 16:07 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx; +Cc: akash.goel

On 01/07/2015 16:27, Michel Thierry wrote:
> Otherwise it can overflow in 48-bit mode, and cause an incorrect
> exec_start.
>
> Before commit 5f19e2bff ("drm/i915: Merged the many do_execbuf()
> parameters into a structure"), it was already an u64, so it could be
> seen as a regression (or as an optimization that looked good at that time).
Almost certainly a merge failure. The above patch moved the variable 
when it was a uint32_t but by the time it got merged, another patch had 
updated it to uint64_t. Unfortunately, the merge conflict either didn't 
conflict or didn't get resolved correctly. Either way, the downgrade was 
certainly not intentional.


> Cc: John Harrison <john.c.harrison@Intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d245c82..c720a18 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1664,7 +1664,7 @@ struct i915_execbuffer_params {
>   	struct drm_file                 *file;
>   	uint32_t                        dispatch_flags;
>   	uint32_t                        args_batch_start_offset;
> -	uint32_t                        batch_obj_vm_offset;
> +	uint64_t                        batch_obj_vm_offset;
>   	struct intel_engine_cs          *ring;
>   	struct drm_i915_gem_object      *batch_obj;
>   	struct intel_context            *ctx;

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic
  2015-07-01 15:27   ` [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
@ 2015-07-07 12:36     ` Goel, Akash
  2015-07-07 12:56       ` Michel Thierry
  0 siblings, 1 reply; 74+ messages in thread
From: Goel, Akash @ 2015-07-07 12:36 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx



On 7/1/2015 8:57 PM, Michel Thierry wrote:
> This transitional patch doesn't do much for the existing code. However,
> it should make upcoming patches to use the full 48b address space a bit
> easier. The patch also introduces the PML4, ie. the new top level structure
> of the page tables.
>

Would be better to move the introduction of PML4 to a separate patch & 
keep this patch only for the dynamic allocation of pdp changes.

> v2: Renamed  pdp_free to be similar to  pd/pt (unmap_and_free_pdp).
> v3: To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
> v4: Rebase after s/page_tables/page_table/, added extra information
> about 4-level page table formats and use IS_ENABLED macro.
> v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
> v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and follow
> his nomenclature in pdp functions (there is no alloc_pdp yet).
> v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
> v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>   drivers/gpu/drm/i915/i915_drv.h     |   7 ++-
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 116 ++++++++++++++++++++++++++++--------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  41 ++++++++++---
>   3 files changed, 128 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1dbd957..7bccfd5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2490,7 +2490,12 @@ struct drm_i915_cmd_table {
>   #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
>   #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
>   #define USES_PPGTT(dev)		(i915.enable_ppgtt)
> -#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt == 2)
> +#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt >= 2)
> +#ifdef CONFIG_X86_64
> +# define USES_FULL_48BIT_PPGTT(dev)	(i915.enable_ppgtt == 3)
> +#else
> +# define USES_FULL_48BIT_PPGTT(dev)	false
> +#endif
>
>   #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
>   #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 712ca34..cdcc778 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -104,9 +104,13 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
>   {
>   	bool has_aliasing_ppgtt;
>   	bool has_full_ppgtt;
> +	bool has_full_64bit_ppgtt;
>
>   	has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
>   	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
> +	has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
> +			       (IS_BROADWELL(dev) ||
> +				INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
>
>   	if (intel_vgpu_active(dev))
>   		has_full_ppgtt = false; /* emulation is too hard */
> @@ -125,6 +129,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
>   	if (enable_ppgtt == 2 && has_full_ppgtt)
>   		return 2;
>
> +	if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
> +		return 3;
> +
>   #ifdef CONFIG_INTEL_IOMMU
>   	/* Disable ppgtt on SNB if VT-d is on. */
>   	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
> @@ -522,6 +529,45 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>   	fill_px(vm->dev, pd, scratch_pde);
>   }
>
> +static int __pdp_init(struct drm_device *dev,
> +		      struct i915_page_directory_pointer *pdp)
> +{
> +	size_t pdpes = I915_PDPES_PER_PDP(dev);
> +
> +	pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
> +				  sizeof(unsigned long),
> +				  GFP_KERNEL);
> +	if (!pdp->used_pdpes)
> +		return -ENOMEM;
> +
> +	pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
> +				      GFP_KERNEL);
> +	if (!pdp->page_directory) {
> +		kfree(pdp->used_pdpes);
> +		/* the PDP might be the statically allocated top level. Keep it
> +		 * as clean as possible */
> +		pdp->used_pdpes = NULL;
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static void __pdp_fini(struct i915_page_directory_pointer *pdp)
> +{
> +	kfree(pdp->used_pdpes);
> +	kfree(pdp->page_directory);
> +	pdp->page_directory = NULL;
> +}
> +
> +static void free_pdp(struct drm_device *dev,
> +		     struct i915_page_directory_pointer *pdp)
> +{
> +	__pdp_fini(pdp);
> +	if (USES_FULL_48BIT_PPGTT(dev))
> +		kfree(pdp);
> +}
> +
>   /* Broadwell Page Directory Pointer Descriptors */
>   static int gen8_write_pdp(struct drm_i915_gem_request *req,
>   			  unsigned entry,
> @@ -634,9 +680,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   	pt_vaddr = NULL;
>
>   	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> -		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
> -			break;
> -
>   		if (pt_vaddr == NULL) {
>   			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
>   			struct i915_page_table *pt = pd->page_table[pde];
> @@ -720,7 +763,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>   		container_of(vm, struct i915_hw_ppgtt, base);
>   	int i;
>
> -	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
> +	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> +				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
>   		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
>   			continue;
>
> @@ -729,6 +773,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>   		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
>   	}
>
> +	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
>   	gen8_free_scratch(vm);
>   }
>
> @@ -820,8 +865,9 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>   	struct i915_page_directory *pd;
>   	uint64_t temp;
>   	uint32_t pdpe;
> +	uint32_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
>
> -	WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
> +	WARN_ON(!bitmap_empty(new_pds, pdpes));
>
>   	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
>   		if (pd)
> @@ -839,18 +885,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>   	return 0;
>
>   unwind_out:
> -	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
> +	for_each_set_bit(pdpe, new_pds, pdpes)
>   		free_pd(dev, pdp->page_directory[pdpe]);
>
>   	return -ENOMEM;
>   }
>
>   static void
> -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
> +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
> +		       uint32_t pdpes)
>   {
>   	int i;
>
> -	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
> +	for (i = 0; i < pdpes; i++)
>   		kfree(new_pts[i]);
>   	kfree(new_pts);
>   	kfree(new_pds);
> @@ -861,23 +908,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
>    */
>   static
>   int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> -					 unsigned long ***new_pts)
> +					 unsigned long ***new_pts,
> +					 uint32_t pdpes)
>   {
>   	int i;
>   	unsigned long *pds;
>   	unsigned long **pts;
>
> -	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
> +	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
>   	if (!pds)
>   		return -ENOMEM;
>
> -	pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
> +	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
>   	if (!pts) {
>   		kfree(pds);
>   		return -ENOMEM;
>   	}
>
> -	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
> +	for (i = 0; i < pdpes; i++) {
>   		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
>   				 sizeof(unsigned long), GFP_KERNEL);
>   		if (!pts[i])
> @@ -890,7 +938,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
>   	return 0;
>
>   err_out:
> -	free_gen8_temp_bitmaps(pds, pts);
> +	free_gen8_temp_bitmaps(pds, pts, pdpes);
>   	return -ENOMEM;
>   }
>
> @@ -916,6 +964,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	const uint64_t orig_length = length;
>   	uint64_t temp;
>   	uint32_t pdpe;
> +	uint32_t pdpes = I915_PDPES_PER_PDP(dev);
>   	int ret;
>
>   	/* Wrap is never okay since we can only represent 48b, and we don't
> @@ -927,7 +976,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	if (WARN_ON(start + length > ppgtt->base.total))
>   		return -ENODEV;
>
> -	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
> +	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
>   	if (ret)
>   		return ret;
>
> @@ -935,7 +984,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
>   					new_page_dirs);
>   	if (ret) {
> -		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
>   		return ret;
>   	}
>
> @@ -989,7 +1038,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
>   	}
>
> -	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
>   	mark_tlbs_dirty(ppgtt);
>   	return 0;
>
> @@ -999,10 +1048,10 @@ err_out:
>   			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
>   	}
>
> -	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
> +	for_each_set_bit(pdpe, new_page_dirs, pdpes)
>   		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
>
> -	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
>   	mark_tlbs_dirty(ppgtt);
>   	return ret;
>   }
> @@ -1023,14 +1072,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>   		return ret;
>
>   	ppgtt->base.start = 0;
> -	ppgtt->base.total = 1ULL << 32;
> -	if (IS_ENABLED(CONFIG_X86_32))
> -		/* While we have a proliferation of size_t variables
> -		 * we cannot represent the full ppgtt size on 32bit,
> -		 * so limit it to the same size as the GGTT (currently
> -		 * 2GiB).
> -		 */
> -		ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
>   	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
>   	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
>   	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> @@ -1040,7 +1081,30 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
>   	ppgtt->switch_mm = gen8_mm_switch;
>
> +	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> +		ret = __pdp_init(false, &ppgtt->pdp);
> +
> +		if (ret)
> +			goto free_scratch;
> +
> +		ppgtt->base.total = 1ULL << 32;
> +		if (IS_ENABLED(CONFIG_X86_32))
> +			/* While we have a proliferation of size_t variables
> +			 * we cannot represent the full ppgtt size on 32bit,
> +			 * so limit it to the same size as the GGTT (currently
> +			 * 2GiB).
> +			 */
> +			ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> +	} else {
> +		ppgtt->base.total = 1ULL << 48;
> +		return -EPERM; /* Not yet implemented */
> +	}
> +
>   	return 0;
> +
> +free_scratch:
> +	gen8_free_scratch(&ppgtt->base);
> +	return ret;
>   }
>
>   static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d5bf953..e2b684e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
>    * PDPE  |  PDE  |  PTE  | offset
>    * The difference as compared to normal x86 3 level page table is the PDPEs are
>    * programmed via register.
> + *
> + * GEN8 48b legacy style address is defined as a 4 level page table:
> + * 47:39 | 38:30 | 29:21 | 20:12 |  11:0
> + * PML4E | PDPE  |  PDE  |  PTE  | offset
>    */
> +#define GEN8_PML4ES_PER_PML4		512
> +#define GEN8_PML4E_SHIFT		39
>   #define GEN8_PDPE_SHIFT			30
> -#define GEN8_PDPE_MASK			0x3
> +/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
> + * tables */
> +#define GEN8_PDPE_MASK			0x1ff
>   #define GEN8_PDE_SHIFT			21
>   #define GEN8_PDE_MASK			0x1ff
>   #define GEN8_PTE_SHIFT			12
> @@ -98,6 +106,9 @@ typedef uint64_t gen8_pde_t;
>   #define GEN8_LEGACY_PDPES		4
>   #define GEN8_PTES			I915_PTES(sizeof(gen8_pte_t))
>
> +#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> +				GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
> +
>   #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
>   #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
>   #define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
> @@ -241,9 +252,17 @@ struct i915_page_directory {
>   };
>
>   struct i915_page_directory_pointer {
> -	/* struct page *page; */
> -	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
> -	struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
> +	struct i915_page_dma base;
> +
> +	unsigned long *used_pdpes;
> +	struct i915_page_directory **page_directory;
> +};
> +
> +struct i915_pml4 {
> +	struct i915_page_dma base;
> +
> +	DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
> +	struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
>   };
>
>   struct i915_address_space {
> @@ -341,8 +360,9 @@ struct i915_hw_ppgtt {
>   	struct drm_mm_node node;
>   	unsigned long pd_dirty_rings;
>   	union {
> -		struct i915_page_directory_pointer pdp;
> -		struct i915_page_directory pd;
> +		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
> +		struct i915_page_directory_pointer pdp;	/* GEN8+ */
> +		struct i915_page_directory pd;		/* GEN6-7 */
>   	};
>
>   	struct drm_i915_file_private *file_priv;
> @@ -436,14 +456,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
>   	     temp = min(temp, length),					\
>   	     start += temp, length -= temp)
>
> -#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
> -	for (iter = gen8_pdpe_index(start);	\
> -	     pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES;	\
> +#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b)	\
> +	for (iter = gen8_pdpe_index(start); \
> +	     pd = (pdp)->page_directory[iter], length > 0 && (iter < b);	\
>   	     iter++,				\
>   	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
>   	     temp = min(temp, length),					\
>   	     start += temp, length -= temp)
>
> +#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
> +	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
> +
>   static inline uint32_t gen8_pte_index(uint64_t address)
>   {
>   	return i915_pte_index(address, GEN8_PDE_SHIFT);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage
  2015-07-01 15:27   ` [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage Michel Thierry
@ 2015-07-07 12:43     ` Goel, Akash
  2015-07-07 13:35       ` Michel Thierry
  0 siblings, 1 reply; 74+ messages in thread
From: Goel, Akash @ 2015-07-07 12:43 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx



On 7/1/2015 8:57 PM, Michel Thierry wrote:
> Up until now, ppgtt->pdp has always been the root of our page tables.
> Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
>
> In preparation for 4 level page tables, we need to stop use ppgtt->pdp
> directly unless we know it's what we want. The future structure will use
> ppgtt->pml4 for the top level, and the pdp is just one of the entries
> being pointed to by a pml4e.
>
> v2: Updated after dynamic page allocation changes.
> v3: Rebase after s/page_tables/page_table/.
> v4: Rebase after changes in "Dynamic page table allocations" patch.
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 136 +++++++++++++++++++++++-------------
>   1 file changed, 88 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index cdcc778..41a18ff 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -529,6 +529,25 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>   	fill_px(vm->dev, pd, scratch_pde);
>   }
>
> +/* It's likely we'll map more than one page table at a time. This function will
> + * save us unnecessary kmap calls, but do no more functionally than multiple
> + * calls to pde_encode. The ppgtt is only needed to reuse the kunmap macro. */
> +static void gen8_map_pagetable_range(struct i915_hw_ppgtt *ppgtt,
> +				     struct i915_page_directory *pd,
> +				     uint64_t start,
> +				     uint64_t length)
> +{
> +	gen8_pde_t * const page_directory = kmap_px(pd);
> +	struct i915_page_table *pt;
> +	uint64_t temp, pde;
> +
> +	gen8_for_each_pde(pt, pd, start, length, temp, pde)
> +		page_directory[pde] = gen8_pde_encode(px_dma(pt),
> +						      I915_CACHE_LLC);
> +
> +	kunmap_px(ppgtt, page_directory);
> +}
> +
>   static int __pdp_init(struct drm_device *dev,
>   		      struct i915_page_directory_pointer *pdp)
>   {
> @@ -616,6 +635,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>   {
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
>   	gen8_pte_t *pt_vaddr, scratch_pte;
>   	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
>   	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> @@ -630,10 +650,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>   		struct i915_page_directory *pd;
>   		struct i915_page_table *pt;
>
> -		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
> +		if (WARN_ON(!pdp->page_directory[pdpe]))
>   			break;
>
> -		pd = ppgtt->pdp.page_directory[pdpe];
> +		pd = pdp->page_directory[pdpe];
>
>   		if (WARN_ON(!pd->page_table[pde]))
>   			break;
> @@ -671,6 +691,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   {
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
>   	gen8_pte_t *pt_vaddr;
>   	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
>   	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> @@ -681,7 +702,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>
>   	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>   		if (pt_vaddr == NULL) {
> -			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
> +			struct i915_page_directory *pd = pdp->page_directory[pdpe];
>   			struct i915_page_table *pt = pd->page_table[pde];
>   			pt_vaddr = kmap_px(pt);
>   		}
> @@ -763,23 +784,28 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>   		container_of(vm, struct i915_hw_ppgtt, base);
>   	int i;
>
> -	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> -				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> -		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> -			continue;
> +	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> +		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> +				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> +			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> +				continue;
>
> -		gen8_free_page_tables(ppgtt->base.dev,
> -				      ppgtt->pdp.page_directory[i]);
> -		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
> +			gen8_free_page_tables(ppgtt->base.dev,
> +					      ppgtt->pdp.page_directory[i]);
> +			free_pd(ppgtt->base.dev,
> +				ppgtt->pdp.page_directory[i]);
> +		}
> +		free_pdp(ppgtt->base.dev, &ppgtt->pdp);
> +	} else {
> +		WARN_ON(1); /* to be implemented later */
>   	}
>
> -	free_pdp(ppgtt->base.dev, &ppgtt->pdp);
>   	gen8_free_scratch(vm);
>   }
>
>   /**
>    * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
> - * @ppgtt:	Master ppgtt structure.
> + * @vm:		Master vm structure.
>    * @pd:		Page directory for this address range.
>    * @start:	Starting virtual address to begin allocations.
>    * @length	Size of the allocations.
> @@ -795,13 +821,15 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>    *
>    * Return: 0 if success; negative error code otherwise.
>    */
> -static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> +static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
>   				     struct i915_page_directory *pd,
>   				     uint64_t start,
>   				     uint64_t length,
>   				     unsigned long *new_pts)
>   {
> -	struct drm_device *dev = ppgtt->base.dev;
> +	struct i915_hw_ppgtt *ppgtt =
> +	    container_of(vm, struct i915_hw_ppgtt, base);
> +	struct drm_device *dev = vm->dev;

The 'ppgtt' pointer can be completely dispensed with, by the use of 'vm' 
pointer.

>   	struct i915_page_table *pt;
>   	uint64_t temp;
>   	uint32_t pde;
> @@ -818,7 +846,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>   		if (IS_ERR(pt))
>   			goto unwind_out;
>
> -		gen8_initialize_pt(&ppgtt->base, pt);
> +		gen8_initialize_pt(vm, pt);
>   		pd->page_table[pde] = pt;
>   		__set_bit(pde, new_pts);
>   	}
> @@ -834,7 +862,7 @@ unwind_out:
>
>   /**
>    * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
> - * @ppgtt:	Master ppgtt structure.
> + * @vm:		Master vm structure.
>    * @pdp:	Page directory pointer for this address range.
>    * @start:	Starting virtual address to begin allocations.
>    * @length	Size of the allocations.
> @@ -855,17 +883,18 @@ unwind_out:
>    *
>    * Return: 0 if success; negative error code otherwise.
>    */
> -static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> -				     struct i915_page_directory_pointer *pdp,
> -				     uint64_t start,
> -				     uint64_t length,
> -				     unsigned long *new_pds)
> +static int
> +gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
> +				  struct i915_page_directory_pointer *pdp,
> +				  uint64_t start,
> +				  uint64_t length,
> +				  unsigned long *new_pds)
>   {
> -	struct drm_device *dev = ppgtt->base.dev;
> +	struct drm_device *dev = vm->dev;
>   	struct i915_page_directory *pd;
>   	uint64_t temp;
>   	uint32_t pdpe;
> -	uint32_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
> +	uint32_t pdpes =  I915_PDPES_PER_PDP(vm->dev);
>
>   	WARN_ON(!bitmap_empty(new_pds, pdpes));
>
> @@ -877,7 +906,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>   		if (IS_ERR(pd))
>   			goto unwind_out;
>
> -		gen8_initialize_pd(&ppgtt->base, pd);
> +		gen8_initialize_pd(vm, pd);
>   		pdp->page_directory[pdpe] = pd;
>   		__set_bit(pdpe, new_pds);
>   	}
> @@ -952,13 +981,15 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
>   	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
>   }
>
> -static int gen8_alloc_va_range(struct i915_address_space *vm,
> -			       uint64_t start,
> -			       uint64_t length)
> +static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
> +				    struct i915_page_directory_pointer *pdp,
> +				    uint64_t start,
> +				    uint64_t length)
>   {
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
>   	unsigned long *new_page_dirs, **new_page_tables;
> +	struct drm_device *dev = vm->dev;
>   	struct i915_page_directory *pd;
>   	const uint64_t orig_start = start;
>   	const uint64_t orig_length = length;
> @@ -981,16 +1012,15 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   		return ret;
>
>   	/* Do the allocations first so we can easily bail out */
> -	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
> -					new_page_dirs);
> +	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
> +						new_page_dirs);
>   	if (ret) {
>   		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
>   		return ret;
>   	}
>
> -	/* For every page directory referenced, allocate page tables */
> -	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
> +	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> +		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
>   						new_page_tables[pdpe]);
>   		if (ret)
>   			goto err_out;
> @@ -999,10 +1029,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	start = orig_start;
>   	length = orig_length;
>
> -	/* Allocations have completed successfully, so set the bitmaps, and do
> -	 * the mappings. */
> -	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -		gen8_pde_t *const page_directory = kmap_px(pd);
> +	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
>   		struct i915_page_table *pt;
>   		uint64_t pd_len = length;
>   		uint64_t pd_start = start;
> @@ -1024,18 +1051,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>
>   			/* Our pde is now pointing to the pagetable, pt */
>   			__set_bit(pde, pd->used_pdes);
> -
> -			/* Map the PDE to the page table */
> -			page_directory[pde] = gen8_pde_encode(px_dma(pt),
> -							      I915_CACHE_LLC);
> -
> -			/* NB: We haven't yet mapped ptes to pages. At this
> -			 * point we're still relying on insert_entries() */
>   		}
>
> -		kunmap_px(ppgtt, page_directory);
> -
> -		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
> +		__set_bit(pdpe, pdp->used_pdpes);
> +		gen8_map_pagetable_range(ppgtt, pd, start, length);

No apparent benefit in use of "gen8_map_pagetable_range", considering 
that the Page Directory page is mapped at the start of outer loop 
(PDPEs) only and the inner loop (PDEs) maps the PDEs to the page tables, 
in an inline manner.
The 'gen8_map_pagetable_range', will repeat the same inner loop of PDEs.

>   	}
>
>   	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> @@ -1045,17 +1064,38 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   err_out:
>   	while (pdpe--) {
>   		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
> -			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
> +			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
>   	}
>
>   	for_each_set_bit(pdpe, new_page_dirs, pdpes)
> -		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
> +		free_pd(dev, pdp->page_directory[pdpe]);
>
>   	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
>   	mark_tlbs_dirty(ppgtt);
>   	return ret;
>   }
>
> +static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
> +				    struct i915_pml4 *pml4,
> +				    uint64_t start,
> +				    uint64_t length)
> +{
> +	WARN_ON(1); /* to be implemented later */
> +	return 0;
> +}
> +
> +static int gen8_alloc_va_range(struct i915_address_space *vm,
> +			       uint64_t start, uint64_t length)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(vm, struct i915_hw_ppgtt, base);
> +
> +	if (!USES_FULL_48BIT_PPGTT(vm->dev))
> +		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
> +	else
> +		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
> +}
> +
>   /*
>    * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
>    * with a net effect resembling a 2-level page table in normal x86 terms. Each
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl
  2015-07-01 15:27   ` [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
@ 2015-07-07 12:48     ` Goel, Akash
  2015-07-07 13:40       ` Michel Thierry
  0 siblings, 1 reply; 74+ messages in thread
From: Goel, Akash @ 2015-07-07 12:48 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx



On 7/1/2015 8:57 PM, Michel Thierry wrote:
> PML4 has no special attributes, and there will always be a PML4.
> So simply initialize it at creation, and destroy it at the end.
>
> The code for 4lvl is able to call into the existing 3lvl page table code
> to handle all of the lower levels.
>
> v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
> compiler happy. And define ret only in one place.
> Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
> v3: Use i915_dma_unmap_single instead of pci API. Fix a
> couple of incorrect checks when unmapping pdp and pd pages (Akash).
> v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
> v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
> v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
> paths. (Akash)
> v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
> v8: Change location of pml4_init/fini. It will make next patches
> cleaner.
> v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
> trying to reuse as much as possible for pdp alloc. pml4_init/fini
> replaced by setup/cleanup_px macros.
> v10: Rebase after Mika's merged ppgtt cleanup patch series.
> v11: Rebase after final merged version of Mika's ppgtt/scratch patches.
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 162 ++++++++++++++++++++++++++++++------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  12 ++-
>   2 files changed, 146 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 1327e41..d23b0a8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -584,12 +584,44 @@ static void __pdp_fini(struct i915_page_directory_pointer *pdp)
>   	pdp->page_directory = NULL;
>   }
>
> +static struct
> +i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
> +{
> +	struct i915_page_directory_pointer *pdp;
> +	int ret = -ENOMEM;
> +
> +	WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
> +
> +	pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
> +	if (!pdp)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = __pdp_init(dev, pdp);
> +	if (ret)
> +		goto fail_bitmap;
> +
> +	ret = setup_px(dev, pdp);
> +	if (ret)
> +		goto fail_page_m;
> +
> +	return pdp;
> +
> +fail_page_m:
> +	__pdp_fini(pdp);
> +fail_bitmap:
> +	kfree(pdp);
> +
> +	return ERR_PTR(ret);
> +}
> +
>   static void free_pdp(struct drm_device *dev,
>   		     struct i915_page_directory_pointer *pdp)
>   {
>   	__pdp_fini(pdp);
> -	if (USES_FULL_48BIT_PPGTT(dev))
> +	if (USES_FULL_48BIT_PPGTT(dev)) {
> +		cleanup_px(dev, pdp);
>   		kfree(pdp);
> +	}
>   }
>
>   /* Broadwell Page Directory Pointer Descriptors */
> @@ -783,28 +815,46 @@ static void gen8_free_scratch(struct i915_address_space *vm)
>   	free_scratch_page(dev, vm->scratch_page);
>   }
>
> -static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> +static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
> +				    struct i915_page_directory_pointer *pdp)
>   {
> -	struct i915_hw_ppgtt *ppgtt =
> -		container_of(vm, struct i915_hw_ppgtt, base);
>   	int i;
>
> -	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> -		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> -				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> -			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> -				continue;
> +	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
> +		if (WARN_ON(!pdp->page_directory[i]))
> +			continue;
>
> -			gen8_free_page_tables(ppgtt->base.dev,
> -					      ppgtt->pdp.page_directory[i]);
> -			free_pd(ppgtt->base.dev,
> -				ppgtt->pdp.page_directory[i]);
> -		}
> -		free_pdp(ppgtt->base.dev, &ppgtt->pdp);
> -	} else {
> -		WARN_ON(1); /* to be implemented later */
> +		gen8_free_page_tables(dev, pdp->page_directory[i]);
> +		free_pd(dev, pdp->page_directory[i]);
>   	}
>
> +	free_pdp(dev, pdp);
> +}
> +
> +static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
> +{
> +	int i;
> +
> +	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
> +		if (WARN_ON(!ppgtt->pml4.pdps[i]))
> +			continue;
> +
> +		gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
> +	}
> +
> +	cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
> +}
> +
> +static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(vm, struct i915_hw_ppgtt, base);
> +
> +	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
> +		gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
> +	else
> +		gen8_ppgtt_cleanup_4lvl(ppgtt);
> +
>   	gen8_free_scratch(vm);
>   }
>
> @@ -1087,8 +1137,62 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
>   				    uint64_t start,
>   				    uint64_t length)
>   {
> -	WARN_ON(1); /* to be implemented later */
> +	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_page_directory_pointer *pdp;
> +	const uint64_t orig_start = start;
> +	const uint64_t orig_length = length;
> +	uint64_t temp, pml4e;
> +	int ret = 0;
> +
> +	/* Do the pml4 allocations first, so we don't need to track the newly
> +	 * allocated tables below the pdp */
> +	bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
> +
> +	/* The pagedirectory and pagetable allocations are done in the shared 3
> +	 * and 4 level code. Just allocate the pdps.
> +	 */
> +	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
> +		if (!pdp) {
> +			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
> +			pdp = alloc_pdp(vm->dev);
> +			if (IS_ERR(pdp))
> +				goto err_out;
> +
> +			pml4->pdps[pml4e] = pdp;
> +			__set_bit(pml4e, new_pdps);
> +			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
> +						   pml4e << GEN8_PML4E_SHIFT,
The ‘start’ variable should be used here in place of  ‘pml4e << 
GEN8_PML4E_SHIFT’  ?
> +						   GEN8_PML4E_SHIFT);
> +		}
> +	}
> +
> +	WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
> +	     "The allocation has spanned more than 512GB. "
> +	     "It is highly likely this is incorrect.");
> +
> +	start = orig_start;
> +	length = orig_length;
> +
> +	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
> +		WARN_ON(!pdp);
> +
> +		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
> +		if (ret)
> +			goto err_out;
> +	}
> +
> +	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
> +		  GEN8_PML4ES_PER_PML4);
> +
>   	return 0;
> +
> +err_out:
> +	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
> +		gen8_ppgtt_cleanup_3lvl(vm->dev, pml4->pdps[pml4e]);
> +
> +	return ret;
>   }
>
>   static int gen8_alloc_va_range(struct i915_address_space *vm,
> @@ -1097,10 +1201,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
>
> -	if (!USES_FULL_48BIT_PPGTT(vm->dev))
> -		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
> -	else
> +	if (USES_FULL_48BIT_PPGTT(vm->dev))
>   		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
> +	else
> +		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
>   }
>
>   /*
> @@ -1128,9 +1232,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
>   	ppgtt->switch_mm = gen8_mm_switch;
>
> -	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> -		ret = __pdp_init(false, &ppgtt->pdp);
> +	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> +		ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
> +		if (ret)
> +			goto free_scratch;
>
> +		ppgtt->base.total = 1ULL << 48;
> +	} else {
> +		ret = __pdp_init(false, &ppgtt->pdp);
>   		if (ret)
>   			goto free_scratch;
>
> @@ -1142,9 +1251,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>   			 * 2GiB).
>   			 */
>   			ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> -	} else {
> -		ppgtt->base.total = 1ULL << 48;
> -		return -EPERM; /* Not yet implemented */
> +
> +		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
> +							      0, 0,
> +							      GEN8_PML4E_SHIFT);
>   	}
>
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index e2b684e..c8ac0b5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -95,6 +95,7 @@ typedef uint64_t gen8_pde_t;
>    */
>   #define GEN8_PML4ES_PER_PML4		512
>   #define GEN8_PML4E_SHIFT		39
> +#define GEN8_PML4E_MASK			(GEN8_PML4ES_PER_PML4 - 1)
>   #define GEN8_PDPE_SHIFT			30
>   /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
>    * tables */
> @@ -464,6 +465,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
>   	     temp = min(temp, length),					\
>   	     start += temp, length -= temp)
>
> +#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter)	\
> +	for (iter = gen8_pml4e_index(start);	\
> +	     pdp = (pml4)->pdps[iter], length > 0 && iter < GEN8_PML4ES_PER_PML4;	\
> +	     iter++,				\
> +	     temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start,	\
> +	     temp = min(temp, length),					\
> +	     start += temp, length -= temp)
> +
>   #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
>   	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
>
> @@ -484,8 +493,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
>
>   static inline uint32_t gen8_pml4e_index(uint64_t address)
>   {
> -	WARN_ON(1); /* For 64B */
> -	return 0;
> +	return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
>   }
>
>   static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  2015-07-01 15:27   ` [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-07-07 12:51     ` Goel, Akash
  2015-07-07 13:42       ` Michel Thierry
  0 siblings, 1 reply; 74+ messages in thread
From: Goel, Akash @ 2015-07-07 12:51 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx



On 7/1/2015 8:57 PM, Michel Thierry wrote:
> When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
> Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
> it will write to.
>
> Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
>
> This patch was inspired by Ben's "Depend exclusively on map and
> unmap_vma".
>
> v2: Rebase after s/page_tables/page_table/.
> v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
> clamp_pdp in gen8_ppgtt_insert_entries (Akash).
> v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
> maintain symmetry with gen8_ppgtt_insert_entries (Akash).
> v5: Do not mix pages and bytes in insert_entries (Akash).
> v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
> once.
> v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> Use gen8_px_index functions, and remove unnecessary number of pages
> parameter in insert_pte_entries.
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 51 ++++++++++++++++++++++++++++---------
>   drivers/gpu/drm/i915/i915_gem_gtt.h | 11 ++++++++
>   2 files changed, 50 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 67d02b9..d16fbce 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -712,9 +712,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
>   	gen8_pte_t *pt_vaddr;
> -	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> -	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> -	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> +	unsigned pdpe = gen8_pdpe_index(start);
> +	unsigned pde = gen8_pde_index(start);
> +	unsigned pte = gen8_pte_index(start);
>   	unsigned num_entries = length >> PAGE_SHIFT;
>   	unsigned last_pte, i;
>
> @@ -763,12 +763,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>   {
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
> -	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> -
>   	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>   						 I915_CACHE_LLC, use_scratch);
>
> -	gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
> +	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> +		gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
> +					   scratch_pte);
> +	} else {
> +		uint64_t templ4, pml4e;
> +		struct i915_page_directory_pointer *pdp;
> +
> +		gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> +			uint64_t pdp_len = gen8_clamp_pdp(start, length);
> +			uint64_t pdp_start = start;
> +
> +			gen8_ppgtt_clear_pte_range(vm, pdp, pdp_start, pdp_len,
> +						   scratch_pte);
> +		}
> +	}
>   }
>
>   static void
> @@ -781,9 +793,9 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
>   	gen8_pte_t *pt_vaddr;
> -	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> -	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> -	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> +	unsigned pdpe = gen8_pdpe_index(start);
> +	unsigned pde = gen8_pde_index(start);
> +	unsigned pte = gen8_pte_index(start);
>
>   	pt_vaddr = NULL;
>
> @@ -801,7 +813,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
>   			kunmap_px(ppgtt, pt_vaddr);
>   			pt_vaddr = NULL;
>   			if (++pde == I915_PDES) {
> -				pdpe++;
> +				if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> +					break;

Can the same pdpe check (for Page directory pointer boundary) be added 
in the gen8_ppgtt_clear_pte_range function also, to make it consistent 
with gen8_ppgtt_insert_pte_entries and this will also obviate the need 
for gen8_clamp_pdp macro.

>   				pde = 0;
>   			}
>   			pte = 0;
> @@ -820,11 +833,25 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   {
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
> -	struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
>   	struct sg_page_iter sg_iter;
>
>   	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
> -	gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
> +
> +	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> +		gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
> +					      cache_level);
> +	} else {
> +		struct i915_page_directory_pointer *pdp;
> +		uint64_t templ4, pml4e;
> +		uint64_t length = (uint64_t)sg_nents(pages->sgl) << PAGE_SHIFT;
> +
> +		gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> +			uint64_t pdp_start = start;
> +

Isn't the 'pdp_start' dispensable here ? ‘start’ can be used directly.

> +			gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
> +						      pdp_start, cache_level);
> +		}
> +	}
>   }
>
>   static void gen8_free_page_tables(struct drm_device *dev,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index fb939fb..fd61325 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -478,6 +478,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
>   #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
>   	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
>
> +/* Clamp length to the next page_directory pointer boundary */
> +static inline uint64_t gen8_clamp_pdp(uint64_t start, uint64_t length)
> +{
> +	uint64_t next_pdp = ALIGN(start + 1, 1ULL << GEN8_PML4E_SHIFT);
> +
> +	if (next_pdp > (start + length))
> +		return length;
> +
> +	return next_pdp - start;
> +}
> +
>   static inline uint32_t gen8_pte_index(uint64_t address)
>   {
>   	return i915_pte_index(address, GEN8_PDE_SHIFT);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b
  2015-07-01 15:27   ` [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-07-07 12:53     ` Goel, Akash
  2015-07-07 13:50       ` Michel Thierry
  0 siblings, 1 reply; 74+ messages in thread
From: Goel, Akash @ 2015-07-07 12:53 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx



On 7/1/2015 8:57 PM, Michel Thierry wrote:
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h       |  4 ++--
>   drivers/gpu/drm/i915/i915_gpu_error.c | 17 +++++++++--------
>   2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7bccfd5..d245c82 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -546,7 +546,7 @@ struct drm_i915_error_state {
>
>   		struct drm_i915_error_object {
>   			int page_count;
> -			u32 gtt_offset;
> +			u64 gtt_offset;
>   			u32 *pages[0];
>   		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
>
> @@ -572,7 +572,7 @@ struct drm_i915_error_state {
>   		u32 size;
>   		u32 name;
>   		u32 rseqno[I915_NUM_RINGS], wseqno;
> -		u32 gtt_offset;
> +		u64 gtt_offset;
>   		u32 read_domains;
>   		u32 write_domain;
>   		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 6f42569..cdbd4c2 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -197,7 +197,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
>   	err_printf(m, "  %s [%d]:\n", name, count);
>
>   	while (count--) {
> -		err_printf(m, "    %08x %8u %02x %02x [ ",
> +		err_printf(m, "    %016llx %8u %02x %02x [ ",
>   			   err->gtt_offset,
>   			   err->size,
>   			   err->read_domains,
> @@ -426,7 +426,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   				err_printf(m, " (submitted by %s [%d])",
>   					   error->ring[i].comm,
>   					   error->ring[i].pid);
> -			err_printf(m, " --- gtt_offset = 0x%08x\n",
> +			err_printf(m, " --- gtt_offset = 0x%016llx\n",
>   				   obj->gtt_offset);
>   			print_error_obj(m, obj);
>   		}
> @@ -434,7 +434,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   		obj = error->ring[i].wa_batchbuffer;
>   		if (obj) {
>   			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
> -				   dev_priv->ring[i].name, obj->gtt_offset);
> +				   dev_priv->ring[i].name,
> +				   lower_32_bits(obj->gtt_offset));
>   			print_error_obj(m, obj);
>   		}
>
> @@ -453,14 +454,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   		if ((obj = error->ring[i].ringbuffer)) {
>   			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
>   				   dev_priv->ring[i].name,
> -				   obj->gtt_offset);
> +				   lower_32_bits(obj->gtt_offset));
>   			print_error_obj(m, obj);
>   		}
>
>   		if ((obj = error->ring[i].hws_page)) {
>   			err_printf(m, "%s --- HW Status = 0x%08x\n",
>   				   dev_priv->ring[i].name,
> -				   obj->gtt_offset);
> +				   lower_32_bits(obj->gtt_offset));
>   			offset = 0;
>   			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
>   				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
> @@ -476,13 +477,13 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   		if ((obj = error->ring[i].ctx)) {
>   			err_printf(m, "%s --- HW Context = 0x%08x\n",
>   				   dev_priv->ring[i].name,
> -				   obj->gtt_offset);
> +				   lower_32_bits(obj->gtt_offset));
>   			print_error_obj(m, obj);
>   		}
>   	}
>
>   	if ((obj = error->semaphore_obj)) {
> -		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
> +		err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);

Can the 'lower_32_bits' be used for the semaphore object also. Its 
mapped into GGTT during ring init time, so may not have an offset > 4GB.

>   		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
>   			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
>   				   elt * 4,
> @@ -590,7 +591,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
>   	int num_pages;
>   	bool use_ggtt;
>   	int i = 0;
> -	u32 reloc_offset;
> +	u64 reloc_offset;
>
>   	if (src == NULL || src->pages == NULL)
>   		return NULL;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump
  2015-07-01 15:27   ` [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
@ 2015-07-07 12:56     ` Goel, Akash
  2015-07-07 13:51       ` Michel Thierry
  0 siblings, 1 reply; 74+ messages in thread
From: Goel, Akash @ 2015-07-07 12:56 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx



On 7/1/2015 8:57 PM, Michel Thierry wrote:
> v2: Clean up patch after rebases.
> v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
> v4: Use used_pml4es/pdpes (Akash).
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 92 +++++++++++++++++++++++++++++++++++++
>   2 files changed, 102 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ad9a737..8c3dcc9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2223,7 +2223,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_engine_cs *ring;
> -	struct drm_file *file;
>   	int i;
>
>   	if (INTEL_INFO(dev)->gen == 6)
> @@ -2246,13 +2245,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   		ppgtt->debug_dump(ppgtt, m);
>   	}
>
> -	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> -		struct drm_i915_file_private *file_priv = file->driver_priv;
> -
> -		seq_printf(m, "proc: %s\n",
> -			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
> -		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
> -	}
>   	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
>   }
>
> @@ -2261,6 +2253,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
>   	struct drm_info_node *node = m->private;
>   	struct drm_device *dev = node->minor->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_file *file;
>
>   	int ret = mutex_lock_interruptible(&dev->struct_mutex);
>   	if (ret)
> @@ -2272,6 +2265,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
>   	else if (INTEL_INFO(dev)->gen >= 6)
>   		gen6_ppgtt_info(m, dev);
>
> +	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> +		struct drm_i915_file_private *file_priv = file->driver_priv;
> +
> +		seq_printf(m, "\nproc: %s\n",
> +			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
> +		idr_for_each(&file_priv->context_idr, per_file_ctx,
> +			     (void *)(unsigned long)m);
> +	}
> +
>   	intel_runtime_pm_put(dev_priv);
>   	mutex_unlock(&dev->struct_mutex);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index c6fc0d3..0c41e5d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1337,6 +1337,97 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
>   }
>
> +static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
> +			  uint64_t start, uint64_t length,
> +			  gen8_pte_t scratch_pte,
> +			  struct seq_file *m)
> +{
> +	struct i915_page_directory *pd;
> +	uint64_t temp;
> +	uint32_t pdpe;
> +
> +	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> +		struct i915_page_table *pt;
> +		uint64_t pd_len = length;
> +		uint64_t pd_start = start;
> +		uint32_t pde;
> +
> +		if (!pd)
> +			continue;
> +
> +		if(!test_bit(pdpe, pdp->used_pdpes))
> +			continue;
> +
> +		seq_printf(m, "\tPDPE #%d\n", pdpe);
> +		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
> +			uint32_t  pte;
> +			gen8_pte_t *pt_vaddr;
> +
> +			if (!pt)
> +				continue;
> +
> +			pt_vaddr = kmap_px(pt);
> +			for (pte = 0; pte < GEN8_PTES; pte+=4) {
> +				uint64_t va =
> +					(pdpe << GEN8_PDPE_SHIFT) |
> +					(pde << GEN8_PDE_SHIFT) |
> +					(pte << GEN8_PTE_SHIFT);
> +				int i;
> +				bool found = false;
> +				for (i = 0; i < 4; i++)
> +					if (pt_vaddr[pte + i] != scratch_pte)
> +						found = true;
> +				if (!found)
> +					continue;
> +
> +				seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
> +				for (i = 0; i < 4; i++) {
> +					if (pt_vaddr[pte + i] != scratch_pte)
> +						seq_printf(m, " %llx", pt_vaddr[pte + i]);
> +					else
> +						seq_puts(m, "  SCRATCH ");
> +				}
> +				seq_puts(m, "\n");
> +			}
> +			/* don't use kunmap_px, it could trigger
> +			 * an unnecessary flush.
> +			 */
> +			kunmap_atomic(pt_vaddr);
> +		}
> +	}
> +}
> +
> +static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
> +{
> +	struct i915_address_space *vm = &ppgtt->base;
> +	uint64_t start = ppgtt->base.start;
> +	uint64_t length = ppgtt->base.total;
> +	gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> +						 I915_CACHE_LLC, true);
> +
> +	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> +		gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
> +	} else {
> +		uint64_t templ4, pml4e;
> +		struct i915_pml4 *pml4 = &ppgtt->pml4;
> +		struct i915_page_directory_pointer *pdp;
> +
> +		gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
> +			uint64_t pdp_len = length;
> +			uint64_t pdp_start = start;
> +
> +			if (!pdp)
> +				continue;
> +
I think the "if (!test_bit(pml4e, pml4->used_pml4es))" check is 
foolproof & should suffice.
No real need of the extra check of 'if (!pdp)'.
Same for pdpe & pde loops in gen8_dump_pdp function

> +			if (!test_bit(pml4e, pml4->used_pml4es))
> +				continue;
> +
> +			seq_printf(m, "    PML4E #%llu\n", pml4e);
> +			gen8_dump_pdp(pdp, pdp_start, pdp_len, scratch_pte, m);
> +		}
> +	}
> +}
> +
>   /*
>    * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
>    * with a net effect resembling a 2-level page table in normal x86 terms. Each
> @@ -1359,6 +1450,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>   	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
>   	ppgtt->base.unbind_vma = ppgtt_unbind_vma;
>   	ppgtt->base.bind_vma = ppgtt_bind_vma;
> +	ppgtt->debug_dump = gen8_dump_ppgtt;
>
>   	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
>   		ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic
  2015-07-07 12:36     ` Goel, Akash
@ 2015-07-07 12:56       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-07 12:56 UTC (permalink / raw)
  To: Goel, Akash, intel-gfx

On 7/7/2015 1:36 PM, Goel, Akash wrote:
>
>
> On 7/1/2015 8:57 PM, Michel Thierry wrote:
>> This transitional patch doesn't do much for the existing code. However,
>> it should make upcoming patches to use the full 48b address space a bit
>> easier. The patch also introduces the PML4, ie. the new top level
>> structure
>> of the page tables.
>>
>
> Would be better to move the introduction of PML4 to a separate patch &
> keep this patch only for the dynamic allocation of pdp changes.
>
I'll move the PML4 declaration to a subsequent patch.

Thanks

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage
  2015-07-07 12:43     ` Goel, Akash
@ 2015-07-07 13:35       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-07 13:35 UTC (permalink / raw)
  To: Goel, Akash, intel-gfx

On 7/7/2015 1:43 PM, Goel, Akash wrote:
>
>
> On 7/1/2015 8:57 PM, Michel Thierry wrote:
>> @@ -795,13 +821,15 @@ static void gen8_ppgtt_cleanup(struct
>> i915_address_space *vm)
>>    *
>>    * Return: 0 if success; negative error code otherwise.
>>    */
>> -static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>> +static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
>>                        struct i915_page_directory *pd,
>>                        uint64_t start,
>>                        uint64_t length,
>>                        unsigned long *new_pts)
>>   {
>> -    struct drm_device *dev = ppgtt->base.dev;
>> +    struct i915_hw_ppgtt *ppgtt =
>> +        container_of(vm, struct i915_hw_ppgtt, base);
>> +    struct drm_device *dev = vm->dev;
>
> The 'ppgtt' pointer can be completely dispensed with, by the use of 'vm'
> pointer.
>
Leftovers from old rebases, I'll clean these functions up.

>>       struct i915_page_table *pt;
>>       uint64_t temp;
>>       uint32_t pde;
>> @@ -1024,18 +1051,10 @@ static int gen8_alloc_va_range(struct
>> i915_address_space *vm,
>>
>>               /* Our pde is now pointing to the pagetable, pt */
>>               __set_bit(pde, pd->used_pdes);
>> -
>> -            /* Map the PDE to the page table */
>> -            page_directory[pde] = gen8_pde_encode(px_dma(pt),
>> -                                  I915_CACHE_LLC);
>> -
>> -            /* NB: We haven't yet mapped ptes to pages. At this
>> -             * point we're still relying on insert_entries() */
>>           }
>>
>> -        kunmap_px(ppgtt, page_directory);
>> -
>> -        __set_bit(pdpe, ppgtt->pdp.used_pdpes);
>> +        __set_bit(pdpe, pdp->used_pdpes);
>> +        gen8_map_pagetable_range(ppgtt, pd, start, length);
>
> No apparent benefit in use of "gen8_map_pagetable_range", considering
> that the Page Directory page is mapped at the start of outer loop
> (PDPEs) only and the inner loop (PDEs) maps the PDEs to the page tables,
> in an inline manner.
> The 'gen8_map_pagetable_range', will repeat the same inner loop of PDEs.
>

And I'll discard these changes.

-Michel
>>       }
>>
>>       free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl
  2015-07-07 12:48     ` Goel, Akash
@ 2015-07-07 13:40       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-07 13:40 UTC (permalink / raw)
  To: Goel, Akash, intel-gfx

On 7/7/2015 1:48 PM, Goel, Akash wrote:
>
>
> On 7/1/2015 8:57 PM, Michel Thierry wrote:
>> @@ -1087,8 +1137,62 @@ static int gen8_alloc_va_range_4lvl(struct
>> i915_address_space *vm,
>>                       uint64_t start,
>>                       uint64_t length)
>>   {
>> -    WARN_ON(1); /* to be implemented later */
>> +    DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
>> +    struct i915_hw_ppgtt *ppgtt =
>> +        container_of(vm, struct i915_hw_ppgtt, base);
>> +    struct i915_page_directory_pointer *pdp;
>> +    const uint64_t orig_start = start;
>> +    const uint64_t orig_length = length;
>> +    uint64_t temp, pml4e;
>> +    int ret = 0;
>> +
>> +    /* Do the pml4 allocations first, so we don't need to track the
>> newly
>> +     * allocated tables below the pdp */
>> +    bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
>> +
>> +    /* The pagedirectory and pagetable allocations are done in the
>> shared 3
>> +     * and 4 level code. Just allocate the pdps.
>> +     */
>> +    gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
>> +        if (!pdp) {
>> +            WARN_ON(test_bit(pml4e, pml4->used_pml4es));
>> +            pdp = alloc_pdp(vm->dev);
>> +            if (IS_ERR(pdp))
>> +                goto err_out;
>> +
>> +            pml4->pdps[pml4e] = pdp;
>> +            __set_bit(pml4e, new_pdps);
>> +
>> trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
>> +                           pml4e << GEN8_PML4E_SHIFT,
> The ‘start’ variable should be used here in place of  ‘pml4e <<
> GEN8_PML4E_SHIFT’  ?

Correct, should be ‘start’.
Thanks

>> +                           GEN8_PML4E_SHIFT);
>> +        }
>> +    }
>> +

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  2015-07-07 12:51     ` Goel, Akash
@ 2015-07-07 13:42       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-07 13:42 UTC (permalink / raw)
  To: Goel, Akash, intel-gfx

On 7/7/2015 1:51 PM, Goel, Akash wrote:
> On 7/1/2015 8:57 PM, Michel Thierry wrote:
>>   static void
>> @@ -781,9 +793,9 @@ gen8_ppgtt_insert_pte_entries(struct
>> i915_address_space *vm,
>>       struct i915_hw_ppgtt *ppgtt =
>>           container_of(vm, struct i915_hw_ppgtt, base);
>>       gen8_pte_t *pt_vaddr;
>> -    unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
>> -    unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
>> -    unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>> +    unsigned pdpe = gen8_pdpe_index(start);
>> +    unsigned pde = gen8_pde_index(start);
>> +    unsigned pte = gen8_pte_index(start);
>>
>>       pt_vaddr = NULL;
>>
>> @@ -801,7 +813,8 @@ gen8_ppgtt_insert_pte_entries(struct
>> i915_address_space *vm,
>>               kunmap_px(ppgtt, pt_vaddr);
>>               pt_vaddr = NULL;
>>               if (++pde == I915_PDES) {
>> -                pdpe++;
>> +                if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
>> +                    break;
>
> Can the same pdpe check (for Page directory pointer boundary) be added
> in the gen8_ppgtt_clear_pte_range function also, to make it consistent
> with gen8_ppgtt_insert_pte_entries and this will also obviate the need
> for gen8_clamp_pdp macro.
>
I will change gen8_ppgtt_clear_pte_range to stop at PDP boundary as you 
suggests (and clamp_pdp will go away).

>>                   pde = 0;
>>               }
>>               pte = 0;
>> @@ -820,11 +833,25 @@ static void gen8_ppgtt_insert_entries(struct
>> i915_address_space *vm,
>>   {
>>       struct i915_hw_ppgtt *ppgtt =
>>           container_of(vm, struct i915_hw_ppgtt, base);
>> -    struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME:
>> 48b */
>>       struct sg_page_iter sg_iter;
>>
>>       __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl),
>> 0);
>> -    gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start,
>> cache_level);
>> +
>> +    if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
>> +        gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
>> +                          cache_level);
>> +    } else {
>> +        struct i915_page_directory_pointer *pdp;
>> +        uint64_t templ4, pml4e;
>> +        uint64_t length = (uint64_t)sg_nents(pages->sgl) << PAGE_SHIFT;
>> +
>> +        gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4,
>> pml4e) {
>> +            uint64_t pdp_start = start;
>> +
>
> Isn't the 'pdp_start' dispensable here ? ‘start’ can be used directly.
>
Yes, and the same applies in gen8_ppgtt_clear_range, pdp_len and 
pdp_start are redundant there.

>> +            gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
>> +                              pdp_start, cache_level);
>> +        }
>> +    }
>>   }
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b
  2015-07-07 12:53     ` Goel, Akash
@ 2015-07-07 13:50       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-07 13:50 UTC (permalink / raw)
  To: Goel, Akash, intel-gfx

On 7/7/2015 1:53 PM, Goel, Akash wrote:
> On 7/1/2015 8:57 PM, Michel Thierry wrote:
>> @@ -476,13 +477,13 @@ int i915_error_state_to_str(struct
>> drm_i915_error_state_buf *m,
>>           if ((obj = error->ring[i].ctx)) {
>>               err_printf(m, "%s --- HW Context = 0x%08x\n",
>>                      dev_priv->ring[i].name,
>> -                   obj->gtt_offset);
>> +                   lower_32_bits(obj->gtt_offset));
>>               print_error_obj(m, obj);
>>           }
>>       }
>>
>>       if ((obj = error->semaphore_obj)) {
>> -        err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
>> +        err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
>
> Can the 'lower_32_bits' be used for the semaphore object also. Its
> mapped into GGTT during ring init time, so may not have an offset > 4GB.
>
Makes sense, will change to:
	if ((obj = error->semaphore_obj)) {
		err_printf(m, "Semaphore page = 0x%08x\n",
			   lower_32_bits(obj->gtt_offset));

>>           for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
>>               err_printf(m, "[%04x] %08x %08x %08x %08x\n",
>>                      elt * 4,
>> @@ -590,7 +591,7 @@ i915_error_object_create(struct drm_i915_private
>> *dev_priv,
>>       int num_pages;
>>       bool use_ggtt;
>>       int i = 0;
>> -    u32 reloc_offset;
>> +    u64 reloc_offset;
>>
>>       if (src == NULL || src->pages == NULL)
>>           return NULL;
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump
  2015-07-07 12:56     ` Goel, Akash
@ 2015-07-07 13:51       ` Michel Thierry
  0 siblings, 0 replies; 74+ messages in thread
From: Michel Thierry @ 2015-07-07 13:51 UTC (permalink / raw)
  To: Goel, Akash, intel-gfx

On 7/7/2015 1:56 PM, Goel, Akash wrote:
> On 7/1/2015 8:57 PM, Michel Thierry wrote:
>> v2: Clean up patch after rebases.
>> v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
>> v4: Use used_pml4es/pdpes (Akash).
>> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
>>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
>> ---
>>   drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 92
>> +++++++++++++++++++++++++++++++++++++
>>   2 files changed, 102 insertions(+), 8 deletions(-)
>> +static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct
>> seq_file *m)
>> +{
>> +    struct i915_address_space *vm = &ppgtt->base;
>> +    uint64_t start = ppgtt->base.start;
>> +    uint64_t length = ppgtt->base.total;
>> +    gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>> +                         I915_CACHE_LLC, true);
>> +
>> +    if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
>> +        gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
>> +    } else {
>> +        uint64_t templ4, pml4e;
>> +        struct i915_pml4 *pml4 = &ppgtt->pml4;
>> +        struct i915_page_directory_pointer *pdp;
>> +
>> +        gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
>> +            uint64_t pdp_len = length;
>> +            uint64_t pdp_start = start;
>> +
>> +            if (!pdp)
>> +                continue;
>> +
> I think the "if (!test_bit(pml4e, pml4->used_pml4es))" check is
> foolproof & should suffice.
> No real need of the extra check of 'if (!pdp)'.
> Same for pdpe & pde loops in gen8_dump_pdp function
>
Right, I'll changed it to use test_bit across the board, also remove the 
unnecessary pdp_len/pdp_start variables:
-        if (!pd)
-            continue;
-
          if(!test_bit(pdpe, pdp->used_pdpes))
              continue;
-----------
-            if (!pt)
+            if(!test_bit(pde, pd->used_pdes))
                  continue;
-----------
          gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
-            uint64_t pdp_len = length;
-            uint64_t pdp_start = start;
-
-            if (!pdp)
-                continue;
-
              if (!test_bit(pml4e, pml4->used_pml4es))
                  continue;

              seq_printf(m, "    PML4E #%llu\n", pml4e);
-            gen8_dump_pdp(pdp, pdp_start, pdp_len, scratch_pte, m);
+            gen8_dump_pdp(pdp, start, length, scratch_pte, m);
-----------

>> +            if (!test_bit(pml4e, pml4->used_pml4es))
>> +                continue;
>> +
>> +            seq_printf(m, "    PML4E #%llu\n", pml4e);
>> +            gen8_dump_pdp(pdp, pdp_start, pdp_len, scratch_pte, m);
>> +        }
>> +    }
>> +}
>> +
>>   /*
>>    * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP
>> registers
>>    * with a net effect resembling a 2-level page table in normal x86
>> terms. Each
>> @@ -1359,6 +1450,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt
>> *ppgtt)
>>       ppgtt->base.clear_range = gen8_ppgtt_clear_range;
>>       ppgtt->base.unbind_vma = ppgtt_unbind_vma;
>>       ppgtt->base.bind_vma = ppgtt_bind_vma;
>> +    ppgtt->debug_dump = gen8_dump_ppgtt;
>>
>>       if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
>>           ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2015-07-07 13:51 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10 16:46 [PATCH v2 00/18] 48-bit PPGTT Michel Thierry
2015-06-10 16:46 ` [PATCH v2 01/18] drm/i915/lrc: Update PDPx registers with lri commands Michel Thierry
2015-06-11 18:04   ` Mika Kuoppala
2015-06-22  9:18     ` Michel Thierry
2015-06-26 12:46   ` [PATCH v3] " Michel Thierry
2015-06-26 14:45     ` Mika Kuoppala
2015-06-10 16:46 ` [PATCH v2 02/18] drm/i915/gtt: Switch gen8_free_page_tables params Michel Thierry
2015-06-11 18:05   ` Mika Kuoppala
2015-06-26 16:38     ` Daniel Vetter
2015-06-10 16:46 ` [PATCH v2 03/18] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
2015-06-10 16:46 ` [PATCH v2 04/18] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
2015-06-10 16:46 ` [PATCH v2 05/18] drm/i915/gen8: Abstract PDP usage Michel Thierry
2015-06-10 16:46 ` [PATCH v2 06/18] drm/i915/gen8: Add dynamic page trace events Michel Thierry
2015-06-10 16:46 ` [PATCH v2 07/18] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
2015-06-10 16:46 ` [PATCH v2 08/18] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
2015-06-10 16:46 ` [PATCH v2 09/18] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-06-10 16:46 ` [PATCH v2 10/18] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
2015-06-10 16:46 ` [PATCH v2 11/18] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-06-10 16:46 ` [PATCH v2 12/18] drm/i915/gen8: Initialize PDPs Michel Thierry
2015-06-10 16:46 ` [PATCH v2 13/18] drm/i915: Expand error state's address width to 64b Michel Thierry
2015-06-10 16:46 ` [PATCH v2 14/18] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
2015-06-10 16:46 ` [PATCH v2 15/18] drm/i915: object size needs to be u64 Michel Thierry
2015-06-10 16:46 ` [PATCH v2 16/18] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
2015-06-10 17:57   ` Chris Wilson
2015-06-10 16:46 ` [PATCH v2 17/18] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
2015-06-10 18:09   ` Chris Wilson
2015-06-17 12:49     ` Daniel Vetter
2015-06-17 12:53       ` Chris Wilson
2015-06-17 15:03         ` Daniel Vetter
2015-06-17 17:37           ` Chris Wilson
2015-06-18  6:45             ` Daniel Vetter
2015-06-18  7:03               ` Chris Wilson
2015-06-18  7:11                 ` Daniel Vetter
2015-06-18  7:34                   ` Chris Wilson
2015-06-23 12:21   ` [PATCH v3] " Michel Thierry
2015-06-23 13:22     ` Chris Wilson
2015-06-10 16:46 ` [PATCH v2 18/18] drm/i915/gen8: Flip the 48b switch Michel Thierry
2015-06-10 16:46 ` [PATCH v2] tests/gem_ppgtt: Check Wa32bitOffsets workarounds Michel Thierry
2015-07-01 15:27 ` [PATCH v3 00/17] 48-bit PPGTT Michel Thierry
2015-07-01 15:27   ` [PATCH v3 01/17] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
2015-07-01 15:27   ` [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
2015-07-07 12:36     ` Goel, Akash
2015-07-07 12:56       ` Michel Thierry
2015-07-01 15:27   ` [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage Michel Thierry
2015-07-07 12:43     ` Goel, Akash
2015-07-07 13:35       ` Michel Thierry
2015-07-01 15:27   ` [PATCH v3 04/17] drm/i915/gen8: Add dynamic page trace events Michel Thierry
2015-07-01 15:27   ` [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
2015-07-07 12:48     ` Goel, Akash
2015-07-07 13:40       ` Michel Thierry
2015-07-01 15:27   ` [PATCH v3 06/17] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
2015-07-01 15:27   ` [PATCH v3 07/17] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-07-01 15:27   ` [PATCH v3 08/17] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
2015-07-01 15:27   ` [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-07-07 12:51     ` Goel, Akash
2015-07-07 13:42       ` Michel Thierry
2015-07-01 15:27   ` [PATCH v3 10/17] drm/i915/gen8: Initialize PDPs Michel Thierry
2015-07-01 15:27   ` [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b Michel Thierry
2015-07-07 12:53     ` Goel, Akash
2015-07-07 13:50       ` Michel Thierry
2015-07-01 15:27   ` [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
2015-07-07 12:56     ` Goel, Akash
2015-07-07 13:51       ` Michel Thierry
2015-07-01 15:27   ` [PATCH v3 13/17] drm/i915: object size needs to be u64 Michel Thierry
2015-07-01 15:27   ` [PATCH v3 14/17] drm/i915: batch_obj vm offset must " Michel Thierry
2015-07-01 16:07     ` John Harrison
2015-07-01 15:27   ` [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check Michel Thierry
2015-07-01 15:31     ` Chris Wilson
2015-07-01 15:27   ` [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
2015-07-01 15:43     ` Chris Wilson
2015-07-01 15:54       ` Michel Thierry
2015-07-01 16:02     ` [PATCH v5] " Michel Thierry
2015-07-01 15:27   ` [PATCH v3 17/17] drm/i915/gen8: Flip the 48b switch Michel Thierry
2015-07-01 15:38   ` [PATCH v3 00/17] 48-bit PPGTT Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.