All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] ppgtt cleanups / scratch merge (V2)
@ 2015-05-22 17:04 Mika Kuoppala
  2015-05-22 17:04 ` [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+ Mika Kuoppala
                   ` (20 more replies)
  0 siblings, 21 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Hi,

I have replaced patch 2 from v1 series with version
that preallocates top level pdp structure with 32bit addressing
on architectures that have problems with pdp tlb flushes.

All issues raised with v1 should be addressed by this series.
Ville also noticed that copying scratch structures means unnecessary read
as we can straight out fill them. This change among others
triggered alot of rebasing thus new series.

I also included one patch that makes our bitops nonatomic.

Thanks,
-Mika

Mika Kuoppala (21):
  drm/i915/gtt: Mark TLBS dirty for gen8+
  drm/i915/gtt: Workaround for HW preload not flushing pdps
  drm/i915/gtt: Check va range against vm size
  drm/i915/gtt: Allow >= 4GB sizes for vm.
  drm/i915/gtt: Don't leak scratch page on mapping error
  drm/i915/gtt: Remove _single from page table allocator
  drm/i915/gtt: Introduce i915_page_dir_dma_addr
  drm/i915/gtt: Introduce struct i915_page_dma
  drm/i915/gtt: Rename unmap_and_free_px to free_px
  drm/i915/gtt: Remove superfluous free_pd with gen6/7
  drm/i915/gtt: Introduce fill_page_dma()
  drm/i915/gtt: Introduce kmap|kunmap for dma page
  drm/i915/gtt: Use macros to access dma mapped pages
  drm/i915/gtt: Make scratch page i915_page_dma compatible
  drm/i915/gtt: Fill scratch page
  drm/i915/gtt: Pin vma during virtual address allocation
  drm/i915/gtt: Cleanup page directory encoding
  drm/i915/gtt: Move scratch_pd and scratch_pt into vm area
  drm/i915/gtt: One instance of scratch page table/directory
  drm/i915/gtt: Use nonatomic bitmap ops
  drm/i915/gtt: Reorder page alloc/free/init functions

 drivers/char/agp/intel-gtt.c        |   4 +-
 drivers/gpu/drm/i915/i915_debugfs.c |  44 +--
 drivers/gpu/drm/i915/i915_gem.c     |   6 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 709 +++++++++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  55 ++-
 drivers/gpu/drm/i915/i915_reg.h     |  17 +
 drivers/gpu/drm/i915/intel_lrc.c    |  19 +-
 include/drm/intel-gtt.h             |   4 +-
 8 files changed, 499 insertions(+), 359 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
@ 2015-05-22 17:04 ` Mika Kuoppala
  2015-06-01 14:51   ` Joonas Lahtinen
  2015-06-01 15:52   ` Michel Thierry
  2015-05-22 17:04 ` [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps Mika Kuoppala
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

When we touch gen8+ page maps, mark them dirty like we
do with previous gens.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 17b7df0..0ffd459 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -830,6 +830,15 @@ err_out:
 	return -ENOMEM;
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
+{
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
+}
+
 static int gen8_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start,
 			       uint64_t length)
@@ -915,6 +924,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	mark_tlbs_dirty(ppgtt);
 	return 0;
 
 err_out:
@@ -927,6 +937,7 @@ err_out:
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
 
@@ -1260,16 +1271,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
- * are switching between contexts with the same LRCA, we also must do a force
- * restore.
- */
-static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
-{
-	/* If current vm != vm, */
-	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
-}
-
 static void gen6_initialize_pt(struct i915_address_space *vm,
 		struct i915_page_table *pt)
 {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
  2015-05-22 17:04 ` [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+ Mika Kuoppala
@ 2015-05-22 17:04 ` Mika Kuoppala
  2015-05-29 11:05   ` Michel Thierry
  2015-05-22 17:04 ` [PATCH 03/21] drm/i915/gtt: Check va range against vm size Mika Kuoppala
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

With BDW/SKL and 32bit addressing mode only, the hardware preloads
pdps. However the TLB invalidation only has effect on levels below
the pdps. This means that if pdps change, hw might access with
stale pdp entry.

To combat this problem, preallocate the top pdps so that hw sees
them as immutable for each context.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 50 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
 3 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0ffd459..1a5ad4c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -941,6 +941,48 @@ err_out:
 	return ret;
 }
 
+/* With some architectures and 32bit legacy mode, hardware pre-loads the
+ * top level pdps but the tlb invalidation only invalidates the lower levels.
+ * This might lead to hw fetching with stale pdp entries if top level
+ * structure changes, ie va space grows with dynamic page tables.
+ */
+static bool hw_wont_flush_pdp_tlbs(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+
+	if (GEN8_CTX_ADDRESSING_MODE != LEGACY_32B_CONTEXT)
+		return false;
+
+	if (IS_BROADWELL(dev) || IS_SKYLAKE(dev))
+		return true;
+
+	return false;
+}
+
+static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
+{
+	unsigned long *new_page_dirs, **new_page_tables;
+	int ret;
+
+	/* We allocate temp bitmap for page tables for no gain
+	 * but as this is for init only, lets keep the things simple
+	 */
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	if (ret)
+		return ret;
+
+	/* Allocate for all pdps regardless of how the ppgtt
+	 * was defined.
+	 */
+	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp,
+						0, 1ULL << 32,
+						new_page_dirs);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+
+	return ret;
+}
+
 /*
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -972,6 +1014,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
+	if (hw_wont_flush_pdp_tlbs(ppgtt)) {
+		/* Avoid the tlb flush bug by preallocating
+		 * whole top level pdp structure so it stays
+		 * static even if our va space grows.
+		 */
+		return gen8_preallocate_top_level_pdps(ppgtt);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6eeba63..334324b 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2777,6 +2777,23 @@ enum skl_disp_power_wells {
 #define VLV_CLK_CTL2			0x101104
 #define   CLK_CTL2_CZCOUNT_30NS_SHIFT	28
 
+/* Context descriptor format bits */
+#define GEN8_CTX_VALID			(1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE	(1<<1)
+#define GEN8_CTX_FORCE_RESTORE		(1<<2)
+#define GEN8_CTX_L3LLC_COHERENT		(1<<5)
+#define GEN8_CTX_PRIVILEGE		(1<<8)
+
+enum {
+	ADVANCED_CONTEXT = 0,
+	LEGACY_32B_CONTEXT,
+	ADVANCED_AD_CONTEXT,
+	LEGACY_64B_CONTEXT
+};
+
+#define GEN8_CTX_ADDRESSING_MODE_SHIFT	3
+#define GEN8_CTX_ADDRESSING_MODE	LEGACY_32B_CONTEXT
+
 /*
  * Overlay regs
  */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 96ae90a..d793d4e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -183,12 +183,6 @@
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
 
-#define GEN8_CTX_VALID (1<<0)
-#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
-#define GEN8_CTX_FORCE_RESTORE (1<<2)
-#define GEN8_CTX_L3LLC_COHERENT (1<<5)
-#define GEN8_CTX_PRIVILEGE (1<<8)
-
 #define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
 	const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
 		ppgtt->pdp.page_directory[n]->daddr : \
@@ -198,13 +192,6 @@
 }
 
 enum {
-	ADVANCED_CONTEXT = 0,
-	LEGACY_CONTEXT,
-	ADVANCED_AD_CONTEXT,
-	LEGACY_64B_CONTEXT
-};
-#define GEN8_CTX_MODE_SHIFT 3
-enum {
 	FAULT_AND_HANG = 0,
 	FAULT_AND_HALT, /* Debug only */
 	FAULT_AND_STREAM,
@@ -273,7 +260,7 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	desc |= GEN8_CTX_ADDRESSING_MODE << GEN8_CTX_ADDRESSING_MODE_SHIFT;
 	if (IS_GEN8(ctx_obj->base.dev))
 		desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 03/21] drm/i915/gtt: Check va range against vm size
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
  2015-05-22 17:04 ` [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+ Mika Kuoppala
  2015-05-22 17:04 ` [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps Mika Kuoppala
@ 2015-05-22 17:04 ` Mika Kuoppala
  2015-06-01 15:33   ` Joonas Lahtinen
  2015-05-22 17:04 ` [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm Mika Kuoppala
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Check the allocation area against the known end
of address space instead of against fixed value.

v2: Return ENODEV on internal bugs (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1a5ad4c..76de781 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -756,9 +756,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 	WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
 
-	/* FIXME: upper bound must not overflow 32 bits  */
-	WARN_ON((start + length) > (1ULL << 32));
-
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		if (pd)
 			continue;
@@ -857,7 +854,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	 * actually use the other side of the canonical address space.
 	 */
 	if (WARN_ON(start + length < start))
-		return -ERANGE;
+		return -ENODEV;
+
+	if (WARN_ON(start + length > ppgtt->base.total))
+		return -ENODEV;
 
 	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
@@ -1341,7 +1341,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 }
 
 static int gen6_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start, uint64_t length)
+			       uint64_t start_in, uint64_t length_in)
 {
 	DECLARE_BITMAP(new_page_tables, I915_PDES);
 	struct drm_device *dev = vm->dev;
@@ -1349,11 +1349,15 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_table *pt;
-	const uint32_t start_save = start, length_save = length;
+	uint32_t start, length, start_save, length_save;
 	uint32_t pde, temp;
 	int ret;
 
-	WARN_ON(upper_32_bits(start));
+	if (WARN_ON(start_in + length_in > ppgtt->base.total))
+		return -ENODEV;
+
+	start = start_save = start_in;
+	length = length_save = length_in;
 
 	bitmap_zero(new_page_tables, I915_PDES);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm.
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (2 preceding siblings ...)
  2015-05-22 17:04 ` [PATCH 03/21] drm/i915/gtt: Check va range against vm size Mika Kuoppala
@ 2015-05-22 17:04 ` Mika Kuoppala
  2015-05-26  7:15   ` Daniel Vetter
  2015-05-22 17:04 ` [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error Mika Kuoppala
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

We can have exactly 4GB sized ppgtt with 32bit system.
size_t is inadequate for this.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/char/agp/intel-gtt.c        |  4 ++--
 drivers/gpu/drm/i915/i915_debugfs.c | 42 ++++++++++++++++++-------------------
 drivers/gpu/drm/i915/i915_gem.c     |  6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 12 +++++------
 include/drm/intel-gtt.h             |  4 ++--
 6 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 0b4188b..4734d02 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -1408,8 +1408,8 @@ int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
 }
 EXPORT_SYMBOL(intel_gmch_probe);
 
-void intel_gtt_get(size_t *gtt_total, size_t *stolen_size,
-		   phys_addr_t *mappable_base, unsigned long *mappable_end)
+void intel_gtt_get(u64 *gtt_total, size_t *stolen_size,
+		   phys_addr_t *mappable_base, u64 *mappable_end)
 {
 	*gtt_total = intel_private.gtt_total_entries << PAGE_SHIFT;
 	*stolen_size = intel_private.stolen_size;
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fece922..c7a840b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -198,7 +198,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_address_space *vm = &dev_priv->gtt.base;
 	struct i915_vma *vma;
-	size_t total_obj_size, total_gtt_size;
+	u64 total_obj_size, total_gtt_size;
 	int count, ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -231,7 +231,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
+	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
 		   count, total_obj_size, total_gtt_size);
 	return 0;
 }
@@ -253,7 +253,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	size_t total_obj_size, total_gtt_size;
+	u64 total_obj_size, total_gtt_size;
 	LIST_HEAD(stolen);
 	int count, ret;
 
@@ -292,7 +292,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
+	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
 		   count, total_obj_size, total_gtt_size);
 	return 0;
 }
@@ -310,10 +310,10 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 
 struct file_stats {
 	struct drm_i915_file_private *file_priv;
-	int count;
-	size_t total, unbound;
-	size_t global, shared;
-	size_t active, inactive;
+	unsigned long count;
+	u64 total, unbound;
+	u64 global, shared;
+	u64 active, inactive;
 };
 
 static int per_file_stats(int id, void *ptr, void *data)
@@ -370,7 +370,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 
 #define print_file_stats(m, name, stats) do { \
 	if (stats.count) \
-		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
+		seq_printf(m, "%s: %lu objects, %llu bytes (%llu active, %llu inactive, %llu global, %llu shared, %llu unbound)\n", \
 			   name, \
 			   stats.count, \
 			   stats.total, \
@@ -420,7 +420,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 count, mappable_count, purgeable_count;
-	size_t size, mappable_size, purgeable_size;
+	u64 size, mappable_size, purgeable_size;
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm = &dev_priv->gtt.base;
 	struct drm_file *file;
@@ -437,17 +437,17 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 
 	size = count = mappable_size = mappable_count = 0;
 	count_objects(&dev_priv->mm.bound_list, global_list);
-	seq_printf(m, "%u [%u] objects, %zu [%zu] bytes in gtt\n",
+	seq_printf(m, "%u [%u] objects, %llu [%llu] bytes in gtt\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = mappable_size = mappable_count = 0;
 	count_vmas(&vm->active_list, mm_list);
-	seq_printf(m, "  %u [%u] active objects, %zu [%zu] bytes\n",
+	seq_printf(m, "  %u [%u] active objects, %llu [%llu] bytes\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = mappable_size = mappable_count = 0;
 	count_vmas(&vm->inactive_list, mm_list);
-	seq_printf(m, "  %u [%u] inactive objects, %zu [%zu] bytes\n",
+	seq_printf(m, "  %u [%u] inactive objects, %llu [%llu] bytes\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = purgeable_size = purgeable_count = 0;
@@ -456,7 +456,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		if (obj->madv == I915_MADV_DONTNEED)
 			purgeable_size += obj->base.size, ++purgeable_count;
 	}
-	seq_printf(m, "%u unbound objects, %zu bytes\n", count, size);
+	seq_printf(m, "%u unbound objects, %llu bytes\n", count, size);
 
 	size = count = mappable_size = mappable_count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
@@ -473,16 +473,16 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 			++purgeable_count;
 		}
 	}
-	seq_printf(m, "%u purgeable objects, %zu bytes\n",
+	seq_printf(m, "%u purgeable objects, %llu bytes\n",
 		   purgeable_count, purgeable_size);
-	seq_printf(m, "%u pinned mappable objects, %zu bytes\n",
+	seq_printf(m, "%u pinned mappable objects, %llu bytes\n",
 		   mappable_count, mappable_size);
-	seq_printf(m, "%u fault mappable objects, %zu bytes\n",
+	seq_printf(m, "%u fault mappable objects, %llu bytes\n",
 		   count, size);
 
-	seq_printf(m, "%zu [%lu] gtt total\n",
+	seq_printf(m, "%llu [%llu] gtt total\n",
 		   dev_priv->gtt.base.total,
-		   dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
+		   (u64)dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
 
 	seq_putc(m, '\n');
 	print_batch_pool_stats(m, dev_priv);
@@ -519,7 +519,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 	uintptr_t list = (uintptr_t) node->info_ent->data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	size_t total_obj_size, total_gtt_size;
+	u64 total_obj_size, total_gtt_size;
 	int count, ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -541,7 +541,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
+	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
 		   count, total_obj_size, total_gtt_size);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cc206f1..25e375c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3671,9 +3671,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 size, fence_size, fence_alignment, unfenced_alignment;
-	unsigned long start =
+	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
-	unsigned long end =
+	u64 end =
 		flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
 	struct i915_vma *vma;
 	int ret;
@@ -3729,7 +3729,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	 * attempt to find space.
 	 */
 	if (size > end) {
-		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%lu\n",
+		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
 			  ggtt_view ? ggtt_view->type : 0,
 			  size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 76de781..c61de4a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2147,7 +2147,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 void i915_gem_init_global_gtt(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long gtt_size, mappable_size;
+	u64 gtt_size, mappable_size;
 
 	gtt_size = dev_priv->gtt.base.total;
 	mappable_size = dev_priv->gtt.mappable_end;
@@ -2402,13 +2402,13 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
 }
 
 static int gen8_gmch_probe(struct drm_device *dev,
-			   size_t *gtt_total,
+			   u64 *gtt_total,
 			   size_t *stolen,
 			   phys_addr_t *mappable_base,
-			   unsigned long *mappable_end)
+			   u64 *mappable_end)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned int gtt_size;
+	u64 gtt_size;
 	u16 snb_gmch_ctl;
 	int ret;
 
@@ -2450,10 +2450,10 @@ static int gen8_gmch_probe(struct drm_device *dev,
 }
 
 static int gen6_gmch_probe(struct drm_device *dev,
-			   size_t *gtt_total,
+			   u64 *gtt_total,
 			   size_t *stolen,
 			   phys_addr_t *mappable_base,
-			   unsigned long *mappable_end)
+			   u64 *mappable_end)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned int gtt_size;
@@ -2467,7 +2467,7 @@ static int gen6_gmch_probe(struct drm_device *dev,
 	 * a coarse sanity check.
 	 */
 	if ((*mappable_end < (64<<20) || (*mappable_end > (512<<20)))) {
-		DRM_ERROR("Unknown GMADR size (%lx)\n",
+		DRM_ERROR("Unknown GMADR size (%llx)\n",
 			  dev_priv->gtt.mappable_end);
 		return -ENXIO;
 	}
@@ -2501,10 +2501,10 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 }
 
 static int i915_gmch_probe(struct drm_device *dev,
-			   size_t *gtt_total,
+			   u64 *gtt_total,
 			   size_t *stolen,
 			   phys_addr_t *mappable_base,
-			   unsigned long *mappable_end)
+			   u64 *mappable_end)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
@@ -2569,9 +2569,9 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	gtt->base.dev = dev;
 
 	/* GMADR is the PCI mmio aperture into the global GTT. */
-	DRM_INFO("Memory usable by graphics device = %zdM\n",
+	DRM_INFO("Memory usable by graphics device = %lluM\n",
 		 gtt->base.total >> 20);
-	DRM_DEBUG_DRIVER("GMADR size = %ldM\n", gtt->mappable_end >> 20);
+	DRM_DEBUG_DRIVER("GMADR size = %lldM\n", gtt->mappable_end >> 20);
 	DRM_DEBUG_DRIVER("GTT stolen size = %zdM\n", gtt->stolen_size >> 20);
 #ifdef CONFIG_INTEL_IOMMU
 	if (intel_iommu_gfx_mapped)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0d46dd2..c343161 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -233,8 +233,8 @@ struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
 	struct list_head global_link;
-	unsigned long start;		/* Start offset always 0 for dri2 */
-	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
+	u64 start;		/* Start offset always 0 for dri2 */
+	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
 	struct {
 		dma_addr_t addr;
@@ -300,9 +300,9 @@ struct i915_address_space {
  */
 struct i915_gtt {
 	struct i915_address_space base;
-	size_t stolen_size;		/* Total size of stolen memory */
 
-	unsigned long mappable_end;	/* End offset that we can CPU map */
+	size_t stolen_size;		/* Total size of stolen memory */
+	u64 mappable_end;		/* End offset that we can CPU map */
 	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
 	phys_addr_t mappable_base;	/* PA of our GMADR */
 
@@ -314,9 +314,9 @@ struct i915_gtt {
 	int mtrr;
 
 	/* global gtt ops */
-	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
+	int (*gtt_probe)(struct drm_device *dev, u64 *gtt_total,
 			  size_t *stolen, phys_addr_t *mappable_base,
-			  unsigned long *mappable_end);
+			  u64 *mappable_end);
 };
 
 struct i915_hw_ppgtt {
diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
index b08bdad..9e9bddaa5 100644
--- a/include/drm/intel-gtt.h
+++ b/include/drm/intel-gtt.h
@@ -3,8 +3,8 @@
 #ifndef _DRM_INTEL_GTT_H
 #define	_DRM_INTEL_GTT_H
 
-void intel_gtt_get(size_t *gtt_total, size_t *stolen_size,
-		   phys_addr_t *mappable_base, unsigned long *mappable_end);
+void intel_gtt_get(u64 *gtt_total, size_t *stolen_size,
+		   phys_addr_t *mappable_base, u64 *mappable_end);
 
 int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
 		     struct agp_bridge_data *bridge);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (3 preceding siblings ...)
  2015-05-22 17:04 ` [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm Mika Kuoppala
@ 2015-05-22 17:04 ` Mika Kuoppala
  2015-06-01 15:02   ` Joonas Lahtinen
  2015-05-22 17:04 ` [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator Mika Kuoppala
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Free the scratch page if dma mapping fails.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c61de4a..a608b1b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2191,8 +2191,10 @@ static int setup_scratch_page(struct drm_device *dev)
 #ifdef CONFIG_INTEL_IOMMU
 	dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
 				PCI_DMA_BIDIRECTIONAL);
-	if (pci_dma_mapping_error(dev->pdev, dma_addr))
+	if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
+		__free_page(page);
 		return -EINVAL;
+	}
 #else
 	dma_addr = page_to_phys(page);
 #endif
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (4 preceding siblings ...)
  2015-05-22 17:04 ` [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error Mika Kuoppala
@ 2015-05-22 17:04 ` Mika Kuoppala
  2015-06-02  9:53   ` Joonas Lahtinen
  2015-06-02  9:56   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 07/21] drm/i915/gtt: Introduce i915_page_dir_dma_addr Mika Kuoppala
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:04 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

We are always allocating a single page. No need to be verbose so
remove the suffix.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a608b1b..4cf47f9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -369,7 +369,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 	kunmap_atomic(pt_vaddr);
 }
 
-static struct i915_page_table *alloc_pt_single(struct drm_device *dev)
+static struct i915_page_table *alloc_pt(struct drm_device *dev)
 {
 	struct i915_page_table *pt;
 	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
@@ -417,7 +417,7 @@ static void unmap_and_free_pd(struct i915_page_directory *pd,
 	}
 }
 
-static struct i915_page_directory *alloc_pd_single(struct drm_device *dev)
+static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 {
 	struct i915_page_directory *pd;
 	int ret = -ENOMEM;
@@ -702,7 +702,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 			continue;
 		}
 
-		pt = alloc_pt_single(dev);
+		pt = alloc_pt(dev);
 		if (IS_ERR(pt))
 			goto unwind_out;
 
@@ -760,7 +760,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 		if (pd)
 			continue;
 
-		pd = alloc_pd_single(dev);
+		pd = alloc_pd(dev);
 		if (IS_ERR(pd))
 			goto unwind_out;
 
@@ -992,11 +992,11 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
-	ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev);
+	ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pt))
 		return PTR_ERR(ppgtt->scratch_pt);
 
-	ppgtt->scratch_pd = alloc_pd_single(ppgtt->base.dev);
+	ppgtt->scratch_pd = alloc_pd(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
@@ -1375,7 +1375,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		/* We've already allocated a page table */
 		WARN_ON(!bitmap_empty(pt->used_ptes, GEN6_PTES));
 
-		pt = alloc_pt_single(dev);
+		pt = alloc_pt(dev);
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
 			goto unwind_out;
@@ -1461,7 +1461,7 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
-	ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev);
+	ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pt))
 		return PTR_ERR(ppgtt->scratch_pt);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 07/21] drm/i915/gtt: Introduce i915_page_dir_dma_addr
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (5 preceding siblings ...)
  2015-05-22 17:04 ` [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-02 10:11   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma Mika Kuoppala
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

The legacy mode mm switch and the execlist context assignment
needs dma address for the page directories.

Introduce a function that encapsulates the scratch_pd dma
fallback if no pd is found.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++++++++
 drivers/gpu/drm/i915/intel_lrc.c    | 4 +---
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4cf47f9..18989f7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -481,10 +481,8 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int i, ret;
 
 	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
-		struct i915_page_directory *pd = ppgtt->pdp.page_directory[i];
-		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
-		/* The page directory might be NULL, but we need to clear out
-		 * whatever the previous context might have used. */
+		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
+
 		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c343161..da67542 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -468,6 +468,14 @@ static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
 	return i915_pte_count(address, length, GEN8_PDE_SHIFT);
 }
 
+static inline dma_addr_t
+i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
+{
+	return test_bit(n, ppgtt->pdp.used_pdpes) ?
+		ppgtt->pdp.page_directory[n]->daddr :
+		ppgtt->scratch_pd->daddr;
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d793d4e..626949a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -184,9 +184,7 @@
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
 
 #define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
-	const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
-		ppgtt->pdp.page_directory[n]->daddr : \
-		ppgtt->scratch_pd->daddr; \
+	const u64 _addr = i915_page_dir_dma_addr((ppgtt), (n));	\
 	reg_state[CTX_PDP ## n ## _UDW+1] = upper_32_bits(_addr); \
 	reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (6 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 07/21] drm/i915/gtt: Introduce i915_page_dir_dma_addr Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-02 12:39   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px Mika Kuoppala
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

All our paging structures have struct page and dma address
for that page.

Add struct for page/dma address pairs and use it to make
the setup and teardown for different paging structures
identical.

Include the page directory offset also in the struct for legacy
gens. Rename it to clearly point out that it is offset into the
ggtt.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 120 ++++++++++++++----------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  21 ++++---
 3 files changed, 60 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c7a840b..22770aa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2245,7 +2245,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.base.ggtt_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 18989f7..1e1a7a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -301,52 +301,39 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-#define i915_dma_unmap_single(px, dev) \
-	__i915_dma_unmap_single((px)->daddr, dev)
-
-static void __i915_dma_unmap_single(dma_addr_t daddr,
-				    struct drm_device *dev)
+static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 {
 	struct device *device = &dev->pdev->dev;
 
-	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
-}
-
-/**
- * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
- * @px:	Page table/dir/etc to get a DMA map for
- * @dev:	drm device
- *
- * Page table allocations are unified across all gens. They always require a
- * single 4k allocation, as well as a DMA mapping. If we keep the structs
- * symmetric here, the simple macro covers us for every page table type.
- *
- * Return: 0 if success.
- */
-#define i915_dma_map_single(px, dev) \
-	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
+	p->page = alloc_page(GFP_KERNEL);
+	if (!p->page)
+		return -ENOMEM;
 
-static int i915_dma_map_page_single(struct page *page,
-				    struct drm_device *dev,
-				    dma_addr_t *daddr)
-{
-	struct device *device = &dev->pdev->dev;
+	p->daddr = dma_map_page(device,
+				p->page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
 
-	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
-	if (dma_mapping_error(device, *daddr))
-		return -ENOMEM;
+	if (dma_mapping_error(device, p->daddr)) {
+		__free_page(p->page);
+		return -EINVAL;
+	}
 
 	return 0;
 }
 
-static void unmap_and_free_pt(struct i915_page_table *pt,
-			       struct drm_device *dev)
+static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 {
-	if (WARN_ON(!pt->page))
+	if (WARN_ON(!p->page))
 		return;
 
-	i915_dma_unmap_single(pt, dev);
-	__free_page(pt->page);
+	dma_unmap_page(&dev->pdev->dev, p->daddr, 4096, PCI_DMA_BIDIRECTIONAL);
+	__free_page(p->page);
+	memset(p, 0, sizeof(*p));
+}
+
+static void unmap_and_free_pt(struct i915_page_table *pt,
+			       struct drm_device *dev)
+{
+	cleanup_page_dma(dev, &pt->base);
 	kfree(pt->used_ptes);
 	kfree(pt);
 }
@@ -357,7 +344,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 	gen8_pte_t *pt_vaddr, scratch_pte;
 	int i;
 
-	pt_vaddr = kmap_atomic(pt->page);
+	pt_vaddr = kmap_atomic(pt->base.page);
 	scratch_pte = gen8_pte_encode(vm->scratch.addr,
 				      I915_CACHE_LLC, true);
 
@@ -386,19 +373,13 @@ static struct i915_page_table *alloc_pt(struct drm_device *dev)
 	if (!pt->used_ptes)
 		goto fail_bitmap;
 
-	pt->page = alloc_page(GFP_KERNEL);
-	if (!pt->page)
-		goto fail_page;
-
-	ret = i915_dma_map_single(pt, dev);
+	ret = setup_page_dma(dev, &pt->base);
 	if (ret)
-		goto fail_dma;
+		goto fail_page_m;
 
 	return pt;
 
-fail_dma:
-	__free_page(pt->page);
-fail_page:
+fail_page_m:
 	kfree(pt->used_ptes);
 fail_bitmap:
 	kfree(pt);
@@ -409,9 +390,8 @@ fail_bitmap:
 static void unmap_and_free_pd(struct i915_page_directory *pd,
 			      struct drm_device *dev)
 {
-	if (pd->page) {
-		i915_dma_unmap_single(pd, dev);
-		__free_page(pd->page);
+	if (pd->base.page) {
+		cleanup_page_dma(dev, &pd->base);
 		kfree(pd->used_pdes);
 		kfree(pd);
 	}
@@ -431,18 +411,12 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 	if (!pd->used_pdes)
 		goto free_pd;
 
-	pd->page = alloc_page(GFP_KERNEL);
-	if (!pd->page)
-		goto free_bitmap;
-
-	ret = i915_dma_map_single(pd, dev);
+	ret = setup_page_dma(dev, &pd->base);
 	if (ret)
-		goto free_page;
+		goto free_bitmap;
 
 	return pd;
 
-free_page:
-	__free_page(pd->page);
 free_bitmap:
 	kfree(pd->used_pdes);
 free_pd:
@@ -523,10 +497,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 
 		pt = pd->page_table[pde];
 
-		if (WARN_ON(!pt->page))
+		if (WARN_ON(!pt->base.page))
 			continue;
 
-		page_table = pt->page;
+		page_table = pt->base.page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES)
@@ -573,7 +547,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
-			struct page *page_table = pt->page;
+			struct page *page_table = pt->base.page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -605,7 +579,7 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
 			     struct drm_device *dev)
 {
 	gen8_pde_t entry =
-		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+		gen8_pde_encode(dev, pt->base.daddr, I915_CACHE_LLC);
 	*pde = entry;
 }
 
@@ -618,7 +592,7 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 	struct i915_page_table *pt;
 	int i;
 
-	page_directory = kmap_atomic(pd->page);
+	page_directory = kmap_atomic(pd->base.page);
 	pt = ppgtt->scratch_pt;
 	for (i = 0; i < I915_PDES; i++)
 		/* Map the PDE to the page table */
@@ -633,7 +607,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
 {
 	int i;
 
-	if (!pd->page)
+	if (!pd->base.page)
 		return;
 
 	for_each_set_bit(i, pd->used_pdes, I915_PDES) {
@@ -883,7 +857,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	/* Allocations have completed successfully, so set the bitmaps, and do
 	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_pde_t *const page_directory = kmap_atomic(pd->page);
+		gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
 		struct i915_page_table *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -1037,7 +1011,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->base.daddr;
 		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -1048,7 +1022,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
 		for (pte = 0; pte < GEN6_PTES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES) +
@@ -1083,7 +1057,7 @@ static void gen6_write_pde(struct i915_page_directory *pd,
 		container_of(pd, struct i915_hw_ppgtt, pd);
 	u32 pd_entry;
 
-	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->base.daddr);
 	pd_entry |= GEN6_PDE_VALID;
 
 	writel(pd_entry, ppgtt->pd_addr + pde);
@@ -1108,9 +1082,9 @@ static void gen6_write_page_range(struct drm_i915_private *dev_priv,
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
 
-	return (ppgtt->pd.pd_offset / 64) << 16;
+	return (ppgtt->pd.base.ggtt_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -1273,7 +1247,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES)
 			last_pte = GEN6_PTES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -1302,7 +1276,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -1330,7 +1304,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 	scratch_pte = vm->pte_encode(vm->scratch.addr,
 			I915_CACHE_LLC, true, 0);
 
-	pt_vaddr = kmap_atomic(pt->page);
+	pt_vaddr = kmap_atomic(pt->base.page);
 
 	for (i = 0; i < GEN6_PTES; i++)
 		pt_vaddr[i] = scratch_pte;
@@ -1546,11 +1520,11 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd.pd_offset =
+	ppgtt->pd.base.ggtt_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
 
 	ppgtt->pd_addr = (gen6_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_pte_t);
+		ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
 
 	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
@@ -1561,7 +1535,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 			 ppgtt->node.start / PAGE_SIZE);
 
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd.pd_offset << 10);
+		  ppgtt->pd.base.ggtt_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index da67542..666decc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,19 +205,22 @@ struct i915_vma {
 #define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
 };
 
-struct i915_page_table {
+struct i915_page_dma {
 	struct page *page;
-	dma_addr_t daddr;
+	union {
+		dma_addr_t daddr;
+		uint32_t ggtt_offset;
+	};
+};
+
+struct i915_page_table {
+	struct i915_page_dma base;
 
 	unsigned long *used_ptes;
 };
 
 struct i915_page_directory {
-	struct page *page; /* NULL for GEN6-GEN7 */
-	union {
-		uint32_t pd_offset;
-		dma_addr_t daddr;
-	};
+	struct i915_page_dma base;
 
 	unsigned long *used_pdes;
 	struct i915_page_table *page_table[I915_PDES]; /* PDEs */
@@ -472,8 +475,8 @@ static inline dma_addr_t
 i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
 {
 	return test_bit(n, ppgtt->pdp.used_pdpes) ?
-		ppgtt->pdp.page_directory[n]->daddr :
-		ppgtt->scratch_pd->daddr;
+		ppgtt->pdp.page_directory[n]->base.daddr :
+		ppgtt->scratch_pd->base.daddr;
 }
 
 int i915_gem_gtt_init(struct drm_device *dev);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (7 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-02 13:08   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 10/21] drm/i915/gtt: Remove superfluous free_pd with gen6/7 Mika Kuoppala
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

All the paging structures are now similar and mapped for
dma. The unmapping is taken care of by common accessors, so
don't overload the reader with such details.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1e1a7a1..f58aa63 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -330,8 +330,7 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	memset(p, 0, sizeof(*p));
 }
 
-static void unmap_and_free_pt(struct i915_page_table *pt,
-			       struct drm_device *dev)
+static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 {
 	cleanup_page_dma(dev, &pt->base);
 	kfree(pt->used_ptes);
@@ -387,8 +386,7 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
-static void unmap_and_free_pd(struct i915_page_directory *pd,
-			      struct drm_device *dev)
+static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
 {
 	if (pd->base.page) {
 		cleanup_page_dma(dev, &pd->base);
@@ -614,7 +612,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
 		if (WARN_ON(!pd->page_table[i]))
 			continue;
 
-		unmap_and_free_pt(pd->page_table[i], dev);
+		free_pt(dev, pd->page_table[i]);
 		pd->page_table[i] = NULL;
 	}
 }
@@ -630,11 +628,11 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
-	unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
-	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+	free_pd(ppgtt->base.dev, ppgtt->scratch_pd);
+	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
 }
 
 /**
@@ -687,7 +685,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pde, new_pts, I915_PDES)
-		unmap_and_free_pt(pd->page_table[pde], dev);
+		free_pt(dev, pd->page_table[pde]);
 
 	return -ENOMEM;
 }
@@ -745,7 +743,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
-		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
+		free_pd(dev, pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
@@ -902,11 +900,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
-			unmap_and_free_pt(ppgtt->pdp.page_directory[pdpe]->page_table[temp], vm->dev);
+			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
@@ -1395,7 +1393,7 @@ unwind_out:
 		struct i915_page_table *pt = ppgtt->pd.page_table[pde];
 
 		ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
-		unmap_and_free_pt(pt, vm->dev);
+		free_pt(vm->dev, pt);
 	}
 
 	mark_tlbs_dirty(ppgtt);
@@ -1414,11 +1412,11 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 
 	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			unmap_and_free_pt(pt, ppgtt->base.dev);
+			free_pt(ppgtt->base.dev, pt);
 	}
 
-	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
-	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
+	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
+	free_pd(ppgtt->base.dev, &ppgtt->pd);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
@@ -1468,7 +1466,7 @@ alloc:
 	return 0;
 
 err_out:
-	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
 	return ret;
 }
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 10/21] drm/i915/gtt: Remove superfluous free_pd with gen6/7
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (8 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-02 14:07   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma() Mika Kuoppala
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

This has slipped in somewhere but it was harmless
as we check the page pointer before teardown.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f58aa63..f747bd3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1416,7 +1416,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	}
 
 	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
-	free_pd(ppgtt->base.dev, &ppgtt->pd);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma()
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (9 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 10/21] drm/i915/gtt: Remove superfluous free_pd with gen6/7 Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-02 14:51   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page Mika Kuoppala
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

When we setup page directories and tables, we point the entries
to a to the next level scratch structure. Make this generic
by introducing a fill_page_dma which maps and flushes. We also
need 32 bit variant for legacy gens.

v2: Fix flushes and handle valleyview (Ville)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 71 +++++++++++++++++++------------------
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f747bd3..d020b5e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -330,6 +330,31 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	memset(p, 0, sizeof(*p));
 }
 
+static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
+			  const uint64_t val)
+{
+	int i;
+	uint64_t * const vaddr = kmap_atomic(p->page);
+
+	for (i = 0; i < 512; i++)
+		vaddr[i] = val;
+
+	if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
+		drm_clflush_virt_range(vaddr, PAGE_SIZE);
+
+	kunmap_atomic(vaddr);
+}
+
+static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
+			     const uint32_t val32)
+{
+	uint64_t v = val32;
+
+	v = v << 32 | val32;
+
+	fill_page_dma(dev, p, v);
+}
+
 static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 {
 	cleanup_page_dma(dev, &pt->base);
@@ -340,19 +365,11 @@ static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 static void gen8_initialize_pt(struct i915_address_space *vm,
 			       struct i915_page_table *pt)
 {
-	gen8_pte_t *pt_vaddr, scratch_pte;
-	int i;
-
-	pt_vaddr = kmap_atomic(pt->base.page);
-	scratch_pte = gen8_pte_encode(vm->scratch.addr,
-				      I915_CACHE_LLC, true);
+	gen8_pte_t scratch_pte;
 
-	for (i = 0; i < GEN8_PTES; i++)
-		pt_vaddr[i] = scratch_pte;
+	scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-	if (!HAS_LLC(vm->dev))
-		drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-	kunmap_atomic(pt_vaddr);
+	fill_page_dma(vm->dev, &pt->base, scratch_pte);
 }
 
 static struct i915_page_table *alloc_pt(struct drm_device *dev)
@@ -585,20 +602,13 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 			       struct i915_page_directory *pd)
 {
 	struct i915_hw_ppgtt *ppgtt =
-			container_of(vm, struct i915_hw_ppgtt, base);
-	gen8_pde_t *page_directory;
-	struct i915_page_table *pt;
-	int i;
+		container_of(vm, struct i915_hw_ppgtt, base);
+	gen8_pde_t scratch_pde;
 
-	page_directory = kmap_atomic(pd->base.page);
-	pt = ppgtt->scratch_pt;
-	for (i = 0; i < I915_PDES; i++)
-		/* Map the PDE to the page table */
-		__gen8_do_map_pt(page_directory + i, pt, vm->dev);
+	scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
+				      I915_CACHE_LLC);
 
-	if (!HAS_LLC(vm->dev))
-		drm_clflush_virt_range(page_directory, PAGE_SIZE);
-	kunmap_atomic(page_directory);
+	fill_page_dma(vm->dev, &pd->base, scratch_pde);
 }
 
 static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
@@ -1292,22 +1302,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 }
 
 static void gen6_initialize_pt(struct i915_address_space *vm,
-		struct i915_page_table *pt)
+			       struct i915_page_table *pt)
 {
-	gen6_pte_t *pt_vaddr, scratch_pte;
-	int i;
+	gen6_pte_t scratch_pte;
 
 	WARN_ON(vm->scratch.addr == 0);
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr,
-			I915_CACHE_LLC, true, 0);
-
-	pt_vaddr = kmap_atomic(pt->base.page);
-
-	for (i = 0; i < GEN6_PTES; i++)
-		pt_vaddr[i] = scratch_pte;
+	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	kunmap_atomic(pt_vaddr);
+	fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
 }
 
 static int gen6_alloc_va_range(struct i915_address_space *vm,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (10 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma() Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 10:55   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 13/21] drm/i915/gtt: Use macros to access dma mapped pages Mika Kuoppala
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

As there is flushing involved when we have done the cpu
write, make functions for mapping for cpu space. Make macros
to map any type of paging structure.

v2: Make it clear tha flushing kunmap is only for ppgtt (Ville)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 73 +++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d020b5e..072295f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -330,19 +330,35 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	memset(p, 0, sizeof(*p));
 }
 
+static void *kmap_page_dma(struct i915_page_dma *p)
+{
+	return kmap_atomic(p->page);
+}
+
+/* We use the flushing unmap only with ppgtt structures:
+ * page directories, page tables and scratch pages.
+ */
+static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
+{
+	if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
+		drm_clflush_virt_range(vaddr, PAGE_SIZE);
+
+	kunmap_atomic(vaddr);
+}
+
+#define kmap_px(px) kmap_page_dma(&(px)->base)
+#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr));
+
 static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
 			  const uint64_t val)
 {
 	int i;
-	uint64_t * const vaddr = kmap_atomic(p->page);
+	uint64_t * const vaddr = kmap_page_dma(p);
 
 	for (i = 0; i < 512; i++)
 		vaddr[i] = val;
 
-	if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
-		drm_clflush_virt_range(vaddr, PAGE_SIZE);
-
-	kunmap_atomic(vaddr);
+	kunmap_page_dma(dev, vaddr);
 }
 
 static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
@@ -500,7 +516,6 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	while (num_entries) {
 		struct i915_page_directory *pd;
 		struct i915_page_table *pt;
-		struct page *page_table;
 
 		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
 			continue;
@@ -515,22 +530,18 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		if (WARN_ON(!pt->base.page))
 			continue;
 
-		page_table = pt->base.page;
-
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES)
 			last_pte = GEN8_PTES;
 
-		pt_vaddr = kmap_atomic(page_table);
+		pt_vaddr = kmap_px(pt);
 
 		for (i = pte; i < last_pte; i++) {
 			pt_vaddr[i] = scratch_pte;
 			num_entries--;
 		}
 
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt);
 
 		pte = 0;
 		if (++pde == I915_PDES) {
@@ -562,18 +573,14 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
-			struct page *page_table = pt->base.page;
-
-			pt_vaddr = kmap_atomic(page_table);
+			pt_vaddr = kmap_px(pt);
 		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES) {
-			if (!HAS_LLC(ppgtt->base.dev))
-				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-			kunmap_atomic(pt_vaddr);
+			kunmap_px(ppgtt, pt_vaddr);
 			pt_vaddr = NULL;
 			if (++pde == I915_PDES) {
 				pdpe++;
@@ -582,11 +589,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			pte = 0;
 		}
 	}
-	if (pt_vaddr) {
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-		kunmap_atomic(pt_vaddr);
-	}
+
+	if (pt_vaddr)
+		kunmap_px(ppgtt, pt_vaddr);
 }
 
 static void __gen8_do_map_pt(gen8_pde_t * const pde,
@@ -865,7 +870,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	/* Allocations have completed successfully, so set the bitmaps, and do
 	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
+		gen8_pde_t *const page_directory = kmap_px(pd);
 		struct i915_page_table *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -895,10 +900,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			 * point we're still relying on insert_entries() */
 		}
 
-		if (!HAS_LLC(vm->dev))
-			drm_clflush_virt_range(page_directory, PAGE_SIZE);
-
-		kunmap_atomic(page_directory);
+		kunmap_px(ppgtt, page_directory);
 
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
@@ -1030,7 +1032,8 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
+		pt_vaddr = kmap_px(ppgtt->pd.page_table[pde]);
+
 		for (pte = 0; pte < GEN6_PTES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES) +
@@ -1052,7 +1055,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt_vaddr);
 	}
 }
 
@@ -1255,12 +1258,12 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES)
 			last_pte = GEN6_PTES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
+		pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
 
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt_vaddr);
 
 		num_entries -= last_pte - first_pte;
 		first_pte = 0;
@@ -1284,21 +1287,21 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
+			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
 				       cache_level, true, flags);
 
 		if (++act_pte == GEN6_PTES) {
-			kunmap_atomic(pt_vaddr);
+			kunmap_px(ppgtt, pt_vaddr);
 			pt_vaddr = NULL;
 			act_pt++;
 			act_pte = 0;
 		}
 	}
 	if (pt_vaddr)
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt_vaddr);
 }
 
 static void gen6_initialize_pt(struct i915_address_space *vm,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 13/21] drm/i915/gtt: Use macros to access dma mapped pages
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (11 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 10:57   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible Mika Kuoppala
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Make paging structure type agnostic *_px macros to access
page dma struct, the backing page and the dma address.

This makes the code less cluttered on internals of
i915_page_dma.

v2: Superfluous const -> nonconst removed

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 37 +++++++++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  8 ++++++--
 2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 072295f..4f9a000 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -346,8 +346,13 @@ static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
 	kunmap_atomic(vaddr);
 }
 
-#define kmap_px(px) kmap_page_dma(&(px)->base)
-#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr));
+#define kmap_px(px) kmap_page_dma(px_base(px))
+#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr))
+
+#define setup_px(dev, px) setup_page_dma((dev), px_base(px))
+#define cleanup_px(dev, px) cleanup_page_dma((dev), px_base(px))
+#define fill_px(dev, px, v) fill_page_dma((dev), px_base(px), (v))
+#define fill32_px(dev, px, v) fill_page_dma_32((dev), px_base(px), (v))
 
 static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
 			  const uint64_t val)
@@ -373,7 +378,7 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
 
 static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 {
-	cleanup_page_dma(dev, &pt->base);
+	cleanup_px(dev, pt);
 	kfree(pt->used_ptes);
 	kfree(pt);
 }
@@ -385,7 +390,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 
 	scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-	fill_page_dma(vm->dev, &pt->base, scratch_pte);
+	fill_px(vm->dev, pt, scratch_pte);
 }
 
 static struct i915_page_table *alloc_pt(struct drm_device *dev)
@@ -405,7 +410,7 @@ static struct i915_page_table *alloc_pt(struct drm_device *dev)
 	if (!pt->used_ptes)
 		goto fail_bitmap;
 
-	ret = setup_page_dma(dev, &pt->base);
+	ret = setup_px(dev, pt);
 	if (ret)
 		goto fail_page_m;
 
@@ -421,8 +426,8 @@ fail_bitmap:
 
 static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
 {
-	if (pd->base.page) {
-		cleanup_page_dma(dev, &pd->base);
+	if (px_page(pd)) {
+		cleanup_px(dev, pd);
 		kfree(pd->used_pdes);
 		kfree(pd);
 	}
@@ -442,7 +447,7 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 	if (!pd->used_pdes)
 		goto free_pd;
 
-	ret = setup_page_dma(dev, &pd->base);
+	ret = setup_px(dev, pd);
 	if (ret)
 		goto free_bitmap;
 
@@ -527,7 +532,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 
 		pt = pd->page_table[pde];
 
-		if (WARN_ON(!pt->base.page))
+		if (WARN_ON(!px_page(pt)))
 			continue;
 
 		last_pte = pte + num_entries;
@@ -599,7 +604,7 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
 			     struct drm_device *dev)
 {
 	gen8_pde_t entry =
-		gen8_pde_encode(dev, pt->base.daddr, I915_CACHE_LLC);
+		gen8_pde_encode(dev, px_dma(pt), I915_CACHE_LLC);
 	*pde = entry;
 }
 
@@ -610,17 +615,17 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pde_t scratch_pde;
 
-	scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
+	scratch_pde = gen8_pde_encode(vm->dev, px_dma(ppgtt->scratch_pt),
 				      I915_CACHE_LLC);
 
-	fill_page_dma(vm->dev, &pd->base, scratch_pde);
+	fill_px(vm->dev, pd, scratch_pde);
 }
 
 static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
 {
 	int i;
 
-	if (!pd->base.page)
+	if (!px_page(pd))
 		return;
 
 	for_each_set_bit(i, pd->used_pdes, I915_PDES) {
@@ -1021,7 +1026,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->base.daddr;
+		const dma_addr_t pt_addr = px_dma(ppgtt->pd.page_table[pde]);
 		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -1068,7 +1073,7 @@ static void gen6_write_pde(struct i915_page_directory *pd,
 		container_of(pd, struct i915_hw_ppgtt, pd);
 	u32 pd_entry;
 
-	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->base.daddr);
+	pd_entry = GEN6_PDE_ADDR_ENCODE(px_dma(pt));
 	pd_entry |= GEN6_PDE_VALID;
 
 	writel(pd_entry, ppgtt->pd_addr + pde);
@@ -1313,7 +1318,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
+	fill32_px(vm->dev, pt, scratch_pte);
 }
 
 static int gen6_alloc_va_range(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 666decc..006b839 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -213,6 +213,10 @@ struct i915_page_dma {
 	};
 };
 
+#define px_base(px) (&(px)->base)
+#define px_page(px) (px_base(px)->page)
+#define px_dma(px) (px_base(px)->daddr)
+
 struct i915_page_table {
 	struct i915_page_dma base;
 
@@ -475,8 +479,8 @@ static inline dma_addr_t
 i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
 {
 	return test_bit(n, ppgtt->pdp.used_pdpes) ?
-		ppgtt->pdp.page_directory[n]->base.daddr :
-		ppgtt->scratch_pd->base.daddr;
+		px_dma(ppgtt->pdp.page_directory[n]) :
+		px_dma(ppgtt->scratch_pd);
 }
 
 int i915_gem_gtt_init(struct drm_device *dev);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (12 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 13/21] drm/i915/gtt: Use macros to access dma mapped pages Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 13:44   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 15/21] drm/i915/gtt: Fill scratch page Mika Kuoppala
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Lay out scratch page structure in similar manner than other
paging structures. This allows us to use the same tools for
setup and teardown.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 89 ++++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  9 ++--
 2 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4f9a000..43fa543 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -301,11 +301,12 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
+static int __setup_page_dma(struct drm_device *dev,
+			    struct i915_page_dma *p, gfp_t flags)
 {
 	struct device *device = &dev->pdev->dev;
 
-	p->page = alloc_page(GFP_KERNEL);
+	p->page = alloc_page(flags);
 	if (!p->page)
 		return -ENOMEM;
 
@@ -320,6 +321,11 @@ static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	return 0;
 }
 
+static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
+{
+	return __setup_page_dma(dev, p, GFP_KERNEL);
+}
+
 static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 {
 	if (WARN_ON(!p->page))
@@ -388,7 +394,8 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 {
 	gen8_pte_t scratch_pte;
 
-	scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
+	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+				      I915_CACHE_LLC, true);
 
 	fill_px(vm->dev, pt, scratch_pte);
 }
@@ -515,7 +522,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
-	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
+	scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
@@ -1021,7 +1028,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	uint32_t  pte, pde, temp;
 	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
+	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page), I915_CACHE_LLC, true, 0);
 
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
@@ -1256,7 +1263,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned first_pte = first_entry % GEN6_PTES;
 	unsigned last_pte, i;
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
+	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+				     I915_CACHE_LLC, true, 0);
 
 	while (num_entries) {
 		last_pte = first_pte + num_entries;
@@ -1314,9 +1322,10 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 {
 	gen6_pte_t scratch_pte;
 
-	WARN_ON(vm->scratch.addr == 0);
+	WARN_ON(px_dma(vm->scratch_page) == 0);
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
+	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+				     I915_CACHE_LLC, true, 0);
 
 	fill32_px(vm->dev, pt, scratch_pte);
 }
@@ -1553,13 +1562,14 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	ppgtt->base.dev = dev;
-	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
+	ppgtt->base.scratch_page = dev_priv->gtt.base.scratch_page;
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt);
 	else
 		return gen8_ppgtt_init(ppgtt);
 }
+
 int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1874,7 +1884,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = gen8_pte_encode(vm->scratch.addr,
+	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
 				      I915_CACHE_LLC,
 				      use_scratch);
 	for (i = 0; i < num_entries; i++)
@@ -1900,7 +1910,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 		 first_entry, num_entries, max_entries))
 		num_entries = max_entries;
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, use_scratch, 0);
+	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+				     I915_CACHE_LLC, use_scratch, 0);
 
 	for (i = 0; i < num_entries; i++)
 		iowrite32(scratch_pte, &gtt_base[i]);
@@ -2157,42 +2168,40 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
 	vm->cleanup(vm);
 }
 
-static int setup_scratch_page(struct drm_device *dev)
+static int alloc_scratch_page(struct i915_address_space *vm)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct page *page;
-	dma_addr_t dma_addr;
+	struct i915_page_scratch *sp;
+	int ret;
+
+	WARN_ON(vm->scratch_page);
 
-	page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
-	if (page == NULL)
+	sp = kzalloc(sizeof(*sp), GFP_KERNEL);
+	if (sp == NULL)
 		return -ENOMEM;
-	set_pages_uc(page, 1);
 
-#ifdef CONFIG_INTEL_IOMMU
-	dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
-				PCI_DMA_BIDIRECTIONAL);
-	if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
-		__free_page(page);
-		return -EINVAL;
+	ret = __setup_page_dma(vm->dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
+	if (ret) {
+		kfree(sp);
+		return ret;
 	}
-#else
-	dma_addr = page_to_phys(page);
-#endif
-	dev_priv->gtt.base.scratch.page = page;
-	dev_priv->gtt.base.scratch.addr = dma_addr;
+
+	set_pages_uc(px_page(sp), 1);
+
+	vm->scratch_page = sp;
 
 	return 0;
 }
 
-static void teardown_scratch_page(struct drm_device *dev)
+static void free_scratch_page(struct i915_address_space *vm)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct page *page = dev_priv->gtt.base.scratch.page;
+	struct i915_page_scratch *sp = vm->scratch_page;
 
-	set_pages_wb(page, 1);
-	pci_unmap_page(dev->pdev, dev_priv->gtt.base.scratch.addr,
-		       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	__free_page(page);
+	set_pages_wb(px_page(sp), 1);
+
+	cleanup_px(vm->dev, sp);
+	kfree(sp);
+
+	vm->scratch_page = NULL;
 }
 
 static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
@@ -2300,7 +2309,7 @@ static int ggtt_probe_common(struct drm_device *dev,
 		return -ENOMEM;
 	}
 
-	ret = setup_scratch_page(dev);
+	ret = alloc_scratch_page(&dev_priv->gtt.base);
 	if (ret) {
 		DRM_ERROR("Scratch setup failed\n");
 		/* iounmap will also get called at remove, but meh */
@@ -2479,7 +2488,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 	struct i915_gtt *gtt = container_of(vm, struct i915_gtt, base);
 
 	iounmap(gtt->gsm);
-	teardown_scratch_page(vm->dev);
+	free_scratch_page(vm);
 }
 
 static int i915_gmch_probe(struct drm_device *dev,
@@ -2543,13 +2552,13 @@ int i915_gem_gtt_init(struct drm_device *dev)
 		dev_priv->gtt.base.cleanup = gen6_gmch_remove;
 	}
 
+	gtt->base.dev = dev;
+
 	ret = gtt->gtt_probe(dev, &gtt->base.total, &gtt->stolen_size,
 			     &gtt->mappable_base, &gtt->mappable_end);
 	if (ret)
 		return ret;
 
-	gtt->base.dev = dev;
-
 	/* GMADR is the PCI mmio aperture into the global GTT. */
 	DRM_INFO("Memory usable by graphics device = %lluM\n",
 		 gtt->base.total >> 20);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 006b839..1fd4041 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -217,6 +217,10 @@ struct i915_page_dma {
 #define px_page(px) (px_base(px)->page)
 #define px_dma(px) (px_base(px)->daddr)
 
+struct i915_page_scratch {
+	struct i915_page_dma base;
+};
+
 struct i915_page_table {
 	struct i915_page_dma base;
 
@@ -243,10 +247,7 @@ struct i915_address_space {
 	u64 start;		/* Start offset always 0 for dri2 */
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
-	struct {
-		dma_addr_t addr;
-		struct page *page;
-	} scratch;
+	struct i915_page_scratch *scratch_page;
 
 	/**
 	 * List of objects currently involved in rendering.
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (13 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-05-27 18:12   ` Tomas Elf
  2015-06-03 14:03   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 16/21] drm/i915/gtt: Pin vma during virtual address allocation Mika Kuoppala
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

During review of dynamic page tables series, I was able
to hit a lite restore bug with execlists. I assume that
due to incorrect pd, the batch run out of legit address space
and into the scratch page area. The ACTHD was increasing
due to scratch being all zeroes (MI_NOOPs). And as gen8
address space is quite large, the hangcheck happily waited
for a long long time, keeping the process effectively stuck.

According to Chris Wilson any modern gpu will grind to halt
if it encounters commands of all ones. This seemed to do the
trick and hang was declared promptly when the gpu wandered into
the scratch land.

v2: Use 0xffff00ff pattern (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 43fa543..a2a0c88 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2168,6 +2168,8 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
 	vm->cleanup(vm);
 }
 
+#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
+
 static int alloc_scratch_page(struct i915_address_space *vm)
 {
 	struct i915_page_scratch *sp;
@@ -2185,6 +2187,7 @@ static int alloc_scratch_page(struct i915_address_space *vm)
 		return ret;
 	}
 
+	fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
 	set_pages_uc(px_page(sp), 1);
 
 	vm->scratch_page = sp;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 16/21] drm/i915/gtt: Pin vma during virtual address allocation
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (14 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 15/21] drm/i915/gtt: Fill scratch page Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 14:27   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 17/21] drm/i915/gtt: Cleanup page directory encoding Mika Kuoppala
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Dynamic page table allocation might wake the shrinker
when memory is requested for page table structures.
As this happens when we try to allocate the virtual address
during binding, our vma might be among the targets for eviction.
We should do i915_vma_pin() and do pin early in there like Chris
suggests but this is interim solution.

Shield our vma from shrinker by incrementing pin count before
the virtual address is allocated.

The proper place to fix this would be in gem, inside of
i915_vma_pin(). But we don't have that yet so take the short
cut as a intermediate solution.

Testcase: igt/gem_ctx_thrash
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a2a0c88..b938964 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2916,9 +2916,12 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 				    vma->node.size,
 				    VM_TO_TRACE_NAME(vma->vm));
 
+		/* XXX: i915_vma_pin() will fix this +- hack */
+		vma->pin_count++;
 		ret = vma->vm->allocate_va_range(vma->vm,
 						 vma->node.start,
 						 vma->node.size);
+		vma->pin_count--;
 		if (ret)
 			return ret;
 	}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 17/21] drm/i915/gtt: Cleanup page directory encoding
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (15 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 16/21] drm/i915/gtt: Pin vma during virtual address allocation Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 14:58   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 18/21] drm/i915/gtt: Move scratch_pd and scratch_pt into vm area Mika Kuoppala
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Write page directory entry without using superfluous
indirect function. Also remove unused device parameter
from the encode function.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b938964..a1d6d7a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -192,9 +192,8 @@ static gen8_pte_t gen8_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static gen8_pde_t gen8_pde_encode(struct drm_device *dev,
-				  dma_addr_t addr,
-				  enum i915_cache_level level)
+static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
+				  const enum i915_cache_level level)
 {
 	gen8_pde_t pde = _PAGE_PRESENT | _PAGE_RW;
 	pde |= addr;
@@ -606,15 +605,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_px(ppgtt, pt_vaddr);
 }
 
-static void __gen8_do_map_pt(gen8_pde_t * const pde,
-			     struct i915_page_table *pt,
-			     struct drm_device *dev)
-{
-	gen8_pde_t entry =
-		gen8_pde_encode(dev, px_dma(pt), I915_CACHE_LLC);
-	*pde = entry;
-}
-
 static void gen8_initialize_pd(struct i915_address_space *vm,
 			       struct i915_page_directory *pd)
 {
@@ -622,7 +612,7 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pde_t scratch_pde;
 
-	scratch_pde = gen8_pde_encode(vm->dev, px_dma(ppgtt->scratch_pt),
+	scratch_pde = gen8_pde_encode(px_dma(ppgtt->scratch_pt),
 				      I915_CACHE_LLC);
 
 	fill_px(vm->dev, pd, scratch_pde);
@@ -906,7 +896,8 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			set_bit(pde, pd->used_pdes);
 
 			/* Map the PDE to the page table */
-			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+			page_directory[pde] = gen8_pde_encode(px_dma(pt),
+							      I915_CACHE_LLC);
 
 			/* NB: We haven't yet mapped ptes to pages. At this
 			 * point we're still relying on insert_entries() */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 18/21] drm/i915/gtt: Move scratch_pd and scratch_pt into vm area
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (16 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 17/21] drm/i915/gtt: Cleanup page directory encoding Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 16:46   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 19/21] drm/i915/gtt: One instance of scratch page table/directory Mika Kuoppala
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Scratch page is part of i915_address_space due to that we
have only one of that. Move other scratch entities into
the same struct. This is a preparatory patch for having
only one instance of each scratch_pt/pd.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 51 +++++++++++++++++--------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 +++--
 2 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a1d6d7a..61f4da0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -608,12 +608,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 static void gen8_initialize_pd(struct i915_address_space *vm,
 			       struct i915_page_directory *pd)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_pde_t scratch_pde;
 
-	scratch_pde = gen8_pde_encode(px_dma(ppgtt->scratch_pt),
-				      I915_CACHE_LLC);
+	scratch_pde = gen8_pde_encode(px_dma(vm->scratch_pt), I915_CACHE_LLC);
 
 	fill_px(vm->dev, pd, scratch_pde);
 }
@@ -648,8 +645,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
-	free_pd(ppgtt->base.dev, ppgtt->scratch_pd);
-	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
+	free_pd(vm->dev, vm->scratch_pd);
+	free_pt(vm->dev, vm->scratch_pt);
 }
 
 /**
@@ -685,7 +682,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 		/* Don't reallocate page tables */
 		if (pt) {
 			/* Scratch is never allocated this way */
-			WARN_ON(pt == ppgtt->scratch_pt);
+			WARN_ON(pt == ppgtt->base.scratch_pt);
 			continue;
 		}
 
@@ -977,16 +974,16 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
-	ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pt))
-		return PTR_ERR(ppgtt->scratch_pt);
+	ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->base.scratch_pt))
+		return PTR_ERR(ppgtt->base.scratch_pt);
 
-	ppgtt->scratch_pd = alloc_pd(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	ppgtt->base.scratch_pd = alloc_pd(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->base.scratch_pd))
+		return PTR_ERR(ppgtt->base.scratch_pd);
 
-	gen8_initialize_pt(&ppgtt->base, ppgtt->scratch_pt);
-	gen8_initialize_pd(&ppgtt->base, ppgtt->scratch_pd);
+	gen8_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
+	gen8_initialize_pd(&ppgtt->base, ppgtt->base.scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = 1ULL << 32;
@@ -1019,7 +1016,8 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	uint32_t  pte, pde, temp;
 	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
-	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page), I915_CACHE_LLC, true, 0);
+	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+				     I915_CACHE_LLC, true, 0);
 
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
@@ -1348,7 +1346,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	 * tables.
 	 */
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
-		if (pt != ppgtt->scratch_pt) {
+		if (pt != vm->scratch_pt) {
 			WARN_ON(bitmap_empty(pt->used_ptes, GEN6_PTES));
 			continue;
 		}
@@ -1403,7 +1401,7 @@ unwind_out:
 	for_each_set_bit(pde, new_page_tables, I915_PDES) {
 		struct i915_page_table *pt = ppgtt->pd.page_table[pde];
 
-		ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
+		ppgtt->pd.page_table[pde] = vm->scratch_pt;
 		free_pt(vm->dev, pt);
 	}
 
@@ -1418,15 +1416,14 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	struct i915_page_table *pt;
 	uint32_t pde;
 
-
 	drm_mm_remove_node(&ppgtt->node);
 
 	gen6_for_all_pdes(pt, ppgtt, pde) {
-		if (pt != ppgtt->scratch_pt)
+		if (pt != vm->scratch_pt)
 			free_pt(ppgtt->base.dev, pt);
 	}
 
-	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
+	free_pt(vm->dev, vm->scratch_pt);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
@@ -1441,11 +1438,11 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
-	ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pt))
-		return PTR_ERR(ppgtt->scratch_pt);
+	ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->base.scratch_pt))
+		return PTR_ERR(ppgtt->base.scratch_pt);
 
-	gen6_initialize_pt(&ppgtt->base, ppgtt->scratch_pt);
+	gen6_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
 
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
@@ -1476,7 +1473,7 @@ alloc:
 	return 0;
 
 err_out:
-	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
+	free_pt(ppgtt->base.dev, ppgtt->base.scratch_pt);
 	return ret;
 }
 
@@ -1492,7 +1489,7 @@ static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
 	uint32_t pde, temp;
 
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
-		ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
+		ppgtt->pd.page_table[pde] = ppgtt->base.scratch_pt;
 }
 
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1fd4041..ba46374 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -248,6 +248,8 @@ struct i915_address_space {
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
 	struct i915_page_scratch *scratch_page;
+	struct i915_page_table *scratch_pt;
+	struct i915_page_directory *scratch_pd;
 
 	/**
 	 * List of objects currently involved in rendering.
@@ -337,9 +339,6 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory pd;
 	};
 
-	struct i915_page_table *scratch_pt;
-	struct i915_page_directory *scratch_pd;
-
 	struct drm_i915_file_private *file_priv;
 
 	gen6_pte_t __iomem *pd_addr;
@@ -481,7 +480,7 @@ i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
 {
 	return test_bit(n, ppgtt->pdp.used_pdpes) ?
 		px_dma(ppgtt->pdp.page_directory[n]) :
-		px_dma(ppgtt->scratch_pd);
+		px_dma(ppgtt->base.scratch_pd);
 }
 
 int i915_gem_gtt_init(struct drm_device *dev);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 19/21] drm/i915/gtt: One instance of scratch page table/directory
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (17 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 18/21] drm/i915/gtt: Move scratch_pd and scratch_pt into vm area Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 16:57   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 20/21] drm/i915/gtt: Use nonatomic bitmap ops Mika Kuoppala
  2015-05-22 17:05 ` [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions Mika Kuoppala
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

As we use one scratch page for all ppgtt instances, we can
use one scratch page table and scratch directory across
all ppgtt instances, saving 2 pages + structs per ppgtt.

v2: Rebase

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 273 +++++++++++++++++++++++-------------
 1 file changed, 178 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 61f4da0..ab113ce 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -430,6 +430,17 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static void gen6_initialize_pt(struct i915_address_space *vm,
+			       struct i915_page_table *pt)
+{
+	gen6_pte_t scratch_pte;
+
+	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
+				     I915_CACHE_LLC, true, 0);
+
+	fill32_px(vm->dev, pt, scratch_pte);
+}
+
 static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
 {
 	if (px_page(pd)) {
@@ -467,6 +478,156 @@ free_pd:
 	return ERR_PTR(ret);
 }
 
+static void gen8_initialize_pd(struct i915_address_space *vm,
+			       struct i915_page_directory *pd)
+{
+	gen8_pde_t scratch_pde;
+
+	scratch_pde = gen8_pde_encode(px_dma(vm->scratch_pt), I915_CACHE_LLC);
+
+	fill_px(vm->dev, pd, scratch_pde);
+}
+
+#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
+
+static int alloc_scratch_page(struct i915_address_space *vm)
+{
+	struct i915_page_scratch *sp;
+	int ret;
+
+	WARN_ON(vm->scratch_page);
+
+	sp = kzalloc(sizeof(*sp), GFP_KERNEL);
+	if (sp == NULL)
+		return -ENOMEM;
+
+	ret = __setup_page_dma(vm->dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
+	if (ret) {
+		kfree(sp);
+		return ret;
+	}
+
+	fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
+	set_pages_uc(px_page(sp), 1);
+
+	vm->scratch_page = sp;
+
+	return 0;
+}
+
+static void free_scratch_page(struct i915_address_space *vm)
+{
+	struct i915_page_scratch *sp = vm->scratch_page;
+
+	set_pages_wb(px_page(sp), 1);
+
+	cleanup_px(vm->dev, sp);
+	kfree(sp);
+
+	vm->scratch_page = NULL;
+}
+
+static int setup_scratch_ggtt(struct i915_address_space *vm)
+{
+	int ret;
+
+	ret = alloc_scratch_page(vm);
+	if (ret)
+		return ret;
+
+	WARN_ON(vm->scratch_pt);
+
+	if (INTEL_INFO(vm->dev)->gen < 6)
+		return 0;
+
+	vm->scratch_pt = alloc_pt(vm->dev);
+	if (IS_ERR(vm->scratch_pt))
+		return PTR_ERR(vm->scratch_pt);
+
+	WARN_ON(px_dma(vm->scratch_page) == 0);
+
+	if (INTEL_INFO(vm->dev)->gen >= 8) {
+		gen8_initialize_pt(vm, vm->scratch_pt);
+
+		WARN_ON(vm->scratch_pd);
+
+		vm->scratch_pd = alloc_pd(vm->dev);
+		if (IS_ERR(vm->scratch_pd)) {
+			ret = PTR_ERR(vm->scratch_pd);
+			goto err_pd;
+		}
+
+		WARN_ON(px_dma(vm->scratch_pt) == 0);
+		gen8_initialize_pd(vm, vm->scratch_pd);
+	} else {
+		gen6_initialize_pt(vm, vm->scratch_pt);
+	}
+
+	return 0;
+
+err_pd:
+	free_pt(vm->dev, vm->scratch_pt);
+	return ret;
+}
+
+static int setup_scratch(struct i915_address_space *vm)
+{
+	struct i915_address_space *ggtt_vm = &to_i915(vm->dev)->gtt.base;
+
+	if (i915_is_ggtt(vm))
+		return setup_scratch_ggtt(vm);
+
+	vm->scratch_page = ggtt_vm->scratch_page;
+	vm->scratch_pt = ggtt_vm->scratch_pt;
+	vm->scratch_pd = ggtt_vm->scratch_pd;
+
+	return 0;
+}
+
+static void check_scratch_page(struct i915_address_space *vm)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	int i;
+	u64 *vaddr;
+
+	vaddr = kmap_px(vm->scratch_page);
+
+	for (i = 0; i < PAGE_SIZE / sizeof(u64); i++) {
+		if (vaddr[i] == SCRATCH_PAGE_MAGIC)
+			continue;
+
+		DRM_ERROR("%p scratch[%d] = 0x%08llx\n", vm, i, vaddr[i]);
+		break;
+	}
+
+	kunmap_px(ppgtt, vaddr);
+}
+
+static void cleanup_scratch_ggtt(struct i915_address_space *vm)
+{
+	check_scratch_page(vm);
+	free_scratch_page(vm);
+
+	if (INTEL_INFO(vm->dev)->gen < 6)
+		return;
+
+	free_pt(vm->dev, vm->scratch_pt);
+
+	if (INTEL_INFO(vm->dev)->gen >= 8)
+		free_pd(vm->dev, vm->scratch_pd);
+}
+
+static void cleanup_scratch(struct i915_address_space *vm)
+{
+	if (i915_is_ggtt(vm))
+		cleanup_scratch_ggtt(vm);
+
+	vm->scratch_page = NULL;
+	vm->scratch_pt = NULL;
+	vm->scratch_pd = NULL;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring,
 			  unsigned entry,
@@ -521,7 +682,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
-	scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
+	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
@@ -605,16 +766,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_px(ppgtt, pt_vaddr);
 }
 
-static void gen8_initialize_pd(struct i915_address_space *vm,
-			       struct i915_page_directory *pd)
-{
-	gen8_pde_t scratch_pde;
-
-	scratch_pde = gen8_pde_encode(px_dma(vm->scratch_pt), I915_CACHE_LLC);
-
-	fill_px(vm->dev, pd, scratch_pde);
-}
-
 static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
 {
 	int i;
@@ -645,8 +796,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
-	free_pd(vm->dev, vm->scratch_pd);
-	free_pt(vm->dev, vm->scratch_pt);
+	cleanup_scratch(vm);
 }
 
 /**
@@ -974,16 +1124,7 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
-	ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->base.scratch_pt))
-		return PTR_ERR(ppgtt->base.scratch_pt);
-
-	ppgtt->base.scratch_pd = alloc_pd(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->base.scratch_pd))
-		return PTR_ERR(ppgtt->base.scratch_pd);
-
-	gen8_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
-	gen8_initialize_pd(&ppgtt->base, ppgtt->base.scratch_pd);
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = 1ULL << 32;
@@ -996,12 +1137,18 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
+	ret = setup_scratch(&ppgtt->base);
+	if (ret)
+		return ret;
+
 	if (hw_wont_flush_pdp_tlbs(ppgtt)) {
 		/* Avoid the tlb flush bug by preallocating
 		 * whole top level pdp structure so it stays
 		 * static even if our va space grows.
 		 */
-		return gen8_preallocate_top_level_pdps(ppgtt);
+		ret = gen8_preallocate_top_level_pdps(ppgtt);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -1306,19 +1453,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_px(ppgtt, pt_vaddr);
 }
 
-static void gen6_initialize_pt(struct i915_address_space *vm,
-			       struct i915_page_table *pt)
-{
-	gen6_pte_t scratch_pte;
-
-	WARN_ON(px_dma(vm->scratch_page) == 0);
-
-	scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
-				     I915_CACHE_LLC, true, 0);
-
-	fill32_px(vm->dev, pt, scratch_pte);
-}
-
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start_in, uint64_t length_in)
 {
@@ -1423,7 +1557,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 			free_pt(ppgtt->base.dev, pt);
 	}
 
-	free_pt(vm->dev, vm->scratch_pt);
+	cleanup_scratch(vm);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
@@ -1438,11 +1572,10 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
-	ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->base.scratch_pt))
-		return PTR_ERR(ppgtt->base.scratch_pt);
 
-	gen6_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
+	ret = setup_scratch(&ppgtt->base);
+	if (ret)
+		return ret;
 
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
@@ -1473,7 +1606,7 @@ alloc:
 	return 0;
 
 err_out:
-	free_pt(ppgtt->base.dev, ppgtt->base.scratch_pt);
+	cleanup_scratch(&ppgtt->base);
 	return ret;
 }
 
@@ -1547,10 +1680,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-
 	ppgtt->base.dev = dev;
-	ppgtt->base.scratch_page = dev_priv->gtt.base.scratch_page;
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt);
@@ -2156,45 +2286,6 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
 	vm->cleanup(vm);
 }
 
-#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
-
-static int alloc_scratch_page(struct i915_address_space *vm)
-{
-	struct i915_page_scratch *sp;
-	int ret;
-
-	WARN_ON(vm->scratch_page);
-
-	sp = kzalloc(sizeof(*sp), GFP_KERNEL);
-	if (sp == NULL)
-		return -ENOMEM;
-
-	ret = __setup_page_dma(vm->dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
-	if (ret) {
-		kfree(sp);
-		return ret;
-	}
-
-	fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
-	set_pages_uc(px_page(sp), 1);
-
-	vm->scratch_page = sp;
-
-	return 0;
-}
-
-static void free_scratch_page(struct i915_address_space *vm)
-{
-	struct i915_page_scratch *sp = vm->scratch_page;
-
-	set_pages_wb(px_page(sp), 1);
-
-	cleanup_px(vm->dev, sp);
-	kfree(sp);
-
-	vm->scratch_page = NULL;
-}
-
 static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
 {
 	snb_gmch_ctl >>= SNB_GMCH_GGMS_SHIFT;
@@ -2278,7 +2369,6 @@ static int ggtt_probe_common(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	phys_addr_t gtt_phys_addr;
-	int ret;
 
 	/* For Modern GENs the PTEs and register space are split in the BAR */
 	gtt_phys_addr = pci_resource_start(dev->pdev, 0) +
@@ -2300,14 +2390,7 @@ static int ggtt_probe_common(struct drm_device *dev,
 		return -ENOMEM;
 	}
 
-	ret = alloc_scratch_page(&dev_priv->gtt.base);
-	if (ret) {
-		DRM_ERROR("Scratch setup failed\n");
-		/* iounmap will also get called at remove, but meh */
-		iounmap(dev_priv->gtt.gsm);
-	}
-
-	return ret;
+	return setup_scratch(&dev_priv->gtt.base);
 }
 
 /* The GGTT and PPGTT need a private PPAT setup in order to handle cacheability
@@ -2479,7 +2562,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 	struct i915_gtt *gtt = container_of(vm, struct i915_gtt, base);
 
 	iounmap(gtt->gsm);
-	free_scratch_page(vm);
+	cleanup_scratch(vm);
 }
 
 static int i915_gmch_probe(struct drm_device *dev,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 20/21] drm/i915/gtt: Use nonatomic bitmap ops
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (18 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 19/21] drm/i915/gtt: One instance of scratch page table/directory Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 17:07   ` Michel Thierry
  2015-05-22 17:05 ` [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions Mika Kuoppala
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

There is no need for atomicity here. Convert all bitmap
operations to nonatomic variants.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ab113ce..95c39e5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -842,7 +842,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 
 		gen8_initialize_pt(&ppgtt->base, pt);
 		pd->page_table[pde] = pt;
-		set_bit(pde, new_pts);
+		__set_bit(pde, new_pts);
 	}
 
 	return 0;
@@ -900,7 +900,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 		gen8_initialize_pd(&ppgtt->base, pd);
 		pdp->page_directory[pdpe] = pd;
-		set_bit(pdpe, new_pds);
+		__set_bit(pdpe, new_pds);
 	}
 
 	return 0;
@@ -1040,7 +1040,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 				   gen8_pte_count(pd_start, pd_len));
 
 			/* Our pde is now pointing to the pagetable, pt */
-			set_bit(pde, pd->used_pdes);
+			__set_bit(pde, pd->used_pdes);
 
 			/* Map the PDE to the page table */
 			page_directory[pde] = gen8_pde_encode(px_dma(pt),
@@ -1052,7 +1052,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 
 		kunmap_px(ppgtt, page_directory);
 
-		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		__set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
@@ -1497,7 +1497,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		gen6_initialize_pt(vm, pt);
 
 		ppgtt->pd.page_table[pde] = pt;
-		set_bit(pde, new_page_tables);
+		__set_bit(pde, new_page_tables);
 		trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
 	}
 
@@ -1511,7 +1511,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		bitmap_set(tmp_bitmap, gen6_pte_index(start),
 			   gen6_pte_count(start, length));
 
-		if (test_and_clear_bit(pde, new_page_tables))
+		if (__test_and_clear_bit(pde, new_page_tables))
 			gen6_write_pde(&ppgtt->pd, pde, pt);
 
 		trace_i915_page_table_entry_map(vm, pde, pt,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions
  2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
                   ` (19 preceding siblings ...)
  2015-05-22 17:05 ` [PATCH 20/21] drm/i915/gtt: Use nonatomic bitmap ops Mika Kuoppala
@ 2015-05-22 17:05 ` Mika Kuoppala
  2015-06-03 17:14   ` Michel Thierry
  20 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-05-22 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

Introduce base page handling functions in order of
alloc, free, init. No functional changes.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 95c39e5..24f31ad 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -381,24 +381,6 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
 	fill_page_dma(dev, p, v);
 }
 
-static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
-{
-	cleanup_px(dev, pt);
-	kfree(pt->used_ptes);
-	kfree(pt);
-}
-
-static void gen8_initialize_pt(struct i915_address_space *vm,
-			       struct i915_page_table *pt)
-{
-	gen8_pte_t scratch_pte;
-
-	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
-				      I915_CACHE_LLC, true);
-
-	fill_px(vm->dev, pt, scratch_pte);
-}
-
 static struct i915_page_table *alloc_pt(struct drm_device *dev)
 {
 	struct i915_page_table *pt;
@@ -430,6 +412,24 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
+{
+	cleanup_px(dev, pt);
+	kfree(pt->used_ptes);
+	kfree(pt);
+}
+
+static void gen8_initialize_pt(struct i915_address_space *vm,
+			       struct i915_page_table *pt)
+{
+	gen8_pte_t scratch_pte;
+
+	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+				      I915_CACHE_LLC, true);
+
+	fill_px(vm->dev, pt, scratch_pte);
+}
+
 static void gen6_initialize_pt(struct i915_address_space *vm,
 			       struct i915_page_table *pt)
 {
@@ -441,15 +441,6 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 	fill32_px(vm->dev, pt, scratch_pte);
 }
 
-static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
-{
-	if (px_page(pd)) {
-		cleanup_px(dev, pd);
-		kfree(pd->used_pdes);
-		kfree(pd);
-	}
-}
-
 static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 {
 	struct i915_page_directory *pd;
@@ -478,6 +469,15 @@ free_pd:
 	return ERR_PTR(ret);
 }
 
+static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
+{
+	if (px_page(pd)) {
+		cleanup_px(dev, pd);
+		kfree(pd->used_pdes);
+		kfree(pd);
+	}
+}
+
 static void gen8_initialize_pd(struct i915_address_space *vm,
 			       struct i915_page_directory *pd)
 {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm.
  2015-05-22 17:04 ` [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm Mika Kuoppala
@ 2015-05-26  7:15   ` Daniel Vetter
  2015-06-11 17:38     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Vetter @ 2015-05-26  7:15 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On Fri, May 22, 2015 at 08:04:57PM +0300, Mika Kuoppala wrote:
> We can have exactly 4GB sized ppgtt with 32bit system.
> size_t is inadequate for this.
> 

Is there a

v2: Convert a lot more places (Daniel)

missing here? The patch looks a lot bigger, but not sure ...
-Daniel

> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/char/agp/intel-gtt.c        |  4 ++--
>  drivers/gpu/drm/i915/i915_debugfs.c | 42 ++++++++++++++++++-------------------
>  drivers/gpu/drm/i915/i915_gem.c     |  6 +++---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++----------
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 12 +++++------
>  include/drm/intel-gtt.h             |  4 ++--
>  6 files changed, 45 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 0b4188b..4734d02 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -1408,8 +1408,8 @@ int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
>  }
>  EXPORT_SYMBOL(intel_gmch_probe);
>  
> -void intel_gtt_get(size_t *gtt_total, size_t *stolen_size,
> -		   phys_addr_t *mappable_base, unsigned long *mappable_end)
> +void intel_gtt_get(u64 *gtt_total, size_t *stolen_size,
> +		   phys_addr_t *mappable_base, u64 *mappable_end)
>  {
>  	*gtt_total = intel_private.gtt_total_entries << PAGE_SHIFT;
>  	*stolen_size = intel_private.stolen_size;
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index fece922..c7a840b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -198,7 +198,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct i915_address_space *vm = &dev_priv->gtt.base;
>  	struct i915_vma *vma;
> -	size_t total_obj_size, total_gtt_size;
> +	u64 total_obj_size, total_gtt_size;
>  	int count, ret;
>  
>  	ret = mutex_lock_interruptible(&dev->struct_mutex);
> @@ -231,7 +231,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
>  	}
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
> +	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
>  		   count, total_obj_size, total_gtt_size);
>  	return 0;
>  }
> @@ -253,7 +253,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
>  	struct drm_device *dev = node->minor->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj;
> -	size_t total_obj_size, total_gtt_size;
> +	u64 total_obj_size, total_gtt_size;
>  	LIST_HEAD(stolen);
>  	int count, ret;
>  
> @@ -292,7 +292,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
>  	}
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
> +	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
>  		   count, total_obj_size, total_gtt_size);
>  	return 0;
>  }
> @@ -310,10 +310,10 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
>  
>  struct file_stats {
>  	struct drm_i915_file_private *file_priv;
> -	int count;
> -	size_t total, unbound;
> -	size_t global, shared;
> -	size_t active, inactive;
> +	unsigned long count;
> +	u64 total, unbound;
> +	u64 global, shared;
> +	u64 active, inactive;
>  };
>  
>  static int per_file_stats(int id, void *ptr, void *data)
> @@ -370,7 +370,7 @@ static int per_file_stats(int id, void *ptr, void *data)
>  
>  #define print_file_stats(m, name, stats) do { \
>  	if (stats.count) \
> -		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
> +		seq_printf(m, "%s: %lu objects, %llu bytes (%llu active, %llu inactive, %llu global, %llu shared, %llu unbound)\n", \
>  			   name, \
>  			   stats.count, \
>  			   stats.total, \
> @@ -420,7 +420,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  	struct drm_device *dev = node->minor->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u32 count, mappable_count, purgeable_count;
> -	size_t size, mappable_size, purgeable_size;
> +	u64 size, mappable_size, purgeable_size;
>  	struct drm_i915_gem_object *obj;
>  	struct i915_address_space *vm = &dev_priv->gtt.base;
>  	struct drm_file *file;
> @@ -437,17 +437,17 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  
>  	size = count = mappable_size = mappable_count = 0;
>  	count_objects(&dev_priv->mm.bound_list, global_list);
> -	seq_printf(m, "%u [%u] objects, %zu [%zu] bytes in gtt\n",
> +	seq_printf(m, "%u [%u] objects, %llu [%llu] bytes in gtt\n",
>  		   count, mappable_count, size, mappable_size);
>  
>  	size = count = mappable_size = mappable_count = 0;
>  	count_vmas(&vm->active_list, mm_list);
> -	seq_printf(m, "  %u [%u] active objects, %zu [%zu] bytes\n",
> +	seq_printf(m, "  %u [%u] active objects, %llu [%llu] bytes\n",
>  		   count, mappable_count, size, mappable_size);
>  
>  	size = count = mappable_size = mappable_count = 0;
>  	count_vmas(&vm->inactive_list, mm_list);
> -	seq_printf(m, "  %u [%u] inactive objects, %zu [%zu] bytes\n",
> +	seq_printf(m, "  %u [%u] inactive objects, %llu [%llu] bytes\n",
>  		   count, mappable_count, size, mappable_size);
>  
>  	size = count = purgeable_size = purgeable_count = 0;
> @@ -456,7 +456,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  		if (obj->madv == I915_MADV_DONTNEED)
>  			purgeable_size += obj->base.size, ++purgeable_count;
>  	}
> -	seq_printf(m, "%u unbound objects, %zu bytes\n", count, size);
> +	seq_printf(m, "%u unbound objects, %llu bytes\n", count, size);
>  
>  	size = count = mappable_size = mappable_count = 0;
>  	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> @@ -473,16 +473,16 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  			++purgeable_count;
>  		}
>  	}
> -	seq_printf(m, "%u purgeable objects, %zu bytes\n",
> +	seq_printf(m, "%u purgeable objects, %llu bytes\n",
>  		   purgeable_count, purgeable_size);
> -	seq_printf(m, "%u pinned mappable objects, %zu bytes\n",
> +	seq_printf(m, "%u pinned mappable objects, %llu bytes\n",
>  		   mappable_count, mappable_size);
> -	seq_printf(m, "%u fault mappable objects, %zu bytes\n",
> +	seq_printf(m, "%u fault mappable objects, %llu bytes\n",
>  		   count, size);
>  
> -	seq_printf(m, "%zu [%lu] gtt total\n",
> +	seq_printf(m, "%llu [%llu] gtt total\n",
>  		   dev_priv->gtt.base.total,
> -		   dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
> +		   (u64)dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
>  
>  	seq_putc(m, '\n');
>  	print_batch_pool_stats(m, dev_priv);
> @@ -519,7 +519,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
>  	uintptr_t list = (uintptr_t) node->info_ent->data;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj;
> -	size_t total_obj_size, total_gtt_size;
> +	u64 total_obj_size, total_gtt_size;
>  	int count, ret;
>  
>  	ret = mutex_lock_interruptible(&dev->struct_mutex);
> @@ -541,7 +541,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
>  
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
> +	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
>  		   count, total_obj_size, total_gtt_size);
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index cc206f1..25e375c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3671,9 +3671,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	struct drm_device *dev = obj->base.dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u32 size, fence_size, fence_alignment, unfenced_alignment;
> -	unsigned long start =
> +	u64 start =
>  		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> -	unsigned long end =
> +	u64 end =
>  		flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
>  	struct i915_vma *vma;
>  	int ret;
> @@ -3729,7 +3729,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	 * attempt to find space.
>  	 */
>  	if (size > end) {
> -		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%lu\n",
> +		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
>  			  ggtt_view ? ggtt_view->type : 0,
>  			  size,
>  			  flags & PIN_MAPPABLE ? "mappable" : "total",
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 76de781..c61de4a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2147,7 +2147,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
>  void i915_gem_init_global_gtt(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	unsigned long gtt_size, mappable_size;
> +	u64 gtt_size, mappable_size;
>  
>  	gtt_size = dev_priv->gtt.base.total;
>  	mappable_size = dev_priv->gtt.mappable_end;
> @@ -2402,13 +2402,13 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
>  }
>  
>  static int gen8_gmch_probe(struct drm_device *dev,
> -			   size_t *gtt_total,
> +			   u64 *gtt_total,
>  			   size_t *stolen,
>  			   phys_addr_t *mappable_base,
> -			   unsigned long *mappable_end)
> +			   u64 *mappable_end)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	unsigned int gtt_size;
> +	u64 gtt_size;
>  	u16 snb_gmch_ctl;
>  	int ret;
>  
> @@ -2450,10 +2450,10 @@ static int gen8_gmch_probe(struct drm_device *dev,
>  }
>  
>  static int gen6_gmch_probe(struct drm_device *dev,
> -			   size_t *gtt_total,
> +			   u64 *gtt_total,
>  			   size_t *stolen,
>  			   phys_addr_t *mappable_base,
> -			   unsigned long *mappable_end)
> +			   u64 *mappable_end)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	unsigned int gtt_size;
> @@ -2467,7 +2467,7 @@ static int gen6_gmch_probe(struct drm_device *dev,
>  	 * a coarse sanity check.
>  	 */
>  	if ((*mappable_end < (64<<20) || (*mappable_end > (512<<20)))) {
> -		DRM_ERROR("Unknown GMADR size (%lx)\n",
> +		DRM_ERROR("Unknown GMADR size (%llx)\n",
>  			  dev_priv->gtt.mappable_end);
>  		return -ENXIO;
>  	}
> @@ -2501,10 +2501,10 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>  }
>  
>  static int i915_gmch_probe(struct drm_device *dev,
> -			   size_t *gtt_total,
> +			   u64 *gtt_total,
>  			   size_t *stolen,
>  			   phys_addr_t *mappable_base,
> -			   unsigned long *mappable_end)
> +			   u64 *mappable_end)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret;
> @@ -2569,9 +2569,9 @@ int i915_gem_gtt_init(struct drm_device *dev)
>  	gtt->base.dev = dev;
>  
>  	/* GMADR is the PCI mmio aperture into the global GTT. */
> -	DRM_INFO("Memory usable by graphics device = %zdM\n",
> +	DRM_INFO("Memory usable by graphics device = %lluM\n",
>  		 gtt->base.total >> 20);
> -	DRM_DEBUG_DRIVER("GMADR size = %ldM\n", gtt->mappable_end >> 20);
> +	DRM_DEBUG_DRIVER("GMADR size = %lldM\n", gtt->mappable_end >> 20);
>  	DRM_DEBUG_DRIVER("GTT stolen size = %zdM\n", gtt->stolen_size >> 20);
>  #ifdef CONFIG_INTEL_IOMMU
>  	if (intel_iommu_gfx_mapped)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 0d46dd2..c343161 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -233,8 +233,8 @@ struct i915_address_space {
>  	struct drm_mm mm;
>  	struct drm_device *dev;
>  	struct list_head global_link;
> -	unsigned long start;		/* Start offset always 0 for dri2 */
> -	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
> +	u64 start;		/* Start offset always 0 for dri2 */
> +	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>  
>  	struct {
>  		dma_addr_t addr;
> @@ -300,9 +300,9 @@ struct i915_address_space {
>   */
>  struct i915_gtt {
>  	struct i915_address_space base;
> -	size_t stolen_size;		/* Total size of stolen memory */
>  
> -	unsigned long mappable_end;	/* End offset that we can CPU map */
> +	size_t stolen_size;		/* Total size of stolen memory */
> +	u64 mappable_end;		/* End offset that we can CPU map */
>  	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
>  	phys_addr_t mappable_base;	/* PA of our GMADR */
>  
> @@ -314,9 +314,9 @@ struct i915_gtt {
>  	int mtrr;
>  
>  	/* global gtt ops */
> -	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
> +	int (*gtt_probe)(struct drm_device *dev, u64 *gtt_total,
>  			  size_t *stolen, phys_addr_t *mappable_base,
> -			  unsigned long *mappable_end);
> +			  u64 *mappable_end);
>  };
>  
>  struct i915_hw_ppgtt {
> diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
> index b08bdad..9e9bddaa5 100644
> --- a/include/drm/intel-gtt.h
> +++ b/include/drm/intel-gtt.h
> @@ -3,8 +3,8 @@
>  #ifndef _DRM_INTEL_GTT_H
>  #define	_DRM_INTEL_GTT_H
>  
> -void intel_gtt_get(size_t *gtt_total, size_t *stolen_size,
> -		   phys_addr_t *mappable_base, unsigned long *mappable_end);
> +void intel_gtt_get(u64 *gtt_total, size_t *stolen_size,
> +		   phys_addr_t *mappable_base, u64 *mappable_end);
>  
>  int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
>  		     struct agp_bridge_data *bridge);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-05-22 17:05 ` [PATCH 15/21] drm/i915/gtt: Fill scratch page Mika Kuoppala
@ 2015-05-27 18:12   ` Tomas Elf
  2015-06-01 15:53     ` Chris Wilson
  2015-06-11 16:37     ` Mika Kuoppala
  2015-06-03 14:03   ` Michel Thierry
  1 sibling, 2 replies; 86+ messages in thread
From: Tomas Elf @ 2015-05-27 18:12 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: miku

On 22/05/2015 18:05, Mika Kuoppala wrote:
> During review of dynamic page tables series, I was able
> to hit a lite restore bug with execlists. I assume that
> due to incorrect pd, the batch run out of legit address space
> and into the scratch page area. The ACTHD was increasing
> due to scratch being all zeroes (MI_NOOPs). And as gen8
> address space is quite large, the hangcheck happily waited
> for a long long time, keeping the process effectively stuck.
>
> According to Chris Wilson any modern gpu will grind to halt
> if it encounters commands of all ones. This seemed to do the
> trick and hang was declared promptly when the gpu wandered into
> the scratch land.
>
> v2: Use 0xffff00ff pattern (Chris)

Just for my own benefit:

1. Is there any particular reason for this pattern rather than 0xffffffff?

2. Someone please correct me if I'm wrong here but at least based on my 
own experiences with gen9 submitting batch buffers filled with bad 
instructions (0xffffffff) to the GPU does not hang it. I'm guessing that 
is because there's allegedly a hardware security parser that MI_NOOPs 
out invalid instructions during execution. If that's the case here then 
I guess we might have to come up with something else for gen9+ if we 
want to induce engine hangs once the execution reaches the scratch page?

On the other hand, on gen9+ page faulting is supposedly not broken 
anymore so maybe we don't need the scratch page to begin with there so 
maybe it's all moot at that point? Again, if I'm making no sense here 
feel free to set things straight, I'm very curious about how all of this 
is supposed to work.

Thanks,
Tomas

>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 43fa543..a2a0c88 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2168,6 +2168,8 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>   	vm->cleanup(vm);
>   }
>
> +#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
> +
>   static int alloc_scratch_page(struct i915_address_space *vm)
>   {
>   	struct i915_page_scratch *sp;
> @@ -2185,6 +2187,7 @@ static int alloc_scratch_page(struct i915_address_space *vm)
>   		return ret;
>   	}
>
> +	fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
>   	set_pages_uc(px_page(sp), 1);
>
>   	vm->scratch_page = sp;
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-05-22 17:04 ` [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps Mika Kuoppala
@ 2015-05-29 11:05   ` Michel Thierry
  2015-05-29 12:53     ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-05-29 11:05 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> With BDW/SKL and 32bit addressing mode only, the hardware preloads
> pdps. However the TLB invalidation only has effect on levels below
> the pdps. This means that if pdps change, hw might access with
> stale pdp entry.
>
> To combat this problem, preallocate the top pdps so that hw sees
> them as immutable for each context.
>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50 +++++++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>   3 files changed, 68 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0ffd459..1a5ad4c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -941,6 +941,48 @@ err_out:
>          return ret;
>   }
>
> +/* With some architectures and 32bit legacy mode, hardware pre-loads the
> + * top level pdps but the tlb invalidation only invalidates the lower levels.
> + * This might lead to hw fetching with stale pdp entries if top level
> + * structure changes, ie va space grows with dynamic page tables.
> + */
> +static bool hw_wont_flush_pdp_tlbs(struct i915_hw_ppgtt *ppgtt)
> +{
> +       struct drm_device *dev = ppgtt->base.dev;
> +
> +       if (GEN8_CTX_ADDRESSING_MODE != LEGACY_32B_CONTEXT)
> +               return false;
> +
> +       if (IS_BROADWELL(dev) || IS_SKYLAKE(dev))
> +               return true;
The pd load restriction is also true for chv and bxt.
And to be safe, we can set reg 0x4030 bit14 to '1' (PD load disable). 
Since this register is not part of the context state, it can be added 
with the other platform workarounds in intel_pm.c.

> +
> +       return false;
> +}
> +
> +static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
> +{
> +       unsigned long *new_page_dirs, **new_page_tables;
> +       int ret;
> +
> +       /* We allocate temp bitmap for page tables for no gain
> +        * but as this is for init only, lets keep the things simple
> +        */
> +       ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
> +       if (ret)
> +               return ret;
> +
> +       /* Allocate for all pdps regardless of how the ppgtt
> +        * was defined.
> +        */
> +       ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp,
> +                                               0, 1ULL << 32,
> +                                               new_page_dirs);
> +
> +       free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +
> +       return ret;
> +}
> +
>   /*
>    * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
>    * with a net effect resembling a 2-level page table in normal x86 terms. Each
> @@ -972,6 +1014,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
>          ppgtt->switch_mm = gen8_mm_switch;
>
> +       if (hw_wont_flush_pdp_tlbs(ppgtt)) {
> +               /* Avoid the tlb flush bug by preallocating
> +                * whole top level pdp structure so it stays
> +                * static even if our va space grows.
> +                */
> +               return gen8_preallocate_top_level_pdps(ppgtt);
> +       }
> +
>          return 0;
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 6eeba63..334324b 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -2777,6 +2777,23 @@ enum skl_disp_power_wells {
>   #define VLV_CLK_CTL2                   0x101104
>   #define   CLK_CTL2_CZCOUNT_30NS_SHIFT  28
>
> +/* Context descriptor format bits */
> +#define GEN8_CTX_VALID                 (1<<0)
> +#define GEN8_CTX_FORCE_PD_RESTORE      (1<<1)
> +#define GEN8_CTX_FORCE_RESTORE         (1<<2)
> +#define GEN8_CTX_L3LLC_COHERENT                (1<<5)
> +#define GEN8_CTX_PRIVILEGE             (1<<8)
> +
> +enum {
> +       ADVANCED_CONTEXT = 0,
> +       LEGACY_32B_CONTEXT,
> +       ADVANCED_AD_CONTEXT,
> +       LEGACY_64B_CONTEXT
> +};
> +
> +#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
> +#define GEN8_CTX_ADDRESSING_MODE       LEGACY_32B_CONTEXT
> +
>   /*
>    * Overlay regs
>    */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 96ae90a..d793d4e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -183,12 +183,6 @@
>   #define CTX_R_PWR_CLK_STATE            0x42
>   #define CTX_GPGPU_CSR_BASE_ADDRESS     0x44
>
> -#define GEN8_CTX_VALID (1<<0)
> -#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
> -#define GEN8_CTX_FORCE_RESTORE (1<<2)
> -#define GEN8_CTX_L3LLC_COHERENT (1<<5)
> -#define GEN8_CTX_PRIVILEGE (1<<8)
> -
>   #define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
>          const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
>                  ppgtt->pdp.page_directory[n]->daddr : \
> @@ -198,13 +192,6 @@
>   }
>
>   enum {
> -       ADVANCED_CONTEXT = 0,
> -       LEGACY_CONTEXT,
> -       ADVANCED_AD_CONTEXT,
> -       LEGACY_64B_CONTEXT
> -};
> -#define GEN8_CTX_MODE_SHIFT 3
> -enum {
>          FAULT_AND_HANG = 0,
>          FAULT_AND_HALT, /* Debug only */
>          FAULT_AND_STREAM,
> @@ -273,7 +260,7 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
>          WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>
>          desc = GEN8_CTX_VALID;
> -       desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
> +       desc |= GEN8_CTX_ADDRESSING_MODE << GEN8_CTX_ADDRESSING_MODE_SHIFT;
>          if (IS_GEN8(ctx_obj->base.dev))
>                  desc |= GEN8_CTX_L3LLC_COHERENT;
>          desc |= GEN8_CTX_PRIVILEGE;
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-05-29 11:05   ` Michel Thierry
@ 2015-05-29 12:53     ` Michel Thierry
  2015-06-10 11:42       ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-05-29 12:53 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/29/2015 12:05 PM, Michel Thierry wrote:
> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>> pdps. However the TLB invalidation only has effect on levels below
>> the pdps. This means that if pdps change, hw might access with
>> stale pdp entry.
>>
>> To combat this problem, preallocate the top pdps so that hw sees
>> them as immutable for each context.
>>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50 
>> +++++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>   3 files changed, 68 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 0ffd459..1a5ad4c 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -941,6 +941,48 @@ err_out:
>>          return ret;
>>   }
>>
>> +/* With some architectures and 32bit legacy mode, hardware pre-loads 
>> the
>> + * top level pdps but the tlb invalidation only invalidates the 
>> lower levels.
>> + * This might lead to hw fetching with stale pdp entries if top level
>> + * structure changes, ie va space grows with dynamic page tables.
>> + */
>> +static bool hw_wont_flush_pdp_tlbs(struct i915_hw_ppgtt *ppgtt)
>> +{
>> +       struct drm_device *dev = ppgtt->base.dev;
>> +
>> +       if (GEN8_CTX_ADDRESSING_MODE != LEGACY_32B_CONTEXT)
>> +               return false;
>> +
>> +       if (IS_BROADWELL(dev) || IS_SKYLAKE(dev))
>> +               return true;
> The pd load restriction is also true for chv and bxt.
> And to be safe, we can set reg 0x4030 bit14 to '1' (PD load disable). 
> Since this register is not part of the context state, it can be added 
> with the other platform workarounds in intel_pm.c.
>
>> +
>> +       return false;
>> +}
>> +
>> +static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
>> +{
>> +       unsigned long *new_page_dirs, **new_page_tables;
>> +       int ret;
>> +
>> +       /* We allocate temp bitmap for page tables for no gain
>> +        * but as this is for init only, lets keep the things simple
>> +        */
>> +       ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
>> +       if (ret)
>> +               return ret;
>> +
>> +       /* Allocate for all pdps regardless of how the ppgtt
>> +        * was defined.
>> +        */
>> +       ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp,
>> +                                               0, 1ULL << 32,
>> +                                               new_page_dirs);
>> +
>> +       free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>> +
>> +       return ret;
>> +}
>> +
>>   /*
>>    * GEN8 legacy ppgtt programming is accomplished through a max 4 
>> PDP registers
>>    * with a net effect resembling a 2-level page table in normal x86 
>> terms. Each
>> @@ -972,6 +1014,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt 
>> *ppgtt)
>>
>>          ppgtt->switch_mm = gen8_mm_switch;
>>
>> +       if (hw_wont_flush_pdp_tlbs(ppgtt)) {
>> +               /* Avoid the tlb flush bug by preallocating
>> +                * whole top level pdp structure so it stays
>> +                * static even if our va space grows.
>> +                */
>> +               return gen8_preallocate_top_level_pdps(ppgtt);
>> +       }
>> +
Also, we will need the same hw_wont_flush check in the cleanup function, 
and iterate each_pdpe (pd) from 0 to 4GiB (otherwise we will leak some 
of the preallocated page dirs).

>>          return 0;
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h 
>> b/drivers/gpu/drm/i915/i915_reg.h
>> index 6eeba63..334324b 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -2777,6 +2777,23 @@ enum skl_disp_power_wells {
>>   #define VLV_CLK_CTL2                   0x101104
>>   #define   CLK_CTL2_CZCOUNT_30NS_SHIFT  28
>>
>> +/* Context descriptor format bits */
>> +#define GEN8_CTX_VALID                 (1<<0)
>> +#define GEN8_CTX_FORCE_PD_RESTORE      (1<<1)
>> +#define GEN8_CTX_FORCE_RESTORE         (1<<2)
>> +#define GEN8_CTX_L3LLC_COHERENT                (1<<5)
>> +#define GEN8_CTX_PRIVILEGE             (1<<8)
>> +
>> +enum {
>> +       ADVANCED_CONTEXT = 0,
>> +       LEGACY_32B_CONTEXT,
>> +       ADVANCED_AD_CONTEXT,
>> +       LEGACY_64B_CONTEXT
>> +};
>> +
>> +#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
>> +#define GEN8_CTX_ADDRESSING_MODE       LEGACY_32B_CONTEXT
>> +
>>   /*
>>    * Overlay regs
>>    */
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 96ae90a..d793d4e 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -183,12 +183,6 @@
>>   #define CTX_R_PWR_CLK_STATE            0x42
>>   #define CTX_GPGPU_CSR_BASE_ADDRESS     0x44
>>
>> -#define GEN8_CTX_VALID (1<<0)
>> -#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
>> -#define GEN8_CTX_FORCE_RESTORE (1<<2)
>> -#define GEN8_CTX_L3LLC_COHERENT (1<<5)
>> -#define GEN8_CTX_PRIVILEGE (1<<8)
>> -
>>   #define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
>>          const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
>>                  ppgtt->pdp.page_directory[n]->daddr : \
>> @@ -198,13 +192,6 @@
>>   }
>>
>>   enum {
>> -       ADVANCED_CONTEXT = 0,
>> -       LEGACY_CONTEXT,
>> -       ADVANCED_AD_CONTEXT,
>> -       LEGACY_64B_CONTEXT
>> -};
>> -#define GEN8_CTX_MODE_SHIFT 3
>> -enum {
>>          FAULT_AND_HANG = 0,
>>          FAULT_AND_HALT, /* Debug only */
>>          FAULT_AND_STREAM,
>> @@ -273,7 +260,7 @@ static uint64_t execlists_ctx_descriptor(struct 
>> intel_engine_cs *ring,
>>          WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>>
>>          desc = GEN8_CTX_VALID;
>> -       desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
>> +       desc |= GEN8_CTX_ADDRESSING_MODE << 
>> GEN8_CTX_ADDRESSING_MODE_SHIFT;
>>          if (IS_GEN8(ctx_obj->base.dev))
>>                  desc |= GEN8_CTX_L3LLC_COHERENT;
>>          desc |= GEN8_CTX_PRIVILEGE;
>> -- 
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+
  2015-05-22 17:04 ` [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+ Mika Kuoppala
@ 2015-06-01 14:51   ` Joonas Lahtinen
  2015-06-11 17:37     ` Mika Kuoppala
  2015-06-01 15:52   ` Michel Thierry
  1 sibling, 1 reply; 86+ messages in thread
From: Joonas Lahtinen @ 2015-06-01 14:51 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
> When we touch gen8+ page maps, mark them dirty like we
> do with previous gens.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 17b7df0..0ffd459 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -830,6 +830,15 @@ err_out:
>  	return -ENOMEM;
>  }
>  
> +/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> + * are switching between contexts with the same LRCA, we also must do a force
> + * restore.
> + */

I think the comment could be updated now that it us used with GEN8 too.

Regards, joonas

> +static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> +{
> +	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> +}
> +
>  static int gen8_alloc_va_range(struct i915_address_space *vm,
>  			       uint64_t start,
>  			       uint64_t length)
> @@ -915,6 +924,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>  	}
>  
>  	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +	mark_tlbs_dirty(ppgtt);
>  	return 0;
>  
>  err_out:
> @@ -927,6 +937,7 @@ err_out:
>  		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
>  
>  	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +	mark_tlbs_dirty(ppgtt);
>  	return ret;
>  }
>  
> @@ -1260,16 +1271,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  		kunmap_atomic(pt_vaddr);
>  }
>  
> -/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> - * are switching between contexts with the same LRCA, we also must do a force
> - * restore.
> - */
> -static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> -{
> -	/* If current vm != vm, */
> -	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> -}
> -
>  static void gen6_initialize_pt(struct i915_address_space *vm,
>  		struct i915_page_table *pt)
>  {


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error
  2015-05-22 17:04 ` [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error Mika Kuoppala
@ 2015-06-01 15:02   ` Joonas Lahtinen
  2015-06-15 10:13     ` Daniel Vetter
  0 siblings, 1 reply; 86+ messages in thread
From: Joonas Lahtinen @ 2015-06-01 15:02 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
> Free the scratch page if dma mapping fails.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index c61de4a..a608b1b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2191,8 +2191,10 @@ static int setup_scratch_page(struct drm_device *dev)
>  #ifdef CONFIG_INTEL_IOMMU
>  	dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
>  				PCI_DMA_BIDIRECTIONAL);
> -	if (pci_dma_mapping_error(dev->pdev, dma_addr))
> +	if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
> +		__free_page(page);
>  		return -EINVAL;
> +	}
>  #else
>  	dma_addr = page_to_phys(page);
>  #endif


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/21] drm/i915/gtt: Check va range against vm size
  2015-05-22 17:04 ` [PATCH 03/21] drm/i915/gtt: Check va range against vm size Mika Kuoppala
@ 2015-06-01 15:33   ` Joonas Lahtinen
  2015-06-11 14:23     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Joonas Lahtinen @ 2015-06-01 15:33 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
> Check the allocation area against the known end
> of address space instead of against fixed value.
> 
> v2: Return ENODEV on internal bugs (Chris)
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 1a5ad4c..76de781 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -756,9 +756,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>  
>  	WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
>  
> -	/* FIXME: upper bound must not overflow 32 bits  */
> -	WARN_ON((start + length) > (1ULL << 32));
> -
>  	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
>  		if (pd)
>  			continue;
> @@ -857,7 +854,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>  	 * actually use the other side of the canonical address space.
>  	 */
>  	if (WARN_ON(start + length < start))
> -		return -ERANGE;
> +		return -ENODEV;
> +
> +	if (WARN_ON(start + length > ppgtt->base.total))
> +		return -ENODEV;
>  
>  	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
>  	if (ret)
> @@ -1341,7 +1341,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>  }
>  
>  static int gen6_alloc_va_range(struct i915_address_space *vm,
> -			       uint64_t start, uint64_t length)
> +			       uint64_t start_in, uint64_t length_in)
>  {
>  	DECLARE_BITMAP(new_page_tables, I915_PDES);
>  	struct drm_device *dev = vm->dev;
> @@ -1349,11 +1349,15 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  				container_of(vm, struct i915_hw_ppgtt, base);
>  	struct i915_page_table *pt;
> -	const uint32_t start_save = start, length_save = length;
> +	uint32_t start, length, start_save, length_save;
>  	uint32_t pde, temp;
>  	int ret;
>  
> -	WARN_ON(upper_32_bits(start));
> +	if (WARN_ON(start_in + length_in > ppgtt->base.total))
> +		return -ENODEV;
> +
> +	start = start_save = start_in;
> +	length = length_save = length_in;

Why is it not enough just to change the WARN_ON test?

>  
>  	bitmap_zero(new_page_tables, I915_PDES);
>  


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+
  2015-05-22 17:04 ` [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+ Mika Kuoppala
  2015-06-01 14:51   ` Joonas Lahtinen
@ 2015-06-01 15:52   ` Michel Thierry
  1 sibling, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-01 15:52 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> When we touch gen8+ page maps, mark them dirty like we
> do with previous gens.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 17b7df0..0ffd459 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -830,6 +830,15 @@ err_out:
>          return -ENOMEM;
>   }
>
> +/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> + * are switching between contexts with the same LRCA, we also must do a force
> + * restore.
> + */
> +static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> +{
> +       ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> +}
> +
>   static int gen8_alloc_va_range(struct i915_address_space *vm,
>                                 uint64_t start,
>                                 uint64_t length)
> @@ -915,6 +924,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>          }
>
>          free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +       mark_tlbs_dirty(ppgtt);
>          return 0;
>
>   err_out:
> @@ -927,6 +937,7 @@ err_out:
>                  unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
>
>          free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +       mark_tlbs_dirty(ppgtt);
>          return ret;
>   }
This seems to be for legacy submission only.
In execlists, it's true we cannot use Force PD Restore, but we could use 
the more expensive (but functional) Force Restore.
It could by handy (not for PDP updates which your next patch 
pre-allocates), in case there's something odd in the ctx pd/pt update 
that we haven't seen yet.

>
> @@ -1260,16 +1271,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>                  kunmap_atomic(pt_vaddr);
>   }
>
> -/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> - * are switching between contexts with the same LRCA, we also must do a force
> - * restore.
> - */
> -static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> -{
> -       /* If current vm != vm, */
> -       ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> -}
> -
>   static void gen6_initialize_pt(struct i915_address_space *vm,
>                  struct i915_page_table *pt)
>   {
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-05-27 18:12   ` Tomas Elf
@ 2015-06-01 15:53     ` Chris Wilson
  2015-06-04 11:08       ` Tomas Elf
  2015-06-11 16:37     ` Mika Kuoppala
  1 sibling, 1 reply; 86+ messages in thread
From: Chris Wilson @ 2015-06-01 15:53 UTC (permalink / raw)
  To: Tomas Elf; +Cc: intel-gfx, miku

On Wed, May 27, 2015 at 07:12:02PM +0100, Tomas Elf wrote:
> On 22/05/2015 18:05, Mika Kuoppala wrote:
> >During review of dynamic page tables series, I was able
> >to hit a lite restore bug with execlists. I assume that
> >due to incorrect pd, the batch run out of legit address space
> >and into the scratch page area. The ACTHD was increasing
> >due to scratch being all zeroes (MI_NOOPs). And as gen8
> >address space is quite large, the hangcheck happily waited
> >for a long long time, keeping the process effectively stuck.
> >
> >According to Chris Wilson any modern gpu will grind to halt
> >if it encounters commands of all ones. This seemed to do the
> >trick and hang was declared promptly when the gpu wandered into
> >the scratch land.
> >
> >v2: Use 0xffff00ff pattern (Chris)
> 
> Just for my own benefit:
> 
> 1. Is there any particular reason for this pattern rather than 0xffffffff?

It is more obvious when userspace reads from the page and copies it into
its own data structures or surfaces. See below, if this does impact
userspace we should probably revert this patch anyway.
 
> 2. Someone please correct me if I'm wrong here but at least based on
> my own experiences with gen9 submitting batch buffers filled with
> bad instructions (0xffffffff) to the GPU does not hang it. I'm
> guessing that is because there's allegedly a hardware security
> parser that MI_NOOPs out invalid instructions during execution. If
> that's the case here then I guess we might have to come up with
> something else for gen9+ if we want to induce engine hangs once the
> execution reaches the scratch page?

It's not a problem, there will be a GPU hang eventually (in theory at
least). Mika is just trying to shortcircuit that by causing an immediate
hang.
 
> On the other hand, on gen9+ page faulting is supposedly not broken
> anymore so maybe we don't need the scratch page to begin with there
> so maybe it's all moot at that point? Again, if I'm making no sense
> here feel free to set things straight, I'm very curious about how
> all of this is supposed to work.

Generating a pagefault for invalid access is an ABI change and requires
opt-in (we have discussed context flags in the past). The most obvious
example is the CS prefetch, which we have to prevent generating faults by
providing guard pages (on older chipsets at least). But as we have been
historically lax on allowing userspace to access invalid pages, we have
to assume that userspace has been taking advantage of that.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator
  2015-05-22 17:04 ` [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator Mika Kuoppala
@ 2015-06-02  9:53   ` Joonas Lahtinen
  2015-06-02  9:56   ` Michel Thierry
  1 sibling, 0 replies; 86+ messages in thread
From: Joonas Lahtinen @ 2015-06-02  9:53 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
> We are always allocating a single page. No need to be verbose so
> remove the suffix.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index a608b1b..4cf47f9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -369,7 +369,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>  	kunmap_atomic(pt_vaddr);
>  }
>  
> -static struct i915_page_table *alloc_pt_single(struct drm_device *dev)
> +static struct i915_page_table *alloc_pt(struct drm_device *dev)
>  {
>  	struct i915_page_table *pt;
>  	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
> @@ -417,7 +417,7 @@ static void unmap_and_free_pd(struct i915_page_directory *pd,
>  	}
>  }
>  
> -static struct i915_page_directory *alloc_pd_single(struct drm_device *dev)
> +static struct i915_page_directory *alloc_pd(struct drm_device *dev)
>  {
>  	struct i915_page_directory *pd;
>  	int ret = -ENOMEM;
> @@ -702,7 +702,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>  			continue;
>  		}
>  
> -		pt = alloc_pt_single(dev);
> +		pt = alloc_pt(dev);
>  		if (IS_ERR(pt))
>  			goto unwind_out;
>  
> @@ -760,7 +760,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>  		if (pd)
>  			continue;
>  
> -		pd = alloc_pd_single(dev);
> +		pd = alloc_pd(dev);
>  		if (IS_ERR(pd))
>  			goto unwind_out;
>  
> @@ -992,11 +992,11 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
>   */
>  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  {
> -	ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev);
> +	ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
>  	if (IS_ERR(ppgtt->scratch_pt))
>  		return PTR_ERR(ppgtt->scratch_pt);
>  
> -	ppgtt->scratch_pd = alloc_pd_single(ppgtt->base.dev);
> +	ppgtt->scratch_pd = alloc_pd(ppgtt->base.dev);
>  	if (IS_ERR(ppgtt->scratch_pd))
>  		return PTR_ERR(ppgtt->scratch_pd);
>  
> @@ -1375,7 +1375,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>  		/* We've already allocated a page table */
>  		WARN_ON(!bitmap_empty(pt->used_ptes, GEN6_PTES));
>  
> -		pt = alloc_pt_single(dev);
> +		pt = alloc_pt(dev);
>  		if (IS_ERR(pt)) {
>  			ret = PTR_ERR(pt);
>  			goto unwind_out;
> @@ -1461,7 +1461,7 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
>  	 * size. We allocate at the top of the GTT to avoid fragmentation.
>  	 */
>  	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
> -	ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev);
> +	ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
>  	if (IS_ERR(ppgtt->scratch_pt))
>  		return PTR_ERR(ppgtt->scratch_pt);
>  


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator
  2015-05-22 17:04 ` [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator Mika Kuoppala
  2015-06-02  9:53   ` Joonas Lahtinen
@ 2015-06-02  9:56   ` Michel Thierry
  2015-06-15 10:14     ` Daniel Vetter
  1 sibling, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-02  9:56 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> We are always allocating a single page. No need to be verbose so
> remove the suffix.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
I saw another of your patches will take care of 
i915_dma_map_single/i915_dma_unmap_single...

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 07/21] drm/i915/gtt: Introduce i915_page_dir_dma_addr
  2015-05-22 17:05 ` [PATCH 07/21] drm/i915/gtt: Introduce i915_page_dir_dma_addr Mika Kuoppala
@ 2015-06-02 10:11   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-02 10:11 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> The legacy mode mm switch and the execlist context assignment
> needs dma address for the page directories.
>
> Introduce a function that encapsulates the scratch_pd dma
> fallback if no pd is found.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
>   drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++++++++
>   drivers/gpu/drm/i915/intel_lrc.c    | 4 +---
>   3 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 4cf47f9..18989f7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -481,10 +481,8 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>          int i, ret;
>
>          for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
> -               struct i915_page_directory *pd = ppgtt->pdp.page_directory[i];
> -               dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
> -               /* The page directory might be NULL, but we need to clear out
> -                * whatever the previous context might have used. */
> +               const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
> +
>                  ret = gen8_write_pdp(ring, i, pd_daddr);
>                  if (ret)
>                          return ret;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index c343161..da67542 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -468,6 +468,14 @@ static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
>          return i915_pte_count(address, length, GEN8_PDE_SHIFT);
>   }
>
> +static inline dma_addr_t
> +i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
> +{
> +       return test_bit(n, ppgtt->pdp.used_pdpes) ?
> +               ppgtt->pdp.page_directory[n]->daddr :
> +               ppgtt->scratch_pd->daddr;
> +}
> +
>   int i915_gem_gtt_init(struct drm_device *dev);
>   void i915_gem_init_global_gtt(struct drm_device *dev);
>   void i915_global_gtt_cleanup(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d793d4e..626949a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -184,9 +184,7 @@
>   #define CTX_GPGPU_CSR_BASE_ADDRESS     0x44
>
>   #define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
> -       const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
> -               ppgtt->pdp.page_directory[n]->daddr : \
> -               ppgtt->scratch_pd->daddr; \
> +       const u64 _addr = i915_page_dir_dma_addr((ppgtt), (n)); \
>          reg_state[CTX_PDP ## n ## _UDW+1] = upper_32_bits(_addr); \
>          reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
>   }
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma
  2015-05-22 17:05 ` [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma Mika Kuoppala
@ 2015-06-02 12:39   ` Michel Thierry
  2015-06-11 17:48     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-02 12:39 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> All our paging structures have struct page and dma address
> for that page.
>
> Add struct for page/dma address pairs and use it to make
> the setup and teardown for different paging structures
> identical.
>
> Include the page directory offset also in the struct for legacy
> gens. Rename it to clearly point out that it is offset into the
> ggtt.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c |   2 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 120 ++++++++++++++----------------------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  21 ++++---
>   3 files changed, 60 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index c7a840b..22770aa 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2245,7 +2245,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>                  struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>
>                  seq_puts(m, "aliasing PPGTT:\n");
> -               seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
> +               seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.base.ggtt_offset);
>
>                  ppgtt->debug_dump(ppgtt, m);
>          }
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 18989f7..1e1a7a1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -301,52 +301,39 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
>          return pte;
>   }
>
> -#define i915_dma_unmap_single(px, dev) \
> -       __i915_dma_unmap_single((px)->daddr, dev)
> -
> -static void __i915_dma_unmap_single(dma_addr_t daddr,
> -                                   struct drm_device *dev)
> +static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   {
>          struct device *device = &dev->pdev->dev;
>
> -       dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
> -}
> -
> -/**
> - * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
> - * @px:        Page table/dir/etc to get a DMA map for
> - * @dev:       drm device
> - *
> - * Page table allocations are unified across all gens. They always require a
> - * single 4k allocation, as well as a DMA mapping. If we keep the structs
> - * symmetric here, the simple macro covers us for every page table type.
> - *
> - * Return: 0 if success.
> - */
> -#define i915_dma_map_single(px, dev) \
> -       i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
> +       p->page = alloc_page(GFP_KERNEL);
> +       if (!p->page)
> +               return -ENOMEM;
>
> -static int i915_dma_map_page_single(struct page *page,
> -                                   struct drm_device *dev,
> -                                   dma_addr_t *daddr)
> -{
> -       struct device *device = &dev->pdev->dev;
> +       p->daddr = dma_map_page(device,
> +                               p->page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
>
> -       *daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
> -       if (dma_mapping_error(device, *daddr))
> -               return -ENOMEM;
> +       if (dma_mapping_error(device, p->daddr)) {
> +               __free_page(p->page);
> +               return -EINVAL;
> +       }
>
>          return 0;
>   }
>
> -static void unmap_and_free_pt(struct i915_page_table *pt,
> -                              struct drm_device *dev)
> +static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   {
> -       if (WARN_ON(!pt->page))
> +       if (WARN_ON(!p->page))
>                  return;
>
> -       i915_dma_unmap_single(pt, dev);
> -       __free_page(pt->page);
> +       dma_unmap_page(&dev->pdev->dev, p->daddr, 4096, PCI_DMA_BIDIRECTIONAL);
> +       __free_page(p->page);
> +       memset(p, 0, sizeof(*p));
> +}
> +
> +static void unmap_and_free_pt(struct i915_page_table *pt,
> +                              struct drm_device *dev)
> +{
> +       cleanup_page_dma(dev, &pt->base);
>          kfree(pt->used_ptes);
>          kfree(pt);
>   }
> @@ -357,7 +344,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>          gen8_pte_t *pt_vaddr, scratch_pte;
>          int i;
>
> -       pt_vaddr = kmap_atomic(pt->page);
> +       pt_vaddr = kmap_atomic(pt->base.page);
>          scratch_pte = gen8_pte_encode(vm->scratch.addr,
>                                        I915_CACHE_LLC, true);
>
> @@ -386,19 +373,13 @@ static struct i915_page_table *alloc_pt(struct drm_device *dev)
>          if (!pt->used_ptes)
>                  goto fail_bitmap;
>
> -       pt->page = alloc_page(GFP_KERNEL);
> -       if (!pt->page)
> -               goto fail_page;
> -
> -       ret = i915_dma_map_single(pt, dev);
> +       ret = setup_page_dma(dev, &pt->base);
>          if (ret)
> -               goto fail_dma;
> +               goto fail_page_m;
>
>          return pt;
>
> -fail_dma:
> -       __free_page(pt->page);
> -fail_page:
> +fail_page_m:
>          kfree(pt->used_ptes);
>   fail_bitmap:
>          kfree(pt);
> @@ -409,9 +390,8 @@ fail_bitmap:
>   static void unmap_and_free_pd(struct i915_page_directory *pd,
>                                struct drm_device *dev)
>   {
> -       if (pd->page) {
> -               i915_dma_unmap_single(pd, dev);
> -               __free_page(pd->page);
> +       if (pd->base.page) {
> +               cleanup_page_dma(dev, &pd->base);
>                  kfree(pd->used_pdes);
>                  kfree(pd);
>          }
> @@ -431,18 +411,12 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
>          if (!pd->used_pdes)
>                  goto free_pd;
>
> -       pd->page = alloc_page(GFP_KERNEL);
> -       if (!pd->page)
> -               goto free_bitmap;
> -
> -       ret = i915_dma_map_single(pd, dev);
> +       ret = setup_page_dma(dev, &pd->base);
>          if (ret)
> -               goto free_page;
> +               goto free_bitmap;
>
>          return pd;
>
> -free_page:
> -       __free_page(pd->page);
>   free_bitmap:
>          kfree(pd->used_pdes);
>   free_pd:
> @@ -523,10 +497,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>
>                  pt = pd->page_table[pde];
>
> -               if (WARN_ON(!pt->page))
> +               if (WARN_ON(!pt->base.page))
>                          continue;
>
> -               page_table = pt->page;
> +               page_table = pt->base.page;
>
>                  last_pte = pte + num_entries;
>                  if (last_pte > GEN8_PTES)
> @@ -573,7 +547,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>                  if (pt_vaddr == NULL) {
>                          struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
>                          struct i915_page_table *pt = pd->page_table[pde];
> -                       struct page *page_table = pt->page;
> +                       struct page *page_table = pt->base.page;
>
>                          pt_vaddr = kmap_atomic(page_table);
>                  }
> @@ -605,7 +579,7 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
>                               struct drm_device *dev)
>   {
>          gen8_pde_t entry =
> -               gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
> +               gen8_pde_encode(dev, pt->base.daddr, I915_CACHE_LLC);
>          *pde = entry;
>   }
>
> @@ -618,7 +592,7 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>          struct i915_page_table *pt;
>          int i;
>
> -       page_directory = kmap_atomic(pd->page);
> +       page_directory = kmap_atomic(pd->base.page);
>          pt = ppgtt->scratch_pt;
>          for (i = 0; i < I915_PDES; i++)
>                  /* Map the PDE to the page table */
> @@ -633,7 +607,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
>   {
>          int i;
>
> -       if (!pd->page)
> +       if (!pd->base.page)
>                  return;
>
>          for_each_set_bit(i, pd->used_pdes, I915_PDES) {
> @@ -883,7 +857,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>          /* Allocations have completed successfully, so set the bitmaps, and do
>           * the mappings. */
>          gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -               gen8_pde_t *const page_directory = kmap_atomic(pd->page);
> +               gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
>                  struct i915_page_table *pt;
>                  uint64_t pd_len = gen8_clamp_pd(start, length);
>                  uint64_t pd_start = start;
> @@ -1037,7 +1011,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>          gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>                  u32 expected;
>                  gen6_pte_t *pt_vaddr;
> -               dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
> +               dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->base.daddr;
>                  pd_entry = readl(ppgtt->pd_addr + pde);
>                  expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>
> @@ -1048,7 +1022,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>                                     expected);
>                  seq_printf(m, "\tPDE: %x\n", pd_entry);
>
> -               pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
> +               pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
>                  for (pte = 0; pte < GEN6_PTES; pte+=4) {
>                          unsigned long va =
>                                  (pde * PAGE_SIZE * GEN6_PTES) +
> @@ -1083,7 +1057,7 @@ static void gen6_write_pde(struct i915_page_directory *pd,
>                  container_of(pd, struct i915_hw_ppgtt, pd);
>          u32 pd_entry;
>
> -       pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
> +       pd_entry = GEN6_PDE_ADDR_ENCODE(pt->base.daddr);
>          pd_entry |= GEN6_PDE_VALID;
>
>          writel(pd_entry, ppgtt->pd_addr + pde);
> @@ -1108,9 +1082,9 @@ static void gen6_write_page_range(struct drm_i915_private *dev_priv,
>
>   static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>   {
> -       BUG_ON(ppgtt->pd.pd_offset & 0x3f);
> +       BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
>
> -       return (ppgtt->pd.pd_offset / 64) << 16;
> +       return (ppgtt->pd.base.ggtt_offset / 64) << 16;
>   }
>
>   static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
> @@ -1273,7 +1247,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>                  if (last_pte > GEN6_PTES)
>                          last_pte = GEN6_PTES;
>
> -               pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
> +               pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
>
>                  for (i = first_pte; i < last_pte; i++)
>                          pt_vaddr[i] = scratch_pte;
> @@ -1302,7 +1276,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>          pt_vaddr = NULL;
>          for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>                  if (pt_vaddr == NULL)
> -                       pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
> +                       pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
>
>                  pt_vaddr[act_pte] =
>                          vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -1330,7 +1304,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>          scratch_pte = vm->pte_encode(vm->scratch.addr,
>                          I915_CACHE_LLC, true, 0);
>
> -       pt_vaddr = kmap_atomic(pt->page);
> +       pt_vaddr = kmap_atomic(pt->base.page);
>
>          for (i = 0; i < GEN6_PTES; i++)
>                  pt_vaddr[i] = scratch_pte;
> @@ -1546,11 +1520,11 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>          ppgtt->base.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
>          ppgtt->debug_dump = gen6_dump_ppgtt;
>
> -       ppgtt->pd.pd_offset =
> +       ppgtt->pd.base.ggtt_offset =
>                  ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
>
>          ppgtt->pd_addr = (gen6_pte_t __iomem *)dev_priv->gtt.gsm +
> -               ppgtt->pd.pd_offset / sizeof(gen6_pte_t);
> +               ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
>
>          gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
>
> @@ -1561,7 +1535,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>                           ppgtt->node.start / PAGE_SIZE);
>
>          DRM_DEBUG("Adding PPGTT at offset %x\n",
> -                 ppgtt->pd.pd_offset << 10);
> +                 ppgtt->pd.base.ggtt_offset << 10);
>
>          return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index da67542..666decc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -205,19 +205,22 @@ struct i915_vma {
>   #define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
>   };
>
> -struct i915_page_table {
> +struct i915_page_dma {
>          struct page *page;
> -       dma_addr_t daddr;
> +       union {
> +               dma_addr_t daddr;
> +               uint32_t ggtt_offset;

Maybe also add a comment saying "gen6/7 - this is the offset into the 
GGTT where the (current context's) PPGTT page directory begins".
So there's no doubt what ggtt_offset is.

> +       };
> +};
> +
> +struct i915_page_table {
> +       struct i915_page_dma base;
>
>          unsigned long *used_ptes;
>   };
>
>   struct i915_page_directory {
> -       struct page *page; /* NULL for GEN6-GEN7 */
> -       union {
> -               uint32_t pd_offset;
> -               dma_addr_t daddr;
> -       };
> +       struct i915_page_dma base;
>
>          unsigned long *used_pdes;
>          struct i915_page_table *page_table[I915_PDES]; /* PDEs */
> @@ -472,8 +475,8 @@ static inline dma_addr_t
>   i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
>   {
>          return test_bit(n, ppgtt->pdp.used_pdpes) ?
> -               ppgtt->pdp.page_directory[n]->daddr :
> -               ppgtt->scratch_pd->daddr;
> +               ppgtt->pdp.page_directory[n]->base.daddr :
> +               ppgtt->scratch_pd->base.daddr;
>   }
>
>   int i915_gem_gtt_init(struct drm_device *dev);
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px
  2015-05-22 17:05 ` [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px Mika Kuoppala
@ 2015-06-02 13:08   ` Michel Thierry
  2015-06-11 17:48     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-02 13:08 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> All the paging structures are now similar and mapped for
> dma. The unmapping is taken care of by common accessors, so
> don't overload the reader with such details.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 32 +++++++++++++++-----------------
>   1 file changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 1e1a7a1..f58aa63 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -330,8 +330,7 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>          memset(p, 0, sizeof(*p));
>   }
>
> -static void unmap_and_free_pt(struct i915_page_table *pt,
> -                              struct drm_device *dev)
> +static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   {
>          cleanup_page_dma(dev, &pt->base);
>          kfree(pt->used_ptes);
> @@ -387,8 +386,7 @@ fail_bitmap:
>          return ERR_PTR(ret);
>   }
>
> -static void unmap_and_free_pd(struct i915_page_directory *pd,
> -                             struct drm_device *dev)
> +static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)

alloc_pd has a goto label with this same name.
Can you change the labels from free_bitmap & free_pd to fail_page_m & 
fail_bitmap in this patch? (alloc_pt use these names)

@@ -407,17 +407,17 @@ static struct i915_page_directory *alloc_pd(struct 
drm_device *dev)
      pd->used_pdes = kcalloc(BITS_TO_LONGS(I915_PDES),
                  sizeof(*pd->used_pdes), GFP_KERNEL);
      if (!pd->used_pdes)
-        goto free_pd;
+        goto fail_bitmap;

      ret = setup_page_dma(dev, &pd->base);
      if (ret)
-        goto free_bitmap;
+        goto fail_page_m;

      return pd;

-free_bitmap:
+fail_page_m:
      kfree(pd->used_pdes);
-free_pd:
+fail_bitmap:
      kfree(pd);

      return ERR_PTR(ret);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 10/21] drm/i915/gtt: Remove superfluous free_pd with gen6/7
  2015-05-22 17:05 ` [PATCH 10/21] drm/i915/gtt: Remove superfluous free_pd with gen6/7 Mika Kuoppala
@ 2015-06-02 14:07   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-02 14:07 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> This has slipped in somewhere but it was harmless
> as we check the page pointer before teardown.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Right, free_pd is only for gen8+.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index f58aa63..f747bd3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1416,7 +1416,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>          }
>
>          free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
> -       free_pd(ppgtt->base.dev, &ppgtt->pd);
>   }
>
>   static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma()
  2015-05-22 17:05 ` [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma() Mika Kuoppala
@ 2015-06-02 14:51   ` Michel Thierry
  2015-06-02 15:01     ` Ville Syrjälä
  2015-06-11 17:50     ` Mika Kuoppala
  0 siblings, 2 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-02 14:51 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> When we setup page directories and tables, we point the entries
> to a to the next level scratch structure. Make this generic
> by introducing a fill_page_dma which maps and flushes. We also
> need 32 bit variant for legacy gens.
>
> v2: Fix flushes and handle valleyview (Ville)
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 71 +++++++++++++++++++------------------
>   1 file changed, 37 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index f747bd3..d020b5e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -330,6 +330,31 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>          memset(p, 0, sizeof(*p));
>   }
>
> +static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
> +                         const uint64_t val)
> +{
> +       int i;
> +       uint64_t * const vaddr = kmap_atomic(p->page);
> +
> +       for (i = 0; i < 512; i++)
> +               vaddr[i] = val;
> +
> +       if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
> +               drm_clflush_virt_range(vaddr, PAGE_SIZE);

Cherryview returns true to IS_VALLEYVIEW().

You can use(!HAS_LLC && IS_CHERRYVIEW) instead to flush in chv, but not 
in vlv... But to make it bxt-proof, (!HAS_LLC && INTEL_INFO(dev)->gen >= 
8) is probably better.

> +
> +       kunmap_atomic(vaddr);
> +}
> +
> +static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
> +                            const uint32_t val32)
> +{
> +       uint64_t v = val32;
> +
> +       v = v << 32 | val32;
> +
> +       fill_page_dma(dev, p, v);
> +}
> +
>   static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   {
>          cleanup_page_dma(dev, &pt->base);
> @@ -340,19 +365,11 @@ static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   static void gen8_initialize_pt(struct i915_address_space *vm,
>                                 struct i915_page_table *pt)
>   {
> -       gen8_pte_t *pt_vaddr, scratch_pte;
> -       int i;
> -
> -       pt_vaddr = kmap_atomic(pt->base.page);
> -       scratch_pte = gen8_pte_encode(vm->scratch.addr,
> -                                     I915_CACHE_LLC, true);
> +       gen8_pte_t scratch_pte;
>
> -       for (i = 0; i < GEN8_PTES; i++)
> -               pt_vaddr[i] = scratch_pte;
> +       scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
>
> -       if (!HAS_LLC(vm->dev))
> -               drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -       kunmap_atomic(pt_vaddr);
> +       fill_page_dma(vm->dev, &pt->base, scratch_pte);
>   }
>
>   static struct i915_page_table *alloc_pt(struct drm_device *dev)
> @@ -585,20 +602,13 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>                                 struct i915_page_directory *pd)
>   {
>          struct i915_hw_ppgtt *ppgtt =
> -                       container_of(vm, struct i915_hw_ppgtt, base);
> -       gen8_pde_t *page_directory;
> -       struct i915_page_table *pt;
> -       int i;
> +               container_of(vm, struct i915_hw_ppgtt, base);
> +       gen8_pde_t scratch_pde;
>
> -       page_directory = kmap_atomic(pd->base.page);
> -       pt = ppgtt->scratch_pt;
> -       for (i = 0; i < I915_PDES; i++)
> -               /* Map the PDE to the page table */
> -               __gen8_do_map_pt(page_directory + i, pt, vm->dev);
> +       scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
> +                                     I915_CACHE_LLC);
>
> -       if (!HAS_LLC(vm->dev))
> -               drm_clflush_virt_range(page_directory, PAGE_SIZE);
> -       kunmap_atomic(page_directory);
> +       fill_page_dma(vm->dev, &pd->base, scratch_pde);
>   }
>
>   static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
> @@ -1292,22 +1302,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   }
>
>   static void gen6_initialize_pt(struct i915_address_space *vm,
> -               struct i915_page_table *pt)
> +                              struct i915_page_table *pt)
>   {
> -       gen6_pte_t *pt_vaddr, scratch_pte;
> -       int i;
> +       gen6_pte_t scratch_pte;
>
>          WARN_ON(vm->scratch.addr == 0);
>
> -       scratch_pte = vm->pte_encode(vm->scratch.addr,
> -                       I915_CACHE_LLC, true, 0);
> -
> -       pt_vaddr = kmap_atomic(pt->base.page);
> -
> -       for (i = 0; i < GEN6_PTES; i++)
> -               pt_vaddr[i] = scratch_pte;
> +       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>
> -       kunmap_atomic(pt_vaddr);
> +       fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
>   }
>
>   static int gen6_alloc_va_range(struct i915_address_space *vm,
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma()
  2015-06-02 14:51   ` Michel Thierry
@ 2015-06-02 15:01     ` Ville Syrjälä
  2015-06-15 10:16       ` Daniel Vetter
  2015-06-11 17:50     ` Mika Kuoppala
  1 sibling, 1 reply; 86+ messages in thread
From: Ville Syrjälä @ 2015-06-02 15:01 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Jun 02, 2015 at 03:51:26PM +0100, Michel Thierry wrote:
> On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> > When we setup page directories and tables, we point the entries
> > to a to the next level scratch structure. Make this generic
> > by introducing a fill_page_dma which maps and flushes. We also
> > need 32 bit variant for legacy gens.
> >
> > v2: Fix flushes and handle valleyview (Ville)
> >
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_gtt.c | 71 +++++++++++++++++++------------------
> >   1 file changed, 37 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index f747bd3..d020b5e 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -330,6 +330,31 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
> >          memset(p, 0, sizeof(*p));
> >   }
> >
> > +static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
> > +                         const uint64_t val)
> > +{
> > +       int i;
> > +       uint64_t * const vaddr = kmap_atomic(p->page);
> > +
> > +       for (i = 0; i < 512; i++)
> > +               vaddr[i] = val;
> > +
> > +       if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
> > +               drm_clflush_virt_range(vaddr, PAGE_SIZE);
> 
> Cherryview returns true to IS_VALLEYVIEW().
> 
> You can use(!HAS_LLC && IS_CHERRYVIEW) instead to flush in chv, but not 
> in vlv... But to make it bxt-proof, (!HAS_LLC && INTEL_INFO(dev)->gen >= 
> 8) is probably better.

Has someone actually confirmed that BXT needs the clflush?

> 
> > +
> > +       kunmap_atomic(vaddr);
> > +}
> > +
> > +static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
> > +                            const uint32_t val32)
> > +{
> > +       uint64_t v = val32;
> > +
> > +       v = v << 32 | val32;
> > +
> > +       fill_page_dma(dev, p, v);
> > +}
> > +
> >   static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
> >   {
> >          cleanup_page_dma(dev, &pt->base);
> > @@ -340,19 +365,11 @@ static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
> >   static void gen8_initialize_pt(struct i915_address_space *vm,
> >                                 struct i915_page_table *pt)
> >   {
> > -       gen8_pte_t *pt_vaddr, scratch_pte;
> > -       int i;
> > -
> > -       pt_vaddr = kmap_atomic(pt->base.page);
> > -       scratch_pte = gen8_pte_encode(vm->scratch.addr,
> > -                                     I915_CACHE_LLC, true);
> > +       gen8_pte_t scratch_pte;
> >
> > -       for (i = 0; i < GEN8_PTES; i++)
> > -               pt_vaddr[i] = scratch_pte;
> > +       scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
> >
> > -       if (!HAS_LLC(vm->dev))
> > -               drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> > -       kunmap_atomic(pt_vaddr);
> > +       fill_page_dma(vm->dev, &pt->base, scratch_pte);
> >   }
> >
> >   static struct i915_page_table *alloc_pt(struct drm_device *dev)
> > @@ -585,20 +602,13 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
> >                                 struct i915_page_directory *pd)
> >   {
> >          struct i915_hw_ppgtt *ppgtt =
> > -                       container_of(vm, struct i915_hw_ppgtt, base);
> > -       gen8_pde_t *page_directory;
> > -       struct i915_page_table *pt;
> > -       int i;
> > +               container_of(vm, struct i915_hw_ppgtt, base);
> > +       gen8_pde_t scratch_pde;
> >
> > -       page_directory = kmap_atomic(pd->base.page);
> > -       pt = ppgtt->scratch_pt;
> > -       for (i = 0; i < I915_PDES; i++)
> > -               /* Map the PDE to the page table */
> > -               __gen8_do_map_pt(page_directory + i, pt, vm->dev);
> > +       scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
> > +                                     I915_CACHE_LLC);
> >
> > -       if (!HAS_LLC(vm->dev))
> > -               drm_clflush_virt_range(page_directory, PAGE_SIZE);
> > -       kunmap_atomic(page_directory);
> > +       fill_page_dma(vm->dev, &pd->base, scratch_pde);
> >   }
> >
> >   static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
> > @@ -1292,22 +1302,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
> >   }
> >
> >   static void gen6_initialize_pt(struct i915_address_space *vm,
> > -               struct i915_page_table *pt)
> > +                              struct i915_page_table *pt)
> >   {
> > -       gen6_pte_t *pt_vaddr, scratch_pte;
> > -       int i;
> > +       gen6_pte_t scratch_pte;
> >
> >          WARN_ON(vm->scratch.addr == 0);
> >
> > -       scratch_pte = vm->pte_encode(vm->scratch.addr,
> > -                       I915_CACHE_LLC, true, 0);
> > -
> > -       pt_vaddr = kmap_atomic(pt->base.page);
> > -
> > -       for (i = 0; i < GEN6_PTES; i++)
> > -               pt_vaddr[i] = scratch_pte;
> > +       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
> >
> > -       kunmap_atomic(pt_vaddr);
> > +       fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
> >   }
> >
> >   static int gen6_alloc_va_range(struct i915_address_space *vm,
> > --
> > 1.9.1
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page
  2015-05-22 17:05 ` [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page Mika Kuoppala
@ 2015-06-03 10:55   ` Michel Thierry
  2015-06-11 17:50     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 10:55 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> As there is flushing involved when we have done the cpu
> write, make functions for mapping for cpu space. Make macros
> to map any type of paging structure.
>
> v2: Make it clear tha flushing kunmap is only for ppgtt (Ville)
>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 73 +++++++++++++++++++------------------
>   1 file changed, 38 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index d020b5e..072295f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -330,19 +330,35 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>          memset(p, 0, sizeof(*p));
>   }
>
> +static void *kmap_page_dma(struct i915_page_dma *p)
> +{
> +       return kmap_atomic(p->page);
> +}
> +
> +/* We use the flushing unmap only with ppgtt structures:
> + * page directories, page tables and scratch pages.
> + */
> +static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
> +{
> +       if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
> +               drm_clflush_virt_range(vaddr, PAGE_SIZE);

Same comment as in the previous patch (at least IS_CHERRYVIEW, until we 
have more insights in bxt).

> +
> +       kunmap_atomic(vaddr);
> +}
> +
> +#define kmap_px(px) kmap_page_dma(&(px)->base)
> +#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr));
> +
>   static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
>                            const uint64_t val)
>   {
>          int i;
> -       uint64_t * const vaddr = kmap_atomic(p->page);
> +       uint64_t * const vaddr = kmap_page_dma(p);
>
>          for (i = 0; i < 512; i++)
>                  vaddr[i] = val;
>
> -       if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
> -               drm_clflush_virt_range(vaddr, PAGE_SIZE);
> -
> -       kunmap_atomic(vaddr);
> +       kunmap_page_dma(dev, vaddr);
>   }
>
>   static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
> @@ -500,7 +516,6 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>          while (num_entries) {
>                  struct i915_page_directory *pd;
>                  struct i915_page_table *pt;
> -               struct page *page_table;
>
>                  if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
>                          continue;
> @@ -515,22 +530,18 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>                  if (WARN_ON(!pt->base.page))
>                          continue;
>
> -               page_table = pt->base.page;
> -
>                  last_pte = pte + num_entries;
>                  if (last_pte > GEN8_PTES)
>                          last_pte = GEN8_PTES;
>
> -               pt_vaddr = kmap_atomic(page_table);
> +               pt_vaddr = kmap_px(pt);
>
>                  for (i = pte; i < last_pte; i++) {
>                          pt_vaddr[i] = scratch_pte;
>                          num_entries--;
>                  }
>
> -               if (!HAS_LLC(ppgtt->base.dev))
> -                       drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -               kunmap_atomic(pt_vaddr);
> +               kunmap_px(ppgtt, pt);
>
>                  pte = 0;
>                  if (++pde == I915_PDES) {
> @@ -562,18 +573,14 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>                  if (pt_vaddr == NULL) {
>                          struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
>                          struct i915_page_table *pt = pd->page_table[pde];
> -                       struct page *page_table = pt->base.page;
> -
> -                       pt_vaddr = kmap_atomic(page_table);
> +                       pt_vaddr = kmap_px(pt);
>                  }
>
>                  pt_vaddr[pte] =
>                          gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
>                                          cache_level, true);
>                  if (++pte == GEN8_PTES) {
> -                       if (!HAS_LLC(ppgtt->base.dev))
> -                               drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -                       kunmap_atomic(pt_vaddr);
> +                       kunmap_px(ppgtt, pt_vaddr);
>                          pt_vaddr = NULL;
>                          if (++pde == I915_PDES) {
>                                  pdpe++;
> @@ -582,11 +589,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>                          pte = 0;
>                  }
>          }
> -       if (pt_vaddr) {
> -               if (!HAS_LLC(ppgtt->base.dev))
> -                       drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -               kunmap_atomic(pt_vaddr);
> -       }
> +
> +       if (pt_vaddr)
> +               kunmap_px(ppgtt, pt_vaddr);
>   }
>
>   static void __gen8_do_map_pt(gen8_pde_t * const pde,
> @@ -865,7 +870,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>          /* Allocations have completed successfully, so set the bitmaps, and do
>           * the mappings. */
>          gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -               gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
> +               gen8_pde_t *const page_directory = kmap_px(pd);
>                  struct i915_page_table *pt;
>                  uint64_t pd_len = gen8_clamp_pd(start, length);
>                  uint64_t pd_start = start;
> @@ -895,10 +900,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>                           * point we're still relying on insert_entries() */
>                  }
>
> -               if (!HAS_LLC(vm->dev))
> -                       drm_clflush_virt_range(page_directory, PAGE_SIZE);
> -
> -               kunmap_atomic(page_directory);
> +               kunmap_px(ppgtt, page_directory);
>
>                  set_bit(pdpe, ppgtt->pdp.used_pdpes);
>          }
> @@ -1030,7 +1032,8 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>                                     expected);
>                  seq_printf(m, "\tPDE: %x\n", pd_entry);
>
> -               pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
> +               pt_vaddr = kmap_px(ppgtt->pd.page_table[pde]);
> +
>                  for (pte = 0; pte < GEN6_PTES; pte+=4) {
>                          unsigned long va =
>                                  (pde * PAGE_SIZE * GEN6_PTES) +
> @@ -1052,7 +1055,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>                          }
>                          seq_puts(m, "\n");
>                  }
> -               kunmap_atomic(pt_vaddr);
> +               kunmap_px(ppgtt, pt_vaddr);
>          }
>   }
>
> @@ -1255,12 +1258,12 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>                  if (last_pte > GEN6_PTES)
>                          last_pte = GEN6_PTES;
>
> -               pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
> +               pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
>
>                  for (i = first_pte; i < last_pte; i++)
>                          pt_vaddr[i] = scratch_pte;
>
> -               kunmap_atomic(pt_vaddr);
> +               kunmap_px(ppgtt, pt_vaddr);
>
>                  num_entries -= last_pte - first_pte;
>                  first_pte = 0;
> @@ -1284,21 +1287,21 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>          pt_vaddr = NULL;
>          for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>                  if (pt_vaddr == NULL)
> -                       pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
> +                       pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
>
>                  pt_vaddr[act_pte] =
>                          vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
>                                         cache_level, true, flags);
>
>                  if (++act_pte == GEN6_PTES) {
> -                       kunmap_atomic(pt_vaddr);
> +                       kunmap_px(ppgtt, pt_vaddr);
>                          pt_vaddr = NULL;
>                          act_pt++;
>                          act_pte = 0;
>                  }
>          }
>          if (pt_vaddr)
> -               kunmap_atomic(pt_vaddr);
> +               kunmap_px(ppgtt, pt_vaddr);
>   }
>
>   static void gen6_initialize_pt(struct i915_address_space *vm,
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 13/21] drm/i915/gtt: Use macros to access dma mapped pages
  2015-05-22 17:05 ` [PATCH 13/21] drm/i915/gtt: Use macros to access dma mapped pages Mika Kuoppala
@ 2015-06-03 10:57   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 10:57 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> Make paging structure type agnostic *_px macros to access
> page dma struct, the backing page and the dma address.
>
> This makes the code less cluttered on internals of
> i915_page_dma.
>
> v2: Superfluous const -> nonconst removed
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Since we're ending up with macros for a lot of things, should we add one 
for ppgtt->base.dev ? Looks like we could use it in ~20 places.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 37 +++++++++++++++++++++----------------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  8 ++++++--
>   2 files changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 072295f..4f9a000 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -346,8 +346,13 @@ static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
>          kunmap_atomic(vaddr);
>   }
>
> -#define kmap_px(px) kmap_page_dma(&(px)->base)
> -#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr));
> +#define kmap_px(px) kmap_page_dma(px_base(px))
> +#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr))
> +
> +#define setup_px(dev, px) setup_page_dma((dev), px_base(px))
> +#define cleanup_px(dev, px) cleanup_page_dma((dev), px_base(px))
> +#define fill_px(dev, px, v) fill_page_dma((dev), px_base(px), (v))
> +#define fill32_px(dev, px, v) fill_page_dma_32((dev), px_base(px), (v))
>
>   static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
>                            const uint64_t val)
> @@ -373,7 +378,7 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
>
>   static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   {
> -       cleanup_page_dma(dev, &pt->base);
> +       cleanup_px(dev, pt);
>          kfree(pt->used_ptes);
>          kfree(pt);
>   }
> @@ -385,7 +390,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>
>          scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
>
> -       fill_page_dma(vm->dev, &pt->base, scratch_pte);
> +       fill_px(vm->dev, pt, scratch_pte);
>   }
>
>   static struct i915_page_table *alloc_pt(struct drm_device *dev)
> @@ -405,7 +410,7 @@ static struct i915_page_table *alloc_pt(struct drm_device *dev)
>          if (!pt->used_ptes)
>                  goto fail_bitmap;
>
> -       ret = setup_page_dma(dev, &pt->base);
> +       ret = setup_px(dev, pt);
>          if (ret)
>                  goto fail_page_m;
>
> @@ -421,8 +426,8 @@ fail_bitmap:
>
>   static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
>   {
> -       if (pd->base.page) {
> -               cleanup_page_dma(dev, &pd->base);
> +       if (px_page(pd)) {
> +               cleanup_px(dev, pd);
>                  kfree(pd->used_pdes);
>                  kfree(pd);
>          }
> @@ -442,7 +447,7 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
>          if (!pd->used_pdes)
>                  goto free_pd;
>
> -       ret = setup_page_dma(dev, &pd->base);
> +       ret = setup_px(dev, pd);
>          if (ret)
>                  goto free_bitmap;
>
> @@ -527,7 +532,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>
>                  pt = pd->page_table[pde];
>
> -               if (WARN_ON(!pt->base.page))
> +               if (WARN_ON(!px_page(pt)))
>                          continue;
>
>                  last_pte = pte + num_entries;
> @@ -599,7 +604,7 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
>                               struct drm_device *dev)
>   {
>          gen8_pde_t entry =
> -               gen8_pde_encode(dev, pt->base.daddr, I915_CACHE_LLC);
> +               gen8_pde_encode(dev, px_dma(pt), I915_CACHE_LLC);
>          *pde = entry;
>   }
>
> @@ -610,17 +615,17 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>                  container_of(vm, struct i915_hw_ppgtt, base);
>          gen8_pde_t scratch_pde;
>
> -       scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
> +       scratch_pde = gen8_pde_encode(vm->dev, px_dma(ppgtt->scratch_pt),
>                                        I915_CACHE_LLC);
>
> -       fill_page_dma(vm->dev, &pd->base, scratch_pde);
> +       fill_px(vm->dev, pd, scratch_pde);
>   }
>
>   static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
>   {
>          int i;
>
> -       if (!pd->base.page)
> +       if (!px_page(pd))
>                  return;
>
>          for_each_set_bit(i, pd->used_pdes, I915_PDES) {
> @@ -1021,7 +1026,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>          gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>                  u32 expected;
>                  gen6_pte_t *pt_vaddr;
> -               dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->base.daddr;
> +               const dma_addr_t pt_addr = px_dma(ppgtt->pd.page_table[pde]);
>                  pd_entry = readl(ppgtt->pd_addr + pde);
>                  expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>
> @@ -1068,7 +1073,7 @@ static void gen6_write_pde(struct i915_page_directory *pd,
>                  container_of(pd, struct i915_hw_ppgtt, pd);
>          u32 pd_entry;
>
> -       pd_entry = GEN6_PDE_ADDR_ENCODE(pt->base.daddr);
> +       pd_entry = GEN6_PDE_ADDR_ENCODE(px_dma(pt));
>          pd_entry |= GEN6_PDE_VALID;
>
>          writel(pd_entry, ppgtt->pd_addr + pde);
> @@ -1313,7 +1318,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>
>          scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>
> -       fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
> +       fill32_px(vm->dev, pt, scratch_pte);
>   }
>
>   static int gen6_alloc_va_range(struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 666decc..006b839 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -213,6 +213,10 @@ struct i915_page_dma {
>          };
>   };
>
> +#define px_base(px) (&(px)->base)
> +#define px_page(px) (px_base(px)->page)
> +#define px_dma(px) (px_base(px)->daddr)
> +
>   struct i915_page_table {
>          struct i915_page_dma base;
>
> @@ -475,8 +479,8 @@ static inline dma_addr_t
>   i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
>   {
>          return test_bit(n, ppgtt->pdp.used_pdpes) ?
> -               ppgtt->pdp.page_directory[n]->base.daddr :
> -               ppgtt->scratch_pd->base.daddr;
> +               px_dma(ppgtt->pdp.page_directory[n]) :
> +               px_dma(ppgtt->scratch_pd);
>   }
>
>   int i915_gem_gtt_init(struct drm_device *dev);
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible
  2015-05-22 17:05 ` [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible Mika Kuoppala
@ 2015-06-03 13:44   ` Michel Thierry
  2015-06-11 16:30     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 13:44 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> Lay out scratch page structure in similar manner than other
> paging structures. This allows us to use the same tools for
> setup and teardown.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 89 ++++++++++++++++++++-----------------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  9 ++--
>   2 files changed, 54 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 4f9a000..43fa543 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -301,11 +301,12 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
>          return pte;
>   }
>
> -static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
> +static int __setup_page_dma(struct drm_device *dev,
> +                           struct i915_page_dma *p, gfp_t flags)
>   {
>          struct device *device = &dev->pdev->dev;
>
> -       p->page = alloc_page(GFP_KERNEL);
> +       p->page = alloc_page(flags);
>          if (!p->page)
>                  return -ENOMEM;
>
> @@ -320,6 +321,11 @@ static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>          return 0;
>   }
>
> +static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
> +{
> +       return __setup_page_dma(dev, p, GFP_KERNEL);
> +}
> +
>   static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   {
>          if (WARN_ON(!p->page))
> @@ -388,7 +394,8 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>   {
>          gen8_pte_t scratch_pte;
>
> -       scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> +                                     I915_CACHE_LLC, true);
>
>          fill_px(vm->dev, pt, scratch_pte);
>   }
> @@ -515,7 +522,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>          unsigned num_entries = length >> PAGE_SHIFT;
>          unsigned last_pte, i;
>
> -       scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
> +       scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
>                                        I915_CACHE_LLC, use_scratch);
>
>          while (num_entries) {
> @@ -1021,7 +1028,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>          uint32_t  pte, pde, temp;
>          uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
>
> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page), I915_CACHE_LLC, true, 0);
>
>          gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>                  u32 expected;
> @@ -1256,7 +1263,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>          unsigned first_pte = first_entry % GEN6_PTES;
>          unsigned last_pte, i;
>
> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
> +                                    I915_CACHE_LLC, true, 0);
>
>          while (num_entries) {
>                  last_pte = first_pte + num_entries;
> @@ -1314,9 +1322,10 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>   {
>          gen6_pte_t scratch_pte;
>
> -       WARN_ON(vm->scratch.addr == 0);
> +       WARN_ON(px_dma(vm->scratch_page) == 0);
>
> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
> +                                    I915_CACHE_LLC, true, 0);
>
>          fill32_px(vm->dev, pt, scratch_pte);
>   }
> @@ -1553,13 +1562,14 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>          struct drm_i915_private *dev_priv = dev->dev_private;
>
>          ppgtt->base.dev = dev;
> -       ppgtt->base.scratch = dev_priv->gtt.base.scratch;
> +       ppgtt->base.scratch_page = dev_priv->gtt.base.scratch_page;
>
>          if (INTEL_INFO(dev)->gen < 8)
>                  return gen6_ppgtt_init(ppgtt);
>          else
>                  return gen8_ppgtt_init(ppgtt);
>   }
> +
>   int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>   {
>          struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -1874,7 +1884,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>                   first_entry, num_entries, max_entries))
>                  num_entries = max_entries;
>
> -       scratch_pte = gen8_pte_encode(vm->scratch.addr,
> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>                                        I915_CACHE_LLC,
>                                        use_scratch);
>          for (i = 0; i < num_entries; i++)
> @@ -1900,7 +1910,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>                   first_entry, num_entries, max_entries))
>                  num_entries = max_entries;
>
> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, use_scratch, 0);
> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
> +                                    I915_CACHE_LLC, use_scratch, 0);
>
>          for (i = 0; i < num_entries; i++)
>                  iowrite32(scratch_pte, &gtt_base[i]);
> @@ -2157,42 +2168,40 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>          vm->cleanup(vm);
>   }
>
> -static int setup_scratch_page(struct drm_device *dev)
> +static int alloc_scratch_page(struct i915_address_space *vm)
>   {
> -       struct drm_i915_private *dev_priv = dev->dev_private;
> -       struct page *page;
> -       dma_addr_t dma_addr;
> +       struct i915_page_scratch *sp;
> +       int ret;
> +
> +       WARN_ON(vm->scratch_page);
>
> -       page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
> -       if (page == NULL)
> +       sp = kzalloc(sizeof(*sp), GFP_KERNEL);
> +       if (sp == NULL)
>                  return -ENOMEM;
> -       set_pages_uc(page, 1);
>
> -#ifdef CONFIG_INTEL_IOMMU
> -       dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
> -                               PCI_DMA_BIDIRECTIONAL);
> -       if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
> -               __free_page(page);
> -               return -EINVAL;
> +       ret = __setup_page_dma(vm->dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
> +       if (ret) {
> +               kfree(sp);
> +               return ret;
>          }
> -#else
> -       dma_addr = page_to_phys(page);
> -#endif

Should we keep a no-iommu option?
This seems to have been added for gen6 (gtt).

> -       dev_priv->gtt.base.scratch.page = page;
> -       dev_priv->gtt.base.scratch.addr = dma_addr;
> +
> +       set_pages_uc(px_page(sp), 1);
> +
> +       vm->scratch_page = sp;
>
>          return 0;
>   }
>
> -static void teardown_scratch_page(struct drm_device *dev)
> +static void free_scratch_page(struct i915_address_space *vm)
>   {
> -       struct drm_i915_private *dev_priv = dev->dev_private;
> -       struct page *page = dev_priv->gtt.base.scratch.page;
> +       struct i915_page_scratch *sp = vm->scratch_page;
>
> -       set_pages_wb(page, 1);
> -       pci_unmap_page(dev->pdev, dev_priv->gtt.base.scratch.addr,
> -                      PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -       __free_page(page);
> +       set_pages_wb(px_page(sp), 1);
> +
> +       cleanup_px(vm->dev, sp);
> +       kfree(sp);
> +
> +       vm->scratch_page = NULL;
>   }
>
>   static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
> @@ -2300,7 +2309,7 @@ static int ggtt_probe_common(struct drm_device *dev,
>                  return -ENOMEM;
>          }
>
> -       ret = setup_scratch_page(dev);
> +       ret = alloc_scratch_page(&dev_priv->gtt.base);
>          if (ret) {
>                  DRM_ERROR("Scratch setup failed\n");
>                  /* iounmap will also get called at remove, but meh */
> @@ -2479,7 +2488,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>          struct i915_gtt *gtt = container_of(vm, struct i915_gtt, base);
>
>          iounmap(gtt->gsm);
> -       teardown_scratch_page(vm->dev);
> +       free_scratch_page(vm);
>   }
>
>   static int i915_gmch_probe(struct drm_device *dev,
> @@ -2543,13 +2552,13 @@ int i915_gem_gtt_init(struct drm_device *dev)
>                  dev_priv->gtt.base.cleanup = gen6_gmch_remove;
>          }
>
> +       gtt->base.dev = dev;
> +
>          ret = gtt->gtt_probe(dev, &gtt->base.total, &gtt->stolen_size,
>                               &gtt->mappable_base, &gtt->mappable_end);
>          if (ret)
>                  return ret;
>
> -       gtt->base.dev = dev;
> -
>          /* GMADR is the PCI mmio aperture into the global GTT. */
>          DRM_INFO("Memory usable by graphics device = %lluM\n",
>                   gtt->base.total >> 20);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 006b839..1fd4041 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -217,6 +217,10 @@ struct i915_page_dma {
>   #define px_page(px) (px_base(px)->page)
>   #define px_dma(px) (px_base(px)->daddr)
>
> +struct i915_page_scratch {
> +       struct i915_page_dma base;
> +};
> +
>   struct i915_page_table {
>          struct i915_page_dma base;
>
> @@ -243,10 +247,7 @@ struct i915_address_space {
>          u64 start;              /* Start offset always 0 for dri2 */
>          u64 total;              /* size addr space maps (ex. 2GB for ggtt) */
>
> -       struct {
> -               dma_addr_t addr;
> -               struct page *page;
> -       } scratch;
> +       struct i915_page_scratch *scratch_page;
>
>          /**
>           * List of objects currently involved in rendering.
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-05-22 17:05 ` [PATCH 15/21] drm/i915/gtt: Fill scratch page Mika Kuoppala
  2015-05-27 18:12   ` Tomas Elf
@ 2015-06-03 14:03   ` Michel Thierry
  1 sibling, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 14:03 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> During review of dynamic page tables series, I was able
> to hit a lite restore bug with execlists. I assume that
> due to incorrect pd, the batch run out of legit address space
> and into the scratch page area. The ACTHD was increasing
> due to scratch being all zeroes (MI_NOOPs). And as gen8
> address space is quite large, the hangcheck happily waited
> for a long long time, keeping the process effectively stuck.

FYI, it is probably safe to assume that the only thing updated in a 
lite-restore is the ring tail. I didn't realize that until recently.

This issue is more frequent in GuC submission mode, and we are currently 
evaluation 2 alternatives to trigger the re-reading of the page tables.


>
> According to Chris Wilson any modern gpu will grind to halt
> if it encounters commands of all ones. This seemed to do the
> trick and hang was declared promptly when the gpu wandered into
> the scratch land.
>
> v2: Use 0xffff00ff pattern (Chris)
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 43fa543..a2a0c88 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2168,6 +2168,8 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>          vm->cleanup(vm);
>   }
>
> +#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
> +
>   static int alloc_scratch_page(struct i915_address_space *vm)
>   {
>          struct i915_page_scratch *sp;
> @@ -2185,6 +2187,7 @@ static int alloc_scratch_page(struct i915_address_space *vm)
>                  return ret;
>          }
>
> +       fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
>          set_pages_uc(px_page(sp), 1);
>
>          vm->scratch_page = sp;
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 16/21] drm/i915/gtt: Pin vma during virtual address allocation
  2015-05-22 17:05 ` [PATCH 16/21] drm/i915/gtt: Pin vma during virtual address allocation Mika Kuoppala
@ 2015-06-03 14:27   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 14:27 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> Dynamic page table allocation might wake the shrinker
> when memory is requested for page table structures.
> As this happens when we try to allocate the virtual address
> during binding, our vma might be among the targets for eviction.
> We should do i915_vma_pin() and do pin early in there like Chris
> suggests but this is interim solution.
>
> Shield our vma from shrinker by incrementing pin count before
> the virtual address is allocated.
>
> The proper place to fix this would be in gem, inside of
> i915_vma_pin(). But we don't have that yet so take the short
> cut as a intermediate solution.
>
> Testcase: igt/gem_ctx_thrash
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index a2a0c88..b938964 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2916,9 +2916,12 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>                                      vma->node.size,
>                                      VM_TO_TRACE_NAME(vma->vm));
>
> +               /* XXX: i915_vma_pin() will fix this +- hack */
> +               vma->pin_count++;
>                  ret = vma->vm->allocate_va_range(vma->vm,
>                                                   vma->node.start,
>                                                   vma->node.size);
> +               vma->pin_count--;
>                  if (ret)
>                          return ret;
>          }
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 17/21] drm/i915/gtt: Cleanup page directory encoding
  2015-05-22 17:05 ` [PATCH 17/21] drm/i915/gtt: Cleanup page directory encoding Mika Kuoppala
@ 2015-06-03 14:58   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 14:58 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> Write page directory entry without using superfluous
> indirect function. Also remove unused device parameter
> from the encode function.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 19 +++++--------------
>   1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b938964..a1d6d7a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -192,9 +192,8 @@ static gen8_pte_t gen8_pte_encode(dma_addr_t addr,
>          return pte;
>   }
>
> -static gen8_pde_t gen8_pde_encode(struct drm_device *dev,
> -                                 dma_addr_t addr,
> -                                 enum i915_cache_level level)
> +static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
> +                                 const enum i915_cache_level level)
>   {
>          gen8_pde_t pde = _PAGE_PRESENT | _PAGE_RW;
>          pde |= addr;
> @@ -606,15 +605,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>                  kunmap_px(ppgtt, pt_vaddr);
>   }
>
> -static void __gen8_do_map_pt(gen8_pde_t * const pde,
> -                            struct i915_page_table *pt,
> -                            struct drm_device *dev)
> -{
> -       gen8_pde_t entry =
> -               gen8_pde_encode(dev, px_dma(pt), I915_CACHE_LLC);
> -       *pde = entry;
> -}
> -
>   static void gen8_initialize_pd(struct i915_address_space *vm,
>                                 struct i915_page_directory *pd)
>   {
> @@ -622,7 +612,7 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>                  container_of(vm, struct i915_hw_ppgtt, base);
>          gen8_pde_t scratch_pde;
>
> -       scratch_pde = gen8_pde_encode(vm->dev, px_dma(ppgtt->scratch_pt),
> +       scratch_pde = gen8_pde_encode(px_dma(ppgtt->scratch_pt),
>                                        I915_CACHE_LLC);
>
>          fill_px(vm->dev, pd, scratch_pde);
> @@ -906,7 +896,8 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>                          set_bit(pde, pd->used_pdes);
>
>                          /* Map the PDE to the page table */
> -                       __gen8_do_map_pt(page_directory + pde, pt, vm->dev);
> +                       page_directory[pde] = gen8_pde_encode(px_dma(pt),
> +                                                             I915_CACHE_LLC);
>
>                          /* NB: We haven't yet mapped ptes to pages. At this
>                           * point we're still relying on insert_entries() */
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 18/21] drm/i915/gtt: Move scratch_pd and scratch_pt into vm area
  2015-05-22 17:05 ` [PATCH 18/21] drm/i915/gtt: Move scratch_pd and scratch_pt into vm area Mika Kuoppala
@ 2015-06-03 16:46   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 16:46 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> Scratch page is part of i915_address_space due to that we
> have only one of that. Move other scratch entities into
> the same struct. This is a preparatory patch for having
> only one instance of each scratch_pt/pd.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 51 +++++++++++++++++--------------------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  7 +++--
>   2 files changed, 27 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index a1d6d7a..61f4da0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -608,12 +608,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   static void gen8_initialize_pd(struct i915_address_space *vm,
>                                 struct i915_page_directory *pd)
>   {
> -       struct i915_hw_ppgtt *ppgtt =
> -               container_of(vm, struct i915_hw_ppgtt, base);
>          gen8_pde_t scratch_pde;
>
> -       scratch_pde = gen8_pde_encode(px_dma(ppgtt->scratch_pt),
> -                                     I915_CACHE_LLC);
> +       scratch_pde = gen8_pde_encode(px_dma(vm->scratch_pt), I915_CACHE_LLC);
>
>          fill_px(vm->dev, pd, scratch_pde);
>   }
> @@ -648,8 +645,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>                  free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
>          }
>
> -       free_pd(ppgtt->base.dev, ppgtt->scratch_pd);
> -       free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
> +       free_pd(vm->dev, vm->scratch_pd);
> +       free_pt(vm->dev, vm->scratch_pt);
>   }
>
>   /**
> @@ -685,7 +682,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>                  /* Don't reallocate page tables */
>                  if (pt) {
>                          /* Scratch is never allocated this way */
> -                       WARN_ON(pt == ppgtt->scratch_pt);
> +                       WARN_ON(pt == ppgtt->base.scratch_pt);
>                          continue;
>                  }
>
> @@ -977,16 +974,16 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
>    */
>   static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>   {
> -       ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
> -       if (IS_ERR(ppgtt->scratch_pt))
> -               return PTR_ERR(ppgtt->scratch_pt);
> +       ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
> +       if (IS_ERR(ppgtt->base.scratch_pt))
> +               return PTR_ERR(ppgtt->base.scratch_pt);
>
> -       ppgtt->scratch_pd = alloc_pd(ppgtt->base.dev);
> -       if (IS_ERR(ppgtt->scratch_pd))
> -               return PTR_ERR(ppgtt->scratch_pd);
> +       ppgtt->base.scratch_pd = alloc_pd(ppgtt->base.dev);
> +       if (IS_ERR(ppgtt->base.scratch_pd))
> +               return PTR_ERR(ppgtt->base.scratch_pd);
>
> -       gen8_initialize_pt(&ppgtt->base, ppgtt->scratch_pt);
> -       gen8_initialize_pd(&ppgtt->base, ppgtt->scratch_pd);
> +       gen8_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
> +       gen8_initialize_pd(&ppgtt->base, ppgtt->base.scratch_pd);
>
>          ppgtt->base.start = 0;
>          ppgtt->base.total = 1ULL << 32;
> @@ -1019,7 +1016,8 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>          uint32_t  pte, pde, temp;
>          uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
>
> -       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page), I915_CACHE_LLC, true, 0);
> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
> +                                    I915_CACHE_LLC, true, 0);
>
>          gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>                  u32 expected;
> @@ -1348,7 +1346,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>           * tables.
>           */
>          gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> -               if (pt != ppgtt->scratch_pt) {
> +               if (pt != vm->scratch_pt) {
>                          WARN_ON(bitmap_empty(pt->used_ptes, GEN6_PTES));
>                          continue;
>                  }
> @@ -1403,7 +1401,7 @@ unwind_out:
>          for_each_set_bit(pde, new_page_tables, I915_PDES) {
>                  struct i915_page_table *pt = ppgtt->pd.page_table[pde];
>
> -               ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
> +               ppgtt->pd.page_table[pde] = vm->scratch_pt;
>                  free_pt(vm->dev, pt);
>          }
>
> @@ -1418,15 +1416,14 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>          struct i915_page_table *pt;
>          uint32_t pde;
>
> -
>          drm_mm_remove_node(&ppgtt->node);
>
>          gen6_for_all_pdes(pt, ppgtt, pde) {
> -               if (pt != ppgtt->scratch_pt)
> +               if (pt != vm->scratch_pt)
>                          free_pt(ppgtt->base.dev, pt);
>          }
>
> -       free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
> +       free_pt(vm->dev, vm->scratch_pt);
>   }
>
>   static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
> @@ -1441,11 +1438,11 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
>           * size. We allocate at the top of the GTT to avoid fragmentation.
>           */
>          BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
> -       ppgtt->scratch_pt = alloc_pt(ppgtt->base.dev);
> -       if (IS_ERR(ppgtt->scratch_pt))
> -               return PTR_ERR(ppgtt->scratch_pt);
> +       ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
> +       if (IS_ERR(ppgtt->base.scratch_pt))
> +               return PTR_ERR(ppgtt->base.scratch_pt);
>
> -       gen6_initialize_pt(&ppgtt->base, ppgtt->scratch_pt);
> +       gen6_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
>
>   alloc:
>          ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
> @@ -1476,7 +1473,7 @@ alloc:
>          return 0;
>
>   err_out:
> -       free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
> +       free_pt(ppgtt->base.dev, ppgtt->base.scratch_pt);
>          return ret;
>   }
>
> @@ -1492,7 +1489,7 @@ static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
>          uint32_t pde, temp;
>
>          gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
> -               ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
> +               ppgtt->pd.page_table[pde] = ppgtt->base.scratch_pt;
>   }
>
>   static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 1fd4041..ba46374 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -248,6 +248,8 @@ struct i915_address_space {
>          u64 total;              /* size addr space maps (ex. 2GB for ggtt) */
>
>          struct i915_page_scratch *scratch_page;
> +       struct i915_page_table *scratch_pt;
> +       struct i915_page_directory *scratch_pd;
>
>          /**
>           * List of objects currently involved in rendering.
> @@ -337,9 +339,6 @@ struct i915_hw_ppgtt {
>                  struct i915_page_directory pd;
>          };
>
> -       struct i915_page_table *scratch_pt;
> -       struct i915_page_directory *scratch_pd;
> -
>          struct drm_i915_file_private *file_priv;
>
>          gen6_pte_t __iomem *pd_addr;
> @@ -481,7 +480,7 @@ i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
>   {
>          return test_bit(n, ppgtt->pdp.used_pdpes) ?
>                  px_dma(ppgtt->pdp.page_directory[n]) :
> -               px_dma(ppgtt->scratch_pd);
> +               px_dma(ppgtt->base.scratch_pd);
>   }
>
>   int i915_gem_gtt_init(struct drm_device *dev);
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 19/21] drm/i915/gtt: One instance of scratch page table/directory
  2015-05-22 17:05 ` [PATCH 19/21] drm/i915/gtt: One instance of scratch page table/directory Mika Kuoppala
@ 2015-06-03 16:57   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 16:57 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> As we use one scratch page for all ppgtt instances, we can
> use one scratch page table and scratch directory across
> all ppgtt instances, saving 2 pages + structs per ppgtt.
>
> v2: Rebase
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 273 +++++++++++++++++++++++-------------
>   1 file changed, 178 insertions(+), 95 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 61f4da0..ab113ce 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -430,6 +430,17 @@ fail_bitmap:
>          return ERR_PTR(ret);
>   }
>
> +
> +static int setup_scratch(struct i915_address_space *vm)
> +{
> +       struct i915_address_space *ggtt_vm = &to_i915(vm->dev)->gtt.base;
> +
> +       if (i915_is_ggtt(vm))
> +               return setup_scratch_ggtt(vm);
> +
> +       vm->scratch_page = ggtt_vm->scratch_page;
> +       vm->scratch_pt = ggtt_vm->scratch_pt;
> +       vm->scratch_pd = ggtt_vm->scratch_pd;

I'll need to change this a bit for 48b, so it doesn't happen inside 
setup_scratch_ggtt (scratch_pdp wouldn't make sense in ggtt); I'll still 
keep only 1 instance.

> +
> +       return 0;
> +}
> +

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 20/21] drm/i915/gtt: Use nonatomic bitmap ops
  2015-05-22 17:05 ` [PATCH 20/21] drm/i915/gtt: Use nonatomic bitmap ops Mika Kuoppala
@ 2015-06-03 17:07   ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 17:07 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> There is no need for atomicity here. Convert all bitmap
> operations to nonatomic variants.
>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ab113ce..95c39e5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -842,7 +842,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>
>                  gen8_initialize_pt(&ppgtt->base, pt);
>                  pd->page_table[pde] = pt;
> -               set_bit(pde, new_pts);
> +               __set_bit(pde, new_pts);
>          }
>
>          return 0;
> @@ -900,7 +900,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>
>                  gen8_initialize_pd(&ppgtt->base, pd);
>                  pdp->page_directory[pdpe] = pd;
> -               set_bit(pdpe, new_pds);
> +               __set_bit(pdpe, new_pds);
>          }
>
>          return 0;
> @@ -1040,7 +1040,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>                                     gen8_pte_count(pd_start, pd_len));
>
>                          /* Our pde is now pointing to the pagetable, pt */
> -                       set_bit(pde, pd->used_pdes);
> +                       __set_bit(pde, pd->used_pdes);
>
>                          /* Map the PDE to the page table */
>                          page_directory[pde] = gen8_pde_encode(px_dma(pt),
> @@ -1052,7 +1052,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>
>                  kunmap_px(ppgtt, page_directory);
>
> -               set_bit(pdpe, ppgtt->pdp.used_pdpes);
> +               __set_bit(pdpe, ppgtt->pdp.used_pdpes);
>          }
>
>          free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> @@ -1497,7 +1497,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>                  gen6_initialize_pt(vm, pt);
>
>                  ppgtt->pd.page_table[pde] = pt;
> -               set_bit(pde, new_page_tables);
> +               __set_bit(pde, new_page_tables);
>                  trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
>          }
>
> @@ -1511,7 +1511,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>                  bitmap_set(tmp_bitmap, gen6_pte_index(start),
>                             gen6_pte_count(start, length));
>
> -               if (test_and_clear_bit(pde, new_page_tables))
> +               if (__test_and_clear_bit(pde, new_page_tables))
>                          gen6_write_pde(&ppgtt->pd, pde, pt);
>
>                  trace_i915_page_table_entry_map(vm, pde, pt,
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions
  2015-05-22 17:05 ` [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions Mika Kuoppala
@ 2015-06-03 17:14   ` Michel Thierry
  2015-06-11 17:52     ` Mika Kuoppala
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-03 17:14 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> Introduce base page handling functions in order of
> alloc, free, init. No functional changes.

Can you change this sentence like this?

   _Keep/Maintain_ base page handling functions in order of
   alloc, free and init. No functional changes.

_Introduce_ made me think there was something new in this patch.


>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++-------------------
>   1 file changed, 27 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 95c39e5..24f31ad 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -381,24 +381,6 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
>          fill_page_dma(dev, p, v);
>   }
>
> -static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
> -{
> -       cleanup_px(dev, pt);
> -       kfree(pt->used_ptes);
> -       kfree(pt);
> -}
> -
> -static void gen8_initialize_pt(struct i915_address_space *vm,
> -                              struct i915_page_table *pt)
> -{
> -       gen8_pte_t scratch_pte;
> -
> -       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> -                                     I915_CACHE_LLC, true);
> -
> -       fill_px(vm->dev, pt, scratch_pte);
> -}
> -
>   static struct i915_page_table *alloc_pt(struct drm_device *dev)
>   {
>          struct i915_page_table *pt;
> @@ -430,6 +412,24 @@ fail_bitmap:
>          return ERR_PTR(ret);
>   }
>
> +static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
> +{
> +       cleanup_px(dev, pt);
> +       kfree(pt->used_ptes);
> +       kfree(pt);
> +}
> +
> +static void gen8_initialize_pt(struct i915_address_space *vm,
> +                              struct i915_page_table *pt)
> +{
> +       gen8_pte_t scratch_pte;
> +
> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> +                                     I915_CACHE_LLC, true);
> +
> +       fill_px(vm->dev, pt, scratch_pte);
> +}
> +
>   static void gen6_initialize_pt(struct i915_address_space *vm,
>                                 struct i915_page_table *pt)
>   {
> @@ -441,15 +441,6 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>          fill32_px(vm->dev, pt, scratch_pte);
>   }
>
> -static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
> -{
> -       if (px_page(pd)) {
> -               cleanup_px(dev, pd);
> -               kfree(pd->used_pdes);
> -               kfree(pd);
> -       }
> -}
> -
>   static struct i915_page_directory *alloc_pd(struct drm_device *dev)
>   {
>          struct i915_page_directory *pd;
> @@ -478,6 +469,15 @@ free_pd:
>          return ERR_PTR(ret);
>   }
>
> +static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
> +{
> +       if (px_page(pd)) {
> +               cleanup_px(dev, pd);
> +               kfree(pd->used_pdes);
> +               kfree(pd);
> +       }
> +}
> +
>   static void gen8_initialize_pd(struct i915_address_space *vm,
>                                 struct i915_page_directory *pd)
>   {
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-06-01 15:53     ` Chris Wilson
@ 2015-06-04 11:08       ` Tomas Elf
  2015-06-04 11:24         ` Chris Wilson
  0 siblings, 1 reply; 86+ messages in thread
From: Tomas Elf @ 2015-06-04 11:08 UTC (permalink / raw)
  To: Chris Wilson, Mika Kuoppala, intel-gfx, miku

On 01/06/2015 16:53, Chris Wilson wrote:
> On Wed, May 27, 2015 at 07:12:02PM +0100, Tomas Elf wrote:
>> On 22/05/2015 18:05, Mika Kuoppala wrote:
>>> During review of dynamic page tables series, I was able
>>> to hit a lite restore bug with execlists. I assume that
>>> due to incorrect pd, the batch run out of legit address space
>>> and into the scratch page area. The ACTHD was increasing
>>> due to scratch being all zeroes (MI_NOOPs). And as gen8
>>> address space is quite large, the hangcheck happily waited
>>> for a long long time, keeping the process effectively stuck.
>>>
>>> According to Chris Wilson any modern gpu will grind to halt
>>> if it encounters commands of all ones. This seemed to do the
>>> trick and hang was declared promptly when the gpu wandered into
>>> the scratch land.
>>>
>>> v2: Use 0xffff00ff pattern (Chris)
>>
>> Just for my own benefit:
>>
>> 1. Is there any particular reason for this pattern rather than 0xffffffff?
>
> It is more obvious when userspace reads from the page and copies it into
> its own data structures or surfaces. See below, if this does impact
> userspace we should probably revert this patch anyway.
>
>> 2. Someone please correct me if I'm wrong here but at least based on
>> my own experiences with gen9 submitting batch buffers filled with
>> bad instructions (0xffffffff) to the GPU does not hang it. I'm
>> guessing that is because there's allegedly a hardware security
>> parser that MI_NOOPs out invalid instructions during execution. If
>> that's the case here then I guess we might have to come up with
>> something else for gen9+ if we want to induce engine hangs once the
>> execution reaches the scratch page?
>
> It's not a problem, there will be a GPU hang eventually (in theory at
> least). Mika is just trying to shortcircuit that by causing an immediate
> hang.

Interesting! Why do you think the execution would hang eventually? 
Simply because it would end up at an invalid opcode somewhere in memory 
at some point, which would trig a hang? If we're talking gen9 then I'm 
assuming that the security parser would MI_NOOP out all invalid opcodes 
before they get a chance to hang the GPU (that would certainly be the 
case with 0xffffffff and 0xffff00ff, based on my own experiments). Or 
are we assuming that at some point the execution would end up at a valid 
opcode that would still put the GPU in an arbitrary execution state that 
would take a practically infinite amount of time to complete? Such as 
executing a semaphore instruction that no thread would ever think of 
signalling?

Thanks,
Tomas

>
>> On the other hand, on gen9+ page faulting is supposedly not broken
>> anymore so maybe we don't need the scratch page to begin with there
>> so maybe it's all moot at that point? Again, if I'm making no sense
>> here feel free to set things straight, I'm very curious about how
>> all of this is supposed to work.
>
> Generating a pagefault for invalid access is an ABI change and requires
> opt-in (we have discussed context flags in the past). The most obvious
> example is the CS prefetch, which we have to prevent generating faults by
> providing guard pages (on older chipsets at least). But as we have been
> historically lax on allowing userspace to access invalid pages, we have
> to assume that userspace has been taking advantage of that.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-06-04 11:08       ` Tomas Elf
@ 2015-06-04 11:24         ` Chris Wilson
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Wilson @ 2015-06-04 11:24 UTC (permalink / raw)
  To: Tomas Elf; +Cc: intel-gfx, miku

On Thu, Jun 04, 2015 at 12:08:17PM +0100, Tomas Elf wrote:
> On 01/06/2015 16:53, Chris Wilson wrote:
> >On Wed, May 27, 2015 at 07:12:02PM +0100, Tomas Elf wrote:
> >>On 22/05/2015 18:05, Mika Kuoppala wrote:
> >>>During review of dynamic page tables series, I was able
> >>>to hit a lite restore bug with execlists. I assume that
> >>>due to incorrect pd, the batch run out of legit address space
> >>>and into the scratch page area. The ACTHD was increasing
> >>>due to scratch being all zeroes (MI_NOOPs). And as gen8
> >>>address space is quite large, the hangcheck happily waited
> >>>for a long long time, keeping the process effectively stuck.
> >>>
> >>>According to Chris Wilson any modern gpu will grind to halt
> >>>if it encounters commands of all ones. This seemed to do the
> >>>trick and hang was declared promptly when the gpu wandered into
> >>>the scratch land.
> >>>
> >>>v2: Use 0xffff00ff pattern (Chris)
> >>
> >>Just for my own benefit:
> >>
> >>1. Is there any particular reason for this pattern rather than 0xffffffff?
> >
> >It is more obvious when userspace reads from the page and copies it into
> >its own data structures or surfaces. See below, if this does impact
> >userspace we should probably revert this patch anyway.
> >
> >>2. Someone please correct me if I'm wrong here but at least based on
> >>my own experiences with gen9 submitting batch buffers filled with
> >>bad instructions (0xffffffff) to the GPU does not hang it. I'm
> >>guessing that is because there's allegedly a hardware security
> >>parser that MI_NOOPs out invalid instructions during execution. If
> >>that's the case here then I guess we might have to come up with
> >>something else for gen9+ if we want to induce engine hangs once the
> >>execution reaches the scratch page?
> >
> >It's not a problem, there will be a GPU hang eventually (in theory at
> >least). Mika is just trying to shortcircuit that by causing an immediate
> >hang.
> 
> Interesting! Why do you think the execution would hang eventually?
> Simply because it would end up at an invalid opcode somewhere in
> memory at some point, which would trig a hang? If we're talking gen9
> then I'm assuming that the security parser would MI_NOOP out all
> invalid opcodes before they get a chance to hang the GPU (that would
> certainly be the case with 0xffffffff and 0xffff00ff, based on my
> own experiments). Or are we assuming that at some point the
> execution would end up at a valid opcode that would still put the
> GPU in an arbitrary execution state that would take a practically
> infinite amount of time to complete? Such as executing a semaphore
> instruction that no thread would ever think of signalling?

Because even if it has to execute the full 48bit address space, we will
eventually realise that no progress is being made (due to looping) and
declare a hung GPU. A real problem is when we wrap and find a valid
batch and so end up advancing even though the GPU's view of memory is
incoherent.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-05-29 12:53     ` Michel Thierry
@ 2015-06-10 11:42       ` Michel Thierry
  2015-06-11  7:31         ` Dave Gordon
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-10 11:42 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 5/29/2015 1:53 PM, Michel Thierry wrote:
> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>> pdps. However the TLB invalidation only has effect on levels below
>>> the pdps. This means that if pdps change, hw might access with
>>> stale pdp entry.
>>>
>>> To combat this problem, preallocate the top pdps so that hw sees
>>> them as immutable for each context.
>>>
>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>> +++++++++++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>   3 files changed, 68 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> index 0ffd459..1a5ad4c 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> @@ -941,6 +941,48 @@ err_out:
>>>          return ret;
>>>   }
>>>
>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>> the
>>> + * top level pdps but the tlb invalidation only invalidates the
>>> lower levels.
>>> + * This might lead to hw fetching with stale pdp entries if top level
>>> + * structure changes, ie va space grows with dynamic page tables.
>>> + */
>>> +static bool hw_wont_flush_pdp_tlbs(struct i915_hw_ppgtt *ppgtt)
>>> +{
>>> +       struct drm_device *dev = ppgtt->base.dev;
>>> +
>>> +       if (GEN8_CTX_ADDRESSING_MODE != LEGACY_32B_CONTEXT)
>>> +               return false;
>>> +
>>> +       if (IS_BROADWELL(dev) || IS_SKYLAKE(dev))
>>> +               return true;
>> The pd load restriction is also true for chv and bxt.
>> And to be safe, we can set reg 0x4030 bit14 to '1' (PD load disable).
>> Since this register is not part of the context state, it can be added
>> with the other platform workarounds in intel_pm.c.
>>
>>> +
>>> +       return false;
>>> +}
>>> +
>>> +static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
>>> +{
>>> +       unsigned long *new_page_dirs, **new_page_tables;
>>> +       int ret;
>>> +
>>> +       /* We allocate temp bitmap for page tables for no gain
>>> +        * but as this is for init only, lets keep the things simple
>>> +        */
>>> +       ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
>>> +       if (ret)
>>> +               return ret;
>>> +
>>> +       /* Allocate for all pdps regardless of how the ppgtt
>>> +        * was defined.
>>> +        */
>>> +       ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp,
>>> +                                               0, 1ULL << 32,
>>> +                                               new_page_dirs);
>>> +
>>> +       free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);

Second thoughts on this, just set the used_pdpes bits, and then the 
cleanup function will free these pdps correctly:

+    /* mark all pdps as used, otherwise won't clean them correctly */
+    bitmap_fill(ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES);

>>> +
>>> +       return ret;
>>> +}
>>> +
>>>   /*
>>>    * GEN8 legacy ppgtt programming is accomplished through a max 4
>>> PDP registers
>>>    * with a net effect resembling a 2-level page table in normal x86
>>> terms. Each
>>> @@ -972,6 +1014,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt
>>> *ppgtt)
>>>
>>>          ppgtt->switch_mm = gen8_mm_switch;
>>>
>>> +       if (hw_wont_flush_pdp_tlbs(ppgtt)) {
>>> +               /* Avoid the tlb flush bug by preallocating
>>> +                * whole top level pdp structure so it stays
>>> +                * static even if our va space grows.
>>> +                */
>>> +               return gen8_preallocate_top_level_pdps(ppgtt);
>>> +       }
>>> +
> Also, we will need the same hw_wont_flush check in the cleanup function,
> and iterate each_pdpe (pd) from 0 to 4GiB (otherwise we will leak some
> of the preallocated page dirs).
>
>>>          return 0;
>>>   }
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
>>> index 6eeba63..334324b 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> @@ -2777,6 +2777,23 @@ enum skl_disp_power_wells {
>>>   #define VLV_CLK_CTL2                   0x101104
>>>   #define   CLK_CTL2_CZCOUNT_30NS_SHIFT  28
>>>
>>> +/* Context descriptor format bits */
>>> +#define GEN8_CTX_VALID                 (1<<0)
>>> +#define GEN8_CTX_FORCE_PD_RESTORE      (1<<1)
>>> +#define GEN8_CTX_FORCE_RESTORE         (1<<2)
>>> +#define GEN8_CTX_L3LLC_COHERENT                (1<<5)
>>> +#define GEN8_CTX_PRIVILEGE             (1<<8)
>>> +
>>> +enum {
>>> +       ADVANCED_CONTEXT = 0,
>>> +       LEGACY_32B_CONTEXT,
>>> +       ADVANCED_AD_CONTEXT,
>>> +       LEGACY_64B_CONTEXT
>>> +};
>>> +
>>> +#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
>>> +#define GEN8_CTX_ADDRESSING_MODE       LEGACY_32B_CONTEXT
>>> +
>>>   /*
>>>    * Overlay regs
>>>    */
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>>> b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 96ae90a..d793d4e 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -183,12 +183,6 @@
>>>   #define CTX_R_PWR_CLK_STATE            0x42
>>>   #define CTX_GPGPU_CSR_BASE_ADDRESS     0x44
>>>
>>> -#define GEN8_CTX_VALID (1<<0)
>>> -#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
>>> -#define GEN8_CTX_FORCE_RESTORE (1<<2)
>>> -#define GEN8_CTX_L3LLC_COHERENT (1<<5)
>>> -#define GEN8_CTX_PRIVILEGE (1<<8)
>>> -
>>>   #define ASSIGN_CTX_PDP(ppgtt, reg_state, n) { \
>>>          const u64 _addr = test_bit(n, ppgtt->pdp.used_pdpes) ? \
>>>                  ppgtt->pdp.page_directory[n]->daddr : \
>>> @@ -198,13 +192,6 @@
>>>   }
>>>
>>>   enum {
>>> -       ADVANCED_CONTEXT = 0,
>>> -       LEGACY_CONTEXT,
>>> -       ADVANCED_AD_CONTEXT,
>>> -       LEGACY_64B_CONTEXT
>>> -};
>>> -#define GEN8_CTX_MODE_SHIFT 3
>>> -enum {
>>>          FAULT_AND_HANG = 0,
>>>          FAULT_AND_HALT, /* Debug only */
>>>          FAULT_AND_STREAM,
>>> @@ -273,7 +260,7 @@ static uint64_t execlists_ctx_descriptor(struct
>>> intel_engine_cs *ring,
>>>          WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>>>
>>>          desc = GEN8_CTX_VALID;
>>> -       desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
>>> +       desc |= GEN8_CTX_ADDRESSING_MODE <<
>>> GEN8_CTX_ADDRESSING_MODE_SHIFT;
>>>          if (IS_GEN8(ctx_obj->base.dev))
>>>                  desc |= GEN8_CTX_L3LLC_COHERENT;
>>>          desc |= GEN8_CTX_PRIVILEGE;
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-06-10 11:42       ` Michel Thierry
@ 2015-06-11  7:31         ` Dave Gordon
  2015-06-11 10:46           ` Michel Thierry
  2015-06-11 13:57           ` Mika Kuoppala
  0 siblings, 2 replies; 86+ messages in thread
From: Dave Gordon @ 2015-06-11  7:31 UTC (permalink / raw)
  To: Michel Thierry, Mika Kuoppala, intel-gfx

On 10/06/15 12:42, Michel Thierry wrote:
> On 5/29/2015 1:53 PM, Michel Thierry wrote:
>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>>> pdps. However the TLB invalidation only has effect on levels below
>>>> the pdps. This means that if pdps change, hw might access with
>>>> stale pdp entry.
>>>>
>>>> To combat this problem, preallocate the top pdps so that hw sees
>>>> them as immutable for each context.
>>>>
>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>>> +++++++++++++++++++++++++++++++++++++
>>>>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>>   3 files changed, 68 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> index 0ffd459..1a5ad4c 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> @@ -941,6 +941,48 @@ err_out:
>>>>          return ret;
>>>>   }
>>>>
>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>>> the
>>>> + * top level pdps but the tlb invalidation only invalidates the
>>>> lower levels.
>>>> + * This might lead to hw fetching with stale pdp entries if top level
>>>> + * structure changes, ie va space grows with dynamic page tables.
>>>> + */

Is this still necessary if we reload PDPs via LRI instructions whenever
the address map has changed? That always (AFAICT) causes sufficient
invalidation, so then we might not need to preallocate at all :)

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-06-11  7:31         ` Dave Gordon
@ 2015-06-11 10:46           ` Michel Thierry
  2015-06-11 13:57           ` Mika Kuoppala
  1 sibling, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-11 10:46 UTC (permalink / raw)
  To: Dave Gordon, Mika Kuoppala, intel-gfx

On 6/11/2015 8:31 AM, Dave Gordon wrote:
> On 10/06/15 12:42, Michel Thierry wrote:
>> On 5/29/2015 1:53 PM, Michel Thierry wrote:
>>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>>>> pdps. However the TLB invalidation only has effect on levels below
>>>>> the pdps. This means that if pdps change, hw might access with
>>>>> stale pdp entry.
>>>>>
>>>>> To combat this problem, preallocate the top pdps so that hw sees
>>>>> them as immutable for each context.
>>>>>
>>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>>>    3 files changed, 68 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> index 0ffd459..1a5ad4c 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> @@ -941,6 +941,48 @@ err_out:
>>>>>           return ret;
>>>>>    }
>>>>>
>>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>>>> the
>>>>> + * top level pdps but the tlb invalidation only invalidates the
>>>>> lower levels.
>>>>> + * This might lead to hw fetching with stale pdp entries if top level
>>>>> + * structure changes, ie va space grows with dynamic page tables.
>>>>> + */
>
> Is this still necessary if we reload PDPs via LRI instructions whenever
> the address map has changed? That always (AFAICT) causes sufficient
> invalidation, so then we might not need to preallocate at all :)

Correct, if we reload PDPs via LRI [1], the preallocation of top pdps is 
not needed.

[1] 1433954816-13787-2-git-send-email-michel.thierry@intel.com
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-06-11  7:31         ` Dave Gordon
  2015-06-11 10:46           ` Michel Thierry
@ 2015-06-11 13:57           ` Mika Kuoppala
  2015-08-11  5:05             ` Zhiyuan Lv
  1 sibling, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 13:57 UTC (permalink / raw)
  To: Dave Gordon, Michel Thierry, intel-gfx

Dave Gordon <david.s.gordon@intel.com> writes:

> On 10/06/15 12:42, Michel Thierry wrote:
>> On 5/29/2015 1:53 PM, Michel Thierry wrote:
>>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>>>> pdps. However the TLB invalidation only has effect on levels below
>>>>> the pdps. This means that if pdps change, hw might access with
>>>>> stale pdp entry.
>>>>>
>>>>> To combat this problem, preallocate the top pdps so that hw sees
>>>>> them as immutable for each context.
>>>>>
>>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>>>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>>>   3 files changed, 68 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> index 0ffd459..1a5ad4c 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>> @@ -941,6 +941,48 @@ err_out:
>>>>>          return ret;
>>>>>   }
>>>>>
>>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>>>> the
>>>>> + * top level pdps but the tlb invalidation only invalidates the
>>>>> lower levels.
>>>>> + * This might lead to hw fetching with stale pdp entries if top level
>>>>> + * structure changes, ie va space grows with dynamic page tables.
>>>>> + */
>
> Is this still necessary if we reload PDPs via LRI instructions whenever
> the address map has changed? That always (AFAICT) causes sufficient
> invalidation, so then we might not need to preallocate at all :)
>

LRI reload gets my vote. Please ignore this patch.
-Mika

> .Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/21] drm/i915/gtt: Check va range against vm size
  2015-06-01 15:33   ` Joonas Lahtinen
@ 2015-06-11 14:23     ` Mika Kuoppala
  2015-06-24 14:48       ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 14:23 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx, miku

Joonas Lahtinen <joonas.lahtinen@linux.intel.com> writes:

> On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
>> Check the allocation area against the known end
>> of address space instead of against fixed value.
>> 
>> v2: Return ENODEV on internal bugs (Chris)
>> 
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_gem_gtt.c | 18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 1a5ad4c..76de781 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -756,9 +756,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>>  
>>  	WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
>>  
>> -	/* FIXME: upper bound must not overflow 32 bits  */
>> -	WARN_ON((start + length) > (1ULL << 32));
>> -
>>  	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
>>  		if (pd)
>>  			continue;
>> @@ -857,7 +854,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>>  	 * actually use the other side of the canonical address space.
>>  	 */
>>  	if (WARN_ON(start + length < start))
>> -		return -ERANGE;
>> +		return -ENODEV;
>> +
>> +	if (WARN_ON(start + length > ppgtt->base.total))
>> +		return -ENODEV;
>>  
>>  	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
>>  	if (ret)
>> @@ -1341,7 +1341,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>>  }
>>  
>>  static int gen6_alloc_va_range(struct i915_address_space *vm,
>> -			       uint64_t start, uint64_t length)
>> +			       uint64_t start_in, uint64_t length_in)
>>  {
>>  	DECLARE_BITMAP(new_page_tables, I915_PDES);
>>  	struct drm_device *dev = vm->dev;
>> @@ -1349,11 +1349,15 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>>  	struct i915_hw_ppgtt *ppgtt =
>>  				container_of(vm, struct i915_hw_ppgtt, base);
>>  	struct i915_page_table *pt;
>> -	const uint32_t start_save = start, length_save = length;
>> +	uint32_t start, length, start_save, length_save;
>>  	uint32_t pde, temp;
>>  	int ret;
>>  
>> -	WARN_ON(upper_32_bits(start));
>> +	if (WARN_ON(start_in + length_in > ppgtt->base.total))
>> +		return -ENODEV;
>> +
>> +	start = start_save = start_in;
>> +	length = length_save = length_in;
>
> Why is it not enough just to change the WARN_ON test?
>

Might have be cleaner yes. I just wanted to keep the pde iteration
loop using 32bit arithmetic like it used to.
-Mika

>>  
>>  	bitmap_zero(new_page_tables, I915_PDES);
>>  
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible
  2015-06-03 13:44   ` Michel Thierry
@ 2015-06-11 16:30     ` Mika Kuoppala
  2015-06-24 14:59       ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 16:30 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
>> Lay out scratch page structure in similar manner than other
>> paging structures. This allows us to use the same tools for
>> setup and teardown.
>>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 89 ++++++++++++++++++++-----------------
>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  9 ++--
>>   2 files changed, 54 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 4f9a000..43fa543 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -301,11 +301,12 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
>>          return pte;
>>   }
>>
>> -static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>> +static int __setup_page_dma(struct drm_device *dev,
>> +                           struct i915_page_dma *p, gfp_t flags)
>>   {
>>          struct device *device = &dev->pdev->dev;
>>
>> -       p->page = alloc_page(GFP_KERNEL);
>> +       p->page = alloc_page(flags);
>>          if (!p->page)
>>                  return -ENOMEM;
>>
>> @@ -320,6 +321,11 @@ static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>>          return 0;
>>   }
>>
>> +static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>> +{
>> +       return __setup_page_dma(dev, p, GFP_KERNEL);
>> +}
>> +
>>   static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>>   {
>>          if (WARN_ON(!p->page))
>> @@ -388,7 +394,8 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>>   {
>>          gen8_pte_t scratch_pte;
>>
>> -       scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
>> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>> +                                     I915_CACHE_LLC, true);
>>
>>          fill_px(vm->dev, pt, scratch_pte);
>>   }
>> @@ -515,7 +522,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>>          unsigned num_entries = length >> PAGE_SHIFT;
>>          unsigned last_pte, i;
>>
>> -       scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
>> +       scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
>>                                        I915_CACHE_LLC, use_scratch);
>>
>>          while (num_entries) {
>> @@ -1021,7 +1028,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>          uint32_t  pte, pde, temp;
>>          uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
>>
>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page), I915_CACHE_LLC, true, 0);
>>
>>          gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>>                  u32 expected;
>> @@ -1256,7 +1263,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>>          unsigned first_pte = first_entry % GEN6_PTES;
>>          unsigned last_pte, i;
>>
>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
>> +                                    I915_CACHE_LLC, true, 0);
>>
>>          while (num_entries) {
>>                  last_pte = first_pte + num_entries;
>> @@ -1314,9 +1322,10 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>>   {
>>          gen6_pte_t scratch_pte;
>>
>> -       WARN_ON(vm->scratch.addr == 0);
>> +       WARN_ON(px_dma(vm->scratch_page) == 0);
>>
>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
>> +                                    I915_CACHE_LLC, true, 0);
>>
>>          fill32_px(vm->dev, pt, scratch_pte);
>>   }
>> @@ -1553,13 +1562,14 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>>          struct drm_i915_private *dev_priv = dev->dev_private;
>>
>>          ppgtt->base.dev = dev;
>> -       ppgtt->base.scratch = dev_priv->gtt.base.scratch;
>> +       ppgtt->base.scratch_page = dev_priv->gtt.base.scratch_page;
>>
>>          if (INTEL_INFO(dev)->gen < 8)
>>                  return gen6_ppgtt_init(ppgtt);
>>          else
>>                  return gen8_ppgtt_init(ppgtt);
>>   }
>> +
>>   int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>>   {
>>          struct drm_i915_private *dev_priv = dev->dev_private;
>> @@ -1874,7 +1884,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>>                   first_entry, num_entries, max_entries))
>>                  num_entries = max_entries;
>>
>> -       scratch_pte = gen8_pte_encode(vm->scratch.addr,
>> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>>                                        I915_CACHE_LLC,
>>                                        use_scratch);
>>          for (i = 0; i < num_entries; i++)
>> @@ -1900,7 +1910,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>>                   first_entry, num_entries, max_entries))
>>                  num_entries = max_entries;
>>
>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, use_scratch, 0);
>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
>> +                                    I915_CACHE_LLC, use_scratch, 0);
>>
>>          for (i = 0; i < num_entries; i++)
>>                  iowrite32(scratch_pte, &gtt_base[i]);
>> @@ -2157,42 +2168,40 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>>          vm->cleanup(vm);
>>   }
>>
>> -static int setup_scratch_page(struct drm_device *dev)
>> +static int alloc_scratch_page(struct i915_address_space *vm)
>>   {
>> -       struct drm_i915_private *dev_priv = dev->dev_private;
>> -       struct page *page;
>> -       dma_addr_t dma_addr;
>> +       struct i915_page_scratch *sp;
>> +       int ret;
>> +
>> +       WARN_ON(vm->scratch_page);
>>
>> -       page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
>> -       if (page == NULL)
>> +       sp = kzalloc(sizeof(*sp), GFP_KERNEL);
>> +       if (sp == NULL)
>>                  return -ENOMEM;
>> -       set_pages_uc(page, 1);
>>
>> -#ifdef CONFIG_INTEL_IOMMU
>> -       dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
>> -                               PCI_DMA_BIDIRECTIONAL);
>> -       if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
>> -               __free_page(page);
>> -               return -EINVAL;
>> +       ret = __setup_page_dma(vm->dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
>> +       if (ret) {
>> +               kfree(sp);
>> +               return ret;
>>          }
>> -#else
>> -       dma_addr = page_to_phys(page);
>> -#endif
>
> Should we keep a no-iommu option?
> This seems to have been added for gen6 (gtt).
>

The dma_map_page should do the right thing with and
without iommu. I really don't understand why we
would need the no-iommu option.

If there is no iommu, we get nommu_map_page()
which is effectively page_to_phys().

-Mika


>> -       dev_priv->gtt.base.scratch.page = page;
>> -       dev_priv->gtt.base.scratch.addr = dma_addr;
>> +
>> +       set_pages_uc(px_page(sp), 1);
>> +
>> +       vm->scratch_page = sp;
>>
>>          return 0;
>>   }
>>
>> -static void teardown_scratch_page(struct drm_device *dev)
>> +static void free_scratch_page(struct i915_address_space *vm)
>>   {
>> -       struct drm_i915_private *dev_priv = dev->dev_private;
>> -       struct page *page = dev_priv->gtt.base.scratch.page;
>> +       struct i915_page_scratch *sp = vm->scratch_page;
>>
>> -       set_pages_wb(page, 1);
>> -       pci_unmap_page(dev->pdev, dev_priv->gtt.base.scratch.addr,
>> -                      PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>> -       __free_page(page);
>> +       set_pages_wb(px_page(sp), 1);
>> +
>> +       cleanup_px(vm->dev, sp);
>> +       kfree(sp);
>> +
>> +       vm->scratch_page = NULL;
>>   }
>>
>>   static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
>> @@ -2300,7 +2309,7 @@ static int ggtt_probe_common(struct drm_device *dev,
>>                  return -ENOMEM;
>>          }
>>
>> -       ret = setup_scratch_page(dev);
>> +       ret = alloc_scratch_page(&dev_priv->gtt.base);
>>          if (ret) {
>>                  DRM_ERROR("Scratch setup failed\n");
>>                  /* iounmap will also get called at remove, but meh */
>> @@ -2479,7 +2488,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>>          struct i915_gtt *gtt = container_of(vm, struct i915_gtt, base);
>>
>>          iounmap(gtt->gsm);
>> -       teardown_scratch_page(vm->dev);
>> +       free_scratch_page(vm);
>>   }
>>
>>   static int i915_gmch_probe(struct drm_device *dev,
>> @@ -2543,13 +2552,13 @@ int i915_gem_gtt_init(struct drm_device *dev)
>>                  dev_priv->gtt.base.cleanup = gen6_gmch_remove;
>>          }
>>
>> +       gtt->base.dev = dev;
>> +
>>          ret = gtt->gtt_probe(dev, &gtt->base.total, &gtt->stolen_size,
>>                               &gtt->mappable_base, &gtt->mappable_end);
>>          if (ret)
>>                  return ret;
>>
>> -       gtt->base.dev = dev;
>> -
>>          /* GMADR is the PCI mmio aperture into the global GTT. */
>>          DRM_INFO("Memory usable by graphics device = %lluM\n",
>>                   gtt->base.total >> 20);
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 006b839..1fd4041 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -217,6 +217,10 @@ struct i915_page_dma {
>>   #define px_page(px) (px_base(px)->page)
>>   #define px_dma(px) (px_base(px)->daddr)
>>
>> +struct i915_page_scratch {
>> +       struct i915_page_dma base;
>> +};
>> +
>>   struct i915_page_table {
>>          struct i915_page_dma base;
>>
>> @@ -243,10 +247,7 @@ struct i915_address_space {
>>          u64 start;              /* Start offset always 0 for dri2 */
>>          u64 total;              /* size addr space maps (ex. 2GB for ggtt) */
>>
>> -       struct {
>> -               dma_addr_t addr;
>> -               struct page *page;
>> -       } scratch;
>> +       struct i915_page_scratch *scratch_page;
>>
>>          /**
>>           * List of objects currently involved in rendering.
>> --
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 15/21] drm/i915/gtt: Fill scratch page
  2015-05-27 18:12   ` Tomas Elf
  2015-06-01 15:53     ` Chris Wilson
@ 2015-06-11 16:37     ` Mika Kuoppala
  1 sibling, 0 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 16:37 UTC (permalink / raw)
  To: Tomas Elf, intel-gfx; +Cc: miku

Tomas Elf <tomas.elf@intel.com> writes:

> On 22/05/2015 18:05, Mika Kuoppala wrote:
>> During review of dynamic page tables series, I was able
>> to hit a lite restore bug with execlists. I assume that
>> due to incorrect pd, the batch run out of legit address space
>> and into the scratch page area. The ACTHD was increasing
>> due to scratch being all zeroes (MI_NOOPs). And as gen8
>> address space is quite large, the hangcheck happily waited
>> for a long long time, keeping the process effectively stuck.
>>
>> According to Chris Wilson any modern gpu will grind to halt
>> if it encounters commands of all ones. This seemed to do the
>> trick and hang was declared promptly when the gpu wandered into
>> the scratch land.
>>
>> v2: Use 0xffff00ff pattern (Chris)
>
> Just for my own benefit:
>
> 1. Is there any particular reason for this pattern rather than 0xffffffff?
>
> 2. Someone please correct me if I'm wrong here but at least based on my 
> own experiences with gen9 submitting batch buffers filled with bad 
> instructions (0xffffffff) to the GPU does not hang it. I'm guessing that 
> is because there's allegedly a hardware security parser that MI_NOOPs 
> out invalid instructions during execution. If that's the case here then 
> I guess we might have to come up with something else for gen9+ if we 
> want to induce engine hangs once the execution reaches the scratch page?
>

If that is the case with gen9, then we need more ducttape. Like
that we always increase busyness in hangcheck (a little) to finally
declare a hang even tho no loops are detected.

But with this and gen < 9, the execution grinds to a halt and
I get hang in a 5 second window.

-Mika

> On the other hand, on gen9+ page faulting is supposedly not broken 
> anymore so maybe we don't need the scratch page to begin with there so 
> maybe it's all moot at that point? Again, if I'm making no sense here 
> feel free to set things straight, I'm very curious about how all of this 
> is supposed to work.
>
> Thanks,
> Tomas
>
>>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 43fa543..a2a0c88 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -2168,6 +2168,8 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>>   	vm->cleanup(vm);
>>   }
>>
>> +#define SCRATCH_PAGE_MAGIC 0xffff00ffffff00ffULL
>> +
>>   static int alloc_scratch_page(struct i915_address_space *vm)
>>   {
>>   	struct i915_page_scratch *sp;
>> @@ -2185,6 +2187,7 @@ static int alloc_scratch_page(struct i915_address_space *vm)
>>   		return ret;
>>   	}
>>
>> +	fill_px(vm->dev, sp, SCRATCH_PAGE_MAGIC);
>>   	set_pages_uc(px_page(sp), 1);
>>
>>   	vm->scratch_page = sp;
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+
  2015-06-01 14:51   ` Joonas Lahtinen
@ 2015-06-11 17:37     ` Mika Kuoppala
  2015-06-23 11:10       ` Joonas Lahtinen
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:37 UTC (permalink / raw)
  To: intel-gfx

When we touch gen8+ page maps, mark them dirty like we
do with previous gens.

v2: Update comment (Joonas)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 619dad1..0a906e4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -830,6 +830,16 @@ err_out:
 	return -ENOMEM;
 }
 
+/* PDE TLBs are a pain to invalidate on GEN8+. When we modify
+ * the page table structures, we mark them dirty so that
+ * context switching/execlist queuing code takes extra steps
+ * to ensure that tlbs are flushed.
+ */
+static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
+{
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
+}
+
 static int gen8_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start,
 			       uint64_t length)
@@ -915,6 +925,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	mark_tlbs_dirty(ppgtt);
 	return 0;
 
 err_out:
@@ -927,6 +938,7 @@ err_out:
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
 
@@ -1267,16 +1279,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
- * are switching between contexts with the same LRCA, we also must do a force
- * restore.
- */
-static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
-{
-	/* If current vm != vm, */
-	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
-}
-
 static void gen6_initialize_pt(struct i915_address_space *vm,
 		struct i915_page_table *pt)
 {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm.
  2015-05-26  7:15   ` Daniel Vetter
@ 2015-06-11 17:38     ` Mika Kuoppala
  0 siblings, 0 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:38 UTC (permalink / raw)
  To: intel-gfx

We can have exactly 4GB sized ppgtt with 32bit system.
size_t is inadequate for this.

v2: Convert a lot more places (Daniel)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/char/agp/intel-gtt.c        |  4 ++--
 drivers/gpu/drm/i915/i915_debugfs.c | 42 ++++++++++++++++++-------------------
 drivers/gpu/drm/i915/i915_gem.c     |  6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 12 +++++------
 include/drm/intel-gtt.h             |  4 ++--
 6 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 0b4188b..4734d02 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -1408,8 +1408,8 @@ int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
 }
 EXPORT_SYMBOL(intel_gmch_probe);
 
-void intel_gtt_get(size_t *gtt_total, size_t *stolen_size,
-		   phys_addr_t *mappable_base, unsigned long *mappable_end)
+void intel_gtt_get(u64 *gtt_total, size_t *stolen_size,
+		   phys_addr_t *mappable_base, u64 *mappable_end)
 {
 	*gtt_total = intel_private.gtt_total_entries << PAGE_SHIFT;
 	*stolen_size = intel_private.stolen_size;
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 92cf273..14f5d16 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -198,7 +198,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_address_space *vm = &dev_priv->gtt.base;
 	struct i915_vma *vma;
-	size_t total_obj_size, total_gtt_size;
+	u64 total_obj_size, total_gtt_size;
 	int count, ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -231,7 +231,7 @@ static int i915_gem_object_list_info(struct seq_file *m, void *data)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
+	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
 		   count, total_obj_size, total_gtt_size);
 	return 0;
 }
@@ -253,7 +253,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	size_t total_obj_size, total_gtt_size;
+	u64 total_obj_size, total_gtt_size;
 	LIST_HEAD(stolen);
 	int count, ret;
 
@@ -292,7 +292,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
+	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
 		   count, total_obj_size, total_gtt_size);
 	return 0;
 }
@@ -310,10 +310,10 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 
 struct file_stats {
 	struct drm_i915_file_private *file_priv;
-	int count;
-	size_t total, unbound;
-	size_t global, shared;
-	size_t active, inactive;
+	unsigned long count;
+	u64 total, unbound;
+	u64 global, shared;
+	u64 active, inactive;
 };
 
 static int per_file_stats(int id, void *ptr, void *data)
@@ -370,7 +370,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 
 #define print_file_stats(m, name, stats) do { \
 	if (stats.count) \
-		seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, %zu global, %zu shared, %zu unbound)\n", \
+		seq_printf(m, "%s: %lu objects, %llu bytes (%llu active, %llu inactive, %llu global, %llu shared, %llu unbound)\n", \
 			   name, \
 			   stats.count, \
 			   stats.total, \
@@ -420,7 +420,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 count, mappable_count, purgeable_count;
-	size_t size, mappable_size, purgeable_size;
+	u64 size, mappable_size, purgeable_size;
 	struct drm_i915_gem_object *obj;
 	struct i915_address_space *vm = &dev_priv->gtt.base;
 	struct drm_file *file;
@@ -437,17 +437,17 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 
 	size = count = mappable_size = mappable_count = 0;
 	count_objects(&dev_priv->mm.bound_list, global_list);
-	seq_printf(m, "%u [%u] objects, %zu [%zu] bytes in gtt\n",
+	seq_printf(m, "%u [%u] objects, %llu [%llu] bytes in gtt\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = mappable_size = mappable_count = 0;
 	count_vmas(&vm->active_list, mm_list);
-	seq_printf(m, "  %u [%u] active objects, %zu [%zu] bytes\n",
+	seq_printf(m, "  %u [%u] active objects, %llu [%llu] bytes\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = mappable_size = mappable_count = 0;
 	count_vmas(&vm->inactive_list, mm_list);
-	seq_printf(m, "  %u [%u] inactive objects, %zu [%zu] bytes\n",
+	seq_printf(m, "  %u [%u] inactive objects, %llu [%llu] bytes\n",
 		   count, mappable_count, size, mappable_size);
 
 	size = count = purgeable_size = purgeable_count = 0;
@@ -456,7 +456,7 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		if (obj->madv == I915_MADV_DONTNEED)
 			purgeable_size += obj->base.size, ++purgeable_count;
 	}
-	seq_printf(m, "%u unbound objects, %zu bytes\n", count, size);
+	seq_printf(m, "%u unbound objects, %llu bytes\n", count, size);
 
 	size = count = mappable_size = mappable_count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
@@ -473,16 +473,16 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 			++purgeable_count;
 		}
 	}
-	seq_printf(m, "%u purgeable objects, %zu bytes\n",
+	seq_printf(m, "%u purgeable objects, %llu bytes\n",
 		   purgeable_count, purgeable_size);
-	seq_printf(m, "%u pinned mappable objects, %zu bytes\n",
+	seq_printf(m, "%u pinned mappable objects, %llu bytes\n",
 		   mappable_count, mappable_size);
-	seq_printf(m, "%u fault mappable objects, %zu bytes\n",
+	seq_printf(m, "%u fault mappable objects, %llu bytes\n",
 		   count, size);
 
-	seq_printf(m, "%zu [%lu] gtt total\n",
+	seq_printf(m, "%llu [%llu] gtt total\n",
 		   dev_priv->gtt.base.total,
-		   dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
+		   (u64)dev_priv->gtt.mappable_end - dev_priv->gtt.base.start);
 
 	seq_putc(m, '\n');
 	print_batch_pool_stats(m, dev_priv);
@@ -519,7 +519,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 	uintptr_t list = (uintptr_t) node->info_ent->data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	size_t total_obj_size, total_gtt_size;
+	u64 total_obj_size, total_gtt_size;
 	int count, ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -541,7 +541,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 
 	mutex_unlock(&dev->struct_mutex);
 
-	seq_printf(m, "Total %d objects, %zu bytes, %zu GTT size\n",
+	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
 		   count, total_obj_size, total_gtt_size);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4446cb2..5073d49 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3671,9 +3671,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 size, fence_size, fence_alignment, unfenced_alignment;
-	unsigned long start =
+	u64 start =
 		flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
-	unsigned long end =
+	u64 end =
 		flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
 	struct i915_vma *vma;
 	int ret;
@@ -3729,7 +3729,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	 * attempt to find space.
 	 */
 	if (size > end) {
-		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%lu\n",
+		DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
 			  ggtt_view ? ggtt_view->type : 0,
 			  size,
 			  flags & PIN_MAPPABLE ? "mappable" : "total",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e43eb83..33b82da 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2105,7 +2105,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 void i915_gem_init_global_gtt(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned long gtt_size, mappable_size;
+	u64 gtt_size, mappable_size;
 
 	gtt_size = dev_priv->gtt.base.total;
 	mappable_size = dev_priv->gtt.mappable_end;
@@ -2360,13 +2360,13 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
 }
 
 static int gen8_gmch_probe(struct drm_device *dev,
-			   size_t *gtt_total,
+			   u64 *gtt_total,
 			   size_t *stolen,
 			   phys_addr_t *mappable_base,
-			   unsigned long *mappable_end)
+			   u64 *mappable_end)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	unsigned int gtt_size;
+	u64 gtt_size;
 	u16 snb_gmch_ctl;
 	int ret;
 
@@ -2408,10 +2408,10 @@ static int gen8_gmch_probe(struct drm_device *dev,
 }
 
 static int gen6_gmch_probe(struct drm_device *dev,
-			   size_t *gtt_total,
+			   u64 *gtt_total,
 			   size_t *stolen,
 			   phys_addr_t *mappable_base,
-			   unsigned long *mappable_end)
+			   u64 *mappable_end)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned int gtt_size;
@@ -2425,7 +2425,7 @@ static int gen6_gmch_probe(struct drm_device *dev,
 	 * a coarse sanity check.
 	 */
 	if ((*mappable_end < (64<<20) || (*mappable_end > (512<<20)))) {
-		DRM_ERROR("Unknown GMADR size (%lx)\n",
+		DRM_ERROR("Unknown GMADR size (%llx)\n",
 			  dev_priv->gtt.mappable_end);
 		return -ENXIO;
 	}
@@ -2459,10 +2459,10 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 }
 
 static int i915_gmch_probe(struct drm_device *dev,
-			   size_t *gtt_total,
+			   u64 *gtt_total,
 			   size_t *stolen,
 			   phys_addr_t *mappable_base,
-			   unsigned long *mappable_end)
+			   u64 *mappable_end)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret;
@@ -2527,9 +2527,9 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	gtt->base.dev = dev;
 
 	/* GMADR is the PCI mmio aperture into the global GTT. */
-	DRM_INFO("Memory usable by graphics device = %zdM\n",
+	DRM_INFO("Memory usable by graphics device = %lluM\n",
 		 gtt->base.total >> 20);
-	DRM_DEBUG_DRIVER("GMADR size = %ldM\n", gtt->mappable_end >> 20);
+	DRM_DEBUG_DRIVER("GMADR size = %lldM\n", gtt->mappable_end >> 20);
 	DRM_DEBUG_DRIVER("GTT stolen size = %zdM\n", gtt->stolen_size >> 20);
 #ifdef CONFIG_INTEL_IOMMU
 	if (intel_iommu_gfx_mapped)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0d46dd2..c343161 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -233,8 +233,8 @@ struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
 	struct list_head global_link;
-	unsigned long start;		/* Start offset always 0 for dri2 */
-	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
+	u64 start;		/* Start offset always 0 for dri2 */
+	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 
 	struct {
 		dma_addr_t addr;
@@ -300,9 +300,9 @@ struct i915_address_space {
  */
 struct i915_gtt {
 	struct i915_address_space base;
-	size_t stolen_size;		/* Total size of stolen memory */
 
-	unsigned long mappable_end;	/* End offset that we can CPU map */
+	size_t stolen_size;		/* Total size of stolen memory */
+	u64 mappable_end;		/* End offset that we can CPU map */
 	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
 	phys_addr_t mappable_base;	/* PA of our GMADR */
 
@@ -314,9 +314,9 @@ struct i915_gtt {
 	int mtrr;
 
 	/* global gtt ops */
-	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
+	int (*gtt_probe)(struct drm_device *dev, u64 *gtt_total,
 			  size_t *stolen, phys_addr_t *mappable_base,
-			  unsigned long *mappable_end);
+			  u64 *mappable_end);
 };
 
 struct i915_hw_ppgtt {
diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
index b08bdad..9e9bddaa5 100644
--- a/include/drm/intel-gtt.h
+++ b/include/drm/intel-gtt.h
@@ -3,8 +3,8 @@
 #ifndef _DRM_INTEL_GTT_H
 #define	_DRM_INTEL_GTT_H
 
-void intel_gtt_get(size_t *gtt_total, size_t *stolen_size,
-		   phys_addr_t *mappable_base, unsigned long *mappable_end);
+void intel_gtt_get(u64 *gtt_total, size_t *stolen_size,
+		   phys_addr_t *mappable_base, u64 *mappable_end);
 
 int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
 		     struct agp_bridge_data *bridge);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma
  2015-06-02 12:39   ` Michel Thierry
@ 2015-06-11 17:48     ` Mika Kuoppala
  2015-06-22 14:05       ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:48 UTC (permalink / raw)
  To: intel-gfx

All our paging structures have struct page and dma address
for that page.

Add struct for page/dma address pairs and use it to make
the setup and teardown for different paging structures
identical.

Include the page directory offset also in the struct for legacy
gens. Rename it to clearly point out that it is offset into the
ggtt.

v2: Add comment about ggtt_offset (Michel)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 120 ++++++++++++++----------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  25 +++++---
 3 files changed, 64 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 14f5d16..5a7a20a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2248,7 +2248,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.base.ggtt_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5832f53..65ee92f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -301,52 +301,39 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-#define i915_dma_unmap_single(px, dev) \
-	__i915_dma_unmap_single((px)->daddr, dev)
-
-static void __i915_dma_unmap_single(dma_addr_t daddr,
-				    struct drm_device *dev)
+static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 {
 	struct device *device = &dev->pdev->dev;
 
-	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
-}
-
-/**
- * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
- * @px:	Page table/dir/etc to get a DMA map for
- * @dev:	drm device
- *
- * Page table allocations are unified across all gens. They always require a
- * single 4k allocation, as well as a DMA mapping. If we keep the structs
- * symmetric here, the simple macro covers us for every page table type.
- *
- * Return: 0 if success.
- */
-#define i915_dma_map_single(px, dev) \
-	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
+	p->page = alloc_page(GFP_KERNEL);
+	if (!p->page)
+		return -ENOMEM;
 
-static int i915_dma_map_page_single(struct page *page,
-				    struct drm_device *dev,
-				    dma_addr_t *daddr)
-{
-	struct device *device = &dev->pdev->dev;
+	p->daddr = dma_map_page(device,
+				p->page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
 
-	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
-	if (dma_mapping_error(device, *daddr))
-		return -ENOMEM;
+	if (dma_mapping_error(device, p->daddr)) {
+		__free_page(p->page);
+		return -EINVAL;
+	}
 
 	return 0;
 }
 
-static void unmap_and_free_pt(struct i915_page_table *pt,
-			       struct drm_device *dev)
+static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 {
-	if (WARN_ON(!pt->page))
+	if (WARN_ON(!p->page))
 		return;
 
-	i915_dma_unmap_single(pt, dev);
-	__free_page(pt->page);
+	dma_unmap_page(&dev->pdev->dev, p->daddr, 4096, PCI_DMA_BIDIRECTIONAL);
+	__free_page(p->page);
+	memset(p, 0, sizeof(*p));
+}
+
+static void unmap_and_free_pt(struct i915_page_table *pt,
+			       struct drm_device *dev)
+{
+	cleanup_page_dma(dev, &pt->base);
 	kfree(pt->used_ptes);
 	kfree(pt);
 }
@@ -357,7 +344,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 	gen8_pte_t *pt_vaddr, scratch_pte;
 	int i;
 
-	pt_vaddr = kmap_atomic(pt->page);
+	pt_vaddr = kmap_atomic(pt->base.page);
 	scratch_pte = gen8_pte_encode(vm->scratch.addr,
 				      I915_CACHE_LLC, true);
 
@@ -386,19 +373,13 @@ static struct i915_page_table *alloc_pt(struct drm_device *dev)
 	if (!pt->used_ptes)
 		goto fail_bitmap;
 
-	pt->page = alloc_page(GFP_KERNEL);
-	if (!pt->page)
-		goto fail_page;
-
-	ret = i915_dma_map_single(pt, dev);
+	ret = setup_page_dma(dev, &pt->base);
 	if (ret)
-		goto fail_dma;
+		goto fail_page_m;
 
 	return pt;
 
-fail_dma:
-	__free_page(pt->page);
-fail_page:
+fail_page_m:
 	kfree(pt->used_ptes);
 fail_bitmap:
 	kfree(pt);
@@ -409,9 +390,8 @@ fail_bitmap:
 static void unmap_and_free_pd(struct i915_page_directory *pd,
 			      struct drm_device *dev)
 {
-	if (pd->page) {
-		i915_dma_unmap_single(pd, dev);
-		__free_page(pd->page);
+	if (pd->base.page) {
+		cleanup_page_dma(dev, &pd->base);
 		kfree(pd->used_pdes);
 		kfree(pd);
 	}
@@ -431,18 +411,12 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 	if (!pd->used_pdes)
 		goto free_pd;
 
-	pd->page = alloc_page(GFP_KERNEL);
-	if (!pd->page)
-		goto free_bitmap;
-
-	ret = i915_dma_map_single(pd, dev);
+	ret = setup_page_dma(dev, &pd->base);
 	if (ret)
-		goto free_page;
+		goto free_bitmap;
 
 	return pd;
 
-free_page:
-	__free_page(pd->page);
 free_bitmap:
 	kfree(pd->used_pdes);
 free_pd:
@@ -523,10 +497,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 
 		pt = pd->page_table[pde];
 
-		if (WARN_ON(!pt->page))
+		if (WARN_ON(!pt->base.page))
 			continue;
 
-		page_table = pt->page;
+		page_table = pt->base.page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES)
@@ -573,7 +547,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
-			struct page *page_table = pt->page;
+			struct page *page_table = pt->base.page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -605,7 +579,7 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
 			     struct drm_device *dev)
 {
 	gen8_pde_t entry =
-		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+		gen8_pde_encode(dev, pt->base.daddr, I915_CACHE_LLC);
 	*pde = entry;
 }
 
@@ -618,7 +592,7 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 	struct i915_page_table *pt;
 	int i;
 
-	page_directory = kmap_atomic(pd->page);
+	page_directory = kmap_atomic(pd->base.page);
 	pt = ppgtt->scratch_pt;
 	for (i = 0; i < I915_PDES; i++)
 		/* Map the PDE to the page table */
@@ -633,7 +607,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
 {
 	int i;
 
-	if (!pd->page)
+	if (!pd->base.page)
 		return;
 
 	for_each_set_bit(i, pd->used_pdes, I915_PDES) {
@@ -884,7 +858,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	/* Allocations have completed successfully, so set the bitmaps, and do
 	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_pde_t *const page_directory = kmap_atomic(pd->page);
+		gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
 		struct i915_page_table *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -995,7 +969,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->base.daddr;
 		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -1006,7 +980,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
 		for (pte = 0; pte < GEN6_PTES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES) +
@@ -1041,7 +1015,7 @@ static void gen6_write_pde(struct i915_page_directory *pd,
 		container_of(pd, struct i915_hw_ppgtt, pd);
 	u32 pd_entry;
 
-	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->base.daddr);
 	pd_entry |= GEN6_PDE_VALID;
 
 	writel(pd_entry, ppgtt->pd_addr + pde);
@@ -1066,9 +1040,9 @@ static void gen6_write_page_range(struct drm_i915_private *dev_priv,
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
 
-	return (ppgtt->pd.pd_offset / 64) << 16;
+	return (ppgtt->pd.base.ggtt_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -1231,7 +1205,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES)
 			last_pte = GEN6_PTES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -1260,7 +1234,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -1288,7 +1262,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 	scratch_pte = vm->pte_encode(vm->scratch.addr,
 			I915_CACHE_LLC, true, 0);
 
-	pt_vaddr = kmap_atomic(pt->page);
+	pt_vaddr = kmap_atomic(pt->base.page);
 
 	for (i = 0; i < GEN6_PTES; i++)
 		pt_vaddr[i] = scratch_pte;
@@ -1504,11 +1478,11 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd.pd_offset =
+	ppgtt->pd.base.ggtt_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
 
 	ppgtt->pd_addr = (gen6_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_pte_t);
+		ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
 
 	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
@@ -1519,7 +1493,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 			 ppgtt->node.start / PAGE_SIZE);
 
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd.pd_offset << 10);
+		  ppgtt->pd.base.ggtt_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index da67542..0ccdf54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,19 +205,26 @@ struct i915_vma {
 #define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
 };
 
-struct i915_page_table {
+struct i915_page_dma {
 	struct page *page;
-	dma_addr_t daddr;
+	union {
+		dma_addr_t daddr;
+
+		/* For gen6/gen7 only. This is the offset in the GGTT
+		 * where the page directory entries for PPGTT begin
+		 */
+		uint32_t ggtt_offset;
+	};
+};
+
+struct i915_page_table {
+	struct i915_page_dma base;
 
 	unsigned long *used_ptes;
 };
 
 struct i915_page_directory {
-	struct page *page; /* NULL for GEN6-GEN7 */
-	union {
-		uint32_t pd_offset;
-		dma_addr_t daddr;
-	};
+	struct i915_page_dma base;
 
 	unsigned long *used_pdes;
 	struct i915_page_table *page_table[I915_PDES]; /* PDEs */
@@ -472,8 +479,8 @@ static inline dma_addr_t
 i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
 {
 	return test_bit(n, ppgtt->pdp.used_pdpes) ?
-		ppgtt->pdp.page_directory[n]->daddr :
-		ppgtt->scratch_pd->daddr;
+		ppgtt->pdp.page_directory[n]->base.daddr :
+		ppgtt->scratch_pd->base.daddr;
 }
 
 int i915_gem_gtt_init(struct drm_device *dev);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px
  2015-06-02 13:08   ` Michel Thierry
@ 2015-06-11 17:48     ` Mika Kuoppala
  2015-06-22 14:09       ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:48 UTC (permalink / raw)
  To: intel-gfx

All the paging structures are now similar and mapped for
dma. The unmapping is taken care of by common accessors, so
don't overload the reader with such details.

v2: Be consistent with goto labels (Michel)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 40 ++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 65ee92f..048c701 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -330,8 +330,7 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	memset(p, 0, sizeof(*p));
 }
 
-static void unmap_and_free_pt(struct i915_page_table *pt,
-			       struct drm_device *dev)
+static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 {
 	cleanup_page_dma(dev, &pt->base);
 	kfree(pt->used_ptes);
@@ -387,8 +386,7 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
-static void unmap_and_free_pd(struct i915_page_directory *pd,
-			      struct drm_device *dev)
+static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
 {
 	if (pd->base.page) {
 		cleanup_page_dma(dev, &pd->base);
@@ -409,17 +407,17 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 	pd->used_pdes = kcalloc(BITS_TO_LONGS(I915_PDES),
 				sizeof(*pd->used_pdes), GFP_KERNEL);
 	if (!pd->used_pdes)
-		goto free_pd;
+		goto fail_bitmap;
 
 	ret = setup_page_dma(dev, &pd->base);
 	if (ret)
-		goto free_bitmap;
+		goto fail_page_m;
 
 	return pd;
 
-free_bitmap:
+fail_page_m:
 	kfree(pd->used_pdes);
-free_pd:
+fail_bitmap:
 	kfree(pd);
 
 	return ERR_PTR(ret);
@@ -614,7 +612,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
 		if (WARN_ON(!pd->page_table[i]))
 			continue;
 
-		unmap_and_free_pt(pd->page_table[i], dev);
+		free_pt(dev, pd->page_table[i]);
 		pd->page_table[i] = NULL;
 	}
 }
@@ -630,11 +628,11 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
 	}
 
-	unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
-	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+	free_pd(ppgtt->base.dev, ppgtt->scratch_pd);
+	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
 }
 
 /**
@@ -687,7 +685,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pde, new_pts, I915_PDES)
-		unmap_and_free_pt(pd->page_table[pde], dev);
+		free_pt(dev, pd->page_table[pde]);
 
 	return -ENOMEM;
 }
@@ -745,7 +743,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
-		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
+		free_pd(dev, pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
@@ -903,11 +901,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
-			unmap_and_free_pt(ppgtt->pdp.page_directory[pdpe]->page_table[temp], vm->dev);
+			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
@@ -1353,7 +1351,7 @@ unwind_out:
 		struct i915_page_table *pt = ppgtt->pd.page_table[pde];
 
 		ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
-		unmap_and_free_pt(pt, vm->dev);
+		free_pt(vm->dev, pt);
 	}
 
 	mark_tlbs_dirty(ppgtt);
@@ -1372,11 +1370,11 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 
 	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			unmap_and_free_pt(pt, ppgtt->base.dev);
+			free_pt(ppgtt->base.dev, pt);
 	}
 
-	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
-	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
+	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
+	free_pd(ppgtt->base.dev, &ppgtt->pd);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
@@ -1426,7 +1424,7 @@ alloc:
 	return 0;
 
 err_out:
-	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
 	return ret;
 }
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma()
  2015-06-02 14:51   ` Michel Thierry
  2015-06-02 15:01     ` Ville Syrjälä
@ 2015-06-11 17:50     ` Mika Kuoppala
  2015-06-24 15:05       ` Michel Thierry
  1 sibling, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:50 UTC (permalink / raw)
  To: intel-gfx

When we setup page directories and tables, we point the entries
to a to the next level scratch structure. Make this generic
by introducing a fill_page_dma which maps and flushes. We also
need 32 bit variant for legacy gens.

v2: Fix flushes and handle valleyview (Ville)
v3: Now really fix flushes (Michel, Ville)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 74 ++++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 698423b..60796b7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -330,6 +330,34 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	memset(p, 0, sizeof(*p));
 }
 
+static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
+			  const uint64_t val)
+{
+	int i;
+	uint64_t * const vaddr = kmap_atomic(p->page);
+
+	for (i = 0; i < 512; i++)
+		vaddr[i] = val;
+
+	/* There are only few exceptions for gen >=6. chv and bxt.
+	 * And we are not sure about the latter so play safe for now.
+	 */
+	if (IS_CHERRYVIEW(dev) || IS_BROXTON(dev))
+		drm_clflush_virt_range(vaddr, PAGE_SIZE);
+
+	kunmap_atomic(vaddr);
+}
+
+static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
+			     const uint32_t val32)
+{
+	uint64_t v = val32;
+
+	v = v << 32 | val32;
+
+	fill_page_dma(dev, p, v);
+}
+
 static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 {
 	cleanup_page_dma(dev, &pt->base);
@@ -340,19 +368,11 @@ static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
 static void gen8_initialize_pt(struct i915_address_space *vm,
 			       struct i915_page_table *pt)
 {
-	gen8_pte_t *pt_vaddr, scratch_pte;
-	int i;
-
-	pt_vaddr = kmap_atomic(pt->base.page);
-	scratch_pte = gen8_pte_encode(vm->scratch.addr,
-				      I915_CACHE_LLC, true);
+	gen8_pte_t scratch_pte;
 
-	for (i = 0; i < GEN8_PTES; i++)
-		pt_vaddr[i] = scratch_pte;
+	scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-	if (!HAS_LLC(vm->dev))
-		drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-	kunmap_atomic(pt_vaddr);
+	fill_page_dma(vm->dev, &pt->base, scratch_pte);
 }
 
 static struct i915_page_table *alloc_pt(struct drm_device *dev)
@@ -585,20 +605,13 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
 			       struct i915_page_directory *pd)
 {
 	struct i915_hw_ppgtt *ppgtt =
-			container_of(vm, struct i915_hw_ppgtt, base);
-	gen8_pde_t *page_directory;
-	struct i915_page_table *pt;
-	int i;
+		container_of(vm, struct i915_hw_ppgtt, base);
+	gen8_pde_t scratch_pde;
 
-	page_directory = kmap_atomic(pd->base.page);
-	pt = ppgtt->scratch_pt;
-	for (i = 0; i < I915_PDES; i++)
-		/* Map the PDE to the page table */
-		__gen8_do_map_pt(page_directory + i, pt, vm->dev);
+	scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
+				      I915_CACHE_LLC);
 
-	if (!HAS_LLC(vm->dev))
-		drm_clflush_virt_range(page_directory, PAGE_SIZE);
-	kunmap_atomic(page_directory);
+	fill_page_dma(vm->dev, &pd->base, scratch_pde);
 }
 
 static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
@@ -1250,22 +1263,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 }
 
 static void gen6_initialize_pt(struct i915_address_space *vm,
-		struct i915_page_table *pt)
+			       struct i915_page_table *pt)
 {
-	gen6_pte_t *pt_vaddr, scratch_pte;
-	int i;
+	gen6_pte_t scratch_pte;
 
 	WARN_ON(vm->scratch.addr == 0);
 
-	scratch_pte = vm->pte_encode(vm->scratch.addr,
-			I915_CACHE_LLC, true, 0);
-
-	pt_vaddr = kmap_atomic(pt->base.page);
-
-	for (i = 0; i < GEN6_PTES; i++)
-		pt_vaddr[i] = scratch_pte;
+	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	kunmap_atomic(pt_vaddr);
+	fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
 }
 
 static int gen6_alloc_va_range(struct i915_address_space *vm,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page
  2015-06-03 10:55   ` Michel Thierry
@ 2015-06-11 17:50     ` Mika Kuoppala
  2015-06-24 15:06       ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:50 UTC (permalink / raw)
  To: intel-gfx

As there is flushing involved when we have done the cpu
write, make functions for mapping for cpu space. Make macros
to map any type of paging structure.

v2: Make it clear tha flushing kunmap is only for ppgtt (Ville)
v3: Flushing fixed (Ville, Michel). Removed superfluous semicolon

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 77 +++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 60796b7..3ac8671 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -330,15 +330,16 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
 	memset(p, 0, sizeof(*p));
 }
 
-static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
-			  const uint64_t val)
+static void *kmap_page_dma(struct i915_page_dma *p)
 {
-	int i;
-	uint64_t * const vaddr = kmap_atomic(p->page);
-
-	for (i = 0; i < 512; i++)
-		vaddr[i] = val;
+	return kmap_atomic(p->page);
+}
 
+/* We use the flushing unmap only with ppgtt structures:
+ * page directories, page tables and scratch pages.
+ */
+static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
+{
 	/* There are only few exceptions for gen >=6. chv and bxt.
 	 * And we are not sure about the latter so play safe for now.
 	 */
@@ -348,6 +349,21 @@ static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
 	kunmap_atomic(vaddr);
 }
 
+#define kmap_px(px) kmap_page_dma(&(px)->base)
+#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr))
+
+static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
+			  const uint64_t val)
+{
+	int i;
+	uint64_t * const vaddr = kmap_page_dma(p);
+
+	for (i = 0; i < 512; i++)
+		vaddr[i] = val;
+
+	kunmap_page_dma(dev, vaddr);
+}
+
 static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
 			     const uint32_t val32)
 {
@@ -503,7 +519,6 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	while (num_entries) {
 		struct i915_page_directory *pd;
 		struct i915_page_table *pt;
-		struct page *page_table;
 
 		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
 			continue;
@@ -518,22 +533,18 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		if (WARN_ON(!pt->base.page))
 			continue;
 
-		page_table = pt->base.page;
-
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES)
 			last_pte = GEN8_PTES;
 
-		pt_vaddr = kmap_atomic(page_table);
+		pt_vaddr = kmap_px(pt);
 
 		for (i = pte; i < last_pte; i++) {
 			pt_vaddr[i] = scratch_pte;
 			num_entries--;
 		}
 
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt);
 
 		pte = 0;
 		if (++pde == I915_PDES) {
@@ -565,18 +576,14 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table *pt = pd->page_table[pde];
-			struct page *page_table = pt->base.page;
-
-			pt_vaddr = kmap_atomic(page_table);
+			pt_vaddr = kmap_px(pt);
 		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES) {
-			if (!HAS_LLC(ppgtt->base.dev))
-				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-			kunmap_atomic(pt_vaddr);
+			kunmap_px(ppgtt, pt_vaddr);
 			pt_vaddr = NULL;
 			if (++pde == I915_PDES) {
 				pdpe++;
@@ -585,11 +592,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			pte = 0;
 		}
 	}
-	if (pt_vaddr) {
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
-		kunmap_atomic(pt_vaddr);
-	}
+
+	if (pt_vaddr)
+		kunmap_px(ppgtt, pt_vaddr);
 }
 
 static void __gen8_do_map_pt(gen8_pde_t * const pde,
@@ -869,7 +874,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	/* Allocations have completed successfully, so set the bitmaps, and do
 	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
+		gen8_pde_t *const page_directory = kmap_px(pd);
 		struct i915_page_table *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -899,10 +904,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			 * point we're still relying on insert_entries() */
 		}
 
-		if (!HAS_LLC(vm->dev))
-			drm_clflush_virt_range(page_directory, PAGE_SIZE);
-
-		kunmap_atomic(page_directory);
+		kunmap_px(ppgtt, page_directory);
 
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
@@ -991,7 +993,8 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
+		pt_vaddr = kmap_px(ppgtt->pd.page_table[pde]);
+
 		for (pte = 0; pte < GEN6_PTES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES) +
@@ -1013,7 +1016,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt_vaddr);
 	}
 }
 
@@ -1216,12 +1219,12 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES)
 			last_pte = GEN6_PTES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
+		pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
 
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt_vaddr);
 
 		num_entries -= last_pte - first_pte;
 		first_pte = 0;
@@ -1245,21 +1248,21 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
+			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
 				       cache_level, true, flags);
 
 		if (++act_pte == GEN6_PTES) {
-			kunmap_atomic(pt_vaddr);
+			kunmap_px(ppgtt, pt_vaddr);
 			pt_vaddr = NULL;
 			act_pt++;
 			act_pte = 0;
 		}
 	}
 	if (pt_vaddr)
-		kunmap_atomic(pt_vaddr);
+		kunmap_px(ppgtt, pt_vaddr);
 }
 
 static void gen6_initialize_pt(struct i915_address_space *vm,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions
  2015-06-03 17:14   ` Michel Thierry
@ 2015-06-11 17:52     ` Mika Kuoppala
  0 siblings, 0 replies; 86+ messages in thread
From: Mika Kuoppala @ 2015-06-11 17:52 UTC (permalink / raw)
  To: intel-gfx

Maintain base page handling functions in order of
alloc, free, init. No functional changes.

v2: s/Introduce/Maintain (Michel)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0bb4504..3f994fe 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -384,24 +384,6 @@ static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
 	fill_page_dma(dev, p, v);
 }
 
-static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
-{
-	cleanup_px(dev, pt);
-	kfree(pt->used_ptes);
-	kfree(pt);
-}
-
-static void gen8_initialize_pt(struct i915_address_space *vm,
-			       struct i915_page_table *pt)
-{
-	gen8_pte_t scratch_pte;
-
-	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
-				      I915_CACHE_LLC, true);
-
-	fill_px(vm->dev, pt, scratch_pte);
-}
-
 static struct i915_page_table *alloc_pt(struct drm_device *dev)
 {
 	struct i915_page_table *pt;
@@ -433,6 +415,24 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
+{
+	cleanup_px(dev, pt);
+	kfree(pt->used_ptes);
+	kfree(pt);
+}
+
+static void gen8_initialize_pt(struct i915_address_space *vm,
+			       struct i915_page_table *pt)
+{
+	gen8_pte_t scratch_pte;
+
+	scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+				      I915_CACHE_LLC, true);
+
+	fill_px(vm->dev, pt, scratch_pte);
+}
+
 static void gen6_initialize_pt(struct i915_address_space *vm,
 			       struct i915_page_table *pt)
 {
@@ -444,15 +444,6 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
 	fill32_px(vm->dev, pt, scratch_pte);
 }
 
-static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
-{
-	if (px_page(pd)) {
-		cleanup_px(dev, pd);
-		kfree(pd->used_pdes);
-		kfree(pd);
-	}
-}
-
 static struct i915_page_directory *alloc_pd(struct drm_device *dev)
 {
 	struct i915_page_directory *pd;
@@ -481,6 +472,15 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
+{
+	if (px_page(pd)) {
+		cleanup_px(dev, pd);
+		kfree(pd->used_pdes);
+		kfree(pd);
+	}
+}
+
 static void gen8_initialize_pd(struct i915_address_space *vm,
 			       struct i915_page_directory *pd)
 {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error
  2015-06-01 15:02   ` Joonas Lahtinen
@ 2015-06-15 10:13     ` Daniel Vetter
  0 siblings, 0 replies; 86+ messages in thread
From: Daniel Vetter @ 2015-06-15 10:13 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx, miku

On Mon, Jun 01, 2015 at 06:02:52PM +0300, Joonas Lahtinen wrote:
> On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
> > Free the scratch page if dma mapping fails.
> > 
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Queued for -next, thanks for the patch.
-Daniel

> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index c61de4a..a608b1b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -2191,8 +2191,10 @@ static int setup_scratch_page(struct drm_device *dev)
> >  #ifdef CONFIG_INTEL_IOMMU
> >  	dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
> >  				PCI_DMA_BIDIRECTIONAL);
> > -	if (pci_dma_mapping_error(dev->pdev, dma_addr))
> > +	if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
> > +		__free_page(page);
> >  		return -EINVAL;
> > +	}
> >  #else
> >  	dma_addr = page_to_phys(page);
> >  #endif
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator
  2015-06-02  9:56   ` Michel Thierry
@ 2015-06-15 10:14     ` Daniel Vetter
  0 siblings, 0 replies; 86+ messages in thread
From: Daniel Vetter @ 2015-06-15 10:14 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Jun 02, 2015 at 10:56:55AM +0100, Michel Thierry wrote:
> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> >We are always allocating a single page. No need to be verbose so
> >remove the suffix.
> >
> >Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> I saw another of your patches will take care of
> i915_dma_map_single/i915_dma_unmap_single...
> 
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma()
  2015-06-02 15:01     ` Ville Syrjälä
@ 2015-06-15 10:16       ` Daniel Vetter
  0 siblings, 0 replies; 86+ messages in thread
From: Daniel Vetter @ 2015-06-15 10:16 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Tue, Jun 02, 2015 at 06:01:27PM +0300, Ville Syrjälä wrote:
> On Tue, Jun 02, 2015 at 03:51:26PM +0100, Michel Thierry wrote:
> > On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
> > > When we setup page directories and tables, we point the entries
> > > to a to the next level scratch structure. Make this generic
> > > by introducing a fill_page_dma which maps and flushes. We also
> > > need 32 bit variant for legacy gens.
> > >
> > > v2: Fix flushes and handle valleyview (Ville)
> > >
> > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/i915_gem_gtt.c | 71 +++++++++++++++++++------------------
> > >   1 file changed, 37 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > index f747bd3..d020b5e 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > @@ -330,6 +330,31 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
> > >          memset(p, 0, sizeof(*p));
> > >   }
> > >
> > > +static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
> > > +                         const uint64_t val)
> > > +{
> > > +       int i;
> > > +       uint64_t * const vaddr = kmap_atomic(p->page);
> > > +
> > > +       for (i = 0; i < 512; i++)
> > > +               vaddr[i] = val;
> > > +
> > > +       if (!HAS_LLC(dev) && !IS_VALLEYVIEW(dev))
> > > +               drm_clflush_virt_range(vaddr, PAGE_SIZE);
> > 
> > Cherryview returns true to IS_VALLEYVIEW().
> > 
> > You can use(!HAS_LLC && IS_CHERRYVIEW) instead to flush in chv, but not 
> > in vlv... But to make it bxt-proof, (!HAS_LLC && INTEL_INFO(dev)->gen >= 
> > 8) is probably better.
> 
> Has someone actually confirmed that BXT needs the clflush?

Ping on this one ... I'd like to know the answer ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma
  2015-06-11 17:48     ` Mika Kuoppala
@ 2015-06-22 14:05       ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-22 14:05 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 6/11/2015 6:48 PM, Mika Kuoppala wrote:
> All our paging structures have struct page and dma address
> for that page.
>
> Add struct for page/dma address pairs and use it to make
> the setup and teardown for different paging structures
> identical.
>
> Include the page directory offset also in the struct for legacy
> gens. Rename it to clearly point out that it is offset into the
> ggtt.
>
> v2: Add comment about ggtt_offset (Michel)
>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c |   2 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 120 ++++++++++++++----------------------
>   drivers/gpu/drm/i915/i915_gem_gtt.h |  25 +++++---
>   3 files changed, 64 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 14f5d16..5a7a20a 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2248,7 +2248,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>   		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>
>   		seq_puts(m, "aliasing PPGTT:\n");
> -		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
> +		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.base.ggtt_offset);
>
>   		ppgtt->debug_dump(ppgtt, m);
>   	}
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5832f53..65ee92f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -301,52 +301,39 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
>   	return pte;
>   }
>
> -#define i915_dma_unmap_single(px, dev) \
> -	__i915_dma_unmap_single((px)->daddr, dev)
> -
> -static void __i915_dma_unmap_single(dma_addr_t daddr,
> -				    struct drm_device *dev)
> +static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   {
>   	struct device *device = &dev->pdev->dev;
>
> -	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
> -}
> -
> -/**
> - * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
> - * @px:	Page table/dir/etc to get a DMA map for
> - * @dev:	drm device
> - *
> - * Page table allocations are unified across all gens. They always require a
> - * single 4k allocation, as well as a DMA mapping. If we keep the structs
> - * symmetric here, the simple macro covers us for every page table type.
> - *
> - * Return: 0 if success.
> - */
> -#define i915_dma_map_single(px, dev) \
> -	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
> +	p->page = alloc_page(GFP_KERNEL);
> +	if (!p->page)
> +		return -ENOMEM;
>
> -static int i915_dma_map_page_single(struct page *page,
> -				    struct drm_device *dev,
> -				    dma_addr_t *daddr)
> -{
> -	struct device *device = &dev->pdev->dev;
> +	p->daddr = dma_map_page(device,
> +				p->page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
>
> -	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
> -	if (dma_mapping_error(device, *daddr))
> -		return -ENOMEM;
> +	if (dma_mapping_error(device, p->daddr)) {
> +		__free_page(p->page);
> +		return -EINVAL;
> +	}
>
>   	return 0;
>   }
>
> -static void unmap_and_free_pt(struct i915_page_table *pt,
> -			       struct drm_device *dev)
> +static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   {
> -	if (WARN_ON(!pt->page))
> +	if (WARN_ON(!p->page))
>   		return;
>
> -	i915_dma_unmap_single(pt, dev);
> -	__free_page(pt->page);
> +	dma_unmap_page(&dev->pdev->dev, p->daddr, 4096, PCI_DMA_BIDIRECTIONAL);
> +	__free_page(p->page);
> +	memset(p, 0, sizeof(*p));
> +}
> +
> +static void unmap_and_free_pt(struct i915_page_table *pt,
> +			       struct drm_device *dev)
> +{
> +	cleanup_page_dma(dev, &pt->base);
>   	kfree(pt->used_ptes);
>   	kfree(pt);
>   }
> @@ -357,7 +344,7 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>   	gen8_pte_t *pt_vaddr, scratch_pte;
>   	int i;
>
> -	pt_vaddr = kmap_atomic(pt->page);
> +	pt_vaddr = kmap_atomic(pt->base.page);
>   	scratch_pte = gen8_pte_encode(vm->scratch.addr,
>   				      I915_CACHE_LLC, true);
>
> @@ -386,19 +373,13 @@ static struct i915_page_table *alloc_pt(struct drm_device *dev)
>   	if (!pt->used_ptes)
>   		goto fail_bitmap;
>
> -	pt->page = alloc_page(GFP_KERNEL);
> -	if (!pt->page)
> -		goto fail_page;
> -
> -	ret = i915_dma_map_single(pt, dev);
> +	ret = setup_page_dma(dev, &pt->base);
>   	if (ret)
> -		goto fail_dma;
> +		goto fail_page_m;
>
>   	return pt;
>
> -fail_dma:
> -	__free_page(pt->page);
> -fail_page:
> +fail_page_m:
>   	kfree(pt->used_ptes);
>   fail_bitmap:
>   	kfree(pt);
> @@ -409,9 +390,8 @@ fail_bitmap:
>   static void unmap_and_free_pd(struct i915_page_directory *pd,
>   			      struct drm_device *dev)
>   {
> -	if (pd->page) {
> -		i915_dma_unmap_single(pd, dev);
> -		__free_page(pd->page);
> +	if (pd->base.page) {
> +		cleanup_page_dma(dev, &pd->base);
>   		kfree(pd->used_pdes);
>   		kfree(pd);
>   	}
> @@ -431,18 +411,12 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
>   	if (!pd->used_pdes)
>   		goto free_pd;
>
> -	pd->page = alloc_page(GFP_KERNEL);
> -	if (!pd->page)
> -		goto free_bitmap;
> -
> -	ret = i915_dma_map_single(pd, dev);
> +	ret = setup_page_dma(dev, &pd->base);
>   	if (ret)
> -		goto free_page;
> +		goto free_bitmap;
>
>   	return pd;
>
> -free_page:
> -	__free_page(pd->page);
>   free_bitmap:
>   	kfree(pd->used_pdes);
>   free_pd:
> @@ -523,10 +497,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>
>   		pt = pd->page_table[pde];
>
> -		if (WARN_ON(!pt->page))
> +		if (WARN_ON(!pt->base.page))
>   			continue;
>
> -		page_table = pt->page;
> +		page_table = pt->base.page;
>
>   		last_pte = pte + num_entries;
>   		if (last_pte > GEN8_PTES)
> @@ -573,7 +547,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   		if (pt_vaddr == NULL) {
>   			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
>   			struct i915_page_table *pt = pd->page_table[pde];
> -			struct page *page_table = pt->page;
> +			struct page *page_table = pt->base.page;
>
>   			pt_vaddr = kmap_atomic(page_table);
>   		}
> @@ -605,7 +579,7 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
>   			     struct drm_device *dev)
>   {
>   	gen8_pde_t entry =
> -		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
> +		gen8_pde_encode(dev, pt->base.daddr, I915_CACHE_LLC);
>   	*pde = entry;
>   }
>
> @@ -618,7 +592,7 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>   	struct i915_page_table *pt;
>   	int i;
>
> -	page_directory = kmap_atomic(pd->page);
> +	page_directory = kmap_atomic(pd->base.page);
>   	pt = ppgtt->scratch_pt;
>   	for (i = 0; i < I915_PDES; i++)
>   		/* Map the PDE to the page table */
> @@ -633,7 +607,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
>   {
>   	int i;
>
> -	if (!pd->page)
> +	if (!pd->base.page)
>   		return;
>
>   	for_each_set_bit(i, pd->used_pdes, I915_PDES) {
> @@ -884,7 +858,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	/* Allocations have completed successfully, so set the bitmaps, and do
>   	 * the mappings. */
>   	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -		gen8_pde_t *const page_directory = kmap_atomic(pd->page);
> +		gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
>   		struct i915_page_table *pt;
>   		uint64_t pd_len = gen8_clamp_pd(start, length);
>   		uint64_t pd_start = start;
> @@ -995,7 +969,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>   	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>   		u32 expected;
>   		gen6_pte_t *pt_vaddr;
> -		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
> +		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->base.daddr;
>   		pd_entry = readl(ppgtt->pd_addr + pde);
>   		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>
> @@ -1006,7 +980,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>   				   expected);
>   		seq_printf(m, "\tPDE: %x\n", pd_entry);
>
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
>   		for (pte = 0; pte < GEN6_PTES; pte+=4) {
>   			unsigned long va =
>   				(pde * PAGE_SIZE * GEN6_PTES) +
> @@ -1041,7 +1015,7 @@ static void gen6_write_pde(struct i915_page_directory *pd,
>   		container_of(pd, struct i915_hw_ppgtt, pd);
>   	u32 pd_entry;
>
> -	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->base.daddr);
>   	pd_entry |= GEN6_PDE_VALID;
>
>   	writel(pd_entry, ppgtt->pd_addr + pde);
> @@ -1066,9 +1040,9 @@ static void gen6_write_page_range(struct drm_i915_private *dev_priv,
>
>   static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>   {
> -	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
> +	BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
>
> -	return (ppgtt->pd.pd_offset / 64) << 16;
> +	return (ppgtt->pd.base.ggtt_offset / 64) << 16;
>   }
>
>   static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
> @@ -1231,7 +1205,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   		if (last_pte > GEN6_PTES)
>   			last_pte = GEN6_PTES;
>
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
>
>   		for (i = first_pte; i < last_pte; i++)
>   			pt_vaddr[i] = scratch_pte;
> @@ -1260,7 +1234,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   	pt_vaddr = NULL;
>   	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>   		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
>
>   		pt_vaddr[act_pte] =
>   			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -1288,7 +1262,7 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>   	scratch_pte = vm->pte_encode(vm->scratch.addr,
>   			I915_CACHE_LLC, true, 0);
>
> -	pt_vaddr = kmap_atomic(pt->page);
> +	pt_vaddr = kmap_atomic(pt->base.page);
>
>   	for (i = 0; i < GEN6_PTES; i++)
>   		pt_vaddr[i] = scratch_pte;
> @@ -1504,11 +1478,11 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>   	ppgtt->base.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
>   	ppgtt->debug_dump = gen6_dump_ppgtt;
>
> -	ppgtt->pd.pd_offset =
> +	ppgtt->pd.base.ggtt_offset =
>   		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
>
>   	ppgtt->pd_addr = (gen6_pte_t __iomem *)dev_priv->gtt.gsm +
> -		ppgtt->pd.pd_offset / sizeof(gen6_pte_t);
> +		ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
>
>   	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
>
> @@ -1519,7 +1493,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>   			 ppgtt->node.start / PAGE_SIZE);
>
>   	DRM_DEBUG("Adding PPGTT at offset %x\n",
> -		  ppgtt->pd.pd_offset << 10);
> +		  ppgtt->pd.base.ggtt_offset << 10);
>
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index da67542..0ccdf54 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -205,19 +205,26 @@ struct i915_vma {
>   #define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
>   };
>
> -struct i915_page_table {
> +struct i915_page_dma {
>   	struct page *page;
> -	dma_addr_t daddr;
> +	union {
> +		dma_addr_t daddr;
> +
> +		/* For gen6/gen7 only. This is the offset in the GGTT
> +		 * where the page directory entries for PPGTT begin
> +		 */
> +		uint32_t ggtt_offset;
> +	};
> +};
> +
> +struct i915_page_table {
> +	struct i915_page_dma base;
>
>   	unsigned long *used_ptes;
>   };
>
>   struct i915_page_directory {
> -	struct page *page; /* NULL for GEN6-GEN7 */
> -	union {
> -		uint32_t pd_offset;
> -		dma_addr_t daddr;
> -	};
> +	struct i915_page_dma base;
>
>   	unsigned long *used_pdes;
>   	struct i915_page_table *page_table[I915_PDES]; /* PDEs */
> @@ -472,8 +479,8 @@ static inline dma_addr_t
>   i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n)
>   {
>   	return test_bit(n, ppgtt->pdp.used_pdpes) ?
> -		ppgtt->pdp.page_directory[n]->daddr :
> -		ppgtt->scratch_pd->daddr;
> +		ppgtt->pdp.page_directory[n]->base.daddr :
> +		ppgtt->scratch_pd->base.daddr;
>   }
>
>   int i915_gem_gtt_init(struct drm_device *dev);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px
  2015-06-11 17:48     ` Mika Kuoppala
@ 2015-06-22 14:09       ` Michel Thierry
  2015-06-22 14:43         ` Daniel Vetter
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-06-22 14:09 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 6/11/2015 6:48 PM, Mika Kuoppala wrote:
> All the paging structures are now similar and mapped for
> dma. The unmapping is taken care of by common accessors, so
> don't overload the reader with such details.
>
> v2: Be consistent with goto labels (Michel)

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 40 ++++++++++++++++++-------------------
>   1 file changed, 19 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 65ee92f..048c701 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -330,8 +330,7 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   	memset(p, 0, sizeof(*p));
>   }
>
> -static void unmap_and_free_pt(struct i915_page_table *pt,
> -			       struct drm_device *dev)
> +static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   {
>   	cleanup_page_dma(dev, &pt->base);
>   	kfree(pt->used_ptes);
> @@ -387,8 +386,7 @@ fail_bitmap:
>   	return ERR_PTR(ret);
>   }
>
> -static void unmap_and_free_pd(struct i915_page_directory *pd,
> -			      struct drm_device *dev)
> +static void free_pd(struct drm_device *dev, struct i915_page_directory *pd)
>   {
>   	if (pd->base.page) {
>   		cleanup_page_dma(dev, &pd->base);
> @@ -409,17 +407,17 @@ static struct i915_page_directory *alloc_pd(struct drm_device *dev)
>   	pd->used_pdes = kcalloc(BITS_TO_LONGS(I915_PDES),
>   				sizeof(*pd->used_pdes), GFP_KERNEL);
>   	if (!pd->used_pdes)
> -		goto free_pd;
> +		goto fail_bitmap;
>
>   	ret = setup_page_dma(dev, &pd->base);
>   	if (ret)
> -		goto free_bitmap;
> +		goto fail_page_m;
>
>   	return pd;
>
> -free_bitmap:
> +fail_page_m:
>   	kfree(pd->used_pdes);
> -free_pd:
> +fail_bitmap:
>   	kfree(pd);
>
>   	return ERR_PTR(ret);
> @@ -614,7 +612,7 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
>   		if (WARN_ON(!pd->page_table[i]))
>   			continue;
>
> -		unmap_and_free_pt(pd->page_table[i], dev);
> +		free_pt(dev, pd->page_table[i]);
>   		pd->page_table[i] = NULL;
>   	}
>   }
> @@ -630,11 +628,11 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>   			continue;
>
>   		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
> -		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
> +		free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
>   	}
>
> -	unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
> -	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
> +	free_pd(ppgtt->base.dev, ppgtt->scratch_pd);
> +	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
>   }
>
>   /**
> @@ -687,7 +685,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>
>   unwind_out:
>   	for_each_set_bit(pde, new_pts, I915_PDES)
> -		unmap_and_free_pt(pd->page_table[pde], dev);
> +		free_pt(dev, pd->page_table[pde]);
>
>   	return -ENOMEM;
>   }
> @@ -745,7 +743,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
>
>   unwind_out:
>   	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
> -		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
> +		free_pd(dev, pdp->page_directory[pdpe]);
>
>   	return -ENOMEM;
>   }
> @@ -903,11 +901,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   err_out:
>   	while (pdpe--) {
>   		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
> -			unmap_and_free_pt(ppgtt->pdp.page_directory[pdpe]->page_table[temp], vm->dev);
> +			free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
>   	}
>
>   	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
> -		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
> +		free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
>
>   	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>   	mark_tlbs_dirty(ppgtt);
> @@ -1353,7 +1351,7 @@ unwind_out:
>   		struct i915_page_table *pt = ppgtt->pd.page_table[pde];
>
>   		ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
> -		unmap_and_free_pt(pt, vm->dev);
> +		free_pt(vm->dev, pt);
>   	}
>
>   	mark_tlbs_dirty(ppgtt);
> @@ -1372,11 +1370,11 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>
>   	gen6_for_all_pdes(pt, ppgtt, pde) {
>   		if (pt != ppgtt->scratch_pt)
> -			unmap_and_free_pt(pt, ppgtt->base.dev);
> +			free_pt(ppgtt->base.dev, pt);
>   	}
>
> -	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
> -	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
> +	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
> +	free_pd(ppgtt->base.dev, &ppgtt->pd);
>   }
>
>   static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
> @@ -1426,7 +1424,7 @@ alloc:
>   	return 0;
>
>   err_out:
> -	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
> +	free_pt(ppgtt->base.dev, ppgtt->scratch_pt);
>   	return ret;
>   }
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px
  2015-06-22 14:09       ` Michel Thierry
@ 2015-06-22 14:43         ` Daniel Vetter
  0 siblings, 0 replies; 86+ messages in thread
From: Daniel Vetter @ 2015-06-22 14:43 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Mon, Jun 22, 2015 at 03:09:27PM +0100, Michel Thierry wrote:
> On 6/11/2015 6:48 PM, Mika Kuoppala wrote:
> >All the paging structures are now similar and mapped for
> >dma. The unmapping is taken care of by common accessors, so
> >don't overload the reader with such details.
> >
> >v2: Be consistent with goto labels (Michel)
> 
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>

Just to make sure we don't have merge fail going on here: Some of the
earlier patches don't have an r-b yet, which means I can't pick up the
later ones either.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+
  2015-06-11 17:37     ` Mika Kuoppala
@ 2015-06-23 11:10       ` Joonas Lahtinen
  0 siblings, 0 replies; 86+ messages in thread
From: Joonas Lahtinen @ 2015-06-23 11:10 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On to, 2015-06-11 at 20:37 +0300, Mika Kuoppala wrote:
> When we touch gen8+ page maps, mark them dirty like we
> do with previous gens.
> 
> v2: Update comment (Joonas)
> 

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 619dad1..0a906e4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -830,6 +830,16 @@ err_out:
>  	return -ENOMEM;
>  }
>  
> +/* PDE TLBs are a pain to invalidate on GEN8+. When we modify
> + * the page table structures, we mark them dirty so that
> + * context switching/execlist queuing code takes extra steps
> + * to ensure that tlbs are flushed.
> + */
> +static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> +{
> +	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> +}
> +
>  static int gen8_alloc_va_range(struct i915_address_space *vm,
>  			       uint64_t start,
>  			       uint64_t length)
> @@ -915,6 +925,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>  	}
>  
>  	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +	mark_tlbs_dirty(ppgtt);
>  	return 0;
>  
>  err_out:
> @@ -927,6 +938,7 @@ err_out:
>  		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
>  
>  	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +	mark_tlbs_dirty(ppgtt);
>  	return ret;
>  }
>  
> @@ -1267,16 +1279,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  		kunmap_atomic(pt_vaddr);
>  }
>  
> -/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> - * are switching between contexts with the same LRCA, we also must do a force
> - * restore.
> - */
> -static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> -{
> -	/* If current vm != vm, */
> -	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> -}
> -
>  static void gen6_initialize_pt(struct i915_address_space *vm,
>  		struct i915_page_table *pt)
>  {


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 03/21] drm/i915/gtt: Check va range against vm size
  2015-06-11 14:23     ` Mika Kuoppala
@ 2015-06-24 14:48       ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-24 14:48 UTC (permalink / raw)
  To: Mika Kuoppala, joonas.lahtinen; +Cc: intel-gfx

On 6/11/2015 3:23 PM, Mika Kuoppala wrote:
> Joonas Lahtinen <joonas.lahtinen@linux.intel.com> writes:
>
>> On pe, 2015-05-22 at 20:04 +0300, Mika Kuoppala wrote:
>>> Check the allocation area against the known end
>>> of address space instead of against fixed value.
>>>
>>> v2: Return ENODEV on internal bugs (Chris)
>>>
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 18 +++++++++++-------
>>>   1 file changed, 11 insertions(+), 7 deletions(-)
>>>
>>
>> Why is it not enough just to change the WARN_ON test?
>>
>
> Might have be cleaner yes. I just wanted to keep the pde iteration
> loop using 32bit arithmetic like it used to.

Unless Joonas has any more comments,

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> -Mika
>
>>>
>>>       bitmap_zero(new_page_tables, I915_PDES);
>>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible
  2015-06-11 16:30     ` Mika Kuoppala
@ 2015-06-24 14:59       ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-24 14:59 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 6/11/2015 5:30 PM, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> On 5/22/2015 6:05 PM, Mika Kuoppala wrote:
>>> Lay out scratch page structure in similar manner than other
>>> paging structures. This allows us to use the same tools for
>>> setup and teardown.
>>>
>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_gtt.c | 89 ++++++++++++++++++++-----------------
>>>    drivers/gpu/drm/i915/i915_gem_gtt.h |  9 ++--
>>>    2 files changed, 54 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> index 4f9a000..43fa543 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> @@ -301,11 +301,12 @@ static gen6_pte_t iris_pte_encode(dma_addr_t addr,
>>>           return pte;
>>>    }
>>>
>>> -static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>>> +static int __setup_page_dma(struct drm_device *dev,
>>> +                           struct i915_page_dma *p, gfp_t flags)
>>>    {
>>>           struct device *device = &dev->pdev->dev;
>>>
>>> -       p->page = alloc_page(GFP_KERNEL);
>>> +       p->page = alloc_page(flags);
>>>           if (!p->page)
>>>                   return -ENOMEM;
>>>
>>> @@ -320,6 +321,11 @@ static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>>>           return 0;
>>>    }
>>>
>>> +static int setup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>>> +{
>>> +       return __setup_page_dma(dev, p, GFP_KERNEL);
>>> +}
>>> +
>>>    static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>>>    {
>>>           if (WARN_ON(!p->page))
>>> @@ -388,7 +394,8 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
>>>    {
>>>           gen8_pte_t scratch_pte;
>>>
>>> -       scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
>>> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>>> +                                     I915_CACHE_LLC, true);
>>>
>>>           fill_px(vm->dev, pt, scratch_pte);
>>>    }
>>> @@ -515,7 +522,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>>>           unsigned num_entries = length >> PAGE_SHIFT;
>>>           unsigned last_pte, i;
>>>
>>> -       scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
>>> +       scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
>>>                                         I915_CACHE_LLC, use_scratch);
>>>
>>>           while (num_entries) {
>>> @@ -1021,7 +1028,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>>           uint32_t  pte, pde, temp;
>>>           uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
>>>
>>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page), I915_CACHE_LLC, true, 0);
>>>
>>>           gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
>>>                   u32 expected;
>>> @@ -1256,7 +1263,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>>>           unsigned first_pte = first_entry % GEN6_PTES;
>>>           unsigned last_pte, i;
>>>
>>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
>>> +                                    I915_CACHE_LLC, true, 0);
>>>
>>>           while (num_entries) {
>>>                   last_pte = first_pte + num_entries;
>>> @@ -1314,9 +1322,10 @@ static void gen6_initialize_pt(struct i915_address_space *vm,
>>>    {
>>>           gen6_pte_t scratch_pte;
>>>
>>> -       WARN_ON(vm->scratch.addr == 0);
>>> +       WARN_ON(px_dma(vm->scratch_page) == 0);
>>>
>>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
>>> +                                    I915_CACHE_LLC, true, 0);
>>>
>>>           fill32_px(vm->dev, pt, scratch_pte);
>>>    }
>>> @@ -1553,13 +1562,14 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>>>           struct drm_i915_private *dev_priv = dev->dev_private;
>>>
>>>           ppgtt->base.dev = dev;
>>> -       ppgtt->base.scratch = dev_priv->gtt.base.scratch;
>>> +       ppgtt->base.scratch_page = dev_priv->gtt.base.scratch_page;
>>>
>>>           if (INTEL_INFO(dev)->gen < 8)
>>>                   return gen6_ppgtt_init(ppgtt);
>>>           else
>>>                   return gen8_ppgtt_init(ppgtt);
>>>    }
>>> +
>>>    int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>>>    {
>>>           struct drm_i915_private *dev_priv = dev->dev_private;
>>> @@ -1874,7 +1884,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>>>                    first_entry, num_entries, max_entries))
>>>                   num_entries = max_entries;
>>>
>>> -       scratch_pte = gen8_pte_encode(vm->scratch.addr,
>>> +       scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>>>                                         I915_CACHE_LLC,
>>>                                         use_scratch);
>>>           for (i = 0; i < num_entries; i++)
>>> @@ -1900,7 +1910,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>>>                    first_entry, num_entries, max_entries))
>>>                   num_entries = max_entries;
>>>
>>> -       scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, use_scratch, 0);
>>> +       scratch_pte = vm->pte_encode(px_dma(vm->scratch_page),
>>> +                                    I915_CACHE_LLC, use_scratch, 0);
>>>
>>>           for (i = 0; i < num_entries; i++)
>>>                   iowrite32(scratch_pte, &gtt_base[i]);
>>> @@ -2157,42 +2168,40 @@ void i915_global_gtt_cleanup(struct drm_device *dev)
>>>           vm->cleanup(vm);
>>>    }
>>>
>>> -static int setup_scratch_page(struct drm_device *dev)
>>> +static int alloc_scratch_page(struct i915_address_space *vm)
>>>    {
>>> -       struct drm_i915_private *dev_priv = dev->dev_private;
>>> -       struct page *page;
>>> -       dma_addr_t dma_addr;
>>> +       struct i915_page_scratch *sp;
>>> +       int ret;
>>> +
>>> +       WARN_ON(vm->scratch_page);
>>>
>>> -       page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
>>> -       if (page == NULL)
>>> +       sp = kzalloc(sizeof(*sp), GFP_KERNEL);
>>> +       if (sp == NULL)
>>>                   return -ENOMEM;
>>> -       set_pages_uc(page, 1);
>>>
>>> -#ifdef CONFIG_INTEL_IOMMU
>>> -       dma_addr = pci_map_page(dev->pdev, page, 0, PAGE_SIZE,
>>> -                               PCI_DMA_BIDIRECTIONAL);
>>> -       if (pci_dma_mapping_error(dev->pdev, dma_addr)) {
>>> -               __free_page(page);
>>> -               return -EINVAL;
>>> +       ret = __setup_page_dma(vm->dev, px_base(sp), GFP_DMA32 | __GFP_ZERO);
>>> +       if (ret) {
>>> +               kfree(sp);
>>> +               return ret;
>>>           }
>>> -#else
>>> -       dma_addr = page_to_phys(page);
>>> -#endif
>>
>> Should we keep a no-iommu option?
>> This seems to have been added for gen6 (gtt).
>>
>
> The dma_map_page should do the right thing with and
> without iommu. I really don't understand why we
> would need the no-iommu option.
>
> If there is no iommu, we get nommu_map_page()
> which is effectively page_to_phys().

Thanks for clear this up.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

>
> -Mika
>
>
>>> -       dev_priv->gtt.base.scratch.page = page;
>>> -       dev_priv->gtt.base.scratch.addr = dma_addr;
>>> +
>>> +       set_pages_uc(px_page(sp), 1);
>>> +
>>> +       vm->scratch_page = sp;
>>>
>>>           return 0;
>>>    }
>>>
>>> -static void teardown_scratch_page(struct drm_device *dev)
>>> +static void free_scratch_page(struct i915_address_space *vm)
>>>    {
>>> -       struct drm_i915_private *dev_priv = dev->dev_private;
>>> -       struct page *page = dev_priv->gtt.base.scratch.page;
>>> +       struct i915_page_scratch *sp = vm->scratch_page;
>>>
>>> -       set_pages_wb(page, 1);
>>> -       pci_unmap_page(dev->pdev, dev_priv->gtt.base.scratch.addr,
>>> -                      PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>>> -       __free_page(page);
>>> +       set_pages_wb(px_page(sp), 1);
>>> +
>>> +       cleanup_px(vm->dev, sp);
>>> +       kfree(sp);
>>> +
>>> +       vm->scratch_page = NULL;
>>>    }
>>>
>>>    static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
>>> @@ -2300,7 +2309,7 @@ static int ggtt_probe_common(struct drm_device *dev,
>>>                   return -ENOMEM;
>>>           }
>>>
>>> -       ret = setup_scratch_page(dev);
>>> +       ret = alloc_scratch_page(&dev_priv->gtt.base);
>>>           if (ret) {
>>>                   DRM_ERROR("Scratch setup failed\n");
>>>                   /* iounmap will also get called at remove, but meh */
>>> @@ -2479,7 +2488,7 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>>>           struct i915_gtt *gtt = container_of(vm, struct i915_gtt, base);
>>>
>>>           iounmap(gtt->gsm);
>>> -       teardown_scratch_page(vm->dev);
>>> +       free_scratch_page(vm);
>>>    }
>>>
>>>    static int i915_gmch_probe(struct drm_device *dev,
>>> @@ -2543,13 +2552,13 @@ int i915_gem_gtt_init(struct drm_device *dev)
>>>                   dev_priv->gtt.base.cleanup = gen6_gmch_remove;
>>>           }
>>>
>>> +       gtt->base.dev = dev;
>>> +
>>>           ret = gtt->gtt_probe(dev, &gtt->base.total, &gtt->stolen_size,
>>>                                &gtt->mappable_base, &gtt->mappable_end);
>>>           if (ret)
>>>                   return ret;
>>>
>>> -       gtt->base.dev = dev;
>>> -
>>>           /* GMADR is the PCI mmio aperture into the global GTT. */
>>>           DRM_INFO("Memory usable by graphics device = %lluM\n",
>>>                    gtt->base.total >> 20);
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> index 006b839..1fd4041 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> @@ -217,6 +217,10 @@ struct i915_page_dma {
>>>    #define px_page(px) (px_base(px)->page)
>>>    #define px_dma(px) (px_base(px)->daddr)
>>>
>>> +struct i915_page_scratch {
>>> +       struct i915_page_dma base;
>>> +};
>>> +
>>>    struct i915_page_table {
>>>           struct i915_page_dma base;
>>>
>>> @@ -243,10 +247,7 @@ struct i915_address_space {
>>>           u64 start;              /* Start offset always 0 for dri2 */
>>>           u64 total;              /* size addr space maps (ex. 2GB for ggtt) */
>>>
>>> -       struct {
>>> -               dma_addr_t addr;
>>> -               struct page *page;
>>> -       } scratch;
>>> +       struct i915_page_scratch *scratch_page;
>>>
>>>           /**
>>>            * List of objects currently involved in rendering.
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma()
  2015-06-11 17:50     ` Mika Kuoppala
@ 2015-06-24 15:05       ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-24 15:05 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 6/11/2015 6:50 PM, Mika Kuoppala wrote:
> When we setup page directories and tables, we point the entries
> to a to the next level scratch structure. Make this generic
> by introducing a fill_page_dma which maps and flushes. We also
> need 32 bit variant for legacy gens.
>
> v2: Fix flushes and handle valleyview (Ville)
> v3: Now really fix flushes (Michel, Ville)

Daniel, I'll do some testing in bxt this week.

Having said that, before this patch the code was doing the flush in bxt, 
so it doesn't change the current behavior...

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 74 ++++++++++++++++++++-----------------
>   1 file changed, 40 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 698423b..60796b7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -330,6 +330,34 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   	memset(p, 0, sizeof(*p));
>   }
>
> +static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
> +			  const uint64_t val)
> +{
> +	int i;
> +	uint64_t * const vaddr = kmap_atomic(p->page);
> +
> +	for (i = 0; i < 512; i++)
> +		vaddr[i] = val;
> +
> +	/* There are only few exceptions for gen >=6. chv and bxt.
> +	 * And we are not sure about the latter so play safe for now.
> +	 */
> +	if (IS_CHERRYVIEW(dev) || IS_BROXTON(dev))
> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
> +
> +	kunmap_atomic(vaddr);
> +}
> +
> +static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
> +			     const uint32_t val32)
> +{
> +	uint64_t v = val32;
> +
> +	v = v << 32 | val32;
> +
> +	fill_page_dma(dev, p, v);
> +}
> +
>   static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   {
>   	cleanup_page_dma(dev, &pt->base);
> @@ -340,19 +368,11 @@ static void free_pt(struct drm_device *dev, struct i915_page_table *pt)
>   static void gen8_initialize_pt(struct i915_address_space *vm,
>   			       struct i915_page_table *pt)
>   {
> -	gen8_pte_t *pt_vaddr, scratch_pte;
> -	int i;
> -
> -	pt_vaddr = kmap_atomic(pt->base.page);
> -	scratch_pte = gen8_pte_encode(vm->scratch.addr,
> -				      I915_CACHE_LLC, true);
> +	gen8_pte_t scratch_pte;
>
> -	for (i = 0; i < GEN8_PTES; i++)
> -		pt_vaddr[i] = scratch_pte;
> +	scratch_pte = gen8_pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
>
> -	if (!HAS_LLC(vm->dev))
> -		drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -	kunmap_atomic(pt_vaddr);
> +	fill_page_dma(vm->dev, &pt->base, scratch_pte);
>   }
>
>   static struct i915_page_table *alloc_pt(struct drm_device *dev)
> @@ -585,20 +605,13 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
>   			       struct i915_page_directory *pd)
>   {
>   	struct i915_hw_ppgtt *ppgtt =
> -			container_of(vm, struct i915_hw_ppgtt, base);
> -	gen8_pde_t *page_directory;
> -	struct i915_page_table *pt;
> -	int i;
> +		container_of(vm, struct i915_hw_ppgtt, base);
> +	gen8_pde_t scratch_pde;
>
> -	page_directory = kmap_atomic(pd->base.page);
> -	pt = ppgtt->scratch_pt;
> -	for (i = 0; i < I915_PDES; i++)
> -		/* Map the PDE to the page table */
> -		__gen8_do_map_pt(page_directory + i, pt, vm->dev);
> +	scratch_pde = gen8_pde_encode(vm->dev, ppgtt->scratch_pt->base.daddr,
> +				      I915_CACHE_LLC);
>
> -	if (!HAS_LLC(vm->dev))
> -		drm_clflush_virt_range(page_directory, PAGE_SIZE);
> -	kunmap_atomic(page_directory);
> +	fill_page_dma(vm->dev, &pd->base, scratch_pde);
>   }
>
>   static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
> @@ -1250,22 +1263,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   }
>
>   static void gen6_initialize_pt(struct i915_address_space *vm,
> -		struct i915_page_table *pt)
> +			       struct i915_page_table *pt)
>   {
> -	gen6_pte_t *pt_vaddr, scratch_pte;
> -	int i;
> +	gen6_pte_t scratch_pte;
>
>   	WARN_ON(vm->scratch.addr == 0);
>
> -	scratch_pte = vm->pte_encode(vm->scratch.addr,
> -			I915_CACHE_LLC, true, 0);
> -
> -	pt_vaddr = kmap_atomic(pt->base.page);
> -
> -	for (i = 0; i < GEN6_PTES; i++)
> -		pt_vaddr[i] = scratch_pte;
> +	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>
> -	kunmap_atomic(pt_vaddr);
> +	fill_page_dma_32(vm->dev, &pt->base, scratch_pte);
>   }
>
>   static int gen6_alloc_va_range(struct i915_address_space *vm,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page
  2015-06-11 17:50     ` Mika Kuoppala
@ 2015-06-24 15:06       ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-06-24 15:06 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

On 6/11/2015 6:50 PM, Mika Kuoppala wrote:
> As there is flushing involved when we have done the cpu
> write, make functions for mapping for cpu space. Make macros
> to map any type of paging structure.
>
> v2: Make it clear tha flushing kunmap is only for ppgtt (Ville)
> v3: Flushing fixed (Ville, Michel). Removed superfluous semicolon
>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 77 +++++++++++++++++++------------------
>   1 file changed, 40 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 60796b7..3ac8671 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -330,15 +330,16 @@ static void cleanup_page_dma(struct drm_device *dev, struct i915_page_dma *p)
>   	memset(p, 0, sizeof(*p));
>   }
>
> -static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
> -			  const uint64_t val)
> +static void *kmap_page_dma(struct i915_page_dma *p)
>   {
> -	int i;
> -	uint64_t * const vaddr = kmap_atomic(p->page);
> -
> -	for (i = 0; i < 512; i++)
> -		vaddr[i] = val;
> +	return kmap_atomic(p->page);
> +}
>
> +/* We use the flushing unmap only with ppgtt structures:
> + * page directories, page tables and scratch pages.
> + */
> +static void kunmap_page_dma(struct drm_device *dev, void *vaddr)
> +{
>   	/* There are only few exceptions for gen >=6. chv and bxt.
>   	 * And we are not sure about the latter so play safe for now.
>   	 */
> @@ -348,6 +349,21 @@ static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
>   	kunmap_atomic(vaddr);
>   }
>
> +#define kmap_px(px) kmap_page_dma(&(px)->base)
> +#define kunmap_px(ppgtt, vaddr) kunmap_page_dma((ppgtt)->base.dev, (vaddr))
> +
> +static void fill_page_dma(struct drm_device *dev, struct i915_page_dma *p,
> +			  const uint64_t val)
> +{
> +	int i;
> +	uint64_t * const vaddr = kmap_page_dma(p);
> +
> +	for (i = 0; i < 512; i++)
> +		vaddr[i] = val;
> +
> +	kunmap_page_dma(dev, vaddr);
> +}
> +
>   static void fill_page_dma_32(struct drm_device *dev, struct i915_page_dma *p,
>   			     const uint32_t val32)
>   {
> @@ -503,7 +519,6 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>   	while (num_entries) {
>   		struct i915_page_directory *pd;
>   		struct i915_page_table *pt;
> -		struct page *page_table;
>
>   		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
>   			continue;
> @@ -518,22 +533,18 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>   		if (WARN_ON(!pt->base.page))
>   			continue;
>
> -		page_table = pt->base.page;
> -
>   		last_pte = pte + num_entries;
>   		if (last_pte > GEN8_PTES)
>   			last_pte = GEN8_PTES;
>
> -		pt_vaddr = kmap_atomic(page_table);
> +		pt_vaddr = kmap_px(pt);
>
>   		for (i = pte; i < last_pte; i++) {
>   			pt_vaddr[i] = scratch_pte;
>   			num_entries--;
>   		}
>
> -		if (!HAS_LLC(ppgtt->base.dev))
> -			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -		kunmap_atomic(pt_vaddr);
> +		kunmap_px(ppgtt, pt);
>
>   		pte = 0;
>   		if (++pde == I915_PDES) {
> @@ -565,18 +576,14 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   		if (pt_vaddr == NULL) {
>   			struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
>   			struct i915_page_table *pt = pd->page_table[pde];
> -			struct page *page_table = pt->base.page;
> -
> -			pt_vaddr = kmap_atomic(page_table);
> +			pt_vaddr = kmap_px(pt);
>   		}
>
>   		pt_vaddr[pte] =
>   			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
>   					cache_level, true);
>   		if (++pte == GEN8_PTES) {
> -			if (!HAS_LLC(ppgtt->base.dev))
> -				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -			kunmap_atomic(pt_vaddr);
> +			kunmap_px(ppgtt, pt_vaddr);
>   			pt_vaddr = NULL;
>   			if (++pde == I915_PDES) {
>   				pdpe++;
> @@ -585,11 +592,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>   			pte = 0;
>   		}
>   	}
> -	if (pt_vaddr) {
> -		if (!HAS_LLC(ppgtt->base.dev))
> -			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
> -		kunmap_atomic(pt_vaddr);
> -	}
> +
> +	if (pt_vaddr)
> +		kunmap_px(ppgtt, pt_vaddr);
>   }
>
>   static void __gen8_do_map_pt(gen8_pde_t * const pde,
> @@ -869,7 +874,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   	/* Allocations have completed successfully, so set the bitmaps, and do
>   	 * the mappings. */
>   	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -		gen8_pde_t *const page_directory = kmap_atomic(pd->base.page);
> +		gen8_pde_t *const page_directory = kmap_px(pd);
>   		struct i915_page_table *pt;
>   		uint64_t pd_len = gen8_clamp_pd(start, length);
>   		uint64_t pd_start = start;
> @@ -899,10 +904,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>   			 * point we're still relying on insert_entries() */
>   		}
>
> -		if (!HAS_LLC(vm->dev))
> -			drm_clflush_virt_range(page_directory, PAGE_SIZE);
> -
> -		kunmap_atomic(page_directory);
> +		kunmap_px(ppgtt, page_directory);
>
>   		set_bit(pdpe, ppgtt->pdp.used_pdpes);
>   	}
> @@ -991,7 +993,8 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>   				   expected);
>   		seq_printf(m, "\tPDE: %x\n", pd_entry);
>
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->base.page);
> +		pt_vaddr = kmap_px(ppgtt->pd.page_table[pde]);
> +
>   		for (pte = 0; pte < GEN6_PTES; pte+=4) {
>   			unsigned long va =
>   				(pde * PAGE_SIZE * GEN6_PTES) +
> @@ -1013,7 +1016,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>   			}
>   			seq_puts(m, "\n");
>   		}
> -		kunmap_atomic(pt_vaddr);
> +		kunmap_px(ppgtt, pt_vaddr);
>   	}
>   }
>
> @@ -1216,12 +1219,12 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>   		if (last_pte > GEN6_PTES)
>   			last_pte = GEN6_PTES;
>
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
> +		pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
>
>   		for (i = first_pte; i < last_pte; i++)
>   			pt_vaddr[i] = scratch_pte;
>
> -		kunmap_atomic(pt_vaddr);
> +		kunmap_px(ppgtt, pt_vaddr);
>
>   		num_entries -= last_pte - first_pte;
>   		first_pte = 0;
> @@ -1245,21 +1248,21 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   	pt_vaddr = NULL;
>   	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>   		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->base.page);
> +			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
>
>   		pt_vaddr[act_pte] =
>   			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
>   				       cache_level, true, flags);
>
>   		if (++act_pte == GEN6_PTES) {
> -			kunmap_atomic(pt_vaddr);
> +			kunmap_px(ppgtt, pt_vaddr);
>   			pt_vaddr = NULL;
>   			act_pt++;
>   			act_pte = 0;
>   		}
>   	}
>   	if (pt_vaddr)
> -		kunmap_atomic(pt_vaddr);
> +		kunmap_px(ppgtt, pt_vaddr);
>   }
>
>   static void gen6_initialize_pt(struct i915_address_space *vm,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-06-11 13:57           ` Mika Kuoppala
@ 2015-08-11  5:05             ` Zhiyuan Lv
  2015-08-12  7:56               ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Zhiyuan Lv @ 2015-08-11  5:05 UTC (permalink / raw)
  To: Mika Kuoppala, Dave Gordon, Michel Thierry; +Cc: intel-gfx

Hi Mika/Dave/Michel,

I saw the patch of using LRI for root pointer update has been merged to
drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
with XenGT, we may still need Mika's this patch like below:

"
        if (intel_vgpu_active(ppgtt->base.dev))
                gen8_preallocate_top_level_pdps(ppgtt);
"

Could you share with us your opinion? Thanks in advance!

The reason behind is that LRI command will make shadow PPGTT implementation
hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
driver, and then track every guest page table change in order to update shadow
page table accordingly. The problem of page table updates with GPU command is
that they cannot be trapped by hypervisor to finish the shadow page table
update work. In XenGT, the only change we have is the command scan in context
submission. But that is not exactly the right time to do shadow page table
update. 

Mika's patch can address the problem nicely. With the preallocation, the root
pointers in EXECLIST context will always keep the same. Then we can treat any
attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!

Regards,
-Zhiyuan

On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
> Dave Gordon <david.s.gordon@intel.com> writes:
> 
> > On 10/06/15 12:42, Michel Thierry wrote:
> >> On 5/29/2015 1:53 PM, Michel Thierry wrote:
> >>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
> >>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> >>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
> >>>>> pdps. However the TLB invalidation only has effect on levels below
> >>>>> the pdps. This means that if pdps change, hw might access with
> >>>>> stale pdp entry.
> >>>>>
> >>>>> To combat this problem, preallocate the top pdps so that hw sees
> >>>>> them as immutable for each context.
> >>>>>
> >>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
> >>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>> ---
> >>>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50
> >>>>> +++++++++++++++++++++++++++++++++++++
> >>>>>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
> >>>>>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
> >>>>>   3 files changed, 68 insertions(+), 14 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>> index 0ffd459..1a5ad4c 100644
> >>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>> @@ -941,6 +941,48 @@ err_out:
> >>>>>          return ret;
> >>>>>   }
> >>>>>
> >>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
> >>>>> the
> >>>>> + * top level pdps but the tlb invalidation only invalidates the
> >>>>> lower levels.
> >>>>> + * This might lead to hw fetching with stale pdp entries if top level
> >>>>> + * structure changes, ie va space grows with dynamic page tables.
> >>>>> + */
> >
> > Is this still necessary if we reload PDPs via LRI instructions whenever
> > the address map has changed? That always (AFAICT) causes sufficient
> > invalidation, so then we might not need to preallocate at all :)
> >
> 
> LRI reload gets my vote. Please ignore this patch.
> -Mika
> 
> > .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-11  5:05             ` Zhiyuan Lv
@ 2015-08-12  7:56               ` Michel Thierry
  2015-08-12 15:09                 ` Dave Gordon
  2015-08-13  9:08                 ` Zhiyuan Lv
  0 siblings, 2 replies; 86+ messages in thread
From: Michel Thierry @ 2015-08-12  7:56 UTC (permalink / raw)
  To: Mika Kuoppala, Dave Gordon, Wang, Zhi A, Tian, Kevin, intel-gfx

On 8/11/2015 1:05 PM, Zhiyuan Lv wrote:
> Hi Mika/Dave/Michel,
>
> I saw the patch of using LRI for root pointer update has been merged to
> drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
> with XenGT, we may still need Mika's this patch like below:
>
> "
>          if (intel_vgpu_active(ppgtt->base.dev))
>                  gen8_preallocate_top_level_pdps(ppgtt);
> "
>
> Could you share with us your opinion? Thanks in advance!

Hi Zhiyuan,

The change looks ok to me. If you need to preallocate the PDPs, 
gen8_ppgtt_init is the right place to do it. Only add a similar 
vgpu_active check to disable the LRI updates (in gen8_emit_bb_start).

>
> The reason behind is that LRI command will make shadow PPGTT implementation
> hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
> driver, and then track every guest page table change in order to update shadow
> page table accordingly. The problem of page table updates with GPU command is
> that they cannot be trapped by hypervisor to finish the shadow page table
> update work. In XenGT, the only change we have is the command scan in context
> submission. But that is not exactly the right time to do shadow page table
> update.
>
> Mika's patch can address the problem nicely. With the preallocation, the root
> pointers in EXECLIST context will always keep the same. Then we can treat any
> attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!
>
> Regards,
> -Zhiyuan
>
> On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
>> Dave Gordon <david.s.gordon@intel.com> writes:
>>
>>> On 10/06/15 12:42, Michel Thierry wrote:
>>>> On 5/29/2015 1:53 PM, Michel Thierry wrote:
>>>>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>>>>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>>>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>>>>>> pdps. However the TLB invalidation only has effect on levels below
>>>>>>> the pdps. This means that if pdps change, hw might access with
>>>>>>> stale pdp entry.
>>>>>>>
>>>>>>> To combat this problem, preallocate the top pdps so that hw sees
>>>>>>> them as immutable for each context.
>>>>>>>
>>>>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>>>    drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>>>>>    drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>>>>>    3 files changed, 68 insertions(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>> index 0ffd459..1a5ad4c 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>> @@ -941,6 +941,48 @@ err_out:
>>>>>>>           return ret;
>>>>>>>    }
>>>>>>>
>>>>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>>>>>> the
>>>>>>> + * top level pdps but the tlb invalidation only invalidates the
>>>>>>> lower levels.
>>>>>>> + * This might lead to hw fetching with stale pdp entries if top level
>>>>>>> + * structure changes, ie va space grows with dynamic page tables.
>>>>>>> + */
>>>
>>> Is this still necessary if we reload PDPs via LRI instructions whenever
>>> the address map has changed? That always (AFAICT) causes sufficient
>>> invalidation, so then we might not need to preallocate at all :)
>>>
>>
>> LRI reload gets my vote. Please ignore this patch.
>> -Mika
>>
>>> .Dave.
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-12  7:56               ` Michel Thierry
@ 2015-08-12 15:09                 ` Dave Gordon
  2015-08-13  9:36                   ` Zhiyuan Lv
  2015-08-13  9:08                 ` Zhiyuan Lv
  1 sibling, 1 reply; 86+ messages in thread
From: Dave Gordon @ 2015-08-12 15:09 UTC (permalink / raw)
  To: Thierry, Michel, Mika Kuoppala, Wang, Zhi A, Tian, Kevin; +Cc: intel-gfx

On 12/08/15 08:56, Thierry, Michel wrote:
> On 8/11/2015 1:05 PM, Zhiyuan Lv wrote:
>> Hi Mika/Dave/Michel,
>>
>> I saw the patch of using LRI for root pointer update has been merged to
>> drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
>> with XenGT, we may still need Mika's this patch like below:
>>
>> "
>>           if (intel_vgpu_active(ppgtt->base.dev))
>>                   gen8_preallocate_top_level_pdps(ppgtt);
>> "
>>
>> Could you share with us your opinion? Thanks in advance!
>
> Hi Zhiyuan,
>
> The change looks ok to me. If you need to preallocate the PDPs,
> gen8_ppgtt_init is the right place to do it. Only add a similar
> vgpu_active check to disable the LRI updates (in gen8_emit_bb_start).
>
>> The reason behind is that LRI command will make shadow PPGTT implementation
>> hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
>> driver, and then track every guest page table change in order to update shadow
>> page table accordingly. The problem of page table updates with GPU command is
>> that they cannot be trapped by hypervisor to finish the shadow page table
>> update work. In XenGT, the only change we have is the command scan in context
>> submission. But that is not exactly the right time to do shadow page table
>> update.
>>
>> Mika's patch can address the problem nicely. With the preallocation, the root
>> pointers in EXECLIST context will always keep the same. Then we can treat any
>> attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!
>>
>> Regards,
>> -Zhiyuan

The bad thing that was happening if we didn't use LRIs was that the CPU 
would try to push the new mappings to the GPU by updating PDP registers 
in the saved context image. This is unsafe if the context is running, as 
switching away from it would result in the CPU-updated values being 
overwritten by the older values in the GPU h/w registers (if the context 
were known to be idle, then it would be safe).

Preallocating the top-level PDPs should mean that the values need never 
change, so there's then no need to update the context image, thus 
avoiding the write hazard :)

.Dave.

>> On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
>>> Dave Gordon <david.s.gordon@intel.com> writes:
>>>
>>>> On 10/06/15 12:42, Michel Thierry wrote:
>>>>> On 5/29/2015 1:53 PM, Michel Thierry wrote:
>>>>>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>>>>>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>>>>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>>>>>>> pdps. However the TLB invalidation only has effect on levels below
>>>>>>>> the pdps. This means that if pdps change, hw might access with
>>>>>>>> stale pdp entry.
>>>>>>>>
>>>>>>>> To combat this problem, preallocate the top pdps so that hw sees
>>>>>>>> them as immutable for each context.
>>>>>>>>
>>>>>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>>>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>>>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>>>>     drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>>>>>>     3 files changed, 68 insertions(+), 14 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>> index 0ffd459..1a5ad4c 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>> @@ -941,6 +941,48 @@ err_out:
>>>>>>>>            return ret;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>>>>>>> + * the top level pdps but the tlb invalidation only invalidates the
>>>>>>>> + * lower levels.
>>>>>>>> + * This might lead to hw fetching with stale pdp entries if top level
>>>>>>>> + * structure changes, ie va space grows with dynamic page tables.
>>>>>>>> + */
>>>>
>>>> Is this still necessary if we reload PDPs via LRI instructions whenever
>>>> the address map has changed? That always (AFAICT) causes sufficient
>>>> invalidation, so then we might not need to preallocate at all :)
>>>
>>> LRI reload gets my vote. Please ignore this patch.
>>> -Mika
>>>
>>>> .Dave.
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-12  7:56               ` Michel Thierry
  2015-08-12 15:09                 ` Dave Gordon
@ 2015-08-13  9:08                 ` Zhiyuan Lv
  2015-08-13 10:12                   ` Michel Thierry
  1 sibling, 1 reply; 86+ messages in thread
From: Zhiyuan Lv @ 2015-08-13  9:08 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

Hi Michel,

Thanks for the reply!

I yet have another question: right now the mark_tlb_dirty() will be
called if any level of PPGTT table is changed. But for the EXECLIST
context submission, we only need LRI commands if there are L3 PDP root
pointer changes right? Thanks!

Regards,
-Zhiyuan

On Wed, Aug 12, 2015 at 03:56:49PM +0800, Michel Thierry wrote:
> On 8/11/2015 1:05 PM, Zhiyuan Lv wrote:
> >Hi Mika/Dave/Michel,
> >
> >I saw the patch of using LRI for root pointer update has been merged to
> >drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
> >with XenGT, we may still need Mika's this patch like below:
> >
> >"
> >         if (intel_vgpu_active(ppgtt->base.dev))
> >                 gen8_preallocate_top_level_pdps(ppgtt);
> >"
> >
> >Could you share with us your opinion? Thanks in advance!
> 
> Hi Zhiyuan,
> 
> The change looks ok to me. If you need to preallocate the PDPs,
> gen8_ppgtt_init is the right place to do it. Only add a similar
> vgpu_active check to disable the LRI updates (in
> gen8_emit_bb_start).
> 
> >
> >The reason behind is that LRI command will make shadow PPGTT implementation
> >hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
> >driver, and then track every guest page table change in order to update shadow
> >page table accordingly. The problem of page table updates with GPU command is
> >that they cannot be trapped by hypervisor to finish the shadow page table
> >update work. In XenGT, the only change we have is the command scan in context
> >submission. But that is not exactly the right time to do shadow page table
> >update.
> >
> >Mika's patch can address the problem nicely. With the preallocation, the root
> >pointers in EXECLIST context will always keep the same. Then we can treat any
> >attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!
> >
> >Regards,
> >-Zhiyuan
> >
> >On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
> >>Dave Gordon <david.s.gordon@intel.com> writes:
> >>
> >>>On 10/06/15 12:42, Michel Thierry wrote:
> >>>>On 5/29/2015 1:53 PM, Michel Thierry wrote:
> >>>>>On 5/29/2015 12:05 PM, Michel Thierry wrote:
> >>>>>>On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> >>>>>>>With BDW/SKL and 32bit addressing mode only, the hardware preloads
> >>>>>>>pdps. However the TLB invalidation only has effect on levels below
> >>>>>>>the pdps. This means that if pdps change, hw might access with
> >>>>>>>stale pdp entry.
> >>>>>>>
> >>>>>>>To combat this problem, preallocate the top pdps so that hw sees
> >>>>>>>them as immutable for each context.
> >>>>>>>
> >>>>>>>Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >>>>>>>Cc: Rafael Barbalho <rafael.barbalho@intel.com>
> >>>>>>>Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>>>>---
> >>>>>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 50
> >>>>>>>+++++++++++++++++++++++++++++++++++++
> >>>>>>>   drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
> >>>>>>>   drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
> >>>>>>>   3 files changed, 68 insertions(+), 14 deletions(-)
> >>>>>>>
> >>>>>>>diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>index 0ffd459..1a5ad4c 100644
> >>>>>>>--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>@@ -941,6 +941,48 @@ err_out:
> >>>>>>>          return ret;
> >>>>>>>   }
> >>>>>>>
> >>>>>>>+/* With some architectures and 32bit legacy mode, hardware pre-loads
> >>>>>>>the
> >>>>>>>+ * top level pdps but the tlb invalidation only invalidates the
> >>>>>>>lower levels.
> >>>>>>>+ * This might lead to hw fetching with stale pdp entries if top level
> >>>>>>>+ * structure changes, ie va space grows with dynamic page tables.
> >>>>>>>+ */
> >>>
> >>>Is this still necessary if we reload PDPs via LRI instructions whenever
> >>>the address map has changed? That always (AFAICT) causes sufficient
> >>>invalidation, so then we might not need to preallocate at all :)
> >>>
> >>
> >>LRI reload gets my vote. Please ignore this patch.
> >>-Mika
> >>
> >>>.Dave.
> >>_______________________________________________
> >>Intel-gfx mailing list
> >>Intel-gfx@lists.freedesktop.org
> >>http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-12 15:09                 ` Dave Gordon
@ 2015-08-13  9:36                   ` Zhiyuan Lv
  2015-08-13  9:54                     ` Michel Thierry
  0 siblings, 1 reply; 86+ messages in thread
From: Zhiyuan Lv @ 2015-08-13  9:36 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

Hi Dave,

On Wed, Aug 12, 2015 at 04:09:18PM +0100, Dave Gordon wrote:
> On 12/08/15 08:56, Thierry, Michel wrote:
> >On 8/11/2015 1:05 PM, Zhiyuan Lv wrote:
> >>Hi Mika/Dave/Michel,
> >>
> >>I saw the patch of using LRI for root pointer update has been merged to
> >>drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
> >>with XenGT, we may still need Mika's this patch like below:
> >>
> >>"
> >>          if (intel_vgpu_active(ppgtt->base.dev))
> >>                  gen8_preallocate_top_level_pdps(ppgtt);
> >>"
> >>
> >>Could you share with us your opinion? Thanks in advance!
> >
> >Hi Zhiyuan,
> >
> >The change looks ok to me. If you need to preallocate the PDPs,
> >gen8_ppgtt_init is the right place to do it. Only add a similar
> >vgpu_active check to disable the LRI updates (in gen8_emit_bb_start).
> >
> >>The reason behind is that LRI command will make shadow PPGTT implementation
> >>hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
> >>driver, and then track every guest page table change in order to update shadow
> >>page table accordingly. The problem of page table updates with GPU command is
> >>that they cannot be trapped by hypervisor to finish the shadow page table
> >>update work. In XenGT, the only change we have is the command scan in context
> >>submission. But that is not exactly the right time to do shadow page table
> >>update.
> >>
> >>Mika's patch can address the problem nicely. With the preallocation, the root
> >>pointers in EXECLIST context will always keep the same. Then we can treat any
> >>attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!
> >>
> >>Regards,
> >>-Zhiyuan
> 
> The bad thing that was happening if we didn't use LRIs was that the
> CPU would try to push the new mappings to the GPU by updating PDP
> registers in the saved context image. This is unsafe if the context
> is running, as switching away from it would result in the
> CPU-updated values being overwritten by the older values in the GPU
> h/w registers (if the context were known to be idle, then it would
> be safe).

Thank you very much for the detailed explanation! And I am curious
that if the root pointers update does not have side effect to the
current running context, for instance, only changing NULL to PD
without modifying existing pdpes, can we use "Force PD Restore" bit in
ctx descriptor?

Regards,
-Zhiyuan

> 
> Preallocating the top-level PDPs should mean that the values need
> never change, so there's then no need to update the context image,
> thus avoiding the write hazard :)
> 
> .Dave.
> 
> >>On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
> >>>Dave Gordon <david.s.gordon@intel.com> writes:
> >>>
> >>>>On 10/06/15 12:42, Michel Thierry wrote:
> >>>>>On 5/29/2015 1:53 PM, Michel Thierry wrote:
> >>>>>>On 5/29/2015 12:05 PM, Michel Thierry wrote:
> >>>>>>>On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
> >>>>>>>>With BDW/SKL and 32bit addressing mode only, the hardware preloads
> >>>>>>>>pdps. However the TLB invalidation only has effect on levels below
> >>>>>>>>the pdps. This means that if pdps change, hw might access with
> >>>>>>>>stale pdp entry.
> >>>>>>>>
> >>>>>>>>To combat this problem, preallocate the top pdps so that hw sees
> >>>>>>>>them as immutable for each context.
> >>>>>>>>
> >>>>>>>>Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >>>>>>>>Cc: Rafael Barbalho <rafael.barbalho@intel.com>
> >>>>>>>>Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>>>>>---
> >>>>>>>>    drivers/gpu/drm/i915/i915_gem_gtt.c | 50
> >>>>>>>>+++++++++++++++++++++++++++++++++++++
> >>>>>>>>    drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
> >>>>>>>>    drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
> >>>>>>>>    3 files changed, 68 insertions(+), 14 deletions(-)
> >>>>>>>>
> >>>>>>>>diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>>b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>>index 0ffd459..1a5ad4c 100644
> >>>>>>>>--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>>+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>>>>>>@@ -941,6 +941,48 @@ err_out:
> >>>>>>>>           return ret;
> >>>>>>>>    }
> >>>>>>>>
> >>>>>>>>+/* With some architectures and 32bit legacy mode, hardware pre-loads
> >>>>>>>>+ * the top level pdps but the tlb invalidation only invalidates the
> >>>>>>>>+ * lower levels.
> >>>>>>>>+ * This might lead to hw fetching with stale pdp entries if top level
> >>>>>>>>+ * structure changes, ie va space grows with dynamic page tables.
> >>>>>>>>+ */
> >>>>
> >>>>Is this still necessary if we reload PDPs via LRI instructions whenever
> >>>>the address map has changed? That always (AFAICT) causes sufficient
> >>>>invalidation, so then we might not need to preallocate at all :)
> >>>
> >>>LRI reload gets my vote. Please ignore this patch.
> >>>-Mika
> >>>
> >>>>.Dave.
> >>>_______________________________________________
> >>>Intel-gfx mailing list
> >>>Intel-gfx@lists.freedesktop.org
> >>>http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-13  9:36                   ` Zhiyuan Lv
@ 2015-08-13  9:54                     ` Michel Thierry
  0 siblings, 0 replies; 86+ messages in thread
From: Michel Thierry @ 2015-08-13  9:54 UTC (permalink / raw)
  To: Dave Gordon, Mika Kuoppala, Wang, Zhi A, Tian, Kevin, intel-gfx

On 8/13/2015 5:36 PM, Zhiyuan Lv wrote:
> Hi Dave,
>
> On Wed, Aug 12, 2015 at 04:09:18PM +0100, Dave Gordon wrote:
>> On 12/08/15 08:56, Thierry, Michel wrote:
>>> On 8/11/2015 1:05 PM, Zhiyuan Lv wrote:
>>>> Hi Mika/Dave/Michel,
>>>>
>>>> I saw the patch of using LRI for root pointer update has been merged to
>>>> drm-intel. When we consider i915 driver to run inside a virtual machine, e.g.
>>>> with XenGT, we may still need Mika's this patch like below:
>>>>
>>>> "
>>>>           if (intel_vgpu_active(ppgtt->base.dev))
>>>>                   gen8_preallocate_top_level_pdps(ppgtt);
>>>> "
>>>>
>>>> Could you share with us your opinion? Thanks in advance!
>>>
>>> Hi Zhiyuan,
>>>
>>> The change looks ok to me. If you need to preallocate the PDPs,
>>> gen8_ppgtt_init is the right place to do it. Only add a similar
>>> vgpu_active check to disable the LRI updates (in gen8_emit_bb_start).
>>>
>>>> The reason behind is that LRI command will make shadow PPGTT implementation
>>>> hard. In XenGT, we construct shadow page table for each PPGTT in guest i915
>>>> driver, and then track every guest page table change in order to update shadow
>>>> page table accordingly. The problem of page table updates with GPU command is
>>>> that they cannot be trapped by hypervisor to finish the shadow page table
>>>> update work. In XenGT, the only change we have is the command scan in context
>>>> submission. But that is not exactly the right time to do shadow page table
>>>> update.
>>>>
>>>> Mika's patch can address the problem nicely. With the preallocation, the root
>>>> pointers in EXECLIST context will always keep the same. Then we can treat any
>>>> attempt to change guest PPGTT with GPU commands as malicious behavior. Thanks!
>>>>
>>>> Regards,
>>>> -Zhiyuan
>>
>> The bad thing that was happening if we didn't use LRIs was that the
>> CPU would try to push the new mappings to the GPU by updating PDP
>> registers in the saved context image. This is unsafe if the context
>> is running, as switching away from it would result in the
>> CPU-updated values being overwritten by the older values in the GPU
>> h/w registers (if the context were known to be idle, then it would
>> be safe).
>
> Thank you very much for the detailed explanation! And I am curious
> that if the root pointers update does not have side effect to the
> current running context, for instance, only changing NULL to PD
> without modifying existing pdpes, can we use "Force PD Restore" bit in
> ctx descriptor?

We've been explicitly asked to not use "Force PD Restore".

>
> Regards,
> -Zhiyuan
>
>>
>> Preallocating the top-level PDPs should mean that the values need
>> never change, so there's then no need to update the context image,
>> thus avoiding the write hazard :)
>>
>> .Dave.
>>
>>>> On Thu, Jun 11, 2015 at 04:57:42PM +0300, Mika Kuoppala wrote:
>>>>> Dave Gordon <david.s.gordon@intel.com> writes:
>>>>>
>>>>>> On 10/06/15 12:42, Michel Thierry wrote:
>>>>>>> On 5/29/2015 1:53 PM, Michel Thierry wrote:
>>>>>>>> On 5/29/2015 12:05 PM, Michel Thierry wrote:
>>>>>>>>> On 5/22/2015 6:04 PM, Mika Kuoppala wrote:
>>>>>>>>>> With BDW/SKL and 32bit addressing mode only, the hardware preloads
>>>>>>>>>> pdps. However the TLB invalidation only has effect on levels below
>>>>>>>>>> the pdps. This means that if pdps change, hw might access with
>>>>>>>>>> stale pdp entry.
>>>>>>>>>>
>>>>>>>>>> To combat this problem, preallocate the top pdps so that hw sees
>>>>>>>>>> them as immutable for each context.
>>>>>>>>>>
>>>>>>>>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>>>>>>>> Cc: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>>>>>>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/i915/i915_gem_gtt.c | 50
>>>>>>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>>>>>>     drivers/gpu/drm/i915/i915_reg.h     | 17 +++++++++++++
>>>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c    | 15 +----------
>>>>>>>>>>     3 files changed, 68 insertions(+), 14 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>>>> index 0ffd459..1a5ad4c 100644
>>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>>>>>>>> @@ -941,6 +941,48 @@ err_out:
>>>>>>>>>>            return ret;
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> +/* With some architectures and 32bit legacy mode, hardware pre-loads
>>>>>>>>>> + * the top level pdps but the tlb invalidation only invalidates the
>>>>>>>>>> + * lower levels.
>>>>>>>>>> + * This might lead to hw fetching with stale pdp entries if top level
>>>>>>>>>> + * structure changes, ie va space grows with dynamic page tables.
>>>>>>>>>> + */
>>>>>>
>>>>>> Is this still necessary if we reload PDPs via LRI instructions whenever
>>>>>> the address map has changed? That always (AFAICT) causes sufficient
>>>>>> invalidation, so then we might not need to preallocate at all :)
>>>>>
>>>>> LRI reload gets my vote. Please ignore this patch.
>>>>> -Mika
>>>>>
>>>>>> .Dave.
>>>>> _______________________________________________
>>>>> Intel-gfx mailing list
>>>>> Intel-gfx@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-13  9:08                 ` Zhiyuan Lv
@ 2015-08-13 10:12                   ` Michel Thierry
  2015-08-13 11:42                     ` Dave Gordon
  0 siblings, 1 reply; 86+ messages in thread
From: Michel Thierry @ 2015-08-13 10:12 UTC (permalink / raw)
  To: Mika Kuoppala, Dave Gordon, Wang, Zhi A, Tian, Kevin, intel-gfx

On 8/13/2015 5:08 PM, Zhiyuan Lv wrote:
> Hi Michel,
>
> Thanks for the reply!
>
> I yet have another question: right now the mark_tlb_dirty() will be
> called if any level of PPGTT table is changed. But for the EXECLIST
> context submission, we only need LRI commands if there are L3 PDP root
> pointer changes right? Thanks!
>

mark_tlbs_dirty is not only for execlists mode, we re-used it since it 
was already there.

The update is only required when a PDP is allocated.

-Michel


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-13 10:12                   ` Michel Thierry
@ 2015-08-13 11:42                     ` Dave Gordon
  2015-08-13 12:03                       ` Dave Gordon
  0 siblings, 1 reply; 86+ messages in thread
From: Dave Gordon @ 2015-08-13 11:42 UTC (permalink / raw)
  To: Michel Thierry, Mika Kuoppala, Wang, Zhi A, Tian, Kevin, intel-gfx

On 13/08/15 11:12, Michel Thierry wrote:
> On 8/13/2015 5:08 PM, Zhiyuan Lv wrote:
>> Hi Michel,
>>
>> Thanks for the reply!
>>
>> I yet have another question: right now the mark_tlb_dirty() will be
>> called if any level of PPGTT table is changed. But for the EXECLIST
>> context submission, we only need LRI commands if there are L3 PDP root
>> pointer changes right? Thanks!
>
> mark_tlbs_dirty is not only for execlists mode, we re-used it since it
> was already there.
>
> The update is only required when a PDP is allocated.
>
> -Michel

Doesn't that depend on whether the context is running? The LRI reload 
has the side effect of flushing all current knowledge of mappings, so 
every level of PD gets refreshed from memory.

If we're not updating the top level PDPs, and we know the context isn't 
active, then we *assume* that lower-level PDs will be refreshed when the 
context is next loaded. (This hasn't been true on all hardware, some of 
which cached previously-retrieved PDs across ctx save-and-reload, and 
that's one reason why there's a "Force PD Restore" bit, but we've been 
told not to use it on current h/w). AFAICT, current chips don't cache 
previous PDs and don't need the "Force" bit for this case.

OTOH, if we don't know whether the context is running, then we can't be 
sure when (or whether) any PD updates will be seen. As long as the 
changes of interest are only ever *from* NULL *to* non-NULL, we *expect* 
it to work, because (we *assume*) the GPU won't cache negative results 
from PD lookups; so any lookup that previously hit an invalid mapping 
will be re-fetched next time it's required (and may now be good).

If we don't reload the PDPs with LRIs, then perhaps to be safe we need 
to inject some other instruction that will just force a re-fetch of the 
lower-level PDs from memory, without altering any top-level PDPs? In 
conjunction with preallocating the top-level entries, that ought to 
guarantee that the updates would be seen just before the point where 
they're about to be used?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-13 11:42                     ` Dave Gordon
@ 2015-08-13 12:03                       ` Dave Gordon
  2015-08-13 14:56                         ` Zhiyuan Lv
  0 siblings, 1 reply; 86+ messages in thread
From: Dave Gordon @ 2015-08-13 12:03 UTC (permalink / raw)
  To: intel-gfx

On 13/08/15 12:42, Dave Gordon wrote:
> On 13/08/15 11:12, Michel Thierry wrote:
>> On 8/13/2015 5:08 PM, Zhiyuan Lv wrote:
>>> Hi Michel,
>>>
>>> Thanks for the reply!
>>>
>>> I yet have another question: right now the mark_tlb_dirty() will be
>>> called if any level of PPGTT table is changed. But for the EXECLIST
>>> context submission, we only need LRI commands if there are L3 PDP root
>>> pointer changes right? Thanks!
>>
>> mark_tlbs_dirty is not only for execlists mode, we re-used it since it
>> was already there.
>>
>> The update is only required when a PDP is allocated.
>>
>> -Michel
>
> Doesn't that depend on whether the context is running? The LRI reload
> has the side effect of flushing all current knowledge of mappings, so
> every level of PD gets refreshed from memory.
>
> If we're not updating the top level PDPs, and we know the context isn't
> active, then we *assume* that lower-level PDs will be refreshed when the
> context is next loaded. (This hasn't been true on all hardware, some of
> which cached previously-retrieved PDs across ctx save-and-reload, and
> that's one reason why there's a "Force PD Restore" bit, but we've been
> told not to use it on current h/w). AFAICT, current chips don't cache
> previous PDs and don't need the "Force" bit for this case.
>
> OTOH, if we don't know whether the context is running, then we can't be
> sure when (or whether) any PD updates will be seen. As long as the
> changes of interest are only ever *from* NULL *to* non-NULL, we *expect*
> it to work, because (we *assume*) the GPU won't cache negative results
> from PD lookups; so any lookup that previously hit an invalid mapping
> will be re-fetched next time it's required (and may now be good).
>
> If we don't reload the PDPs with LRIs, then perhaps to be safe we need
> to inject some other instruction that will just force a re-fetch of the
> lower-level PDs from memory, without altering any top-level PDPs? In
> conjunction with preallocating the top-level entries, that ought to
> guarantee that the updates would be seen just before the point where
> they're about to be used?
>
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

I found the following comment in the BSpec:

"Pre-loading of Page Directory Entries (PD load) for 32b legacy mode is 
not supported from Gen9 onwards.  PD entries are loaded on demand when 
there is a miss in the PDE cache of the corresponding page walker.  Any 
new page additions by the driver are transparent to the HW, and the new 
page translations will be fetched on demand.  However, any removal of 
the pages by the driver should initiate a TLB invalidation to remove the 
stale entries."

So, I think that confirms that we should inject some form of TLB 
invalidation into the ring before the next batch uses any updated PDs. 
Presumably an MI_FLUSH_DW with TLB_INVALIDATE would do?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps
  2015-08-13 12:03                       ` Dave Gordon
@ 2015-08-13 14:56                         ` Zhiyuan Lv
  0 siblings, 0 replies; 86+ messages in thread
From: Zhiyuan Lv @ 2015-08-13 14:56 UTC (permalink / raw)
  To: Dave Gordon, Michel Thierry; +Cc: intel-gfx

On Thu, Aug 13, 2015 at 01:03:30PM +0100, Dave Gordon wrote:
> On 13/08/15 12:42, Dave Gordon wrote:
> >On 13/08/15 11:12, Michel Thierry wrote:
> >>On 8/13/2015 5:08 PM, Zhiyuan Lv wrote:
> >>>Hi Michel,
> >>>
> >>>Thanks for the reply!
> >>>
> >>>I yet have another question: right now the mark_tlb_dirty() will be
> >>>called if any level of PPGTT table is changed. But for the EXECLIST
> >>>context submission, we only need LRI commands if there are L3 PDP root
> >>>pointer changes right? Thanks!
> >>
> >>mark_tlbs_dirty is not only for execlists mode, we re-used it since it
> >>was already there.
> >>
> >>The update is only required when a PDP is allocated.
> >>
> >>-Michel
> >
> >Doesn't that depend on whether the context is running? The LRI reload
> >has the side effect of flushing all current knowledge of mappings, so
> >every level of PD gets refreshed from memory.
> >
> >If we're not updating the top level PDPs, and we know the context isn't
> >active, then we *assume* that lower-level PDs will be refreshed when the
> >context is next loaded. (This hasn't been true on all hardware, some of
> >which cached previously-retrieved PDs across ctx save-and-reload, and
> >that's one reason why there's a "Force PD Restore" bit, but we've been
> >told not to use it on current h/w). AFAICT, current chips don't cache
> >previous PDs and don't need the "Force" bit for this case.
> >
> >OTOH, if we don't know whether the context is running, then we can't be
> >sure when (or whether) any PD updates will be seen. As long as the
> >changes of interest are only ever *from* NULL *to* non-NULL, we *expect*
> >it to work, because (we *assume*) the GPU won't cache negative results
> >from PD lookups; so any lookup that previously hit an invalid mapping
> >will be re-fetched next time it's required (and may now be good).
> >
> >If we don't reload the PDPs with LRIs, then perhaps to be safe we need
> >to inject some other instruction that will just force a re-fetch of the
> >lower-level PDs from memory, without altering any top-level PDPs? In
> >conjunction with preallocating the top-level entries, that ought to
> >guarantee that the updates would be seen just before the point where
> >they're about to be used?
> >
> >.Dave.
> >_______________________________________________
> >Intel-gfx mailing list
> >Intel-gfx@lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> I found the following comment in the BSpec:
> 
> "Pre-loading of Page Directory Entries (PD load) for 32b legacy mode
> is not supported from Gen9 onwards.  PD entries are loaded on demand
> when there is a miss in the PDE cache of the corresponding page
> walker.  Any new page additions by the driver are transparent to the
> HW, and the new page translations will be fetched on demand.
> However, any removal of the pages by the driver should initiate a
> TLB invalidation to remove the stale entries."
> 
> So, I think that confirms that we should inject some form of TLB
> invalidation into the ring before the next batch uses any updated
> PDs. Presumably an MI_FLUSH_DW with TLB_INVALIDATE would do?

Hi Dave and Michel,

So the conclusion is still the same: that for 32b legacy mode,
emit_pdps() is only needed for PDP changes. Other level page table
changes can be handled by TLB_INVALIDATE with ring buffer commands. Is
that correct? Thanks!

Regards,
-Zhiyuan

> 
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2015-08-13 15:08 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-22 17:04 [PATCH 00/21] ppgtt cleanups / scratch merge (V2) Mika Kuoppala
2015-05-22 17:04 ` [PATCH 01/21] drm/i915/gtt: Mark TLBS dirty for gen8+ Mika Kuoppala
2015-06-01 14:51   ` Joonas Lahtinen
2015-06-11 17:37     ` Mika Kuoppala
2015-06-23 11:10       ` Joonas Lahtinen
2015-06-01 15:52   ` Michel Thierry
2015-05-22 17:04 ` [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps Mika Kuoppala
2015-05-29 11:05   ` Michel Thierry
2015-05-29 12:53     ` Michel Thierry
2015-06-10 11:42       ` Michel Thierry
2015-06-11  7:31         ` Dave Gordon
2015-06-11 10:46           ` Michel Thierry
2015-06-11 13:57           ` Mika Kuoppala
2015-08-11  5:05             ` Zhiyuan Lv
2015-08-12  7:56               ` Michel Thierry
2015-08-12 15:09                 ` Dave Gordon
2015-08-13  9:36                   ` Zhiyuan Lv
2015-08-13  9:54                     ` Michel Thierry
2015-08-13  9:08                 ` Zhiyuan Lv
2015-08-13 10:12                   ` Michel Thierry
2015-08-13 11:42                     ` Dave Gordon
2015-08-13 12:03                       ` Dave Gordon
2015-08-13 14:56                         ` Zhiyuan Lv
2015-05-22 17:04 ` [PATCH 03/21] drm/i915/gtt: Check va range against vm size Mika Kuoppala
2015-06-01 15:33   ` Joonas Lahtinen
2015-06-11 14:23     ` Mika Kuoppala
2015-06-24 14:48       ` Michel Thierry
2015-05-22 17:04 ` [PATCH 04/21] drm/i915/gtt: Allow >= 4GB sizes for vm Mika Kuoppala
2015-05-26  7:15   ` Daniel Vetter
2015-06-11 17:38     ` Mika Kuoppala
2015-05-22 17:04 ` [PATCH 05/21] drm/i915/gtt: Don't leak scratch page on mapping error Mika Kuoppala
2015-06-01 15:02   ` Joonas Lahtinen
2015-06-15 10:13     ` Daniel Vetter
2015-05-22 17:04 ` [PATCH 06/21] drm/i915/gtt: Remove _single from page table allocator Mika Kuoppala
2015-06-02  9:53   ` Joonas Lahtinen
2015-06-02  9:56   ` Michel Thierry
2015-06-15 10:14     ` Daniel Vetter
2015-05-22 17:05 ` [PATCH 07/21] drm/i915/gtt: Introduce i915_page_dir_dma_addr Mika Kuoppala
2015-06-02 10:11   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 08/21] drm/i915/gtt: Introduce struct i915_page_dma Mika Kuoppala
2015-06-02 12:39   ` Michel Thierry
2015-06-11 17:48     ` Mika Kuoppala
2015-06-22 14:05       ` Michel Thierry
2015-05-22 17:05 ` [PATCH 09/21] drm/i915/gtt: Rename unmap_and_free_px to free_px Mika Kuoppala
2015-06-02 13:08   ` Michel Thierry
2015-06-11 17:48     ` Mika Kuoppala
2015-06-22 14:09       ` Michel Thierry
2015-06-22 14:43         ` Daniel Vetter
2015-05-22 17:05 ` [PATCH 10/21] drm/i915/gtt: Remove superfluous free_pd with gen6/7 Mika Kuoppala
2015-06-02 14:07   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 11/21] drm/i915/gtt: Introduce fill_page_dma() Mika Kuoppala
2015-06-02 14:51   ` Michel Thierry
2015-06-02 15:01     ` Ville Syrjälä
2015-06-15 10:16       ` Daniel Vetter
2015-06-11 17:50     ` Mika Kuoppala
2015-06-24 15:05       ` Michel Thierry
2015-05-22 17:05 ` [PATCH 12/21] drm/i915/gtt: Introduce kmap|kunmap for dma page Mika Kuoppala
2015-06-03 10:55   ` Michel Thierry
2015-06-11 17:50     ` Mika Kuoppala
2015-06-24 15:06       ` Michel Thierry
2015-05-22 17:05 ` [PATCH 13/21] drm/i915/gtt: Use macros to access dma mapped pages Mika Kuoppala
2015-06-03 10:57   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 14/21] drm/i915/gtt: Make scratch page i915_page_dma compatible Mika Kuoppala
2015-06-03 13:44   ` Michel Thierry
2015-06-11 16:30     ` Mika Kuoppala
2015-06-24 14:59       ` Michel Thierry
2015-05-22 17:05 ` [PATCH 15/21] drm/i915/gtt: Fill scratch page Mika Kuoppala
2015-05-27 18:12   ` Tomas Elf
2015-06-01 15:53     ` Chris Wilson
2015-06-04 11:08       ` Tomas Elf
2015-06-04 11:24         ` Chris Wilson
2015-06-11 16:37     ` Mika Kuoppala
2015-06-03 14:03   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 16/21] drm/i915/gtt: Pin vma during virtual address allocation Mika Kuoppala
2015-06-03 14:27   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 17/21] drm/i915/gtt: Cleanup page directory encoding Mika Kuoppala
2015-06-03 14:58   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 18/21] drm/i915/gtt: Move scratch_pd and scratch_pt into vm area Mika Kuoppala
2015-06-03 16:46   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 19/21] drm/i915/gtt: One instance of scratch page table/directory Mika Kuoppala
2015-06-03 16:57   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 20/21] drm/i915/gtt: Use nonatomic bitmap ops Mika Kuoppala
2015-06-03 17:07   ` Michel Thierry
2015-05-22 17:05 ` [PATCH 21/21] drm/i915/gtt: Reorder page alloc/free/init functions Mika Kuoppala
2015-06-03 17:14   ` Michel Thierry
2015-06-11 17:52     ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.