All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv6 1/2] drm/i915: Factor out setup_private_pat()
@ 2017-08-29 18:14 Zhi Wang
  2017-08-29 18:14 ` [RFCv6 2/2] drm/i915: Introduce private PAT management Zhi Wang
  2017-08-29 18:37 ` ✗ Fi.CI.BAT: failure for series starting with [RFCv6,1/2] drm/i915: Factor out setup_private_pat() Patchwork
  0 siblings, 2 replies; 14+ messages in thread
From: Zhi Wang @ 2017-08-29 18:14 UTC (permalink / raw)
  To: intel-gfx, intel-gvt-dev; +Cc: Rodrigo Vivi, Ben Widawsky

Factor out setup_private_pat() for introducing the following patches.

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 708b95c..b74fa9d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2915,6 +2915,16 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 	cleanup_scratch_page(vm);
 }
 
+static void setup_private_pat(struct drm_i915_private *dev_priv)
+{
+	if (INTEL_GEN(dev_priv) >= 10)
+		cnl_setup_private_ppat(dev_priv);
+	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
+		chv_setup_private_ppat(dev_priv);
+	else
+		bdw_setup_private_ppat(dev_priv);
+}
+
 static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 {
 	struct drm_i915_private *dev_priv = ggtt->base.i915;
@@ -2947,14 +2957,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	}
 
 	ggtt->base.total = (size / sizeof(gen8_pte_t)) << PAGE_SHIFT;
-
-	if (INTEL_GEN(dev_priv) >= 10)
-		cnl_setup_private_ppat(dev_priv);
-	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
-		chv_setup_private_ppat(dev_priv);
-	else
-		bdw_setup_private_ppat(dev_priv);
-
 	ggtt->base.cleanup = gen6_gmch_remove;
 	ggtt->base.bind_vma = ggtt_bind_vma;
 	ggtt->base.unbind_vma = ggtt_unbind_vma;
@@ -2975,6 +2977,8 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 
 	ggtt->invalidate = gen6_ggtt_invalidate;
 
+	setup_private_pat(dev_priv);
+
 	return ggtt_probe_common(ggtt, size);
 }
 
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-29 18:14 [RFCv6 1/2] drm/i915: Factor out setup_private_pat() Zhi Wang
@ 2017-08-29 18:14 ` Zhi Wang
  2017-08-30 22:00   ` Vivi, Rodrigo
  2017-08-29 18:37 ` ✗ Fi.CI.BAT: failure for series starting with [RFCv6,1/2] drm/i915: Factor out setup_private_pat() Patchwork
  1 sibling, 1 reply; 14+ messages in thread
From: Zhi Wang @ 2017-08-29 18:14 UTC (permalink / raw)
  To: intel-gfx, intel-gvt-dev; +Cc: Rodrigo Vivi, Ben Widawsky

The private PAT management is to support PPAT entry manipulation. Two
APIs are introduced for dynamically managing PPAT entries: intel_ppat_get
and intel_ppat_put.

intel_ppat_get will search for an existing PPAT entry which perfectly
matches the required PPAT value. If not, it will try to allocate or
return a partially matched PPAT entry if there is any available PPAT
indexes or not.

intel_ppat_put will put back the PPAT entry which comes from
intel_ppat_get. If it's dynamically allocated, the reference count will
be decreased. If the reference count turns into zero, the PPAT index is
freed again.

Besides, another two callbacks are introduced to support the private PAT
management framework. One is ppat->update_hw(), which writes the PPAT
configurations in ppat->entries into HW. Another one is ppat->match, which
will return a score to show how two PPAT values match with each other.

v6:

- Address all comments from Chris:
http://www.spinics.net/lists/intel-gfx/msg136850.html

- Address all comments from Joonas:
http://www.spinics.net/lists/intel-gfx/msg136845.html

v5:

- Add check and warnnings for those platforms which don't have PPAT.

v3:

- Introduce dirty bitmap for PPAT registers. (Chris)
- Change the name of the pointer "dev_priv" to "i915". (Chris)
- intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. (Chris)

v2:

- API re-design. (Chris)

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h     |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 273 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
 3 files changed, 262 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7587ef5..5ffde10 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2312,6 +2312,8 @@ struct drm_i915_private {
 	DECLARE_HASHTABLE(mm_structs, 7);
 	struct mutex mm_lock;
 
+	struct intel_ppat ppat;
+
 	/* Kernel Modesetting */
 
 	struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b74fa9d..3106142 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
 	return 0;
 }
 
-static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
+static struct intel_ppat_entry *
+__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8 value)
 {
+	struct intel_ppat_entry *entry = &ppat->entries[index];
+
+	GEM_BUG_ON(index >= ppat->max_entries);
+	GEM_BUG_ON(test_bit(index, ppat->used));
+
+	entry->ppat = ppat;
+	entry->value = value;
+	kref_init(&entry->ref);
+	set_bit(index, ppat->used);
+	set_bit(index, ppat->dirty);
+
+	return entry;
+}
+
+static void __free_ppat_entry(struct intel_ppat_entry *entry)
+{
+	struct intel_ppat *ppat = entry->ppat;
+	unsigned int index = entry - ppat->entries;
+
+	GEM_BUG_ON(index >= ppat->max_entries);
+	GEM_BUG_ON(!test_bit(index, ppat->used));
+
+	entry->value = ppat->clear_value;
+	clear_bit(index, ppat->used);
+	set_bit(index, ppat->dirty);
+}
+
+/**
+ * intel_ppat_get - get a usable PPAT entry
+ * @i915: i915 device instance
+ * @value: the PPAT value required by the caller
+ *
+ * The function tries to search if there is an existing PPAT entry which
+ * matches with the required value. If perfectly matched, the existing PPAT
+ * entry will be used. If only partially matched, it will try to check if
+ * there is any available PPAT index. If yes, it will allocate a new PPAT
+ * index for the required entry and update the HW. If not, the partially
+ * matched entry will be used.
+ */
+const struct intel_ppat_entry *
+intel_ppat_get(struct drm_i915_private *i915, u8 value)
+{
+	struct intel_ppat *ppat = &i915->ppat;
+	struct intel_ppat_entry *entry;
+	unsigned int scanned, best_score;
+	int i;
+
+	GEM_BUG_ON(!ppat->max_entries);
+
+	scanned = best_score = 0;
+
+	for_each_set_bit(i, ppat->used, ppat->max_entries) {
+		unsigned int score;
+
+		entry = &ppat->entries[i];
+		score = ppat->match(entry->value, value);
+		if (score > best_score) {
+			if (score == INTEL_PPAT_PERFECT_MATCH) {
+				kref_get(&entry->ref);
+				return entry;
+			}
+			best_score = score;
+		}
+		scanned++;
+	}
+
+	if (scanned == ppat->max_entries) {
+		if (!best_score)
+			return ERR_PTR(-ENOSPC);
+
+		kref_get(&entry->ref);
+		return entry;
+	}
+
+	i = find_first_zero_bit(ppat->used, ppat->max_entries);
+	entry = __alloc_ppat_entry(ppat, i, value);
+	ppat->update_hw(i915);
+	return entry;
+}
+
+static void release_ppat(struct kref *kref)
+{
+	struct intel_ppat_entry *entry =
+		container_of(kref, struct intel_ppat_entry, ref);
+	struct drm_i915_private *i915 = entry->ppat->i915;
+
+	__free_ppat_entry(entry);
+	entry->ppat->update_hw(i915);
+}
+
+/**
+ * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
+ * @entry: an intel PPAT entry
+ *
+ * Put back the PPAT entry got from intel_ppat_get(). If the PPAT index of the
+ * entry is dynamically allocated, its reference count will be decreased. Once
+ * the reference count becomes into zero, the PPAT index becomes free again.
+ */
+void intel_ppat_put(const struct intel_ppat_entry *entry)
+{
+	struct intel_ppat *ppat = entry->ppat;
+	unsigned int index = entry - ppat->entries;
+
+	GEM_BUG_ON(!ppat->max_entries);
+
+	kref_put(&ppat->entries[index].ref, release_ppat);
+}
+
+static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
+{
+	struct intel_ppat *ppat = &dev_priv->ppat;
+	int i;
+
+	for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
+		I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
+		clear_bit(i, ppat->dirty);
+	}
+}
+
+static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
+{
+	struct intel_ppat *ppat = &dev_priv->ppat;
+	u64 pat = 0;
+	int i;
+
+	for (i = 0; i < ppat->max_entries; i++)
+		pat |= GEN8_PPAT(i, ppat->entries[i].value);
+
+	bitmap_clear(ppat->dirty, 0, ppat->max_entries);
+
+	I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
+	I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
+}
+
+static unsigned int bdw_private_pat_match(u8 src, u8 dst)
+{
+	unsigned int score = 0;
+
+	/* Cache attribute has to be matched. */
+	if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
+		return 0;
+
+	if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
+		score += 2;
+
+	if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
+		score += 1;
+
+	if (score == 3)
+		return INTEL_PPAT_PERFECT_MATCH;
+
+	return score;
+}
+
+static unsigned int chv_private_pat_match(u8 src, u8 dst)
+{
+	return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
+		INTEL_PPAT_PERFECT_MATCH : 0;
+}
+
+static void cnl_setup_private_ppat(struct intel_ppat *ppat)
+{
+	ppat->max_entries = 8;
+	ppat->update_hw = cnl_private_pat_update_hw;
+	ppat->match = bdw_private_pat_match;
+	ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
+
 	/* XXX: spec is unclear if this is still needed for CNL+ */
-	if (!USES_PPGTT(dev_priv)) {
-		I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
+	if (!USES_PPGTT(ppat->i915)) {
+		__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_UC);
 		return;
 	}
 
-	I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
-	I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
-	I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
-	I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
-	I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
-	I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
-	I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
-	I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
+	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
+	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLC);
+	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
+	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
+	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
 }
 
 /* The GGTT and PPGTT need a private PPAT setup in order to handle cacheability
  * bits. When using advanced contexts each context stores its own PAT, but
  * writing this data shouldn't be harmful even in those cases. */
-static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
+static void bdw_setup_private_ppat(struct intel_ppat *ppat)
 {
-	u64 pat;
+	ppat->max_entries = 8;
+	ppat->update_hw = bdw_private_pat_update_hw;
+	ppat->match = bdw_private_pat_match;
+	ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
 
-	pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for normal objects, no eLLC */
-	      GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for something pointing to ptes? */
-	      GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for scanout with eLLC */
-	      GEN8_PPAT(3, GEN8_PPAT_UC)                     | /* Uncached objects, mostly for scanout */
-	      GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0)) |
-	      GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1)) |
-	      GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2)) |
-	      GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
-
-	if (!USES_PPGTT(dev_priv))
+	if (!USES_PPGTT(ppat->i915)) {
 		/* Spec: "For GGTT, there is NO pat_sel[2:0] from the entry,
 		 * so RTL will always use the value corresponding to
 		 * pat_sel = 000".
@@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
 		 * So we can still hold onto all our assumptions wrt cpu
 		 * clflushing on LLC machines.
 		 */
-		pat = GEN8_PPAT(0, GEN8_PPAT_UC);
+		__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_UC);
+		return;
+	}
 
-	/* XXX: spec defines this as 2 distinct registers. It's unclear if a 64b
-	 * write would work. */
-	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
-	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
+	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
+	/* for normal objects, no eLLC */
+	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLC);
+	/* for scanout with eLLC */
+	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
+	/* Uncached objects, mostly for scanout */
+	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
+	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
 }
 
-static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
+static void chv_setup_private_ppat(struct intel_ppat *ppat)
 {
-	u64 pat;
+	ppat->max_entries = 8;
+	ppat->update_hw = bdw_private_pat_update_hw;
+	ppat->match = chv_private_pat_match;
+	ppat->clear_value = CHV_PPAT_SNOOP;
 
 	/*
 	 * Map WB on BDW to snooped on CHV.
@@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
 	 * Which means we must set the snoop bit in PAT entry 0
 	 * in order to keep the global status page working.
 	 */
-	pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
-	      GEN8_PPAT(1, 0) |
-	      GEN8_PPAT(2, 0) |
-	      GEN8_PPAT(3, 0) |
-	      GEN8_PPAT(4, CHV_PPAT_SNOOP) |
-	      GEN8_PPAT(5, CHV_PPAT_SNOOP) |
-	      GEN8_PPAT(6, CHV_PPAT_SNOOP) |
-	      GEN8_PPAT(7, CHV_PPAT_SNOOP);
 
-	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
-	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
+	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
+	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
+	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
+	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
+	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
 }
 
 static void gen6_gmch_remove(struct i915_address_space *vm)
@@ -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
 
 static void setup_private_pat(struct drm_i915_private *dev_priv)
 {
+	struct intel_ppat *ppat = &dev_priv->ppat;
+	int i;
+
+	ppat->i915 = dev_priv;
+
 	if (INTEL_GEN(dev_priv) >= 10)
-		cnl_setup_private_ppat(dev_priv);
+		cnl_setup_private_ppat(ppat);
 	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
-		chv_setup_private_ppat(dev_priv);
+		chv_setup_private_ppat(ppat);
 	else
-		bdw_setup_private_ppat(dev_priv);
+		bdw_setup_private_ppat(ppat);
+
+	GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
+
+	for_each_clear_bit(i, ppat->used, ppat->max_entries) {
+		ppat->entries[i].value = ppat->clear_value;
+		ppat->entries[i].ppat = ppat;
+		set_bit(i, ppat->dirty);
+	}
+
+	ppat->update_hw(dev_priv);
 }
 
 static int gen8_gmch_probe(struct i915_ggtt *ggtt)
@@ -3236,13 +3414,10 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	ggtt->base.closed = false;
 
 	if (INTEL_GEN(dev_priv) >= 8) {
-		if (INTEL_GEN(dev_priv) >= 10)
-			cnl_setup_private_ppat(dev_priv);
-		else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
-			chv_setup_private_ppat(dev_priv);
-		else
-			bdw_setup_private_ppat(dev_priv);
+		struct intel_ppat *ppat = &dev_priv->ppat;
 
+		bitmap_set(ppat->dirty, 0, ppat->max_entries);
+		dev_priv->ppat.update_hw(dev_priv);
 		return;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b4e3aa7..e10ca89 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
 #define GEN8_PPAT_ELLC_OVERRIDE		(0<<2)
 #define GEN8_PPAT(i, x)			((u64)(x) << ((i) * 8))
 
+#define GEN8_PPAT_GET_CA(x) ((x) & 3)
+#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2))
+#define GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4))
+#define CHV_PPAT_GET_SNOOP(x) ((x) & (1 << 6))
+
 struct sg_table;
 
 struct intel_rotation_info {
@@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
 	return container_of(vm, struct i915_ggtt, base);
 }
 
+#define INTEL_MAX_PPAT_ENTRIES 8
+#define INTEL_PPAT_PERFECT_MATCH (~0U)
+
+struct intel_ppat;
+
+struct intel_ppat_entry {
+	struct intel_ppat *ppat;
+	struct kref ref;
+	u8 value;
+};
+
+struct intel_ppat {
+	struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
+	DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
+	DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
+	unsigned int max_entries;
+	u8 clear_value;
+	/*
+	 * Return a score to show how two PPAT values match,
+	 * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
+	 */
+	unsigned int (*match)(u8 src, u8 dst);
+	void (*update_hw)(struct drm_i915_private *i915);
+
+	struct drm_i915_private *i915;
+};
+
+const struct intel_ppat_entry *
+intel_ppat_get(struct drm_i915_private *i915, u8 value);
+void intel_ppat_put(const struct intel_ppat_entry *entry);
+
 int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
 void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
 
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [RFCv6,1/2] drm/i915: Factor out setup_private_pat()
  2017-08-29 18:14 [RFCv6 1/2] drm/i915: Factor out setup_private_pat() Zhi Wang
  2017-08-29 18:14 ` [RFCv6 2/2] drm/i915: Introduce private PAT management Zhi Wang
@ 2017-08-29 18:37 ` Patchwork
  1 sibling, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-08-29 18:37 UTC (permalink / raw)
  To: Wang, Zhi A; +Cc: intel-gfx

== Series Details ==

Series: series starting with [RFCv6,1/2] drm/i915: Factor out setup_private_pat()
URL   : https://patchwork.freedesktop.org/series/29493/
State : failure

== Summary ==

Series 29493v1 series starting with [RFCv6,1/2] drm/i915: Factor out setup_private_pat()
https://patchwork.freedesktop.org/api/1.0/series/29493/revisions/1/mbox/

Test core_auth:
        Subgroup basic-auth:
                pass       -> SKIP       (fi-bdw-gvtdvm) fdo#102331
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test core_prop_blob:
        Subgroup basic:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test debugfs_test:
        Subgroup read_all_entries:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test drv_getparams_basic:
        Subgroup basic-eu-total:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
        Subgroup basic-subslice-total:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test drv_hangman:
        Subgroup error-state-basic:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_basic:
        Subgroup bad-close:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
        Subgroup create-close:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
        Subgroup create-fd-close:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_busy:
        Subgroup basic-busy-default:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
        Subgroup basic-hang-default:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_close_race:
        Subgroup basic-process:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
        Subgroup basic-threads:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_cpu_reloc:
        Subgroup basic:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_cs_tlb:
        Subgroup basic-default:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_ctx_basic:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
Test gem_ctx_create:
        Subgroup basic:
                pass       -> SKIP       (fi-bdw-gvtdvm)
                pass       -> SKIP       (fi-skl-gvtdvm)
                pass       -> SKIP       (fi-bxt-j4205)
                pass       -> SKIP       (fi-glk-2a)
        Subgroup basic-files:
WARNING: Long output truncated
fi-kbl-r failed to connect after reboot
fi-skl-x1585l failed to connect after reboot

428ed27345fbf9be530d01ca6dc862eb5895db81 drm-tip: 2017y-08m-29d-17h-43m-11s UTC integration manifest
302c1ff0c16c drm/i915: Introduce private PAT management
20db5de47953 drm/i915: Factor out setup_private_pat()

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5526/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-29 18:14 ` [RFCv6 2/2] drm/i915: Introduce private PAT management Zhi Wang
@ 2017-08-30 22:00   ` Vivi, Rodrigo
  2017-08-31  4:49     ` Zhi Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Vivi, Rodrigo @ 2017-08-30 22:00 UTC (permalink / raw)
  To: Wang, Zhi A; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev

On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
> The private PAT management is to support PPAT entry manipulation. Two
> APIs are introduced for dynamically managing PPAT entries: intel_ppat_get
> and intel_ppat_put.
> 
> intel_ppat_get will search for an existing PPAT entry which perfectly
> matches the required PPAT value. If not, it will try to allocate or
> return a partially matched PPAT entry if there is any available PPAT
> indexes or not.
> 
> intel_ppat_put will put back the PPAT entry which comes from
> intel_ppat_get. If it's dynamically allocated, the reference count will
> be decreased. If the reference count turns into zero, the PPAT index is
> freed again.
> 
> Besides, another two callbacks are introduced to support the private PAT
> management framework. One is ppat->update_hw(), which writes the PPAT
> configurations in ppat->entries into HW. Another one is ppat->match, which
> will return a score to show how two PPAT values match with each other.
> 
> v6:
> 
> - Address all comments from Chris:
> http://www.spinics.net/lists/intel-gfx/msg136850.html
> 
> - Address all comments from Joonas:
> http://www.spinics.net/lists/intel-gfx/msg136845.html
> 
> v5:
> 
> - Add check and warnnings for those platforms which don't have PPAT.
> 
> v3:
> 
> - Introduce dirty bitmap for PPAT registers. (Chris)
> - Change the name of the pointer "dev_priv" to "i915". (Chris)
> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. (Chris)
> 
> v2:
> 
> - API re-design. (Chris)
> 
> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |   2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 273 +++++++++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
>  3 files changed, 262 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7587ef5..5ffde10 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
>  	DECLARE_HASHTABLE(mm_structs, 7);
>  	struct mutex mm_lock;
>  
> +	struct intel_ppat ppat;
> +
>  	/* Kernel Modesetting */
>  
>  	struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b74fa9d..3106142 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>  	return 0;
>  }
>  
> -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
> +static struct intel_ppat_entry *
> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8 value)
>  {
> +	struct intel_ppat_entry *entry = &ppat->entries[index];
> +
> +	GEM_BUG_ON(index >= ppat->max_entries);
> +	GEM_BUG_ON(test_bit(index, ppat->used));
> +
> +	entry->ppat = ppat;
> +	entry->value = value;
> +	kref_init(&entry->ref);
> +	set_bit(index, ppat->used);
> +	set_bit(index, ppat->dirty);
> +
> +	return entry;
> +}
> +
> +static void __free_ppat_entry(struct intel_ppat_entry *entry)
> +{
> +	struct intel_ppat *ppat = entry->ppat;
> +	unsigned int index = entry - ppat->entries;
> +
> +	GEM_BUG_ON(index >= ppat->max_entries);
> +	GEM_BUG_ON(!test_bit(index, ppat->used));
> +
> +	entry->value = ppat->clear_value;
> +	clear_bit(index, ppat->used);
> +	set_bit(index, ppat->dirty);
> +}
> +
> +/**
> + * intel_ppat_get - get a usable PPAT entry
> + * @i915: i915 device instance
> + * @value: the PPAT value required by the caller
> + *
> + * The function tries to search if there is an existing PPAT entry which
> + * matches with the required value. If perfectly matched, the existing PPAT
> + * entry will be used. If only partially matched, it will try to check if
> + * there is any available PPAT index. If yes, it will allocate a new PPAT
> + * index for the required entry and update the HW. If not, the partially
> + * matched entry will be used.
> + */
> +const struct intel_ppat_entry *
> +intel_ppat_get(struct drm_i915_private *i915, u8 value)
> +{
> +	struct intel_ppat *ppat = &i915->ppat;
> +	struct intel_ppat_entry *entry;
> +	unsigned int scanned, best_score;
> +	int i;
> +
> +	GEM_BUG_ON(!ppat->max_entries);
> +
> +	scanned = best_score = 0;
> +
> +	for_each_set_bit(i, ppat->used, ppat->max_entries) {
> +		unsigned int score;
> +
> +		entry = &ppat->entries[i];
> +		score = ppat->match(entry->value, value);
> +		if (score > best_score) {
> +			if (score == INTEL_PPAT_PERFECT_MATCH) {
> +				kref_get(&entry->ref);
> +				return entry;
> +			}
> +			best_score = score;
> +		}
> +		scanned++;
> +	}
> +
> +	if (scanned == ppat->max_entries) {
> +		if (!best_score)
> +			return ERR_PTR(-ENOSPC);
> +
> +		kref_get(&entry->ref);
> +		return entry;
> +	}
> +
> +	i = find_first_zero_bit(ppat->used, ppat->max_entries);
> +	entry = __alloc_ppat_entry(ppat, i, value);
> +	ppat->update_hw(i915);
> +	return entry;
> +}
> +
> +static void release_ppat(struct kref *kref)
> +{
> +	struct intel_ppat_entry *entry =
> +		container_of(kref, struct intel_ppat_entry, ref);
> +	struct drm_i915_private *i915 = entry->ppat->i915;
> +
> +	__free_ppat_entry(entry);
> +	entry->ppat->update_hw(i915);
> +}
> +
> +/**
> + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
> + * @entry: an intel PPAT entry
> + *
> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT index of the
> + * entry is dynamically allocated, its reference count will be decreased. Once
> + * the reference count becomes into zero, the PPAT index becomes free again.
> + */
> +void intel_ppat_put(const struct intel_ppat_entry *entry)
> +{
> +	struct intel_ppat *ppat = entry->ppat;
> +	unsigned int index = entry - ppat->entries;
> +
> +	GEM_BUG_ON(!ppat->max_entries);
> +
> +	kref_put(&ppat->entries[index].ref, release_ppat);
> +}
> +
> +static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	int i;
> +
> +	for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> +		I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
> +		clear_bit(i, ppat->dirty);
> +	}
> +}
> +
> +static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
> +{
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	u64 pat = 0;
> +	int i;
> +
> +	for (i = 0; i < ppat->max_entries; i++)
> +		pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> +	bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> +	I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
> +	I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
> +}
> +
> +static unsigned int bdw_private_pat_match(u8 src, u8 dst)
> +{
> +	unsigned int score = 0;
> +
> +	/* Cache attribute has to be matched. */
> +	if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
> +		return 0;
> +
> +	if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
> +		score += 2;
> +
> +	if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
> +		score += 1;
> +
> +	if (score == 3)
> +		return INTEL_PPAT_PERFECT_MATCH;
> +
> +	return score;
> +}
> +
> +static unsigned int chv_private_pat_match(u8 src, u8 dst)
> +{
> +	return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
> +		INTEL_PPAT_PERFECT_MATCH : 0;
> +}
> +
> +static void cnl_setup_private_ppat(struct intel_ppat *ppat)
> +{
> +	ppat->max_entries = 8;
> +	ppat->update_hw = cnl_private_pat_update_hw;
> +	ppat->match = bdw_private_pat_match;
> +	ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
> +
>  	/* XXX: spec is unclear if this is still needed for CNL+ */
> -	if (!USES_PPGTT(dev_priv)) {
> -		I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
> +	if (!USES_PPGTT(ppat->i915)) {
> +		__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_UC);
>  		return;
>  	}
>  
> -	I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
> -	I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
> -	I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> -	I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
> -	I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
> -	I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
> -	I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
> -	I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));

Gen8 allocation style doesn't work on CNL.
I don't see where all these registers would be getting properly written.

> +	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
> +	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLC);
> +	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> +	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
> +	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>  }
>  
>  /* The GGTT and PPGTT need a private PPAT setup in order to handle cacheability
>   * bits. When using advanced contexts each context stores its own PAT, but
>   * writing this data shouldn't be harmful even in those cases. */
> -static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
>  {
> -	u64 pat;
> +	ppat->max_entries = 8;
> +	ppat->update_hw = bdw_private_pat_update_hw;
> +	ppat->match = bdw_private_pat_match;
> +	ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>  
> -	pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for normal objects, no eLLC */
> -	      GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for something pointing to ptes? */
> -	      GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for scanout with eLLC */
> -	      GEN8_PPAT(3, GEN8_PPAT_UC)                     | /* Uncached objects, mostly for scanout */
> -	      GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0)) |
> -	      GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1)) |
> -	      GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2)) |
> -	      GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
> -
> -	if (!USES_PPGTT(dev_priv))
> +	if (!USES_PPGTT(ppat->i915)) {
>  		/* Spec: "For GGTT, there is NO pat_sel[2:0] from the entry,
>  		 * so RTL will always use the value corresponding to
>  		 * pat_sel = 000".
> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
>  		 * So we can still hold onto all our assumptions wrt cpu
>  		 * clflushing on LLC machines.
>  		 */
> -		pat = GEN8_PPAT(0, GEN8_PPAT_UC);
> +		__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_UC);
> +		return;
> +	}
>  
> -	/* XXX: spec defines this as 2 distinct registers. It's unclear if a 64b
> -	 * write would work. */
> -	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> -	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
> +	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
> +	/* for normal objects, no eLLC */
> +	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLC);
> +	/* for scanout with eLLC */
> +	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> +	/* Uncached objects, mostly for scanout */
> +	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
> +	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>  }
>  
> -static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
>  {
> -	u64 pat;
> +	ppat->max_entries = 8;
> +	ppat->update_hw = bdw_private_pat_update_hw;
> +	ppat->match = chv_private_pat_match;
> +	ppat->clear_value = CHV_PPAT_SNOOP;
>  
>  	/*
>  	 * Map WB on BDW to snooped on CHV.
> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
>  	 * Which means we must set the snoop bit in PAT entry 0
>  	 * in order to keep the global status page working.
>  	 */
> -	pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
> -	      GEN8_PPAT(1, 0) |
> -	      GEN8_PPAT(2, 0) |
> -	      GEN8_PPAT(3, 0) |
> -	      GEN8_PPAT(4, CHV_PPAT_SNOOP) |
> -	      GEN8_PPAT(5, CHV_PPAT_SNOOP) |
> -	      GEN8_PPAT(6, CHV_PPAT_SNOOP) |
> -	      GEN8_PPAT(7, CHV_PPAT_SNOOP);
>  
> -	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> -	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
> +	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
> +	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
> +	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
> +	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
> +	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
>  }
>  
>  static void gen6_gmch_remove(struct i915_address_space *vm)
> @@ -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>  
>  static void setup_private_pat(struct drm_i915_private *dev_priv)
>  {
> +	struct intel_ppat *ppat = &dev_priv->ppat;
> +	int i;
> +
> +	ppat->i915 = dev_priv;
> +
>  	if (INTEL_GEN(dev_priv) >= 10)
> -		cnl_setup_private_ppat(dev_priv);
> +		cnl_setup_private_ppat(ppat);
>  	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> -		chv_setup_private_ppat(dev_priv);
> +		chv_setup_private_ppat(ppat);
>  	else
> -		bdw_setup_private_ppat(dev_priv);
> +		bdw_setup_private_ppat(ppat);
> +
> +	GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
> +
> +	for_each_clear_bit(i, ppat->used, ppat->max_entries) {
> +		ppat->entries[i].value = ppat->clear_value;
> +		ppat->entries[i].ppat = ppat;
> +		set_bit(i, ppat->dirty);
> +	}
> +
> +	ppat->update_hw(dev_priv);
>  }
>  
>  static int gen8_gmch_probe(struct i915_ggtt *ggtt)
> @@ -3236,13 +3414,10 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
>  	ggtt->base.closed = false;
>  
>  	if (INTEL_GEN(dev_priv) >= 8) {
> -		if (INTEL_GEN(dev_priv) >= 10)
> -			cnl_setup_private_ppat(dev_priv);
> -		else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> -			chv_setup_private_ppat(dev_priv);
> -		else
> -			bdw_setup_private_ppat(dev_priv);
> +		struct intel_ppat *ppat = &dev_priv->ppat;
>  
> +		bitmap_set(ppat->dirty, 0, ppat->max_entries);
> +		dev_priv->ppat.update_hw(dev_priv);
>  		return;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index b4e3aa7..e10ca89 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
>  #define GEN8_PPAT_ELLC_OVERRIDE		(0<<2)
>  #define GEN8_PPAT(i, x)			((u64)(x) << ((i) * 8))
>  
> +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2))
> +#define GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4))
> +#define CHV_PPAT_GET_SNOOP(x) ((x) & (1 << 6))
> +
>  struct sg_table;
>  
>  struct intel_rotation_info {
> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>  	return container_of(vm, struct i915_ggtt, base);
>  }
>  
> +#define INTEL_MAX_PPAT_ENTRIES 8
> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
> +
> +struct intel_ppat;
> +
> +struct intel_ppat_entry {
> +	struct intel_ppat *ppat;
> +	struct kref ref;
> +	u8 value;
> +};
> +
> +struct intel_ppat {
> +	struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
> +	DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
> +	DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
> +	unsigned int max_entries;
> +	u8 clear_value;
> +	/*
> +	 * Return a score to show how two PPAT values match,
> +	 * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
> +	 */
> +	unsigned int (*match)(u8 src, u8 dst);
> +	void (*update_hw)(struct drm_i915_private *i915);
> +
> +	struct drm_i915_private *i915;
> +};
> +
> +const struct intel_ppat_entry *
> +intel_ppat_get(struct drm_i915_private *i915, u8 value);
> +void intel_ppat_put(const struct intel_ppat_entry *entry);
> +
>  int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
>  void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
>  

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-30 22:00   ` Vivi, Rodrigo
@ 2017-08-31  4:49     ` Zhi Wang
  2017-08-31  5:15       ` Rodrigo Vivi
  2017-08-31  8:25       ` Joonas Lahtinen
  0 siblings, 2 replies; 14+ messages in thread
From: Zhi Wang @ 2017-08-31  4:49 UTC (permalink / raw)
  To: Vivi, Rodrigo; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev

Hi Vivi:
     Thanks for the reply! The register are written in ppat->update_hw() 
now.

+static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
+{
+	struct intel_ppat *ppat = &dev_priv->ppat;
+	int i;
+
+	for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
+		I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
+		clear_bit(i, ppat->dirty);
+	}
+}
+
+static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
+{
+	struct intel_ppat *ppat = &dev_priv->ppat;
+	u64 pat = 0;
+	int i;
+
+	for (i = 0; i < ppat->max_entries; i++)
+		pat |= GEN8_PPAT(i, ppat->entries[i].value);
+
+	bitmap_clear(ppat->dirty, 0, ppat->max_entries);
+
+	I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
+	I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
+}

On 08/31/17 06:00, Vivi, Rodrigo wrote:
> On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
>> The private PAT management is to support PPAT entry manipulation. Two
>> APIs are introduced for dynamically managing PPAT entries: intel_ppat_get
>> and intel_ppat_put.
>>
>> intel_ppat_get will search for an existing PPAT entry which perfectly
>> matches the required PPAT value. If not, it will try to allocate or
>> return a partially matched PPAT entry if there is any available PPAT
>> indexes or not.
>>
>> intel_ppat_put will put back the PPAT entry which comes from
>> intel_ppat_get. If it's dynamically allocated, the reference count will
>> be decreased. If the reference count turns into zero, the PPAT index is
>> freed again.
>>
>> Besides, another two callbacks are introduced to support the private PAT
>> management framework. One is ppat->update_hw(), which writes the PPAT
>> configurations in ppat->entries into HW. Another one is ppat->match, which
>> will return a score to show how two PPAT values match with each other.
>>
>> v6:
>>
>> - Address all comments from Chris:
>> http://www.spinics.net/lists/intel-gfx/msg136850.html
>>
>> - Address all comments from Joonas:
>> http://www.spinics.net/lists/intel-gfx/msg136845.html
>>
>> v5:
>>
>> - Add check and warnnings for those platforms which don't have PPAT.
>>
>> v3:
>>
>> - Introduce dirty bitmap for PPAT registers. (Chris)
>> - Change the name of the pointer "dev_priv" to "i915". (Chris)
>> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. (Chris)
>>
>> v2:
>>
>> - API re-design. (Chris)
>>
>> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h     |   2 +
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 273 +++++++++++++++++++++++++++++-------
>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
>>   3 files changed, 262 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 7587ef5..5ffde10 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
>>   	DECLARE_HASHTABLE(mm_structs, 7);
>>   	struct mutex mm_lock;
>>   
>> +	struct intel_ppat ppat;
>> +
>>   	/* Kernel Modesetting */
>>   
>>   	struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index b74fa9d..3106142 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>>   	return 0;
>>   }
>>   
>> -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
>> +static struct intel_ppat_entry *
>> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8 value)
>>   {
>> +	struct intel_ppat_entry *entry = &ppat->entries[index];
>> +
>> +	GEM_BUG_ON(index >= ppat->max_entries);
>> +	GEM_BUG_ON(test_bit(index, ppat->used));
>> +
>> +	entry->ppat = ppat;
>> +	entry->value = value;
>> +	kref_init(&entry->ref);
>> +	set_bit(index, ppat->used);
>> +	set_bit(index, ppat->dirty);
>> +
>> +	return entry;
>> +}
>> +
>> +static void __free_ppat_entry(struct intel_ppat_entry *entry)
>> +{
>> +	struct intel_ppat *ppat = entry->ppat;
>> +	unsigned int index = entry - ppat->entries;
>> +
>> +	GEM_BUG_ON(index >= ppat->max_entries);
>> +	GEM_BUG_ON(!test_bit(index, ppat->used));
>> +
>> +	entry->value = ppat->clear_value;
>> +	clear_bit(index, ppat->used);
>> +	set_bit(index, ppat->dirty);
>> +}
>> +
>> +/**
>> + * intel_ppat_get - get a usable PPAT entry
>> + * @i915: i915 device instance
>> + * @value: the PPAT value required by the caller
>> + *
>> + * The function tries to search if there is an existing PPAT entry which
>> + * matches with the required value. If perfectly matched, the existing PPAT
>> + * entry will be used. If only partially matched, it will try to check if
>> + * there is any available PPAT index. If yes, it will allocate a new PPAT
>> + * index for the required entry and update the HW. If not, the partially
>> + * matched entry will be used.
>> + */
>> +const struct intel_ppat_entry *
>> +intel_ppat_get(struct drm_i915_private *i915, u8 value)
>> +{
>> +	struct intel_ppat *ppat = &i915->ppat;
>> +	struct intel_ppat_entry *entry;
>> +	unsigned int scanned, best_score;
>> +	int i;
>> +
>> +	GEM_BUG_ON(!ppat->max_entries);
>> +
>> +	scanned = best_score = 0;
>> +
>> +	for_each_set_bit(i, ppat->used, ppat->max_entries) {
>> +		unsigned int score;
>> +
>> +		entry = &ppat->entries[i];
>> +		score = ppat->match(entry->value, value);
>> +		if (score > best_score) {
>> +			if (score == INTEL_PPAT_PERFECT_MATCH) {
>> +				kref_get(&entry->ref);
>> +				return entry;
>> +			}
>> +			best_score = score;
>> +		}
>> +		scanned++;
>> +	}
>> +
>> +	if (scanned == ppat->max_entries) {
>> +		if (!best_score)
>> +			return ERR_PTR(-ENOSPC);
>> +
>> +		kref_get(&entry->ref);
>> +		return entry;
>> +	}
>> +
>> +	i = find_first_zero_bit(ppat->used, ppat->max_entries);
>> +	entry = __alloc_ppat_entry(ppat, i, value);
>> +	ppat->update_hw(i915);
>> +	return entry;
>> +}
>> +
>> +static void release_ppat(struct kref *kref)
>> +{
>> +	struct intel_ppat_entry *entry =
>> +		container_of(kref, struct intel_ppat_entry, ref);
>> +	struct drm_i915_private *i915 = entry->ppat->i915;
>> +
>> +	__free_ppat_entry(entry);
>> +	entry->ppat->update_hw(i915);
>> +}
>> +
>> +/**
>> + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
>> + * @entry: an intel PPAT entry
>> + *
>> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT index of the
>> + * entry is dynamically allocated, its reference count will be decreased. Once
>> + * the reference count becomes into zero, the PPAT index becomes free again.
>> + */
>> +void intel_ppat_put(const struct intel_ppat_entry *entry)
>> +{
>> +	struct intel_ppat *ppat = entry->ppat;
>> +	unsigned int index = entry - ppat->entries;
>> +
>> +	GEM_BUG_ON(!ppat->max_entries);
>> +
>> +	kref_put(&ppat->entries[index].ref, release_ppat);
>> +}
>> +
>> +static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
>> +{
>> +	struct intel_ppat *ppat = &dev_priv->ppat;
>> +	int i;
>> +
>> +	for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
>> +		I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
>> +		clear_bit(i, ppat->dirty);
>> +	}
>> +}
>> +
>> +static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
>> +{
>> +	struct intel_ppat *ppat = &dev_priv->ppat;
>> +	u64 pat = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < ppat->max_entries; i++)
>> +		pat |= GEN8_PPAT(i, ppat->entries[i].value);
>> +
>> +	bitmap_clear(ppat->dirty, 0, ppat->max_entries);
>> +
>> +	I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
>> +	I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
>> +}
>> +
>> +static unsigned int bdw_private_pat_match(u8 src, u8 dst)
>> +{
>> +	unsigned int score = 0;
>> +
>> +	/* Cache attribute has to be matched. */
>> +	if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
>> +		return 0;
>> +
>> +	if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
>> +		score += 2;
>> +
>> +	if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
>> +		score += 1;
>> +
>> +	if (score == 3)
>> +		return INTEL_PPAT_PERFECT_MATCH;
>> +
>> +	return score;
>> +}
>> +
>> +static unsigned int chv_private_pat_match(u8 src, u8 dst)
>> +{
>> +	return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
>> +		INTEL_PPAT_PERFECT_MATCH : 0;
>> +}
>> +
>> +static void cnl_setup_private_ppat(struct intel_ppat *ppat)
>> +{
>> +	ppat->max_entries = 8;
>> +	ppat->update_hw = cnl_private_pat_update_hw;
>> +	ppat->match = bdw_private_pat_match;
>> +	ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>> +
>>   	/* XXX: spec is unclear if this is still needed for CNL+ */
>> -	if (!USES_PPGTT(dev_priv)) {
>> -		I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
>> +	if (!USES_PPGTT(ppat->i915)) {
>> +		__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_UC);
>>   		return;
>>   	}
>>   
>> -	I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
>> -	I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
>> -	I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>> -	I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
>> -	I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>> -	I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1));
>> -	I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2));
>> -	I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
> 
> Gen8 allocation style doesn't work on CNL.
> I don't see where all these registers would be getting properly written.
> 
>> +	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
>> +	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLC);
>> +	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>> +	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>> +	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>>   }
>>   
>>   /* The GGTT and PPGTT need a private PPAT setup in order to handle cacheability
>>    * bits. When using advanced contexts each context stores its own PAT, but
>>    * writing this data shouldn't be harmful even in those cases. */
>> -static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
>> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
>>   {
>> -	u64 pat;
>> +	ppat->max_entries = 8;
>> +	ppat->update_hw = bdw_private_pat_update_hw;
>> +	ppat->match = bdw_private_pat_match;
>> +	ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>>   
>> -	pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for normal objects, no eLLC */
>> -	      GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for something pointing to ptes? */
>> -	      GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for scanout with eLLC */
>> -	      GEN8_PPAT(3, GEN8_PPAT_UC)                     | /* Uncached objects, mostly for scanout */
>> -	      GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0)) |
>> -	      GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(1)) |
>> -	      GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(2)) |
>> -	      GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3));
>> -
>> -	if (!USES_PPGTT(dev_priv))
>> +	if (!USES_PPGTT(ppat->i915)) {
>>   		/* Spec: "For GGTT, there is NO pat_sel[2:0] from the entry,
>>   		 * so RTL will always use the value corresponding to
>>   		 * pat_sel = 000".
>> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
>>   		 * So we can still hold onto all our assumptions wrt cpu
>>   		 * clflushing on LLC machines.
>>   		 */
>> -		pat = GEN8_PPAT(0, GEN8_PPAT_UC);
>> +		__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_UC);
>> +		return;
>> +	}
>>   
>> -	/* XXX: spec defines this as 2 distinct registers. It's unclear if a 64b
>> -	 * write would work. */
>> -	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>> -	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>> +	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
>> +	/* for normal objects, no eLLC */
>> +	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLC);
>> +	/* for scanout with eLLC */
>> +	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>> +	/* Uncached objects, mostly for scanout */
>> +	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>> +	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>>   }
>>   
>> -static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
>> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
>>   {
>> -	u64 pat;
>> +	ppat->max_entries = 8;
>> +	ppat->update_hw = bdw_private_pat_update_hw;
>> +	ppat->match = chv_private_pat_match;
>> +	ppat->clear_value = CHV_PPAT_SNOOP;
>>   
>>   	/*
>>   	 * Map WB on BDW to snooped on CHV.
>> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
>>   	 * Which means we must set the snoop bit in PAT entry 0
>>   	 * in order to keep the global status page working.
>>   	 */
>> -	pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
>> -	      GEN8_PPAT(1, 0) |
>> -	      GEN8_PPAT(2, 0) |
>> -	      GEN8_PPAT(3, 0) |
>> -	      GEN8_PPAT(4, CHV_PPAT_SNOOP) |
>> -	      GEN8_PPAT(5, CHV_PPAT_SNOOP) |
>> -	      GEN8_PPAT(6, CHV_PPAT_SNOOP) |
>> -	      GEN8_PPAT(7, CHV_PPAT_SNOOP);
>>   
>> -	I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>> -	I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>> +	/* See gen8_pte_encode() for the mapping from cache-level to PPAT */
>> +	__alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
>> +	__alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
>> +	__alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
>> +	__alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
>>   }
>>   
>>   static void gen6_gmch_remove(struct i915_address_space *vm)
>> @@ -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct i915_address_space *vm)
>>   
>>   static void setup_private_pat(struct drm_i915_private *dev_priv)
>>   {
>> +	struct intel_ppat *ppat = &dev_priv->ppat;
>> +	int i;
>> +
>> +	ppat->i915 = dev_priv;
>> +
>>   	if (INTEL_GEN(dev_priv) >= 10)
>> -		cnl_setup_private_ppat(dev_priv);
>> +		cnl_setup_private_ppat(ppat);
>>   	else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>> -		chv_setup_private_ppat(dev_priv);
>> +		chv_setup_private_ppat(ppat);
>>   	else
>> -		bdw_setup_private_ppat(dev_priv);
>> +		bdw_setup_private_ppat(ppat);
>> +
>> +	GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
>> +
>> +	for_each_clear_bit(i, ppat->used, ppat->max_entries) {
>> +		ppat->entries[i].value = ppat->clear_value;
>> +		ppat->entries[i].ppat = ppat;
>> +		set_bit(i, ppat->dirty);
>> +	}
>> +
>> +	ppat->update_hw(dev_priv);
>>   }
>>   
>>   static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>> @@ -3236,13 +3414,10 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
>>   	ggtt->base.closed = false;
>>   
>>   	if (INTEL_GEN(dev_priv) >= 8) {
>> -		if (INTEL_GEN(dev_priv) >= 10)
>> -			cnl_setup_private_ppat(dev_priv);
>> -		else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>> -			chv_setup_private_ppat(dev_priv);
>> -		else
>> -			bdw_setup_private_ppat(dev_priv);
>> +		struct intel_ppat *ppat = &dev_priv->ppat;
>>   
>> +		bitmap_set(ppat->dirty, 0, ppat->max_entries);
>> +		dev_priv->ppat.update_hw(dev_priv);
>>   		return;
>>   	}
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index b4e3aa7..e10ca89 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
>>   #define GEN8_PPAT_ELLC_OVERRIDE		(0<<2)
>>   #define GEN8_PPAT(i, x)			((u64)(x) << ((i) * 8))
>>   
>> +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
>> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2))
>> +#define GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4))
>> +#define CHV_PPAT_GET_SNOOP(x) ((x) & (1 << 6))
>> +
>>   struct sg_table;
>>   
>>   struct intel_rotation_info {
>> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>>   	return container_of(vm, struct i915_ggtt, base);
>>   }
>>   
>> +#define INTEL_MAX_PPAT_ENTRIES 8
>> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
>> +
>> +struct intel_ppat;
>> +
>> +struct intel_ppat_entry {
>> +	struct intel_ppat *ppat;
>> +	struct kref ref;
>> +	u8 value;
>> +};
>> +
>> +struct intel_ppat {
>> +	struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
>> +	DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
>> +	DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
>> +	unsigned int max_entries;
>> +	u8 clear_value;
>> +	/*
>> +	 * Return a score to show how two PPAT values match,
>> +	 * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
>> +	 */
>> +	unsigned int (*match)(u8 src, u8 dst);
>> +	void (*update_hw)(struct drm_i915_private *i915);
>> +
>> +	struct drm_i915_private *i915;
>> +};
>> +
>> +const struct intel_ppat_entry *
>> +intel_ppat_get(struct drm_i915_private *i915, u8 value);
>> +void intel_ppat_put(const struct intel_ppat_entry *entry);
>> +
>>   int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
>>   void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
>>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  4:49     ` Zhi Wang
@ 2017-08-31  5:15       ` Rodrigo Vivi
  2017-08-31  5:22         ` Wang, Zhi A
  2017-08-31 18:40         ` Wang, Zhi A
  2017-08-31  8:25       ` Joonas Lahtinen
  1 sibling, 2 replies; 14+ messages in thread
From: Rodrigo Vivi @ 2017-08-31  5:15 UTC (permalink / raw)
  To: Zhi Wang; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev, Vivi, Rodrigo

On Wed, Aug 30, 2017 at 9:49 PM, Zhi Wang <zhi.a.wang@intel.com> wrote:
> Hi Vivi:
>     Thanks for the reply! The register are written in ppat->update_hw() now.

oh, I saw now...
I hadden noticed that interation.

But something seems really odd yet...
My CNL with these 2 patches applied hangs on any execution...

>
>
> +static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
> +{
> +       struct intel_ppat *ppat = &dev_priv->ppat;
> +       int i;
> +
> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
> +               clear_bit(i, ppat->dirty);
> +       }
> +}
> +
> +static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
> +{
> +       struct intel_ppat *ppat = &dev_priv->ppat;
> +       u64 pat = 0;
> +       int i;
> +
> +       for (i = 0; i < ppat->max_entries; i++)
> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
> +}
>
> On 08/31/17 06:00, Vivi, Rodrigo wrote:
>>
>> On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
>>>
>>> The private PAT management is to support PPAT entry manipulation. Two
>>> APIs are introduced for dynamically managing PPAT entries: intel_ppat_get
>>> and intel_ppat_put.
>>>
>>> intel_ppat_get will search for an existing PPAT entry which perfectly
>>> matches the required PPAT value. If not, it will try to allocate or
>>> return a partially matched PPAT entry if there is any available PPAT
>>> indexes or not.
>>>
>>> intel_ppat_put will put back the PPAT entry which comes from
>>> intel_ppat_get. If it's dynamically allocated, the reference count will
>>> be decreased. If the reference count turns into zero, the PPAT index is
>>> freed again.
>>>
>>> Besides, another two callbacks are introduced to support the private PAT
>>> management framework. One is ppat->update_hw(), which writes the PPAT
>>> configurations in ppat->entries into HW. Another one is ppat->match,
>>> which
>>> will return a score to show how two PPAT values match with each other.
>>>
>>> v6:
>>>
>>> - Address all comments from Chris:
>>> http://www.spinics.net/lists/intel-gfx/msg136850.html
>>>
>>> - Address all comments from Joonas:
>>> http://www.spinics.net/lists/intel-gfx/msg136845.html
>>>
>>> v5:
>>>
>>> - Add check and warnnings for those platforms which don't have PPAT.
>>>
>>> v3:
>>>
>>> - Introduce dirty bitmap for PPAT registers. (Chris)
>>> - Change the name of the pointer "dev_priv" to "i915". (Chris)
>>> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. (Chris)
>>>
>>> v2:
>>>
>>> - API re-design. (Chris)
>>>
>>> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h     |   2 +
>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 273
>>> +++++++++++++++++++++++++++++-------
>>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
>>>   3 files changed, 262 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index 7587ef5..5ffde10 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
>>>         DECLARE_HASHTABLE(mm_structs, 7);
>>>         struct mutex mm_lock;
>>>   +     struct intel_ppat ppat;
>>> +
>>>         /* Kernel Modesetting */
>>>         struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> index b74fa9d..3106142 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct i915_ggtt
>>> *ggtt, u64 size)
>>>         return 0;
>>>   }
>>>   -static void cnl_setup_private_ppat(struct drm_i915_private *dev_priv)
>>> +static struct intel_ppat_entry *
>>> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8
>>> value)
>>>   {
>>> +       struct intel_ppat_entry *entry = &ppat->entries[index];
>>> +
>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>> +       GEM_BUG_ON(test_bit(index, ppat->used));
>>> +
>>> +       entry->ppat = ppat;
>>> +       entry->value = value;
>>> +       kref_init(&entry->ref);
>>> +       set_bit(index, ppat->used);
>>> +       set_bit(index, ppat->dirty);
>>> +
>>> +       return entry;
>>> +}
>>> +
>>> +static void __free_ppat_entry(struct intel_ppat_entry *entry)
>>> +{
>>> +       struct intel_ppat *ppat = entry->ppat;
>>> +       unsigned int index = entry - ppat->entries;
>>> +
>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>> +       GEM_BUG_ON(!test_bit(index, ppat->used));
>>> +
>>> +       entry->value = ppat->clear_value;
>>> +       clear_bit(index, ppat->used);
>>> +       set_bit(index, ppat->dirty);
>>> +}
>>> +
>>> +/**
>>> + * intel_ppat_get - get a usable PPAT entry
>>> + * @i915: i915 device instance
>>> + * @value: the PPAT value required by the caller
>>> + *
>>> + * The function tries to search if there is an existing PPAT entry which
>>> + * matches with the required value. If perfectly matched, the existing
>>> PPAT
>>> + * entry will be used. If only partially matched, it will try to check
>>> if
>>> + * there is any available PPAT index. If yes, it will allocate a new
>>> PPAT
>>> + * index for the required entry and update the HW. If not, the partially
>>> + * matched entry will be used.
>>> + */
>>> +const struct intel_ppat_entry *
>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value)
>>> +{
>>> +       struct intel_ppat *ppat = &i915->ppat;
>>> +       struct intel_ppat_entry *entry;
>>> +       unsigned int scanned, best_score;
>>> +       int i;
>>> +
>>> +       GEM_BUG_ON(!ppat->max_entries);
>>> +
>>> +       scanned = best_score = 0;
>>> +
>>> +       for_each_set_bit(i, ppat->used, ppat->max_entries) {
>>> +               unsigned int score;
>>> +
>>> +               entry = &ppat->entries[i];
>>> +               score = ppat->match(entry->value, value);
>>> +               if (score > best_score) {
>>> +                       if (score == INTEL_PPAT_PERFECT_MATCH) {
>>> +                               kref_get(&entry->ref);
>>> +                               return entry;
>>> +                       }
>>> +                       best_score = score;
>>> +               }
>>> +               scanned++;
>>> +       }
>>> +
>>> +       if (scanned == ppat->max_entries) {
>>> +               if (!best_score)
>>> +                       return ERR_PTR(-ENOSPC);
>>> +
>>> +               kref_get(&entry->ref);
>>> +               return entry;
>>> +       }
>>> +
>>> +       i = find_first_zero_bit(ppat->used, ppat->max_entries);
>>> +       entry = __alloc_ppat_entry(ppat, i, value);
>>> +       ppat->update_hw(i915);
>>> +       return entry;
>>> +}
>>> +
>>> +static void release_ppat(struct kref *kref)
>>> +{
>>> +       struct intel_ppat_entry *entry =
>>> +               container_of(kref, struct intel_ppat_entry, ref);
>>> +       struct drm_i915_private *i915 = entry->ppat->i915;
>>> +
>>> +       __free_ppat_entry(entry);
>>> +       entry->ppat->update_hw(i915);
>>> +}
>>> +
>>> +/**
>>> + * intel_ppat_put - put back the PPAT entry got from intel_ppat_get()
>>> + * @entry: an intel PPAT entry
>>> + *
>>> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT index
>>> of the
>>> + * entry is dynamically allocated, its reference count will be
>>> decreased. Once
>>> + * the reference count becomes into zero, the PPAT index becomes free
>>> again.
>>> + */
>>> +void intel_ppat_put(const struct intel_ppat_entry *entry)
>>> +{
>>> +       struct intel_ppat *ppat = entry->ppat;
>>> +       unsigned int index = entry - ppat->entries;
>>> +
>>> +       GEM_BUG_ON(!ppat->max_entries);
>>> +
>>> +       kref_put(&ppat->entries[index].ref, release_ppat);
>>> +}
>>> +
>>> +static void cnl_private_pat_update_hw(struct drm_i915_private *dev_priv)
>>> +{
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       int i;
>>> +
>>> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
>>> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
>>> +               clear_bit(i, ppat->dirty);
>>> +       }
>>> +}
>>> +
>>> +static void bdw_private_pat_update_hw(struct drm_i915_private *dev_priv)
>>> +{
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       u64 pat = 0;
>>> +       int i;
>>> +
>>> +       for (i = 0; i < ppat->max_entries; i++)
>>> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
>>> +
>>> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
>>> +
>>> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
>>> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat));
>>> +}
>>> +
>>> +static unsigned int bdw_private_pat_match(u8 src, u8 dst)
>>> +{
>>> +       unsigned int score = 0;
>>> +
>>> +       /* Cache attribute has to be matched. */
>>> +       if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
>>> +               return 0;
>>> +
>>> +       if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
>>> +               score += 2;
>>> +
>>> +       if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
>>> +               score += 1;
>>> +
>>> +       if (score == 3)
>>> +               return INTEL_PPAT_PERFECT_MATCH;
>>> +
>>> +       return score;
>>> +}
>>> +
>>> +static unsigned int chv_private_pat_match(u8 src, u8 dst)
>>> +{
>>> +       return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
>>> +               INTEL_PPAT_PERFECT_MATCH : 0;
>>> +}
>>> +
>>> +static void cnl_setup_private_ppat(struct intel_ppat *ppat)
>>> +{
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = cnl_private_pat_update_hw;
>>> +       ppat->match = bdw_private_pat_match;
>>> +       ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>>> +
>>>         /* XXX: spec is unclear if this is still needed for CNL+ */
>>> -       if (!USES_PPGTT(dev_priv)) {
>>> -               I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
>>> +       if (!USES_PPGTT(ppat->i915)) {
>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>> GEN8_PPAT_UC);
>>>                 return;
>>>         }
>>>   -     I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0));
>>> -       I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(1));
>>> -       I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(2));
>>> -       I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3));
>>
>>
>> Gen8 allocation style doesn't work on CNL.
>> I don't see where all these registers would be getting properly written.
>>
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to PPAT
>>> */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB |
>>> GEN8_PPAT_LLC);
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT |
>>> GEN8_PPAT_LLCELLC);
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0));
>>>   }
>>>     /* The GGTT and PPGTT need a private PPAT setup in order to handle
>>> cacheability
>>>    * bits. When using advanced contexts each context stores its own PAT,
>>> but
>>>    * writing this data shouldn't be harmful even in those cases. */
>>> -static void bdw_setup_private_ppat(struct drm_i915_private *dev_priv)
>>> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
>>>   {
>>> -       u64 pat;
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>> +       ppat->match = bdw_private_pat_match;
>>> +       ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3);
>>>   -     pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for
>>> normal objects, no eLLC */
>>> -             GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for
>>> something pointing to ptes? */
>>> -             GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for
>>> scanout with eLLC */
>>> -             GEN8_PPAT(3, GEN8_PPAT_UC)                     | /*
>>> Uncached objects, mostly for scanout */
>>> -             GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0)) |
>>> -             GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(1)) |
>>> -             GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(2)) |
>>> -             GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3));
>>> -
>>> -       if (!USES_PPGTT(dev_priv))
>>> +       if (!USES_PPGTT(ppat->i915)) {
>>>                 /* Spec: "For GGTT, there is NO pat_sel[2:0] from the
>>> entry,
>>>                  * so RTL will always use the value corresponding to
>>>                  * pat_sel = 000".
>>> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct
>>> drm_i915_private *dev_priv)
>>>                  * So we can still hold onto all our assumptions wrt cpu
>>>                  * clflushing on LLC machines.
>>>                  */
>>> -               pat = GEN8_PPAT(0, GEN8_PPAT_UC);
>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>> GEN8_PPAT_UC);
>>> +               return;
>>> +       }
>>>   -     /* XXX: spec defines this as 2 distinct registers. It's unclear
>>> if a 64b
>>> -        * write would work. */
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to PPAT
>>> */
>>> +       /* for normal objects, no eLLC */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB |
>>> GEN8_PPAT_LLC);
>>> +       /* for scanout with eLLC */
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, GEN8_PPAT_WT |
>>> GEN8_PPAT_LLCELLC);
>>> +       /* Uncached objects, mostly for scanout */
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB |
>>> GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>>>   }
>>>   -static void chv_setup_private_ppat(struct drm_i915_private *dev_priv)
>>> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
>>>   {
>>> -       u64 pat;
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>> +       ppat->match = chv_private_pat_match;
>>> +       ppat->clear_value = CHV_PPAT_SNOOP;
>>>         /*
>>>          * Map WB on BDW to snooped on CHV.
>>> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct
>>> drm_i915_private *dev_priv)
>>>          * Which means we must set the snoop bit in PAT entry 0
>>>          * in order to keep the global status page working.
>>>          */
>>> -       pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(1, 0) |
>>> -             GEN8_PPAT(2, 0) |
>>> -             GEN8_PPAT(3, 0) |
>>> -             GEN8_PPAT(4, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(5, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(6, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(7, CHV_PPAT_SNOOP);
>>>   -     I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to PPAT
>>> */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
>>>   }
>>>     static void gen6_gmch_remove(struct i915_address_space *vm)
>>> @@ -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct
>>> i915_address_space *vm)
>>>     static void setup_private_pat(struct drm_i915_private *dev_priv)
>>>   {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       int i;
>>> +
>>> +       ppat->i915 = dev_priv;
>>> +
>>>         if (INTEL_GEN(dev_priv) >= 10)
>>> -               cnl_setup_private_ppat(dev_priv);
>>> +               cnl_setup_private_ppat(ppat);
>>>         else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>> -               chv_setup_private_ppat(dev_priv);
>>> +               chv_setup_private_ppat(ppat);
>>>         else
>>> -               bdw_setup_private_ppat(dev_priv);
>>> +               bdw_setup_private_ppat(ppat);
>>> +
>>> +       GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
>>> +
>>> +       for_each_clear_bit(i, ppat->used, ppat->max_entries) {
>>> +               ppat->entries[i].value = ppat->clear_value;
>>> +               ppat->entries[i].ppat = ppat;
>>> +               set_bit(i, ppat->dirty);
>>> +       }
>>> +
>>> +       ppat->update_hw(dev_priv);
>>>   }
>>>     static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>>> @@ -3236,13 +3414,10 @@ void i915_gem_restore_gtt_mappings(struct
>>> drm_i915_private *dev_priv)
>>>         ggtt->base.closed = false;
>>>         if (INTEL_GEN(dev_priv) >= 8) {
>>> -               if (INTEL_GEN(dev_priv) >= 10)
>>> -                       cnl_setup_private_ppat(dev_priv);
>>> -               else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>> -                       chv_setup_private_ppat(dev_priv);
>>> -               else
>>> -                       bdw_setup_private_ppat(dev_priv);
>>> +               struct intel_ppat *ppat = &dev_priv->ppat;
>>>   +             bitmap_set(ppat->dirty, 0, ppat->max_entries);
>>> +               dev_priv->ppat.update_hw(dev_priv);
>>>                 return;
>>>         }
>>>   diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> index b4e3aa7..e10ca89 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
>>>   #define GEN8_PPAT_ELLC_OVERRIDE               (0<<2)
>>>   #define GEN8_PPAT(i, x)                       ((u64)(x) << ((i) * 8))
>>>   +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
>>> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2))
>>> +#define GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4))
>>> +#define CHV_PPAT_GET_SNOOP(x) ((x) & (1 << 6))
>>> +
>>>   struct sg_table;
>>>     struct intel_rotation_info {
>>> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>>>         return container_of(vm, struct i915_ggtt, base);
>>>   }
>>>   +#define INTEL_MAX_PPAT_ENTRIES 8
>>> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
>>> +
>>> +struct intel_ppat;
>>> +
>>> +struct intel_ppat_entry {
>>> +       struct intel_ppat *ppat;
>>> +       struct kref ref;
>>> +       u8 value;
>>> +};
>>> +
>>> +struct intel_ppat {
>>> +       struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
>>> +       DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
>>> +       DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
>>> +       unsigned int max_entries;
>>> +       u8 clear_value;
>>> +       /*
>>> +        * Return a score to show how two PPAT values match,
>>> +        * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
>>> +        */
>>> +       unsigned int (*match)(u8 src, u8 dst);
>>> +       void (*update_hw)(struct drm_i915_private *i915);
>>> +
>>> +       struct drm_i915_private *i915;
>>> +};
>>> +
>>> +const struct intel_ppat_entry *
>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value);
>>> +void intel_ppat_put(const struct intel_ppat_entry *entry);
>>> +
>>>   int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
>>>   void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
>>>
>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  5:15       ` Rodrigo Vivi
@ 2017-08-31  5:22         ` Wang, Zhi A
  2017-08-31  5:24           ` Zhi Wang
  2017-08-31 18:40         ` Wang, Zhi A
  1 sibling, 1 reply; 14+ messages in thread
From: Wang, Zhi A @ 2017-08-31  5:22 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev, Vivi, Rodrigo

Thanks for the test! Sorry I didn't have an CNL on my hand.

This is only RFC now. after collecting all the comments, I will start full test. :)

Thanks,
Zhi.

-----Original Message-----
From: Rodrigo Vivi [mailto:rodrigo.vivi@gmail.com] 
Sent: Thursday, August 31, 2017 8:15 AM
To: Wang, Zhi A <zhi.a.wang@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>; intel-gfx@lists.freedesktop.org; Widawsky, Benjamin <benjamin.widawsky@intel.com>; intel-gvt-dev@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFCv6 2/2] drm/i915: Introduce private PAT management

On Wed, Aug 30, 2017 at 9:49 PM, Zhi Wang <zhi.a.wang@intel.com> wrote:
> Hi Vivi:
>     Thanks for the reply! The register are written in ppat->update_hw() now.

oh, I saw now...
I hadden noticed that interation.

But something seems really odd yet...
My CNL with these 2 patches applied hangs on any execution...

>
>
> +static void cnl_private_pat_update_hw(struct drm_i915_private 
> +*dev_priv) {
> +       struct intel_ppat *ppat = &dev_priv->ppat;
> +       int i;
> +
> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
> +               clear_bit(i, ppat->dirty);
> +       }
> +}
> +
> +static void bdw_private_pat_update_hw(struct drm_i915_private 
> +*dev_priv) {
> +       struct intel_ppat *ppat = &dev_priv->ppat;
> +       u64 pat = 0;
> +       int i;
> +
> +       for (i = 0; i < ppat->max_entries; i++)
> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
>
> On 08/31/17 06:00, Vivi, Rodrigo wrote:
>>
>> On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
>>>
>>> The private PAT management is to support PPAT entry manipulation. 
>>> Two APIs are introduced for dynamically managing PPAT entries: 
>>> intel_ppat_get and intel_ppat_put.
>>>
>>> intel_ppat_get will search for an existing PPAT entry which 
>>> perfectly matches the required PPAT value. If not, it will try to 
>>> allocate or return a partially matched PPAT entry if there is any 
>>> available PPAT indexes or not.
>>>
>>> intel_ppat_put will put back the PPAT entry which comes from 
>>> intel_ppat_get. If it's dynamically allocated, the reference count 
>>> will be decreased. If the reference count turns into zero, the PPAT 
>>> index is freed again.
>>>
>>> Besides, another two callbacks are introduced to support the private 
>>> PAT management framework. One is ppat->update_hw(), which writes the 
>>> PPAT configurations in ppat->entries into HW. Another one is 
>>> ppat->match, which will return a score to show how two PPAT values 
>>> match with each other.
>>>
>>> v6:
>>>
>>> - Address all comments from Chris:
>>> http://www.spinics.net/lists/intel-gfx/msg136850.html
>>>
>>> - Address all comments from Joonas:
>>> http://www.spinics.net/lists/intel-gfx/msg136845.html
>>>
>>> v5:
>>>
>>> - Add check and warnnings for those platforms which don't have PPAT.
>>>
>>> v3:
>>>
>>> - Introduce dirty bitmap for PPAT registers. (Chris)
>>> - Change the name of the pointer "dev_priv" to "i915". (Chris)
>>> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. 
>>> (Chris)
>>>
>>> v2:
>>>
>>> - API re-design. (Chris)
>>>
>>> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h     |   2 +
>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 273
>>> +++++++++++++++++++++++++++++-------
>>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
>>>   3 files changed, 262 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>> b/drivers/gpu/drm/i915/i915_drv.h index 7587ef5..5ffde10 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
>>>         DECLARE_HASHTABLE(mm_structs, 7);
>>>         struct mutex mm_lock;
>>>   +     struct intel_ppat ppat;
>>> +
>>>         /* Kernel Modesetting */
>>>         struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> index b74fa9d..3106142 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct 
>>> i915_ggtt *ggtt, u64 size)
>>>         return 0;
>>>   }
>>>   -static void cnl_setup_private_ppat(struct drm_i915_private 
>>> *dev_priv)
>>> +static struct intel_ppat_entry *
>>> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8
>>> value)
>>>   {
>>> +       struct intel_ppat_entry *entry = &ppat->entries[index];
>>> +
>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>> +       GEM_BUG_ON(test_bit(index, ppat->used));
>>> +
>>> +       entry->ppat = ppat;
>>> +       entry->value = value;
>>> +       kref_init(&entry->ref);
>>> +       set_bit(index, ppat->used);
>>> +       set_bit(index, ppat->dirty);
>>> +
>>> +       return entry;
>>> +}
>>> +
>>> +static void __free_ppat_entry(struct intel_ppat_entry *entry) {
>>> +       struct intel_ppat *ppat = entry->ppat;
>>> +       unsigned int index = entry - ppat->entries;
>>> +
>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>> +       GEM_BUG_ON(!test_bit(index, ppat->used));
>>> +
>>> +       entry->value = ppat->clear_value;
>>> +       clear_bit(index, ppat->used);
>>> +       set_bit(index, ppat->dirty); }
>>> +
>>> +/**
>>> + * intel_ppat_get - get a usable PPAT entry
>>> + * @i915: i915 device instance
>>> + * @value: the PPAT value required by the caller
>>> + *
>>> + * The function tries to search if there is an existing PPAT entry 
>>> +which
>>> + * matches with the required value. If perfectly matched, the 
>>> +existing
>>> PPAT
>>> + * entry will be used. If only partially matched, it will try to 
>>> + check
>>> if
>>> + * there is any available PPAT index. If yes, it will allocate a 
>>> + new
>>> PPAT
>>> + * index for the required entry and update the HW. If not, the 
>>> +partially
>>> + * matched entry will be used.
>>> + */
>>> +const struct intel_ppat_entry *
>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value) {
>>> +       struct intel_ppat *ppat = &i915->ppat;
>>> +       struct intel_ppat_entry *entry;
>>> +       unsigned int scanned, best_score;
>>> +       int i;
>>> +
>>> +       GEM_BUG_ON(!ppat->max_entries);
>>> +
>>> +       scanned = best_score = 0;
>>> +
>>> +       for_each_set_bit(i, ppat->used, ppat->max_entries) {
>>> +               unsigned int score;
>>> +
>>> +               entry = &ppat->entries[i];
>>> +               score = ppat->match(entry->value, value);
>>> +               if (score > best_score) {
>>> +                       if (score == INTEL_PPAT_PERFECT_MATCH) {
>>> +                               kref_get(&entry->ref);
>>> +                               return entry;
>>> +                       }
>>> +                       best_score = score;
>>> +               }
>>> +               scanned++;
>>> +       }
>>> +
>>> +       if (scanned == ppat->max_entries) {
>>> +               if (!best_score)
>>> +                       return ERR_PTR(-ENOSPC);
>>> +
>>> +               kref_get(&entry->ref);
>>> +               return entry;
>>> +       }
>>> +
>>> +       i = find_first_zero_bit(ppat->used, ppat->max_entries);
>>> +       entry = __alloc_ppat_entry(ppat, i, value);
>>> +       ppat->update_hw(i915);
>>> +       return entry;
>>> +}
>>> +
>>> +static void release_ppat(struct kref *kref) {
>>> +       struct intel_ppat_entry *entry =
>>> +               container_of(kref, struct intel_ppat_entry, ref);
>>> +       struct drm_i915_private *i915 = entry->ppat->i915;
>>> +
>>> +       __free_ppat_entry(entry);
>>> +       entry->ppat->update_hw(i915); }
>>> +
>>> +/**
>>> + * intel_ppat_put - put back the PPAT entry got from 
>>> +intel_ppat_get()
>>> + * @entry: an intel PPAT entry
>>> + *
>>> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT 
>>> +index
>>> of the
>>> + * entry is dynamically allocated, its reference count will be
>>> decreased. Once
>>> + * the reference count becomes into zero, the PPAT index becomes 
>>> + free
>>> again.
>>> + */
>>> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
>>> +       struct intel_ppat *ppat = entry->ppat;
>>> +       unsigned int index = entry - ppat->entries;
>>> +
>>> +       GEM_BUG_ON(!ppat->max_entries);
>>> +
>>> +       kref_put(&ppat->entries[index].ref, release_ppat); }
>>> +
>>> +static void cnl_private_pat_update_hw(struct drm_i915_private 
>>> +*dev_priv) {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       int i;
>>> +
>>> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
>>> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
>>> +               clear_bit(i, ppat->dirty);
>>> +       }
>>> +}
>>> +
>>> +static void bdw_private_pat_update_hw(struct drm_i915_private 
>>> +*dev_priv) {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       u64 pat = 0;
>>> +       int i;
>>> +
>>> +       for (i = 0; i < ppat->max_entries; i++)
>>> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
>>> +
>>> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
>>> +
>>> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
>>> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
>>> +
>>> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
>>> +       unsigned int score = 0;
>>> +
>>> +       /* Cache attribute has to be matched. */
>>> +       if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
>>> +               return 0;
>>> +
>>> +       if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
>>> +               score += 2;
>>> +
>>> +       if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
>>> +               score += 1;
>>> +
>>> +       if (score == 3)
>>> +               return INTEL_PPAT_PERFECT_MATCH;
>>> +
>>> +       return score;
>>> +}
>>> +
>>> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
>>> +       return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
>>> +               INTEL_PPAT_PERFECT_MATCH : 0; }
>>> +
>>> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = cnl_private_pat_update_hw;
>>> +       ppat->match = bdw_private_pat_match;
>>> +       ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>>> +
>>>         /* XXX: spec is unclear if this is still needed for CNL+ */
>>> -       if (!USES_PPGTT(dev_priv)) {
>>> -               I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
>>> +       if (!USES_PPGTT(ppat->i915)) {
>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>> GEN8_PPAT_UC);
>>>                 return;
>>>         }
>>>   -     I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0));
>>> -       I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(1));
>>> -       I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(2));
>>> -       I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3));
>>
>>
>> Gen8 allocation style doesn't work on CNL.
>> I don't see where all these registers would be getting properly written.
>>
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to 
>>> + PPAT
>>> */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB 
>>> + |
>>> GEN8_PPAT_LLC);
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 
>>> + GEN8_PPAT_WT |
>>> GEN8_PPAT_LLCELLC);
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, 
>>> + GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0));
>>>   }
>>>     /* The GGTT and PPGTT need a private PPAT setup in order to 
>>> handle cacheability
>>>    * bits. When using advanced contexts each context stores its own 
>>> PAT, but
>>>    * writing this data shouldn't be harmful even in those cases. */ 
>>> -static void bdw_setup_private_ppat(struct drm_i915_private 
>>> *dev_priv)
>>> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
>>>   {
>>> -       u64 pat;
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>> +       ppat->match = bdw_private_pat_match;
>>> +       ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3);
>>>   -     pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for
>>> normal objects, no eLLC */
>>> -             GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for
>>> something pointing to ptes? */
>>> -             GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for
>>> scanout with eLLC */
>>> -             GEN8_PPAT(3, GEN8_PPAT_UC)                     | /*
>>> Uncached objects, mostly for scanout */
>>> -             GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0)) |
>>> -             GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(1)) |
>>> -             GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(2)) |
>>> -             GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3));
>>> -
>>> -       if (!USES_PPGTT(dev_priv))
>>> +       if (!USES_PPGTT(ppat->i915)) {
>>>                 /* Spec: "For GGTT, there is NO pat_sel[2:0] from 
>>> the entry,
>>>                  * so RTL will always use the value corresponding to
>>>                  * pat_sel = 000".
>>> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct 
>>> drm_i915_private *dev_priv)
>>>                  * So we can still hold onto all our assumptions wrt cpu
>>>                  * clflushing on LLC machines.
>>>                  */
>>> -               pat = GEN8_PPAT(0, GEN8_PPAT_UC);
>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>> GEN8_PPAT_UC);
>>> +               return;
>>> +       }
>>>   -     /* XXX: spec defines this as 2 distinct registers. It's unclear
>>> if a 64b
>>> -        * write would work. */
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to 
>>> + PPAT
>>> */
>>> +       /* for normal objects, no eLLC */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB 
>>> + |
>>> GEN8_PPAT_LLC);
>>> +       /* for scanout with eLLC */
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 
>>> + GEN8_PPAT_WT |
>>> GEN8_PPAT_LLCELLC);
>>> +       /* Uncached objects, mostly for scanout */
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB |
>>> GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>>>   }
>>>   -static void chv_setup_private_ppat(struct drm_i915_private 
>>> *dev_priv)
>>> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
>>>   {
>>> -       u64 pat;
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>> +       ppat->match = chv_private_pat_match;
>>> +       ppat->clear_value = CHV_PPAT_SNOOP;
>>>         /*
>>>          * Map WB on BDW to snooped on CHV.
>>> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct 
>>> drm_i915_private *dev_priv)
>>>          * Which means we must set the snoop bit in PAT entry 0
>>>          * in order to keep the global status page working.
>>>          */
>>> -       pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(1, 0) |
>>> -             GEN8_PPAT(2, 0) |
>>> -             GEN8_PPAT(3, 0) |
>>> -             GEN8_PPAT(4, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(5, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(6, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(7, CHV_PPAT_SNOOP);
>>>   -     I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to 
>>> + PPAT
>>> */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
>>>   }
>>>     static void gen6_gmch_remove(struct i915_address_space *vm) @@ 
>>> -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct 
>>> i915_address_space *vm)
>>>     static void setup_private_pat(struct drm_i915_private *dev_priv)
>>>   {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       int i;
>>> +
>>> +       ppat->i915 = dev_priv;
>>> +
>>>         if (INTEL_GEN(dev_priv) >= 10)
>>> -               cnl_setup_private_ppat(dev_priv);
>>> +               cnl_setup_private_ppat(ppat);
>>>         else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>> -               chv_setup_private_ppat(dev_priv);
>>> +               chv_setup_private_ppat(ppat);
>>>         else
>>> -               bdw_setup_private_ppat(dev_priv);
>>> +               bdw_setup_private_ppat(ppat);
>>> +
>>> +       GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
>>> +
>>> +       for_each_clear_bit(i, ppat->used, ppat->max_entries) {
>>> +               ppat->entries[i].value = ppat->clear_value;
>>> +               ppat->entries[i].ppat = ppat;
>>> +               set_bit(i, ppat->dirty);
>>> +       }
>>> +
>>> +       ppat->update_hw(dev_priv);
>>>   }
>>>     static int gen8_gmch_probe(struct i915_ggtt *ggtt) @@ -3236,13 
>>> +3414,10 @@ void i915_gem_restore_gtt_mappings(struct
>>> drm_i915_private *dev_priv)
>>>         ggtt->base.closed = false;
>>>         if (INTEL_GEN(dev_priv) >= 8) {
>>> -               if (INTEL_GEN(dev_priv) >= 10)
>>> -                       cnl_setup_private_ppat(dev_priv);
>>> -               else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>> -                       chv_setup_private_ppat(dev_priv);
>>> -               else
>>> -                       bdw_setup_private_ppat(dev_priv);
>>> +               struct intel_ppat *ppat = &dev_priv->ppat;
>>>   +             bitmap_set(ppat->dirty, 0, ppat->max_entries);
>>> +               dev_priv->ppat.update_hw(dev_priv);
>>>                 return;
>>>         }
>>>   diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> index b4e3aa7..e10ca89 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
>>>   #define GEN8_PPAT_ELLC_OVERRIDE               (0<<2)
>>>   #define GEN8_PPAT(i, x)                       ((u64)(x) << ((i) * 8))
>>>   +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
>>> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2)) #define 
>>> +GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4)) #define CHV_PPAT_GET_SNOOP(x) 
>>> +((x) & (1 << 6))
>>> +
>>>   struct sg_table;
>>>     struct intel_rotation_info {
>>> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>>>         return container_of(vm, struct i915_ggtt, base);
>>>   }
>>>   +#define INTEL_MAX_PPAT_ENTRIES 8
>>> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
>>> +
>>> +struct intel_ppat;
>>> +
>>> +struct intel_ppat_entry {
>>> +       struct intel_ppat *ppat;
>>> +       struct kref ref;
>>> +       u8 value;
>>> +};
>>> +
>>> +struct intel_ppat {
>>> +       struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
>>> +       DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
>>> +       DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
>>> +       unsigned int max_entries;
>>> +       u8 clear_value;
>>> +       /*
>>> +        * Return a score to show how two PPAT values match,
>>> +        * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
>>> +        */
>>> +       unsigned int (*match)(u8 src, u8 dst);
>>> +       void (*update_hw)(struct drm_i915_private *i915);
>>> +
>>> +       struct drm_i915_private *i915; };
>>> +
>>> +const struct intel_ppat_entry *
>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value); void 
>>> +intel_ppat_put(const struct intel_ppat_entry *entry);
>>> +
>>>   int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
>>>   void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
>>>
>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx



--
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  5:22         ` Wang, Zhi A
@ 2017-08-31  5:24           ` Zhi Wang
  2017-08-31  5:47             ` Vivi, Rodrigo
  0 siblings, 1 reply; 14+ messages in thread
From: Zhi Wang @ 2017-08-31  5:24 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev, Vivi, Rodrigo

Also I should send a register diff at that time (RFC -> PATCH). Suppose 
these two patches should keep the used PPAT registers unchanged like before.


On 08/31/17 13:22, Wang, Zhi A wrote:
> Thanks for the test! Sorry I didn't have an CNL on my hand.
>
> This is only RFC now. after collecting all the comments, I will start full test. :)
>
> Thanks,
> Zhi.
>
> -----Original Message-----
> From: Rodrigo Vivi [mailto:rodrigo.vivi@gmail.com]
> Sent: Thursday, August 31, 2017 8:15 AM
> To: Wang, Zhi A <zhi.a.wang@intel.com>
> Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>; intel-gfx@lists.freedesktop.org; Widawsky, Benjamin <benjamin.widawsky@intel.com>; intel-gvt-dev@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [RFCv6 2/2] drm/i915: Introduce private PAT management
>
> On Wed, Aug 30, 2017 at 9:49 PM, Zhi Wang <zhi.a.wang@intel.com> wrote:
>> Hi Vivi:
>>      Thanks for the reply! The register are written in ppat->update_hw() now.
> oh, I saw now...
> I hadden noticed that interation.
>
> But something seems really odd yet...
> My CNL with these 2 patches applied hangs on any execution...
>
>>
>> +static void cnl_private_pat_update_hw(struct drm_i915_private
>> +*dev_priv) {
>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>> +       int i;
>> +
>> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
>> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
>> +               clear_bit(i, ppat->dirty);
>> +       }
>> +}
>> +
>> +static void bdw_private_pat_update_hw(struct drm_i915_private
>> +*dev_priv) {
>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>> +       u64 pat = 0;
>> +       int i;
>> +
>> +       for (i = 0; i < ppat->max_entries; i++)
>> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
>> +
>> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
>> +
>> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
>> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
>>
>> On 08/31/17 06:00, Vivi, Rodrigo wrote:
>>> On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
>>>> The private PAT management is to support PPAT entry manipulation.
>>>> Two APIs are introduced for dynamically managing PPAT entries:
>>>> intel_ppat_get and intel_ppat_put.
>>>>
>>>> intel_ppat_get will search for an existing PPAT entry which
>>>> perfectly matches the required PPAT value. If not, it will try to
>>>> allocate or return a partially matched PPAT entry if there is any
>>>> available PPAT indexes or not.
>>>>
>>>> intel_ppat_put will put back the PPAT entry which comes from
>>>> intel_ppat_get. If it's dynamically allocated, the reference count
>>>> will be decreased. If the reference count turns into zero, the PPAT
>>>> index is freed again.
>>>>
>>>> Besides, another two callbacks are introduced to support the private
>>>> PAT management framework. One is ppat->update_hw(), which writes the
>>>> PPAT configurations in ppat->entries into HW. Another one is
>>>> ppat->match, which will return a score to show how two PPAT values
>>>> match with each other.
>>>>
>>>> v6:
>>>>
>>>> - Address all comments from Chris:
>>>> http://www.spinics.net/lists/intel-gfx/msg136850.html
>>>>
>>>> - Address all comments from Joonas:
>>>> http://www.spinics.net/lists/intel-gfx/msg136845.html
>>>>
>>>> v5:
>>>>
>>>> - Add check and warnnings for those platforms which don't have PPAT.
>>>>
>>>> v3:
>>>>
>>>> - Introduce dirty bitmap for PPAT registers. (Chris)
>>>> - Change the name of the pointer "dev_priv" to "i915". (Chris)
>>>> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *.
>>>> (Chris)
>>>>
>>>> v2:
>>>>
>>>> - API re-design. (Chris)
>>>>
>>>> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_drv.h     |   2 +
>>>>    drivers/gpu/drm/i915/i915_gem_gtt.c | 273
>>>> +++++++++++++++++++++++++++++-------
>>>>    drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
>>>>    3 files changed, 262 insertions(+), 49 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>>> b/drivers/gpu/drm/i915/i915_drv.h index 7587ef5..5ffde10 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
>>>>          DECLARE_HASHTABLE(mm_structs, 7);
>>>>          struct mutex mm_lock;
>>>>    +     struct intel_ppat ppat;
>>>> +
>>>>          /* Kernel Modesetting */
>>>>          struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> index b74fa9d..3106142 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>>> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct
>>>> i915_ggtt *ggtt, u64 size)
>>>>          return 0;
>>>>    }
>>>>    -static void cnl_setup_private_ppat(struct drm_i915_private
>>>> *dev_priv)
>>>> +static struct intel_ppat_entry *
>>>> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8
>>>> value)
>>>>    {
>>>> +       struct intel_ppat_entry *entry = &ppat->entries[index];
>>>> +
>>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>>> +       GEM_BUG_ON(test_bit(index, ppat->used));
>>>> +
>>>> +       entry->ppat = ppat;
>>>> +       entry->value = value;
>>>> +       kref_init(&entry->ref);
>>>> +       set_bit(index, ppat->used);
>>>> +       set_bit(index, ppat->dirty);
>>>> +
>>>> +       return entry;
>>>> +}
>>>> +
>>>> +static void __free_ppat_entry(struct intel_ppat_entry *entry) {
>>>> +       struct intel_ppat *ppat = entry->ppat;
>>>> +       unsigned int index = entry - ppat->entries;
>>>> +
>>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>>> +       GEM_BUG_ON(!test_bit(index, ppat->used));
>>>> +
>>>> +       entry->value = ppat->clear_value;
>>>> +       clear_bit(index, ppat->used);
>>>> +       set_bit(index, ppat->dirty); }
>>>> +
>>>> +/**
>>>> + * intel_ppat_get - get a usable PPAT entry
>>>> + * @i915: i915 device instance
>>>> + * @value: the PPAT value required by the caller
>>>> + *
>>>> + * The function tries to search if there is an existing PPAT entry
>>>> +which
>>>> + * matches with the required value. If perfectly matched, the
>>>> +existing
>>>> PPAT
>>>> + * entry will be used. If only partially matched, it will try to
>>>> + check
>>>> if
>>>> + * there is any available PPAT index. If yes, it will allocate a
>>>> + new
>>>> PPAT
>>>> + * index for the required entry and update the HW. If not, the
>>>> +partially
>>>> + * matched entry will be used.
>>>> + */
>>>> +const struct intel_ppat_entry *
>>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value) {
>>>> +       struct intel_ppat *ppat = &i915->ppat;
>>>> +       struct intel_ppat_entry *entry;
>>>> +       unsigned int scanned, best_score;
>>>> +       int i;
>>>> +
>>>> +       GEM_BUG_ON(!ppat->max_entries);
>>>> +
>>>> +       scanned = best_score = 0;
>>>> +
>>>> +       for_each_set_bit(i, ppat->used, ppat->max_entries) {
>>>> +               unsigned int score;
>>>> +
>>>> +               entry = &ppat->entries[i];
>>>> +               score = ppat->match(entry->value, value);
>>>> +               if (score > best_score) {
>>>> +                       if (score == INTEL_PPAT_PERFECT_MATCH) {
>>>> +                               kref_get(&entry->ref);
>>>> +                               return entry;
>>>> +                       }
>>>> +                       best_score = score;
>>>> +               }
>>>> +               scanned++;
>>>> +       }
>>>> +
>>>> +       if (scanned == ppat->max_entries) {
>>>> +               if (!best_score)
>>>> +                       return ERR_PTR(-ENOSPC);
>>>> +
>>>> +               kref_get(&entry->ref);
>>>> +               return entry;
>>>> +       }
>>>> +
>>>> +       i = find_first_zero_bit(ppat->used, ppat->max_entries);
>>>> +       entry = __alloc_ppat_entry(ppat, i, value);
>>>> +       ppat->update_hw(i915);
>>>> +       return entry;
>>>> +}
>>>> +
>>>> +static void release_ppat(struct kref *kref) {
>>>> +       struct intel_ppat_entry *entry =
>>>> +               container_of(kref, struct intel_ppat_entry, ref);
>>>> +       struct drm_i915_private *i915 = entry->ppat->i915;
>>>> +
>>>> +       __free_ppat_entry(entry);
>>>> +       entry->ppat->update_hw(i915); }
>>>> +
>>>> +/**
>>>> + * intel_ppat_put - put back the PPAT entry got from
>>>> +intel_ppat_get()
>>>> + * @entry: an intel PPAT entry
>>>> + *
>>>> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT
>>>> +index
>>>> of the
>>>> + * entry is dynamically allocated, its reference count will be
>>>> decreased. Once
>>>> + * the reference count becomes into zero, the PPAT index becomes
>>>> + free
>>>> again.
>>>> + */
>>>> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
>>>> +       struct intel_ppat *ppat = entry->ppat;
>>>> +       unsigned int index = entry - ppat->entries;
>>>> +
>>>> +       GEM_BUG_ON(!ppat->max_entries);
>>>> +
>>>> +       kref_put(&ppat->entries[index].ref, release_ppat); }
>>>> +
>>>> +static void cnl_private_pat_update_hw(struct drm_i915_private
>>>> +*dev_priv) {
>>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>>> +       int i;
>>>> +
>>>> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
>>>> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
>>>> +               clear_bit(i, ppat->dirty);
>>>> +       }
>>>> +}
>>>> +
>>>> +static void bdw_private_pat_update_hw(struct drm_i915_private
>>>> +*dev_priv) {
>>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>>> +       u64 pat = 0;
>>>> +       int i;
>>>> +
>>>> +       for (i = 0; i < ppat->max_entries; i++)
>>>> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
>>>> +
>>>> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
>>>> +
>>>> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
>>>> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
>>>> +
>>>> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
>>>> +       unsigned int score = 0;
>>>> +
>>>> +       /* Cache attribute has to be matched. */
>>>> +       if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
>>>> +               return 0;
>>>> +
>>>> +       if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
>>>> +               score += 2;
>>>> +
>>>> +       if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
>>>> +               score += 1;
>>>> +
>>>> +       if (score == 3)
>>>> +               return INTEL_PPAT_PERFECT_MATCH;
>>>> +
>>>> +       return score;
>>>> +}
>>>> +
>>>> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
>>>> +       return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
>>>> +               INTEL_PPAT_PERFECT_MATCH : 0; }
>>>> +
>>>> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
>>>> +       ppat->max_entries = 8;
>>>> +       ppat->update_hw = cnl_private_pat_update_hw;
>>>> +       ppat->match = bdw_private_pat_match;
>>>> +       ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>>>> +
>>>>          /* XXX: spec is unclear if this is still needed for CNL+ */
>>>> -       if (!USES_PPGTT(dev_priv)) {
>>>> -               I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
>>>> +       if (!USES_PPGTT(ppat->i915)) {
>>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>>> GEN8_PPAT_UC);
>>>>                  return;
>>>>          }
>>>>    -     I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
>>>> -       I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
>>>> -       I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>>>> -       I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
>>>> -       I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(0));
>>>> -       I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(1));
>>>> -       I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(2));
>>>> -       I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(3));
>>>
>>> Gen8 allocation style doesn't work on CNL.
>>> I don't see where all these registers would be getting properly written.
>>>
>>>> +       /* See gen8_pte_encode() for the mapping from cache-level to
>>>> + PPAT
>>>> */
>>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB
>>>> + |
>>>> GEN8_PPAT_LLC);
>>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX,
>>>> + GEN8_PPAT_WT |
>>>> GEN8_PPAT_LLCELLC);
>>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX,
>>>> + GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(0));
>>>>    }
>>>>      /* The GGTT and PPGTT need a private PPAT setup in order to
>>>> handle cacheability
>>>>     * bits. When using advanced contexts each context stores its own
>>>> PAT, but
>>>>     * writing this data shouldn't be harmful even in those cases. */
>>>> -static void bdw_setup_private_ppat(struct drm_i915_private
>>>> *dev_priv)
>>>> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
>>>>    {
>>>> -       u64 pat;
>>>> +       ppat->max_entries = 8;
>>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>>> +       ppat->match = bdw_private_pat_match;
>>>> +       ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(3);
>>>>    -     pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for
>>>> normal objects, no eLLC */
>>>> -             GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for
>>>> something pointing to ptes? */
>>>> -             GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for
>>>> scanout with eLLC */
>>>> -             GEN8_PPAT(3, GEN8_PPAT_UC)                     | /*
>>>> Uncached objects, mostly for scanout */
>>>> -             GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(0)) |
>>>> -             GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(1)) |
>>>> -             GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(2)) |
>>>> -             GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>>> GEN8_PPAT_AGE(3));
>>>> -
>>>> -       if (!USES_PPGTT(dev_priv))
>>>> +       if (!USES_PPGTT(ppat->i915)) {
>>>>                  /* Spec: "For GGTT, there is NO pat_sel[2:0] from
>>>> the entry,
>>>>                   * so RTL will always use the value corresponding to
>>>>                   * pat_sel = 000".
>>>> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct
>>>> drm_i915_private *dev_priv)
>>>>                   * So we can still hold onto all our assumptions wrt cpu
>>>>                   * clflushing on LLC machines.
>>>>                   */
>>>> -               pat = GEN8_PPAT(0, GEN8_PPAT_UC);
>>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>>> GEN8_PPAT_UC);
>>>> +               return;
>>>> +       }
>>>>    -     /* XXX: spec defines this as 2 distinct registers. It's unclear
>>>> if a 64b
>>>> -        * write would work. */
>>>> -       I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>>> +       /* See gen8_pte_encode() for the mapping from cache-level to
>>>> + PPAT
>>>> */
>>>> +       /* for normal objects, no eLLC */
>>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB
>>>> + |
>>>> GEN8_PPAT_LLC);
>>>> +       /* for scanout with eLLC */
>>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX,
>>>> + GEN8_PPAT_WT |
>>>> GEN8_PPAT_LLCELLC);
>>>> +       /* Uncached objects, mostly for scanout */
>>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB |
>>>> GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>>>>    }
>>>>    -static void chv_setup_private_ppat(struct drm_i915_private
>>>> *dev_priv)
>>>> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
>>>>    {
>>>> -       u64 pat;
>>>> +       ppat->max_entries = 8;
>>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>>> +       ppat->match = chv_private_pat_match;
>>>> +       ppat->clear_value = CHV_PPAT_SNOOP;
>>>>          /*
>>>>           * Map WB on BDW to snooped on CHV.
>>>> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct
>>>> drm_i915_private *dev_priv)
>>>>           * Which means we must set the snoop bit in PAT entry 0
>>>>           * in order to keep the global status page working.
>>>>           */
>>>> -       pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
>>>> -             GEN8_PPAT(1, 0) |
>>>> -             GEN8_PPAT(2, 0) |
>>>> -             GEN8_PPAT(3, 0) |
>>>> -             GEN8_PPAT(4, CHV_PPAT_SNOOP) |
>>>> -             GEN8_PPAT(5, CHV_PPAT_SNOOP) |
>>>> -             GEN8_PPAT(6, CHV_PPAT_SNOOP) |
>>>> -             GEN8_PPAT(7, CHV_PPAT_SNOOP);
>>>>    -     I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>>> +       /* See gen8_pte_encode() for the mapping from cache-level to
>>>> + PPAT
>>>> */
>>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
>>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
>>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
>>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
>>>>    }
>>>>      static void gen6_gmch_remove(struct i915_address_space *vm) @@
>>>> -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct
>>>> i915_address_space *vm)
>>>>      static void setup_private_pat(struct drm_i915_private *dev_priv)
>>>>    {
>>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>>> +       int i;
>>>> +
>>>> +       ppat->i915 = dev_priv;
>>>> +
>>>>          if (INTEL_GEN(dev_priv) >= 10)
>>>> -               cnl_setup_private_ppat(dev_priv);
>>>> +               cnl_setup_private_ppat(ppat);
>>>>          else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>>> -               chv_setup_private_ppat(dev_priv);
>>>> +               chv_setup_private_ppat(ppat);
>>>>          else
>>>> -               bdw_setup_private_ppat(dev_priv);
>>>> +               bdw_setup_private_ppat(ppat);
>>>> +
>>>> +       GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
>>>> +
>>>> +       for_each_clear_bit(i, ppat->used, ppat->max_entries) {
>>>> +               ppat->entries[i].value = ppat->clear_value;
>>>> +               ppat->entries[i].ppat = ppat;
>>>> +               set_bit(i, ppat->dirty);
>>>> +       }
>>>> +
>>>> +       ppat->update_hw(dev_priv);
>>>>    }
>>>>      static int gen8_gmch_probe(struct i915_ggtt *ggtt) @@ -3236,13
>>>> +3414,10 @@ void i915_gem_restore_gtt_mappings(struct
>>>> drm_i915_private *dev_priv)
>>>>          ggtt->base.closed = false;
>>>>          if (INTEL_GEN(dev_priv) >= 8) {
>>>> -               if (INTEL_GEN(dev_priv) >= 10)
>>>> -                       cnl_setup_private_ppat(dev_priv);
>>>> -               else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>>> -                       chv_setup_private_ppat(dev_priv);
>>>> -               else
>>>> -                       bdw_setup_private_ppat(dev_priv);
>>>> +               struct intel_ppat *ppat = &dev_priv->ppat;
>>>>    +             bitmap_set(ppat->dirty, 0, ppat->max_entries);
>>>> +               dev_priv->ppat.update_hw(dev_priv);
>>>>                  return;
>>>>          }
>>>>    diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>>> index b4e3aa7..e10ca89 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>>> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
>>>>    #define GEN8_PPAT_ELLC_OVERRIDE               (0<<2)
>>>>    #define GEN8_PPAT(i, x)                       ((u64)(x) << ((i) * 8))
>>>>    +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
>>>> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2)) #define
>>>> +GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4)) #define CHV_PPAT_GET_SNOOP(x)
>>>> +((x) & (1 << 6))
>>>> +
>>>>    struct sg_table;
>>>>      struct intel_rotation_info {
>>>> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>>>>          return container_of(vm, struct i915_ggtt, base);
>>>>    }
>>>>    +#define INTEL_MAX_PPAT_ENTRIES 8
>>>> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
>>>> +
>>>> +struct intel_ppat;
>>>> +
>>>> +struct intel_ppat_entry {
>>>> +       struct intel_ppat *ppat;
>>>> +       struct kref ref;
>>>> +       u8 value;
>>>> +};
>>>> +
>>>> +struct intel_ppat {
>>>> +       struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
>>>> +       DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
>>>> +       DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
>>>> +       unsigned int max_entries;
>>>> +       u8 clear_value;
>>>> +       /*
>>>> +        * Return a score to show how two PPAT values match,
>>>> +        * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
>>>> +        */
>>>> +       unsigned int (*match)(u8 src, u8 dst);
>>>> +       void (*update_hw)(struct drm_i915_private *i915);
>>>> +
>>>> +       struct drm_i915_private *i915; };
>>>> +
>>>> +const struct intel_ppat_entry *
>>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value); void
>>>> +intel_ppat_put(const struct intel_ppat_entry *entry);
>>>> +
>>>>    int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
>>>>    void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
>>>>
>>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
> --
> Rodrigo Vivi
> Blog: http://blog.vivi.eng.br

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  5:24           ` Zhi Wang
@ 2017-08-31  5:47             ` Vivi, Rodrigo
  0 siblings, 0 replies; 14+ messages in thread
From: Vivi, Rodrigo @ 2017-08-31  5:47 UTC (permalink / raw)
  To: Wang, Zhi A; +Cc: intel-gfx, intel-gvt-dev, Widawsky, Benjamin

On Thu, 2017-08-31 at 13:24 +0800, Zhi Wang wrote:
> Also I should send a register diff at that time (RFC -> PATCH). Suppose 
> these two patches should keep the used PPAT registers unchanged like before.

I knew you were going to ask this and logs :)
I will check tomorrow carefully since right now I don't have access to
the machine.

Sorry about that.

> 
> On 08/31/17 13:22, Wang, Zhi A wrote:
> > Thanks for the test! Sorry I didn't have an CNL on my hand.
> >
> > This is only RFC now. after collecting all the comments, I will start full test. :)
> >
> > Thanks,
> > Zhi.
> >
> > -----Original Message-----
> > From: Rodrigo Vivi [mailto:rodrigo.vivi@gmail.com]
> > Sent: Thursday, August 31, 2017 8:15 AM
> > To: Wang, Zhi A <zhi.a.wang@intel.com>
> > Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>; intel-gfx@lists.freedesktop.org; Widawsky, Benjamin <benjamin.widawsky@intel.com>; intel-gvt-dev@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [RFCv6 2/2] drm/i915: Introduce private PAT management
> >
> > On Wed, Aug 30, 2017 at 9:49 PM, Zhi Wang <zhi.a.wang@intel.com> wrote:
> >> Hi Vivi:
> >>      Thanks for the reply! The register are written in ppat->update_hw() now.
> > oh, I saw now...
> > I hadden noticed that interation.
> >
> > But something seems really odd yet...
> > My CNL with these 2 patches applied hangs on any execution...
> >
> >>
> >> +static void cnl_private_pat_update_hw(struct drm_i915_private
> >> +*dev_priv) {
> >> +       struct intel_ppat *ppat = &dev_priv->ppat;
> >> +       int i;
> >> +
> >> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> >> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
> >> +               clear_bit(i, ppat->dirty);
> >> +       }
> >> +}
> >> +
> >> +static void bdw_private_pat_update_hw(struct drm_i915_private
> >> +*dev_priv) {
> >> +       struct intel_ppat *ppat = &dev_priv->ppat;
> >> +       u64 pat = 0;
> >> +       int i;
> >> +
> >> +       for (i = 0; i < ppat->max_entries; i++)
> >> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
> >> +
> >> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> >> +
> >> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
> >> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
> >>
> >> On 08/31/17 06:00, Vivi, Rodrigo wrote:
> >>> On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
> >>>> The private PAT management is to support PPAT entry manipulation.
> >>>> Two APIs are introduced for dynamically managing PPAT entries:
> >>>> intel_ppat_get and intel_ppat_put.
> >>>>
> >>>> intel_ppat_get will search for an existing PPAT entry which
> >>>> perfectly matches the required PPAT value. If not, it will try to
> >>>> allocate or return a partially matched PPAT entry if there is any
> >>>> available PPAT indexes or not.
> >>>>
> >>>> intel_ppat_put will put back the PPAT entry which comes from
> >>>> intel_ppat_get. If it's dynamically allocated, the reference count
> >>>> will be decreased. If the reference count turns into zero, the PPAT
> >>>> index is freed again.
> >>>>
> >>>> Besides, another two callbacks are introduced to support the private
> >>>> PAT management framework. One is ppat->update_hw(), which writes the
> >>>> PPAT configurations in ppat->entries into HW. Another one is
> >>>> ppat->match, which will return a score to show how two PPAT values
> >>>> match with each other.
> >>>>
> >>>> v6:
> >>>>
> >>>> - Address all comments from Chris:
> >>>> http://www.spinics.net/lists/intel-gfx/msg136850.html
> >>>>
> >>>> - Address all comments from Joonas:
> >>>> http://www.spinics.net/lists/intel-gfx/msg136845.html
> >>>>
> >>>> v5:
> >>>>
> >>>> - Add check and warnnings for those platforms which don't have PPAT.
> >>>>
> >>>> v3:
> >>>>
> >>>> - Introduce dirty bitmap for PPAT registers. (Chris)
> >>>> - Change the name of the pointer "dev_priv" to "i915". (Chris)
> >>>> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *.
> >>>> (Chris)
> >>>>
> >>>> v2:
> >>>>
> >>>> - API re-design. (Chris)
> >>>>
> >>>> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >>>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/i915_drv.h     |   2 +
> >>>>    drivers/gpu/drm/i915/i915_gem_gtt.c | 273
> >>>> +++++++++++++++++++++++++++++-------
> >>>>    drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
> >>>>    3 files changed, 262 insertions(+), 49 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>>> b/drivers/gpu/drm/i915/i915_drv.h index 7587ef5..5ffde10 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_drv.h
> >>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
> >>>>          DECLARE_HASHTABLE(mm_structs, 7);
> >>>>          struct mutex mm_lock;
> >>>>    +     struct intel_ppat ppat;
> >>>> +
> >>>>          /* Kernel Modesetting */
> >>>>          struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>> index b74fa9d..3106142 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >>>> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct
> >>>> i915_ggtt *ggtt, u64 size)
> >>>>          return 0;
> >>>>    }
> >>>>    -static void cnl_setup_private_ppat(struct drm_i915_private
> >>>> *dev_priv)
> >>>> +static struct intel_ppat_entry *
> >>>> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8
> >>>> value)
> >>>>    {
> >>>> +       struct intel_ppat_entry *entry = &ppat->entries[index];
> >>>> +
> >>>> +       GEM_BUG_ON(index >= ppat->max_entries);
> >>>> +       GEM_BUG_ON(test_bit(index, ppat->used));
> >>>> +
> >>>> +       entry->ppat = ppat;
> >>>> +       entry->value = value;
> >>>> +       kref_init(&entry->ref);
> >>>> +       set_bit(index, ppat->used);
> >>>> +       set_bit(index, ppat->dirty);
> >>>> +
> >>>> +       return entry;
> >>>> +}
> >>>> +
> >>>> +static void __free_ppat_entry(struct intel_ppat_entry *entry) {
> >>>> +       struct intel_ppat *ppat = entry->ppat;
> >>>> +       unsigned int index = entry - ppat->entries;
> >>>> +
> >>>> +       GEM_BUG_ON(index >= ppat->max_entries);
> >>>> +       GEM_BUG_ON(!test_bit(index, ppat->used));
> >>>> +
> >>>> +       entry->value = ppat->clear_value;
> >>>> +       clear_bit(index, ppat->used);
> >>>> +       set_bit(index, ppat->dirty); }
> >>>> +
> >>>> +/**
> >>>> + * intel_ppat_get - get a usable PPAT entry
> >>>> + * @i915: i915 device instance
> >>>> + * @value: the PPAT value required by the caller
> >>>> + *
> >>>> + * The function tries to search if there is an existing PPAT entry
> >>>> +which
> >>>> + * matches with the required value. If perfectly matched, the
> >>>> +existing
> >>>> PPAT
> >>>> + * entry will be used. If only partially matched, it will try to
> >>>> + check
> >>>> if
> >>>> + * there is any available PPAT index. If yes, it will allocate a
> >>>> + new
> >>>> PPAT
> >>>> + * index for the required entry and update the HW. If not, the
> >>>> +partially
> >>>> + * matched entry will be used.
> >>>> + */
> >>>> +const struct intel_ppat_entry *
> >>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value) {
> >>>> +       struct intel_ppat *ppat = &i915->ppat;
> >>>> +       struct intel_ppat_entry *entry;
> >>>> +       unsigned int scanned, best_score;
> >>>> +       int i;
> >>>> +
> >>>> +       GEM_BUG_ON(!ppat->max_entries);
> >>>> +
> >>>> +       scanned = best_score = 0;
> >>>> +
> >>>> +       for_each_set_bit(i, ppat->used, ppat->max_entries) {
> >>>> +               unsigned int score;
> >>>> +
> >>>> +               entry = &ppat->entries[i];
> >>>> +               score = ppat->match(entry->value, value);
> >>>> +               if (score > best_score) {
> >>>> +                       if (score == INTEL_PPAT_PERFECT_MATCH) {
> >>>> +                               kref_get(&entry->ref);
> >>>> +                               return entry;
> >>>> +                       }
> >>>> +                       best_score = score;
> >>>> +               }
> >>>> +               scanned++;
> >>>> +       }
> >>>> +
> >>>> +       if (scanned == ppat->max_entries) {
> >>>> +               if (!best_score)
> >>>> +                       return ERR_PTR(-ENOSPC);
> >>>> +
> >>>> +               kref_get(&entry->ref);
> >>>> +               return entry;
> >>>> +       }
> >>>> +
> >>>> +       i = find_first_zero_bit(ppat->used, ppat->max_entries);
> >>>> +       entry = __alloc_ppat_entry(ppat, i, value);
> >>>> +       ppat->update_hw(i915);
> >>>> +       return entry;
> >>>> +}
> >>>> +
> >>>> +static void release_ppat(struct kref *kref) {
> >>>> +       struct intel_ppat_entry *entry =
> >>>> +               container_of(kref, struct intel_ppat_entry, ref);
> >>>> +       struct drm_i915_private *i915 = entry->ppat->i915;
> >>>> +
> >>>> +       __free_ppat_entry(entry);
> >>>> +       entry->ppat->update_hw(i915); }
> >>>> +
> >>>> +/**
> >>>> + * intel_ppat_put - put back the PPAT entry got from
> >>>> +intel_ppat_get()
> >>>> + * @entry: an intel PPAT entry
> >>>> + *
> >>>> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT
> >>>> +index
> >>>> of the
> >>>> + * entry is dynamically allocated, its reference count will be
> >>>> decreased. Once
> >>>> + * the reference count becomes into zero, the PPAT index becomes
> >>>> + free
> >>>> again.
> >>>> + */
> >>>> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
> >>>> +       struct intel_ppat *ppat = entry->ppat;
> >>>> +       unsigned int index = entry - ppat->entries;
> >>>> +
> >>>> +       GEM_BUG_ON(!ppat->max_entries);
> >>>> +
> >>>> +       kref_put(&ppat->entries[index].ref, release_ppat); }
> >>>> +
> >>>> +static void cnl_private_pat_update_hw(struct drm_i915_private
> >>>> +*dev_priv) {
> >>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
> >>>> +       int i;
> >>>> +
> >>>> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> >>>> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
> >>>> +               clear_bit(i, ppat->dirty);
> >>>> +       }
> >>>> +}
> >>>> +
> >>>> +static void bdw_private_pat_update_hw(struct drm_i915_private
> >>>> +*dev_priv) {
> >>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
> >>>> +       u64 pat = 0;
> >>>> +       int i;
> >>>> +
> >>>> +       for (i = 0; i < ppat->max_entries; i++)
> >>>> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
> >>>> +
> >>>> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> >>>> +
> >>>> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
> >>>> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
> >>>> +
> >>>> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
> >>>> +       unsigned int score = 0;
> >>>> +
> >>>> +       /* Cache attribute has to be matched. */
> >>>> +       if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
> >>>> +               return 0;
> >>>> +
> >>>> +       if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
> >>>> +               score += 2;
> >>>> +
> >>>> +       if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
> >>>> +               score += 1;
> >>>> +
> >>>> +       if (score == 3)
> >>>> +               return INTEL_PPAT_PERFECT_MATCH;
> >>>> +
> >>>> +       return score;
> >>>> +}
> >>>> +
> >>>> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
> >>>> +       return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
> >>>> +               INTEL_PPAT_PERFECT_MATCH : 0; }
> >>>> +
> >>>> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
> >>>> +       ppat->max_entries = 8;
> >>>> +       ppat->update_hw = cnl_private_pat_update_hw;
> >>>> +       ppat->match = bdw_private_pat_match;
> >>>> +       ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
> >>>> +
> >>>>          /* XXX: spec is unclear if this is still needed for CNL+ */
> >>>> -       if (!USES_PPGTT(dev_priv)) {
> >>>> -               I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
> >>>> +       if (!USES_PPGTT(ppat->i915)) {
> >>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
> >>>> GEN8_PPAT_UC);
> >>>>                  return;
> >>>>          }
> >>>>    -     I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(0));
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(1));
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(2));
> >>>> -       I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(3));
> >>>
> >>> Gen8 allocation style doesn't work on CNL.
> >>> I don't see where all these registers would be getting properly written.
> >>>
> >>>> +       /* See gen8_pte_encode() for the mapping from cache-level to
> >>>> + PPAT
> >>>> */
> >>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB
> >>>> + |
> >>>> GEN8_PPAT_LLC);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX,
> >>>> + GEN8_PPAT_WT |
> >>>> GEN8_PPAT_LLCELLC);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX,
> >>>> + GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(0));
> >>>>    }
> >>>>      /* The GGTT and PPGTT need a private PPAT setup in order to
> >>>> handle cacheability
> >>>>     * bits. When using advanced contexts each context stores its own
> >>>> PAT, but
> >>>>     * writing this data shouldn't be harmful even in those cases. */
> >>>> -static void bdw_setup_private_ppat(struct drm_i915_private
> >>>> *dev_priv)
> >>>> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
> >>>>    {
> >>>> -       u64 pat;
> >>>> +       ppat->max_entries = 8;
> >>>> +       ppat->update_hw = bdw_private_pat_update_hw;
> >>>> +       ppat->match = bdw_private_pat_match;
> >>>> +       ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(3);
> >>>>    -     pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for
> >>>> normal objects, no eLLC */
> >>>> -             GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for
> >>>> something pointing to ptes? */
> >>>> -             GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for
> >>>> scanout with eLLC */
> >>>> -             GEN8_PPAT(3, GEN8_PPAT_UC)                     | /*
> >>>> Uncached objects, mostly for scanout */
> >>>> -             GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(0)) |
> >>>> -             GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(1)) |
> >>>> -             GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(2)) |
> >>>> -             GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
> >>>> GEN8_PPAT_AGE(3));
> >>>> -
> >>>> -       if (!USES_PPGTT(dev_priv))
> >>>> +       if (!USES_PPGTT(ppat->i915)) {
> >>>>                  /* Spec: "For GGTT, there is NO pat_sel[2:0] from
> >>>> the entry,
> >>>>                   * so RTL will always use the value corresponding to
> >>>>                   * pat_sel = 000".
> >>>> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct
> >>>> drm_i915_private *dev_priv)
> >>>>                   * So we can still hold onto all our assumptions wrt cpu
> >>>>                   * clflushing on LLC machines.
> >>>>                   */
> >>>> -               pat = GEN8_PPAT(0, GEN8_PPAT_UC);
> >>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
> >>>> GEN8_PPAT_UC);
> >>>> +               return;
> >>>> +       }
> >>>>    -     /* XXX: spec defines this as 2 distinct registers. It's unclear
> >>>> if a 64b
> >>>> -        * write would work. */
> >>>> -       I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> >>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
> >>>> +       /* See gen8_pte_encode() for the mapping from cache-level to
> >>>> + PPAT
> >>>> */
> >>>> +       /* for normal objects, no eLLC */
> >>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB
> >>>> + |
> >>>> GEN8_PPAT_LLC);
> >>>> +       /* for scanout with eLLC */
> >>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX,
> >>>> + GEN8_PPAT_WT |
> >>>> GEN8_PPAT_LLCELLC);
> >>>> +       /* Uncached objects, mostly for scanout */
> >>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB |
> >>>> GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
> >>>>    }
> >>>>    -static void chv_setup_private_ppat(struct drm_i915_private
> >>>> *dev_priv)
> >>>> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
> >>>>    {
> >>>> -       u64 pat;
> >>>> +       ppat->max_entries = 8;
> >>>> +       ppat->update_hw = bdw_private_pat_update_hw;
> >>>> +       ppat->match = chv_private_pat_match;
> >>>> +       ppat->clear_value = CHV_PPAT_SNOOP;
> >>>>          /*
> >>>>           * Map WB on BDW to snooped on CHV.
> >>>> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct
> >>>> drm_i915_private *dev_priv)
> >>>>           * Which means we must set the snoop bit in PAT entry 0
> >>>>           * in order to keep the global status page working.
> >>>>           */
> >>>> -       pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
> >>>> -             GEN8_PPAT(1, 0) |
> >>>> -             GEN8_PPAT(2, 0) |
> >>>> -             GEN8_PPAT(3, 0) |
> >>>> -             GEN8_PPAT(4, CHV_PPAT_SNOOP) |
> >>>> -             GEN8_PPAT(5, CHV_PPAT_SNOOP) |
> >>>> -             GEN8_PPAT(6, CHV_PPAT_SNOOP) |
> >>>> -             GEN8_PPAT(7, CHV_PPAT_SNOOP);
> >>>>    -     I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
> >>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
> >>>> +       /* See gen8_pte_encode() for the mapping from cache-level to
> >>>> + PPAT
> >>>> */
> >>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
> >>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
> >>>>    }
> >>>>      static void gen6_gmch_remove(struct i915_address_space *vm) @@
> >>>> -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct
> >>>> i915_address_space *vm)
> >>>>      static void setup_private_pat(struct drm_i915_private *dev_priv)
> >>>>    {
> >>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
> >>>> +       int i;
> >>>> +
> >>>> +       ppat->i915 = dev_priv;
> >>>> +
> >>>>          if (INTEL_GEN(dev_priv) >= 10)
> >>>> -               cnl_setup_private_ppat(dev_priv);
> >>>> +               cnl_setup_private_ppat(ppat);
> >>>>          else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> >>>> -               chv_setup_private_ppat(dev_priv);
> >>>> +               chv_setup_private_ppat(ppat);
> >>>>          else
> >>>> -               bdw_setup_private_ppat(dev_priv);
> >>>> +               bdw_setup_private_ppat(ppat);
> >>>> +
> >>>> +       GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
> >>>> +
> >>>> +       for_each_clear_bit(i, ppat->used, ppat->max_entries) {
> >>>> +               ppat->entries[i].value = ppat->clear_value;
> >>>> +               ppat->entries[i].ppat = ppat;
> >>>> +               set_bit(i, ppat->dirty);
> >>>> +       }
> >>>> +
> >>>> +       ppat->update_hw(dev_priv);
> >>>>    }
> >>>>      static int gen8_gmch_probe(struct i915_ggtt *ggtt) @@ -3236,13
> >>>> +3414,10 @@ void i915_gem_restore_gtt_mappings(struct
> >>>> drm_i915_private *dev_priv)
> >>>>          ggtt->base.closed = false;
> >>>>          if (INTEL_GEN(dev_priv) >= 8) {
> >>>> -               if (INTEL_GEN(dev_priv) >= 10)
> >>>> -                       cnl_setup_private_ppat(dev_priv);
> >>>> -               else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
> >>>> -                       chv_setup_private_ppat(dev_priv);
> >>>> -               else
> >>>> -                       bdw_setup_private_ppat(dev_priv);
> >>>> +               struct intel_ppat *ppat = &dev_priv->ppat;
> >>>>    +             bitmap_set(ppat->dirty, 0, ppat->max_entries);
> >>>> +               dev_priv->ppat.update_hw(dev_priv);
> >>>>                  return;
> >>>>          }
> >>>>    diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
> >>>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
> >>>> index b4e3aa7..e10ca89 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> >>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> >>>> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
> >>>>    #define GEN8_PPAT_ELLC_OVERRIDE               (0<<2)
> >>>>    #define GEN8_PPAT(i, x)                       ((u64)(x) << ((i) * 8))
> >>>>    +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
> >>>> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2)) #define
> >>>> +GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4)) #define CHV_PPAT_GET_SNOOP(x)
> >>>> +((x) & (1 << 6))
> >>>> +
> >>>>    struct sg_table;
> >>>>      struct intel_rotation_info {
> >>>> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
> >>>>          return container_of(vm, struct i915_ggtt, base);
> >>>>    }
> >>>>    +#define INTEL_MAX_PPAT_ENTRIES 8
> >>>> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
> >>>> +
> >>>> +struct intel_ppat;
> >>>> +
> >>>> +struct intel_ppat_entry {
> >>>> +       struct intel_ppat *ppat;
> >>>> +       struct kref ref;
> >>>> +       u8 value;
> >>>> +};
> >>>> +
> >>>> +struct intel_ppat {
> >>>> +       struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
> >>>> +       DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
> >>>> +       DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
> >>>> +       unsigned int max_entries;
> >>>> +       u8 clear_value;
> >>>> +       /*
> >>>> +        * Return a score to show how two PPAT values match,
> >>>> +        * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
> >>>> +        */
> >>>> +       unsigned int (*match)(u8 src, u8 dst);
> >>>> +       void (*update_hw)(struct drm_i915_private *i915);
> >>>> +
> >>>> +       struct drm_i915_private *i915; };
> >>>> +
> >>>> +const struct intel_ppat_entry *
> >>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value); void
> >>>> +intel_ppat_put(const struct intel_ppat_entry *entry);
> >>>> +
> >>>>    int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
> >>>>    void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
> >>>>
> >>>
> >> _______________________________________________
> >> Intel-gfx mailing list
> >> Intel-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >
> >
> > --
> > Rodrigo Vivi
> > Blog: http://blog.vivi.eng.br
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  4:49     ` Zhi Wang
  2017-08-31  5:15       ` Rodrigo Vivi
@ 2017-08-31  8:25       ` Joonas Lahtinen
  2017-08-31  8:28         ` Wang, Zhi A
  1 sibling, 1 reply; 14+ messages in thread
From: Joonas Lahtinen @ 2017-08-31  8:25 UTC (permalink / raw)
  To: Zhi Wang, Vivi, Rodrigo; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev

On Thu, 2017-08-31 at 12:49 +0800, Zhi Wang wrote:
> Hi Vivi:
>      Thanks for the reply! The register are written in ppat->update_hw() 
> now.

Could we do like I already asked and not simultaneously change the code
organization and register values? Just have this patch write exactly
the same values to all registers than before, then a patch after this
which removes the supposedly unused variables from kernel.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  8:25       ` Joonas Lahtinen
@ 2017-08-31  8:28         ` Wang, Zhi A
  2017-08-31 14:20           ` Joonas Lahtinen
  0 siblings, 1 reply; 14+ messages in thread
From: Wang, Zhi A @ 2017-08-31  8:28 UTC (permalink / raw)
  To: Joonas Lahtinen, Vivi, Rodrigo
  Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev

Do you mean I still keep I915_WRITE(xxxxx) in xxxx_setup_private_pat() like before? Then changed them in a new patch? 

Thanks,
Zhi.

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
Sent: Thursday, August 31, 2017 11:26 AM
To: Wang, Zhi A <zhi.a.wang@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org; zhenyuw@linux.intel.com; intel-gvt-dev@lists.freedesktop.org; chris@chris-wilson.co.uk; Widawsky, Benjamin <benjamin.widawsky@intel.com>
Subject: Re: [RFCv6 2/2] drm/i915: Introduce private PAT management

On Thu, 2017-08-31 at 12:49 +0800, Zhi Wang wrote:
> Hi Vivi:
>      Thanks for the reply! The register are written in 
> ppat->update_hw() now.

Could we do like I already asked and not simultaneously change the code organization and register values? Just have this patch write exactly the same values to all registers than before, then a patch after this which removes the supposedly unused variables from kernel.

Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  8:28         ` Wang, Zhi A
@ 2017-08-31 14:20           ` Joonas Lahtinen
  2017-08-31 15:37             ` Wang, Zhi A
  0 siblings, 1 reply; 14+ messages in thread
From: Joonas Lahtinen @ 2017-08-31 14:20 UTC (permalink / raw)
  To: Wang, Zhi A, Vivi, Rodrigo; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev

On Thu, 2017-08-31 at 08:28 +0000, Wang, Zhi A wrote:
> Do you mean I still keep I915_WRITE(xxxxx) in xxxx_setup_private_pat() like before? Then changed them in a new patch? 

No, I mean use the new code structure, but make sure all register
writes are equal to what they were before (and please send a separate
patch for the CNL caching issue before that).

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31 14:20           ` Joonas Lahtinen
@ 2017-08-31 15:37             ` Wang, Zhi A
  0 siblings, 0 replies; 14+ messages in thread
From: Wang, Zhi A @ 2017-08-31 15:37 UTC (permalink / raw)
  To: Joonas Lahtinen, Vivi, Rodrigo
  Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev

I see. Thanks for the explanation! :)

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 
Sent: Thursday, August 31, 2017 5:20 PM
To: Wang, Zhi A <zhi.a.wang@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org; zhenyuw@linux.intel.com; intel-gvt-dev@lists.freedesktop.org; chris@chris-wilson.co.uk; Widawsky, Benjamin <benjamin.widawsky@intel.com>
Subject: Re: [RFCv6 2/2] drm/i915: Introduce private PAT management

On Thu, 2017-08-31 at 08:28 +0000, Wang, Zhi A wrote:
> Do you mean I still keep I915_WRITE(xxxxx) in xxxx_setup_private_pat() like before? Then changed them in a new patch? 

No, I mean use the new code structure, but make sure all register writes are equal to what they were before (and please send a separate patch for the CNL caching issue before that).

Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFCv6 2/2] drm/i915: Introduce private PAT management
  2017-08-31  5:15       ` Rodrigo Vivi
  2017-08-31  5:22         ` Wang, Zhi A
@ 2017-08-31 18:40         ` Wang, Zhi A
  1 sibling, 0 replies; 14+ messages in thread
From: Wang, Zhi A @ 2017-08-31 18:40 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx, Widawsky, Benjamin, intel-gvt-dev, Vivi, Rodrigo

I see why that is going wrong. I take Chris' comments directly like:

__alloc_ppat_entry(ppat, PPAT_CACHE_PDE_INDEX,....), but 0 != PPAT_CACHE_PDE_INDEX actually. Wait my update.

Thanks,
Zhi.

-----Original Message-----
From: Rodrigo Vivi [mailto:rodrigo.vivi@gmail.com] 
Sent: Thursday, August 31, 2017 8:15 AM
To: Wang, Zhi A <zhi.a.wang@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>; intel-gfx@lists.freedesktop.org; Widawsky, Benjamin <benjamin.widawsky@intel.com>; intel-gvt-dev@lists.freedesktop.org
Subject: Re: [Intel-gfx] [RFCv6 2/2] drm/i915: Introduce private PAT management

On Wed, Aug 30, 2017 at 9:49 PM, Zhi Wang <zhi.a.wang@intel.com> wrote:
> Hi Vivi:
>     Thanks for the reply! The register are written in ppat->update_hw() now.

oh, I saw now...
I hadden noticed that interation.

But something seems really odd yet...
My CNL with these 2 patches applied hangs on any execution...

>
>
> +static void cnl_private_pat_update_hw(struct drm_i915_private 
> +*dev_priv) {
> +       struct intel_ppat *ppat = &dev_priv->ppat;
> +       int i;
> +
> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
> +               clear_bit(i, ppat->dirty);
> +       }
> +}
> +
> +static void bdw_private_pat_update_hw(struct drm_i915_private 
> +*dev_priv) {
> +       struct intel_ppat *ppat = &dev_priv->ppat;
> +       u64 pat = 0;
> +       int i;
> +
> +       for (i = 0; i < ppat->max_entries; i++)
> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
> +
> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
> +
> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
>
> On 08/31/17 06:00, Vivi, Rodrigo wrote:
>>
>> On Wed, 2017-08-30 at 02:14 +0800, Zhi Wang wrote:
>>>
>>> The private PAT management is to support PPAT entry manipulation. 
>>> Two APIs are introduced for dynamically managing PPAT entries: 
>>> intel_ppat_get and intel_ppat_put.
>>>
>>> intel_ppat_get will search for an existing PPAT entry which 
>>> perfectly matches the required PPAT value. If not, it will try to 
>>> allocate or return a partially matched PPAT entry if there is any 
>>> available PPAT indexes or not.
>>>
>>> intel_ppat_put will put back the PPAT entry which comes from 
>>> intel_ppat_get. If it's dynamically allocated, the reference count 
>>> will be decreased. If the reference count turns into zero, the PPAT 
>>> index is freed again.
>>>
>>> Besides, another two callbacks are introduced to support the private 
>>> PAT management framework. One is ppat->update_hw(), which writes the 
>>> PPAT configurations in ppat->entries into HW. Another one is 
>>> ppat->match, which will return a score to show how two PPAT values 
>>> match with each other.
>>>
>>> v6:
>>>
>>> - Address all comments from Chris:
>>> http://www.spinics.net/lists/intel-gfx/msg136850.html
>>>
>>> - Address all comments from Joonas:
>>> http://www.spinics.net/lists/intel-gfx/msg136845.html
>>>
>>> v5:
>>>
>>> - Add check and warnnings for those platforms which don't have PPAT.
>>>
>>> v3:
>>>
>>> - Introduce dirty bitmap for PPAT registers. (Chris)
>>> - Change the name of the pointer "dev_priv" to "i915". (Chris)
>>> - intel_ppat_{get, put} returns/takes a const intel_ppat_entry *. 
>>> (Chris)
>>>
>>> v2:
>>>
>>> - API re-design. (Chris)
>>>
>>> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h     |   2 +
>>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 273
>>> +++++++++++++++++++++++++++++-------
>>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  36 +++++
>>>   3 files changed, 262 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>>> b/drivers/gpu/drm/i915/i915_drv.h index 7587ef5..5ffde10 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2312,6 +2312,8 @@ struct drm_i915_private {
>>>         DECLARE_HASHTABLE(mm_structs, 7);
>>>         struct mutex mm_lock;
>>>   +     struct intel_ppat ppat;
>>> +
>>>         /* Kernel Modesetting */
>>>         struct intel_crtc *plane_to_crtc_mapping[I915_MAX_PIPES];
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> index b74fa9d..3106142 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>>> @@ -2816,41 +2816,200 @@ static int ggtt_probe_common(struct 
>>> i915_ggtt *ggtt, u64 size)
>>>         return 0;
>>>   }
>>>   -static void cnl_setup_private_ppat(struct drm_i915_private 
>>> *dev_priv)
>>> +static struct intel_ppat_entry *
>>> +__alloc_ppat_entry(struct intel_ppat *ppat, unsigned int index, u8
>>> value)
>>>   {
>>> +       struct intel_ppat_entry *entry = &ppat->entries[index];
>>> +
>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>> +       GEM_BUG_ON(test_bit(index, ppat->used));
>>> +
>>> +       entry->ppat = ppat;
>>> +       entry->value = value;
>>> +       kref_init(&entry->ref);
>>> +       set_bit(index, ppat->used);
>>> +       set_bit(index, ppat->dirty);
>>> +
>>> +       return entry;
>>> +}
>>> +
>>> +static void __free_ppat_entry(struct intel_ppat_entry *entry) {
>>> +       struct intel_ppat *ppat = entry->ppat;
>>> +       unsigned int index = entry - ppat->entries;
>>> +
>>> +       GEM_BUG_ON(index >= ppat->max_entries);
>>> +       GEM_BUG_ON(!test_bit(index, ppat->used));
>>> +
>>> +       entry->value = ppat->clear_value;
>>> +       clear_bit(index, ppat->used);
>>> +       set_bit(index, ppat->dirty); }
>>> +
>>> +/**
>>> + * intel_ppat_get - get a usable PPAT entry
>>> + * @i915: i915 device instance
>>> + * @value: the PPAT value required by the caller
>>> + *
>>> + * The function tries to search if there is an existing PPAT entry 
>>> +which
>>> + * matches with the required value. If perfectly matched, the 
>>> +existing
>>> PPAT
>>> + * entry will be used. If only partially matched, it will try to 
>>> + check
>>> if
>>> + * there is any available PPAT index. If yes, it will allocate a 
>>> + new
>>> PPAT
>>> + * index for the required entry and update the HW. If not, the 
>>> +partially
>>> + * matched entry will be used.
>>> + */
>>> +const struct intel_ppat_entry *
>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value) {
>>> +       struct intel_ppat *ppat = &i915->ppat;
>>> +       struct intel_ppat_entry *entry;
>>> +       unsigned int scanned, best_score;
>>> +       int i;
>>> +
>>> +       GEM_BUG_ON(!ppat->max_entries);
>>> +
>>> +       scanned = best_score = 0;
>>> +
>>> +       for_each_set_bit(i, ppat->used, ppat->max_entries) {
>>> +               unsigned int score;
>>> +
>>> +               entry = &ppat->entries[i];
>>> +               score = ppat->match(entry->value, value);
>>> +               if (score > best_score) {
>>> +                       if (score == INTEL_PPAT_PERFECT_MATCH) {
>>> +                               kref_get(&entry->ref);
>>> +                               return entry;
>>> +                       }
>>> +                       best_score = score;
>>> +               }
>>> +               scanned++;
>>> +       }
>>> +
>>> +       if (scanned == ppat->max_entries) {
>>> +               if (!best_score)
>>> +                       return ERR_PTR(-ENOSPC);
>>> +
>>> +               kref_get(&entry->ref);
>>> +               return entry;
>>> +       }
>>> +
>>> +       i = find_first_zero_bit(ppat->used, ppat->max_entries);
>>> +       entry = __alloc_ppat_entry(ppat, i, value);
>>> +       ppat->update_hw(i915);
>>> +       return entry;
>>> +}
>>> +
>>> +static void release_ppat(struct kref *kref) {
>>> +       struct intel_ppat_entry *entry =
>>> +               container_of(kref, struct intel_ppat_entry, ref);
>>> +       struct drm_i915_private *i915 = entry->ppat->i915;
>>> +
>>> +       __free_ppat_entry(entry);
>>> +       entry->ppat->update_hw(i915); }
>>> +
>>> +/**
>>> + * intel_ppat_put - put back the PPAT entry got from 
>>> +intel_ppat_get()
>>> + * @entry: an intel PPAT entry
>>> + *
>>> + * Put back the PPAT entry got from intel_ppat_get(). If the PPAT 
>>> +index
>>> of the
>>> + * entry is dynamically allocated, its reference count will be
>>> decreased. Once
>>> + * the reference count becomes into zero, the PPAT index becomes 
>>> + free
>>> again.
>>> + */
>>> +void intel_ppat_put(const struct intel_ppat_entry *entry) {
>>> +       struct intel_ppat *ppat = entry->ppat;
>>> +       unsigned int index = entry - ppat->entries;
>>> +
>>> +       GEM_BUG_ON(!ppat->max_entries);
>>> +
>>> +       kref_put(&ppat->entries[index].ref, release_ppat); }
>>> +
>>> +static void cnl_private_pat_update_hw(struct drm_i915_private 
>>> +*dev_priv) {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       int i;
>>> +
>>> +       for_each_set_bit(i, ppat->dirty, ppat->max_entries) {
>>> +               I915_WRITE(GEN10_PAT_INDEX(i), ppat->entries[i].value);
>>> +               clear_bit(i, ppat->dirty);
>>> +       }
>>> +}
>>> +
>>> +static void bdw_private_pat_update_hw(struct drm_i915_private 
>>> +*dev_priv) {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       u64 pat = 0;
>>> +       int i;
>>> +
>>> +       for (i = 0; i < ppat->max_entries; i++)
>>> +               pat |= GEN8_PPAT(i, ppat->entries[i].value);
>>> +
>>> +       bitmap_clear(ppat->dirty, 0, ppat->max_entries);
>>> +
>>> +       I915_WRITE(GEN8_PRIVATE_PAT_LO, lower_32_bits(pat));
>>> +       I915_WRITE(GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); }
>>> +
>>> +static unsigned int bdw_private_pat_match(u8 src, u8 dst) {
>>> +       unsigned int score = 0;
>>> +
>>> +       /* Cache attribute has to be matched. */
>>> +       if (GEN8_PPAT_GET_CA(src) != GEN8_PPAT_GET_CA(dst))
>>> +               return 0;
>>> +
>>> +       if (GEN8_PPAT_GET_TC(src) == GEN8_PPAT_GET_TC(dst))
>>> +               score += 2;
>>> +
>>> +       if(GEN8_PPAT_GET_AGE(src) == GEN8_PPAT_GET_AGE(dst))
>>> +               score += 1;
>>> +
>>> +       if (score == 3)
>>> +               return INTEL_PPAT_PERFECT_MATCH;
>>> +
>>> +       return score;
>>> +}
>>> +
>>> +static unsigned int chv_private_pat_match(u8 src, u8 dst) {
>>> +       return (CHV_PPAT_GET_SNOOP(src) == CHV_PPAT_GET_SNOOP(dst)) ?
>>> +               INTEL_PPAT_PERFECT_MATCH : 0; }
>>> +
>>> +static void cnl_setup_private_ppat(struct intel_ppat *ppat) {
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = cnl_private_pat_update_hw;
>>> +       ppat->match = bdw_private_pat_match;
>>> +       ppat->clear_value = GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(3);
>>> +
>>>         /* XXX: spec is unclear if this is still needed for CNL+ */
>>> -       if (!USES_PPGTT(dev_priv)) {
>>> -               I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_UC);
>>> +       if (!USES_PPGTT(ppat->i915)) {
>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>> GEN8_PPAT_UC);
>>>                 return;
>>>         }
>>>   -     I915_WRITE(GEN10_PAT_INDEX(0), GEN8_PPAT_WB | GEN8_PPAT_LLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(1), GEN8_PPAT_WC | GEN8_PPAT_LLCELLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(2), GEN8_PPAT_WT | GEN8_PPAT_LLCELLC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(3), GEN8_PPAT_UC);
>>> -       I915_WRITE(GEN10_PAT_INDEX(4), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0));
>>> -       I915_WRITE(GEN10_PAT_INDEX(5), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(1));
>>> -       I915_WRITE(GEN10_PAT_INDEX(6), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(2));
>>> -       I915_WRITE(GEN10_PAT_INDEX(7), GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3));
>>
>>
>> Gen8 allocation style doesn't work on CNL.
>> I don't see where all these registers would be getting properly written.
>>
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to 
>>> + PPAT
>>> */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB 
>>> + |
>>> GEN8_PPAT_LLC);
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 
>>> + GEN8_PPAT_WT |
>>> GEN8_PPAT_LLCELLC);
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, 
>>> + GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0));
>>>   }
>>>     /* The GGTT and PPGTT need a private PPAT setup in order to 
>>> handle cacheability
>>>    * bits. When using advanced contexts each context stores its own 
>>> PAT, but
>>>    * writing this data shouldn't be harmful even in those cases. */ 
>>> -static void bdw_setup_private_ppat(struct drm_i915_private 
>>> *dev_priv)
>>> +static void bdw_setup_private_ppat(struct intel_ppat *ppat)
>>>   {
>>> -       u64 pat;
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>> +       ppat->match = bdw_private_pat_match;
>>> +       ppat->clear_value = GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3);
>>>   -     pat = GEN8_PPAT(0, GEN8_PPAT_WB | GEN8_PPAT_LLC)     | /* for
>>> normal objects, no eLLC */
>>> -             GEN8_PPAT(1, GEN8_PPAT_WC | GEN8_PPAT_LLCELLC) | /* for
>>> something pointing to ptes? */
>>> -             GEN8_PPAT(2, GEN8_PPAT_WT | GEN8_PPAT_LLCELLC) | /* for
>>> scanout with eLLC */
>>> -             GEN8_PPAT(3, GEN8_PPAT_UC)                     | /*
>>> Uncached objects, mostly for scanout */
>>> -             GEN8_PPAT(4, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(0)) |
>>> -             GEN8_PPAT(5, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(1)) |
>>> -             GEN8_PPAT(6, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(2)) |
>>> -             GEN8_PPAT(7, GEN8_PPAT_WB | GEN8_PPAT_LLCELLC |
>>> GEN8_PPAT_AGE(3));
>>> -
>>> -       if (!USES_PPGTT(dev_priv))
>>> +       if (!USES_PPGTT(ppat->i915)) {
>>>                 /* Spec: "For GGTT, there is NO pat_sel[2:0] from 
>>> the entry,
>>>                  * so RTL will always use the value corresponding to
>>>                  * pat_sel = 000".
>>> @@ -2864,17 +3023,26 @@ static void bdw_setup_private_ppat(struct 
>>> drm_i915_private *dev_priv)
>>>                  * So we can still hold onto all our assumptions wrt cpu
>>>                  * clflushing on LLC machines.
>>>                  */
>>> -               pat = GEN8_PPAT(0, GEN8_PPAT_UC);
>>> +               __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX,
>>> GEN8_PPAT_UC);
>>> +               return;
>>> +       }
>>>   -     /* XXX: spec defines this as 2 distinct registers. It's unclear
>>> if a 64b
>>> -        * write would work. */
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to 
>>> + PPAT
>>> */
>>> +       /* for normal objects, no eLLC */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, GEN8_PPAT_WB 
>>> + |
>>> GEN8_PPAT_LLC);
>>> +       /* for scanout with eLLC */
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 
>>> + GEN8_PPAT_WT |
>>> GEN8_PPAT_LLCELLC);
>>> +       /* Uncached objects, mostly for scanout */
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, GEN8_PPAT_UC);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, GEN8_PPAT_WB |
>>> GEN8_PPAT_LLCELLC | GEN8_PPAT_AGE(0));
>>>   }
>>>   -static void chv_setup_private_ppat(struct drm_i915_private 
>>> *dev_priv)
>>> +static void chv_setup_private_ppat(struct intel_ppat *ppat)
>>>   {
>>> -       u64 pat;
>>> +       ppat->max_entries = 8;
>>> +       ppat->update_hw = bdw_private_pat_update_hw;
>>> +       ppat->match = chv_private_pat_match;
>>> +       ppat->clear_value = CHV_PPAT_SNOOP;
>>>         /*
>>>          * Map WB on BDW to snooped on CHV.
>>> @@ -2894,17 +3062,12 @@ static void chv_setup_private_ppat(struct 
>>> drm_i915_private *dev_priv)
>>>          * Which means we must set the snoop bit in PAT entry 0
>>>          * in order to keep the global status page working.
>>>          */
>>> -       pat = GEN8_PPAT(0, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(1, 0) |
>>> -             GEN8_PPAT(2, 0) |
>>> -             GEN8_PPAT(3, 0) |
>>> -             GEN8_PPAT(4, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(5, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(6, CHV_PPAT_SNOOP) |
>>> -             GEN8_PPAT(7, CHV_PPAT_SNOOP);
>>>   -     I915_WRITE(GEN8_PRIVATE_PAT_LO, pat);
>>> -       I915_WRITE(GEN8_PRIVATE_PAT_HI, pat >> 32);
>>> +       /* See gen8_pte_encode() for the mapping from cache-level to 
>>> + PPAT
>>> */
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_PDE_INDEX, CHV_PPAT_SNOOP);
>>> +       __alloc_ppat_entry(ppat, PPAT_DISPLAY_ELLC_INDEX, 0);
>>> +       __alloc_ppat_entry(ppat, PPAT_UNCACHED_INDEX, 0);
>>> +       __alloc_ppat_entry(ppat, PPAT_CACHED_INDEX, CHV_PPAT_SNOOP);
>>>   }
>>>     static void gen6_gmch_remove(struct i915_address_space *vm) @@ 
>>> -2917,12 +3080,27 @@ static void gen6_gmch_remove(struct 
>>> i915_address_space *vm)
>>>     static void setup_private_pat(struct drm_i915_private *dev_priv)
>>>   {
>>> +       struct intel_ppat *ppat = &dev_priv->ppat;
>>> +       int i;
>>> +
>>> +       ppat->i915 = dev_priv;
>>> +
>>>         if (INTEL_GEN(dev_priv) >= 10)
>>> -               cnl_setup_private_ppat(dev_priv);
>>> +               cnl_setup_private_ppat(ppat);
>>>         else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>> -               chv_setup_private_ppat(dev_priv);
>>> +               chv_setup_private_ppat(ppat);
>>>         else
>>> -               bdw_setup_private_ppat(dev_priv);
>>> +               bdw_setup_private_ppat(ppat);
>>> +
>>> +       GEM_BUG_ON(ppat->max_entries > INTEL_MAX_PPAT_ENTRIES);
>>> +
>>> +       for_each_clear_bit(i, ppat->used, ppat->max_entries) {
>>> +               ppat->entries[i].value = ppat->clear_value;
>>> +               ppat->entries[i].ppat = ppat;
>>> +               set_bit(i, ppat->dirty);
>>> +       }
>>> +
>>> +       ppat->update_hw(dev_priv);
>>>   }
>>>     static int gen8_gmch_probe(struct i915_ggtt *ggtt) @@ -3236,13 
>>> +3414,10 @@ void i915_gem_restore_gtt_mappings(struct
>>> drm_i915_private *dev_priv)
>>>         ggtt->base.closed = false;
>>>         if (INTEL_GEN(dev_priv) >= 8) {
>>> -               if (INTEL_GEN(dev_priv) >= 10)
>>> -                       cnl_setup_private_ppat(dev_priv);
>>> -               else if (IS_CHERRYVIEW(dev_priv) || IS_GEN9_LP(dev_priv))
>>> -                       chv_setup_private_ppat(dev_priv);
>>> -               else
>>> -                       bdw_setup_private_ppat(dev_priv);
>>> +               struct intel_ppat *ppat = &dev_priv->ppat;
>>>   +             bitmap_set(ppat->dirty, 0, ppat->max_entries);
>>> +               dev_priv->ppat.update_hw(dev_priv);
>>>                 return;
>>>         }
>>>   diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> index b4e3aa7..e10ca89 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>>> @@ -143,6 +143,11 @@ typedef u64 gen8_ppgtt_pml4e_t;
>>>   #define GEN8_PPAT_ELLC_OVERRIDE               (0<<2)
>>>   #define GEN8_PPAT(i, x)                       ((u64)(x) << ((i) * 8))
>>>   +#define GEN8_PPAT_GET_CA(x) ((x) & 3)
>>> +#define GEN8_PPAT_GET_TC(x) ((x) & (3 << 2)) #define 
>>> +GEN8_PPAT_GET_AGE(x) ((x) & (3 << 4)) #define CHV_PPAT_GET_SNOOP(x) 
>>> +((x) & (1 << 6))
>>> +
>>>   struct sg_table;
>>>     struct intel_rotation_info {
>>> @@ -536,6 +541,37 @@ i915_vm_to_ggtt(struct i915_address_space *vm)
>>>         return container_of(vm, struct i915_ggtt, base);
>>>   }
>>>   +#define INTEL_MAX_PPAT_ENTRIES 8
>>> +#define INTEL_PPAT_PERFECT_MATCH (~0U)
>>> +
>>> +struct intel_ppat;
>>> +
>>> +struct intel_ppat_entry {
>>> +       struct intel_ppat *ppat;
>>> +       struct kref ref;
>>> +       u8 value;
>>> +};
>>> +
>>> +struct intel_ppat {
>>> +       struct intel_ppat_entry entries[INTEL_MAX_PPAT_ENTRIES];
>>> +       DECLARE_BITMAP(used, INTEL_MAX_PPAT_ENTRIES);
>>> +       DECLARE_BITMAP(dirty, INTEL_MAX_PPAT_ENTRIES);
>>> +       unsigned int max_entries;
>>> +       u8 clear_value;
>>> +       /*
>>> +        * Return a score to show how two PPAT values match,
>>> +        * a INTEL_PPAT_PERFECT_MATCH indicates a perfect match
>>> +        */
>>> +       unsigned int (*match)(u8 src, u8 dst);
>>> +       void (*update_hw)(struct drm_i915_private *i915);
>>> +
>>> +       struct drm_i915_private *i915; };
>>> +
>>> +const struct intel_ppat_entry *
>>> +intel_ppat_get(struct drm_i915_private *i915, u8 value); void 
>>> +intel_ppat_put(const struct intel_ppat_entry *entry);
>>> +
>>>   int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915);
>>>   void i915_gem_fini_aliasing_ppgtt(struct drm_i915_private *i915);
>>>
>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx



--
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-08-31 18:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 18:14 [RFCv6 1/2] drm/i915: Factor out setup_private_pat() Zhi Wang
2017-08-29 18:14 ` [RFCv6 2/2] drm/i915: Introduce private PAT management Zhi Wang
2017-08-30 22:00   ` Vivi, Rodrigo
2017-08-31  4:49     ` Zhi Wang
2017-08-31  5:15       ` Rodrigo Vivi
2017-08-31  5:22         ` Wang, Zhi A
2017-08-31  5:24           ` Zhi Wang
2017-08-31  5:47             ` Vivi, Rodrigo
2017-08-31 18:40         ` Wang, Zhi A
2017-08-31  8:25       ` Joonas Lahtinen
2017-08-31  8:28         ` Wang, Zhi A
2017-08-31 14:20           ` Joonas Lahtinen
2017-08-31 15:37             ` Wang, Zhi A
2017-08-29 18:37 ` ✗ Fi.CI.BAT: failure for series starting with [RFCv6,1/2] drm/i915: Factor out setup_private_pat() Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.