All of lore.kernel.org
 help / color / mirror / Atom feed
* Start tidying up execbuf relocations
@ 2016-06-09 11:29 Chris Wilson
  2016-06-09 11:29 ` [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

Just a small set of changes after the last 100 or so that start
preparing for an overhaul of execbuf relocation processing - though
immediately after these in my queue are partial VMA handling...
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 12:29   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 2/8] drm/i915: Cache kmap between relocations Chris Wilson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 48cf9dfbe4ac..b333cf4923bc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -541,6 +541,7 @@ struct drm_i915_error_state {
 		struct drm_i915_error_object {
 			int page_count;
 			u64 gtt_offset;
+			u64 gtt_size;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 76d63e047fae..ab0824b1ce6d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -249,6 +249,13 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
 	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
 	err_printf(m, "  INSTDONE: 0x%08x\n", ring->instdone);
+	if (ring->batchbuffer) {
+		u64 start = ring->batchbuffer->gtt_offset;
+		u64 end = start + ring->batchbuffer->gtt_size;
+		err_printf(m, "  batch: [0x%08x %08x, 0x%08x %08x]\n",
+			   upper_32_bits(start), lower_32_bits(start),
+			   upper_32_bits(end), lower_32_bits(end));
+	}
 	if (INTEL_INFO(dev)->gen >= 4) {
 		err_printf(m, "  BBADDR: 0x%08x %08x\n", (u32)(ring->bbaddr>>32), (u32)ring->bbaddr);
 		err_printf(m, "  BB_STATE: 0x%08x\n", ring->bbstate);
@@ -655,7 +662,10 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	if (dst == NULL)
 		return NULL;
 
-	reloc_offset = dst->gtt_offset = vma->node.start;
+	dst->gtt_offset = vma->node.start;
+	dst->gtt_size = vma->node.size;
+
+	reloc_offset = dst->gtt_offset;
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
 		   (vma->bound & GLOBAL_BIND) &&
 		   reloc_offset + num_pages * PAGE_SIZE <= ggtt->mappable_end);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/8] drm/i915: Cache kmap between relocations
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
  2016-06-09 11:29 ` [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 12:25   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.

Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 160 +++++++++++++++++++----------
 1 file changed, 106 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a29c4b6fea28..318c71b663f4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -302,9 +302,50 @@ relocation_target(struct drm_i915_gem_relocation_entry *reloc,
 	return gen8_canonical_addr((int)reloc->delta + target_offset);
 }
 
+struct reloc_cache {
+	void *vaddr;
+	unsigned page;
+	enum { KMAP, IOMAP } type;
+};
+
+static void reloc_cache_init(struct reloc_cache *cache)
+{
+	cache->page = -1;
+	cache->vaddr = NULL;
+}
+
+static void reloc_cache_fini(struct reloc_cache *cache)
+{
+	if (cache->vaddr == NULL)
+		return;
+
+	switch (cache->type) {
+	case KMAP: kunmap_atomic(cache->vaddr); break;
+	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
+	}
+}
+
+static void *reloc_kmap(struct drm_i915_gem_object *obj,
+			struct reloc_cache *cache,
+			int page)
+{
+	if (cache->page == page)
+		return cache->vaddr;
+
+	if (cache->vaddr)
+		kunmap_atomic(cache->vaddr);
+
+	cache->page = page;
+	cache->vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
+	cache->type = KMAP;
+
+	return cache->vaddr;
+}
+
 static int
 relocate_entry_cpu(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
+		   struct reloc_cache *cache,
 		   uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -317,34 +358,47 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-				reloc->offset >> PAGE_SHIFT));
+	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
 	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
 
-	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
-
-		if (page_offset == 0) {
-			kunmap_atomic(vaddr);
-			vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+	if (INTEL_GEN(dev) >= 8) {
+		page_offset += sizeof(uint32_t);
+		if (page_offset == PAGE_SIZE) {
+			vaddr = reloc_kmap(obj, cache, cache->page + 1);
+			page_offset = 0;
 		}
-
 		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
 	}
 
-	kunmap_atomic(vaddr);
-
 	return 0;
 }
 
+static void *reloc_iomap(struct drm_i915_private *i915,
+			 struct reloc_cache *cache,
+			 uint64_t offset)
+{
+	if (cache->page == offset >> PAGE_SHIFT)
+		return cache->vaddr;
+
+	if (cache->vaddr)
+		io_mapping_unmap_atomic(cache->vaddr);
+
+	cache->page = offset >> PAGE_SHIFT;
+	cache->vaddr =
+		io_mapping_map_atomic_wc(i915->ggtt.mappable,
+					 offset & PAGE_MASK);
+	cache->type = IOMAP;
+
+	return cache->vaddr;
+}
+
 static int
 relocate_entry_gtt(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
+		   struct reloc_cache *cache,
 		   uint64_t target_offset)
 {
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	struct i915_vma *vma;
 	uint64_t delta = relocation_target(reloc, target_offset);
 	uint64_t offset;
@@ -366,28 +420,19 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 	/* Map the page containing the relocation we're going to perform.  */
 	offset = vma->node.start;
 	offset += reloc->offset;
-	reloc_page = io_mapping_map_atomic_wc(ggtt->mappable,
-					      offset & PAGE_MASK);
+	reloc_page = reloc_iomap(dev_priv, cache, offset);
 	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		offset += sizeof(uint32_t);
-
-		if (offset_in_page(offset) == 0) {
-			io_mapping_unmap_atomic(reloc_page);
-			reloc_page =
-				io_mapping_map_atomic_wc(ggtt->mappable,
-							 offset);
-		}
-
+		if (offset_in_page(offset) == 0)
+			reloc_page = reloc_iomap(dev_priv, cache, offset);
 		iowrite32(upper_32_bits(delta),
 			  reloc_page + offset_in_page(offset));
 	}
 
-	io_mapping_unmap_atomic(reloc_page);
-
 unpin:
-	i915_vma_unpin(vma);
+	__i915_vma_unpin(vma);
 	return ret;
 }
 
@@ -403,6 +448,7 @@ clflush_write32(void *addr, uint32_t value)
 static int
 relocate_entry_clflush(struct drm_i915_gem_object *obj,
 		       struct drm_i915_gem_relocation_entry *reloc,
+		       struct reloc_cache *cache,
 		       uint64_t target_offset)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -415,24 +461,18 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-				reloc->offset >> PAGE_SHIFT));
+	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
 	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
 
-	if (INTEL_INFO(dev)->gen >= 8) {
-		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
-
-		if (page_offset == 0) {
-			kunmap_atomic(vaddr);
-			vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
-			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+	if (INTEL_GEN(dev) >= 8) {
+		page_offset += sizeof(uint32_t);
+		if (page_offset == PAGE_SIZE) {
+			vaddr = reloc_kmap(obj, cache, cache->page + 1);
+			page_offset = 0;
 		}
-
 		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
 	}
 
-	kunmap_atomic(vaddr);
-
 	return 0;
 }
 
@@ -453,7 +493,8 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
-				   struct drm_i915_gem_relocation_entry *reloc)
+				   struct drm_i915_gem_relocation_entry *reloc,
+				   struct reloc_cache *cache)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
@@ -537,11 +578,11 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		return -EFAULT;
 
 	if (use_cpu_reloc(obj))
-		ret = relocate_entry_cpu(obj, reloc, target_offset);
+		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
 	else if (obj->map_and_fenceable)
-		ret = relocate_entry_gtt(obj, reloc, target_offset);
+		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
 	else if (static_cpu_has(X86_FEATURE_CLFLUSH))
-		ret = relocate_entry_clflush(obj, reloc, target_offset);
+		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
 	else {
 		WARN_ONCE(1, "Impossible case in relocation handling\n");
 		ret = -ENODEV;
@@ -564,9 +605,11 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
 	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int remain, ret;
+	struct reloc_cache cache;
+	int remain, ret = 0;
 
 	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
+	reloc_cache_init(&cache);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -576,19 +619,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 			count = ARRAY_SIZE(stack_reloc);
 		remain -= count;
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0])))
-			return -EFAULT;
+		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+			ret = -EFAULT;
+			goto out;
+		}
 
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r);
+			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
 			if (ret)
-				return ret;
+				goto out;
 
 			if (r->presumed_offset != offset &&
-			    __put_user(r->presumed_offset, &user_relocs->presumed_offset)) {
-				return -EFAULT;
+			    __put_user(r->presumed_offset,
+				       &user_relocs->presumed_offset)) {
+				ret = -EFAULT;
+				goto out;
 			}
 
 			user_relocs++;
@@ -596,7 +643,9 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 		} while (--count);
 	}
 
-	return 0;
+out:
+	reloc_cache_fini(&cache);
+	return ret;
 #undef N_RELOC
 }
 
@@ -606,15 +655,18 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
 				      struct drm_i915_gem_relocation_entry *relocs)
 {
 	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
-	int i, ret;
+	struct reloc_cache cache;
+	int i, ret = 0;
 
+	reloc_cache_init(&cache);
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i]);
+		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
 		if (ret)
-			return ret;
+			break;
 	}
+	reloc_cache_fini(&cache);
 
-	return 0;
+	return ret;
 }
 
 static int
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write()
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
  2016-06-09 11:29 ` [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
  2016-06-09 11:29 ` [PATCH 2/8] drm/i915: Cache kmap between relocations Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 11:39   ` Chris Wilson
  2016-06-09 12:47   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 4/8] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.

Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c |   2 +-
 drivers/gpu/drm/i915/i915_drv.h        |   6 +-
 drivers/gpu/drm/i915/i915_gem.c        | 119 +++++++++++++++++++++------------
 3 files changed, 83 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index b0fd6a7b0603..de038da6c01b 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -970,7 +970,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		       u32 batch_start_offset,
 		       u32 batch_len)
 {
-	int needs_clflush = 0;
+	unsigned needs_clflush;
 	void *src_base, *src;
 	void *dst = NULL;
 	int ret;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b333cf4923bc..ef321f84e5fc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3026,7 +3026,11 @@ void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
 void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
 
 int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
-				    int *needs_clflush);
+				    unsigned *needs_clflush);
+int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
+				     unsigned *needs_clflush);
+#define CLFLUSH_BEFORE 0x1
+#define CLFLUSH_AFTER 0x2
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a41fa01eb024..c6d06cb21191 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -563,34 +563,95 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
  * flush the object from the CPU cache.
  */
 int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
-				    int *needs_clflush)
+				    unsigned *needs_clflush)
 {
 	int ret;
 
 	*needs_clflush = 0;
-
-	if (WARN_ON((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0))
+	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
 	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
+		ret = i915_gem_object_wait_rendering(obj, true);
+		if (ret)
+			return ret;
+
 		/* If we're not in the cpu read domain, set ourself into the gtt
 		 * read domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will dirty the data
 		 * anyway again before the next pread happens. */
 		*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
 							obj->cache_level);
-		ret = i915_gem_object_wait_rendering(obj, true);
+	}
+
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
+	i915_gem_object_pin_pages(obj);
+
+	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
+		ret = i915_gem_object_set_to_cpu_domain(obj, false);
+		if (ret) {
+			i915_gem_object_unpin_pages(obj);
+			return ret;
+		}
+		*needs_clflush = 0;
+	}
+
+	return 0;
+}
+
+int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
+				    unsigned *needs_clflush)
+{
+	int ret;
+
+	*needs_clflush = 0;
+	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
+		return -EINVAL;
+
+	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
 			return ret;
+
+		/* If we're not in the cpu write domain, set ourself into the
+		 * gtt write domain and manually flush cachelines (as required).
+		 * This optimizes for the case when the gpu will use the data
+		 * right away and we therefore have to clflush anyway.
+		 */
+		*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
 	}
 
+	/* Same trick applies to invalidate partially written cachelines read
+	 * before writing.
+	 */
+	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
+		*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
+							 obj->cache_level);
+
 	ret = i915_gem_object_get_pages(obj);
 	if (ret)
 		return ret;
 
 	i915_gem_object_pin_pages(obj);
 
-	return ret;
+	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
+		if (ret) {
+			i915_gem_object_unpin_pages(obj);
+			return ret;
+		}
+		*needs_clflush = 0;
+	}
+
+	if ((*needs_clflush & CLFLUSH_AFTER) == 0)
+		obj->cache_dirty = true;
+
+	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
+	obj->dirty = 1;
+	return 0;
 }
 
 /* Per-page copy function for the shmem pread fastpath.
@@ -999,41 +1060,17 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 	int shmem_page_offset, page_length, ret = 0;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 	int hit_slowpath = 0;
-	int needs_clflush_after = 0;
-	int needs_clflush_before = 0;
+	unsigned needs_clflush;
 	struct sg_page_iter sg_iter;
 
-	user_data = u64_to_user_ptr(args->data_ptr);
-	remain = args->size;
-
-	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
-
-	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-		/* If we're not in the cpu write domain, set ourself into the gtt
-		 * write domain and manually flush cachelines (if required). This
-		 * optimizes for the case when the gpu will use the data
-		 * right away and we therefore have to clflush anyway. */
-		needs_clflush_after = cpu_write_needs_clflush(obj);
-		ret = i915_gem_object_wait_rendering(obj, false);
-		if (ret)
-			return ret;
-	}
-	/* Same trick applies to invalidate partially written cachelines read
-	 * before writing. */
-	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
-		needs_clflush_before =
-			!cpu_cache_is_coherent(dev, obj->cache_level);
-
-	ret = i915_gem_object_get_pages(obj);
+	ret = i915_gem_obj_prepare_shmem_write(obj, &needs_clflush);
 	if (ret)
 		return ret;
 
-	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
-
-	i915_gem_object_pin_pages(obj);
-
+	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
+	user_data = u64_to_user_ptr(args->data_ptr);
 	offset = args->offset;
-	obj->dirty = 1;
+	remain = args->size;
 
 	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
 			 offset >> PAGE_SHIFT) {
@@ -1057,7 +1094,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		/* If we don't overwrite a cacheline completely we need to be
 		 * careful to have up-to-date data by first clflushing. Don't
 		 * overcomplicate things and flush the entire patch. */
-		partial_cacheline_write = needs_clflush_before &&
+		partial_cacheline_write = needs_clflush & CLFLUSH_BEFORE &&
 			((shmem_page_offset | page_length)
 				& (boot_cpu_data.x86_clflush_size - 1));
 
@@ -1067,7 +1104,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = shmem_pwrite_fast(page, shmem_page_offset, page_length,
 					user_data, page_do_bit17_swizzling,
 					partial_cacheline_write,
-					needs_clflush_after);
+					needs_clflush & CLFLUSH_AFTER);
 		if (ret == 0)
 			goto next_page;
 
@@ -1076,7 +1113,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		ret = shmem_pwrite_slow(page, shmem_page_offset, page_length,
 					user_data, page_do_bit17_swizzling,
 					partial_cacheline_write,
-					needs_clflush_after);
+					needs_clflush & CLFLUSH_AFTER);
 
 		mutex_lock(&dev->struct_mutex);
 
@@ -1098,17 +1135,15 @@ out:
 		 * cachelines in-line while writing and the object moved
 		 * out of the cpu write domain while we've dropped the lock.
 		 */
-		if (!needs_clflush_after &&
+		if (!(needs_clflush & CLFLUSH_AFTER) &&
 		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 			if (i915_gem_clflush_object(obj, obj->pin_display))
-				needs_clflush_after = true;
+				needs_clflush |= CLFLUSH_AFTER;
 		}
 	}
 
-	if (needs_clflush_after)
+	if (needs_clflush & CLFLUSH_AFTER)
 		i915_gem_chipset_flush(to_i915(dev));
-	else
-		obj->cache_dirty = true;
 
 	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
 	return ret;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/8] drm/i915: Before accessing an object via the cpu, flush GTT writes
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
                   ` (2 preceding siblings ...)
  2016-06-09 11:29 ` [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 12:50   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c6d06cb21191..18b4a684ddde 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -571,6 +571,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
+	i915_gem_object_flush_gtt_write_domain(obj);
+
 	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
 		ret = i915_gem_object_wait_rendering(obj, true);
 		if (ret)
@@ -611,6 +613,8 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
+	i915_gem_object_flush_gtt_write_domain(obj);
+
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 		ret = i915_gem_object_wait_rendering(obj, false);
 		if (ret)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
                   ` (3 preceding siblings ...)
  2016-06-09 11:29 ` [PATCH 4/8] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 11:36   ` Chris Wilson
  2016-06-09 12:54   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.

The issue can be illustrated in userspace with:

	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);

	for (i = 0; i < OBJECT_SIZE / 64; i++) {
		int x = 16*i + (i%16);
		gtt[x] = i;
		clflush(&cpu[x], sizeof(cpu[x]));
		assert(cpu[x] == i);
	}

Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 18b4a684ddde..ffe3d3e9d69d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2898,20 +2898,30 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 static void
 i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 {
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
 		return;
 
 	/* No actual flushing is required for the GTT write domain.  Writes
-	 * to it immediately go to main memory as far as we know, so there's
+	 * to it "immediately" go to main memory as far as we know, so there's
 	 * no chipset flush.  It also doesn't land in render cache.
 	 *
 	 * However, we do have to enforce the order so that all writes through
 	 * the GTT land before any writes to the device, such as updates to
 	 * the GATT itself.
+	 *
+	 * We also have to wait a bit for the writes to land from the GTT.
+	 * An uncached read (i.e. mmio) seems to be ideal for the round-trip
+	 * timing. This issue has only been observed when switching quickly
+	 * between GTT writes and CPU reads from inside the kernel on recent hw,
+	 * and it appears to only affect discrete GTT blocks (i.e. on LLC
+	 * system agents we cannot reproduce this behaviour).
 	 */
 	wmb();
+	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))
+		POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base));
 
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
                   ` (4 preceding siblings ...)
  2016-06-09 11:29 ` [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 13:06   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 7/8] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody may steal the
lock and change the cache domains after we have already inspected them.

(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 88 ++++++++++++++++++++++++-----------------
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ffe3d3e9d69d..8c90b6a12d45 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -571,13 +571,22 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
+	i915_gem_object_pin_pages(obj);
+
+	if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
+		goto out;
+
+	ret = i915_gem_object_wait_rendering(obj, true);
+	if (ret)
+		goto err_unpin;
+
 	i915_gem_object_flush_gtt_write_domain(obj);
 
 	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
-		ret = i915_gem_object_wait_rendering(obj, true);
-		if (ret)
-			return ret;
-
 		/* If we're not in the cpu read domain, set ourself into the gtt
 		 * read domain and manually flush cachelines (if required). This
 		 * optimizes for the case when the gpu will dirty the data
@@ -586,26 +595,25 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 							obj->cache_level);
 	}
 
-	ret = i915_gem_object_get_pages(obj);
-	if (ret)
-		return ret;
-
-	i915_gem_object_pin_pages(obj);
-
 	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, false);
-		if (ret) {
-			i915_gem_object_unpin_pages(obj);
-			return ret;
-		}
+		if (ret)
+			goto err_unpin;
+
 		*needs_clflush = 0;
 	}
 
+out:
+	/* return with the pages pinned */
 	return 0;
+
+err_unpin:
+	i915_gem_object_unpin_pages(obj);
+	return ret;
 }
 
 int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
-				    unsigned *needs_clflush)
+				     unsigned *needs_clflush)
 {
 	int ret;
 
@@ -613,20 +621,27 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
 		return -EINVAL;
 
-	i915_gem_object_flush_gtt_write_domain(obj);
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
 
-	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-		ret = i915_gem_object_wait_rendering(obj, false);
-		if (ret)
-			return ret;
+	i915_gem_object_pin_pages(obj);
 
-		/* If we're not in the cpu write domain, set ourself into the
-		 * gtt write domain and manually flush cachelines (as required).
-		 * This optimizes for the case when the gpu will use the data
-		 * right away and we therefore have to clflush anyway.
-		 */
-		*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
-	}
+	if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
+		goto out;
+
+	ret = i915_gem_object_wait_rendering(obj, false);
+	if (ret)
+		goto err_unpin;
+
+	i915_gem_object_flush_gtt_write_domain(obj);
+
+	/* If we're not in the cpu write domain, set ourself into the
+	 * gtt write domain and manually flush cachelines (as required).
+	 * This optimizes for the case when the gpu will use the data
+	 * right away and we therefore have to clflush anyway.
+	 */
+	*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
 
 	/* Same trick applies to invalidate partially written cachelines read
 	 * before writing.
@@ -635,27 +650,26 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 		*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
 							 obj->cache_level);
 
-	ret = i915_gem_object_get_pages(obj);
-	if (ret)
-		return ret;
-
-	i915_gem_object_pin_pages(obj);
-
 	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
 		ret = i915_gem_object_set_to_cpu_domain(obj, true);
-		if (ret) {
-			i915_gem_object_unpin_pages(obj);
-			return ret;
-		}
+		if (ret)
+			goto err_unpin;
+
 		*needs_clflush = 0;
 	}
 
 	if ((*needs_clflush & CLFLUSH_AFTER) == 0)
 		obj->cache_dirty = true;
 
+out:
 	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
 	obj->dirty = 1;
+	/* return with the pages pinned */
 	return 0;
+
+err_unpin:
+	i915_gem_object_unpin_pages(obj);
+	return ret;
 }
 
 /* Per-page copy function for the shmem pread fastpath.
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/8] drm/i915: Tidy up flush cpu/gtt write domains
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
                   ` (5 preceding siblings ...)
  2016-06-09 11:29 ` [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 13:12   ` Joonas Lahtinen
  2016-06-09 11:29 ` [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing Chris Wilson
  2016-06-09 11:40 ` ✗ Ro.CI.BAT: failure for series starting with [1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Patchwork
  8 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

Since we know the write domain, we can drop the local variable and make
the code look a tiny bit simpler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8c90b6a12d45..45f878350d66 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2913,7 +2913,6 @@ static void
 i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-	uint32_t old_write_domain;
 
 	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
 		return;
@@ -2937,36 +2936,30 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))
 		POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base));
 
-	old_write_domain = obj->base.write_domain;
-	obj->base.write_domain = 0;
-
 	intel_fb_obj_flush(obj, false, ORIGIN_GTT);
 
+	obj->base.write_domain = 0;
 	trace_i915_gem_object_change_domain(obj,
 					    obj->base.read_domains,
-					    old_write_domain);
+					    I915_GEM_DOMAIN_GTT);
 }
 
 /** Flushes the CPU write domain for the object if it's dirty. */
 static void
 i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
 {
-	uint32_t old_write_domain;
-
 	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
 		return;
 
 	if (i915_gem_clflush_object(obj, obj->pin_display))
 		i915_gem_chipset_flush(to_i915(obj->base.dev));
 
-	old_write_domain = obj->base.write_domain;
-	obj->base.write_domain = 0;
-
 	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
 
+	obj->base.write_domain = 0;
 	trace_i915_gem_object_change_domain(obj,
 					    obj->base.read_domains,
-					    old_write_domain);
+					    I915_GEM_DOMAIN_CPU);
 }
 
 /**
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
                   ` (6 preceding siblings ...)
  2016-06-09 11:29 ` [PATCH 7/8] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
@ 2016-06-09 11:29 ` Chris Wilson
  2016-06-09 13:31   ` Joonas Lahtinen
  2016-06-09 11:40 ` ✗ Ro.CI.BAT: failure for series starting with [1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Patchwork
  8 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:29 UTC (permalink / raw)
  To: intel-gfx

With in the introduction of the reloc page cache, we are just one step
away from refactoring the relocation write functions into one. Not only
does it tidy the code (slightly), but it greatly simplifies the control
logic much to gcc's satisfaction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 289 +++++++++++++++--------------
 1 file changed, 150 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 318c71b663f4..bda8ec8b02e6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -34,6 +34,8 @@
 #include <linux/dma_remapping.h>
 #include <linux/uaccess.h>
 
+#define DBG_USE_CPU_RELOC 0 /* force relocations to use the CPU write path */
+
 #define  __EXEC_OBJECT_HAS_PIN (1U<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1U<<30)
 #define  __EXEC_OBJECT_NEEDS_MAP (1U<<29)
@@ -53,6 +55,7 @@ struct i915_execbuffer_params {
 };
 
 struct eb_vmas {
+	struct drm_i915_private *i915;
 	struct list_head vmas;
 	int and;
 	union {
@@ -62,7 +65,8 @@ struct eb_vmas {
 };
 
 static struct eb_vmas *
-eb_create(struct drm_i915_gem_execbuffer2 *args)
+eb_create(struct drm_i915_private *i915,
+	  struct drm_i915_gem_execbuffer2 *args)
 {
 	struct eb_vmas *eb = NULL;
 
@@ -89,6 +93,7 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
 	} else
 		eb->and = -args->buffer_count;
 
+	eb->i915 = i915;
 	INIT_LIST_HEAD(&eb->vmas);
 	return eb;
 }
@@ -272,7 +277,8 @@ static void eb_destroy(struct eb_vmas *eb)
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 {
-	return (HAS_LLC(obj->base.dev) ||
+	return (DBG_USE_CPU_RELOC ||
+		HAS_LLC(obj->base.dev) ||
 		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
 		obj->cache_level != I915_CACHE_NONE);
 }
@@ -296,32 +302,58 @@ static inline uint64_t gen8_noncanonical_addr(uint64_t address)
 }
 
 static inline uint64_t
-relocation_target(struct drm_i915_gem_relocation_entry *reloc,
+relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
 		  uint64_t target_offset)
 {
 	return gen8_canonical_addr((int)reloc->delta + target_offset);
 }
 
 struct reloc_cache {
-	void *vaddr;
+	struct drm_i915_private *i915;
+	unsigned long vaddr;
 	unsigned page;
-	enum { KMAP, IOMAP } type;
+	struct drm_mm_node node;
+	bool use_64bit_reloc;
 };
 
-static void reloc_cache_init(struct reloc_cache *cache)
+static void reloc_cache_init(struct reloc_cache *cache,
+			     struct drm_i915_private *i915)
 {
 	cache->page = -1;
-	cache->vaddr = NULL;
+	cache->vaddr = 0;
+	cache->i915 = i915;
+	cache->use_64bit_reloc = INTEL_GEN(cache->i915) >= 8;
+}
+
+static inline void *unmask_page(unsigned long p)
+{
+	return (void *)(uintptr_t)(p & PAGE_MASK);
 }
 
+static inline unsigned unmask_flags(unsigned long p)
+{
+	return p & ~PAGE_MASK;
+}
+
+#define KMAP 0x4
+
 static void reloc_cache_fini(struct reloc_cache *cache)
 {
-	if (cache->vaddr == NULL)
+	void *vaddr;
+
+	if (cache->vaddr == 0)
 		return;
 
-	switch (cache->type) {
-	case KMAP: kunmap_atomic(cache->vaddr); break;
-	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
+	vaddr = unmask_page(cache->vaddr);
+	if (cache->vaddr & KMAP) {
+		if (cache->vaddr & CLFLUSH_AFTER)
+			mb();
+
+		kunmap_atomic(vaddr);
+		i915_gem_object_unpin_pages((struct drm_i915_gem_object *)cache->node.mm);
+	} else {
+		io_mapping_unmap_atomic(vaddr);
+		i915_vma_unpin((struct i915_vma *)cache->node.mm);
 	}
 }
 
@@ -329,148 +361,138 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj,
 			struct reloc_cache *cache,
 			int page)
 {
-	if (cache->page == page)
-		return cache->vaddr;
+	void *vaddr;
 
-	if (cache->vaddr)
-		kunmap_atomic(cache->vaddr);
+	if (cache->vaddr) {
+		kunmap_atomic(unmask_page(cache->vaddr));
+	} else {
+		unsigned flushes;
+		int ret;
+
+		ret = i915_gem_obj_prepare_shmem_write(obj, &flushes);
+		if (ret)
+			return ERR_PTR(ret);
 
+		cache->vaddr = flushes | KMAP;
+		cache->node.mm = (void *)obj;
+		if (flushes)
+			mb();
+	}
+
+	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
+	cache->vaddr = unmask_flags(cache->vaddr) | (unsigned long)vaddr;
 	cache->page = page;
-	cache->vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
-	cache->type = KMAP;
 
-	return cache->vaddr;
+	return vaddr;
 }
 
-static int
-relocate_entry_cpu(struct drm_i915_gem_object *obj,
-		   struct drm_i915_gem_relocation_entry *reloc,
-		   struct reloc_cache *cache,
-		   uint64_t target_offset)
+static void *reloc_iomap(struct drm_i915_gem_object *obj,
+			 struct reloc_cache *cache,
+			 int page)
 {
-	struct drm_device *dev = obj->base.dev;
-	uint32_t page_offset = offset_in_page(reloc->offset);
-	uint64_t delta = relocation_target(reloc, target_offset);
-	char *vaddr;
-	int ret;
+	void *vaddr;
 
-	ret = i915_gem_object_set_to_cpu_domain(obj, true);
-	if (ret)
-		return ret;
+	if (cache->vaddr) {
+		io_mapping_unmap_atomic(unmask_page(cache->vaddr));
+	} else {
+		struct i915_vma *vma;
+		int ret;
 
-	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
-	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
+		if (use_cpu_reloc(obj))
+			return NULL;
 
-	if (INTEL_GEN(dev) >= 8) {
-		page_offset += sizeof(uint32_t);
-		if (page_offset == PAGE_SIZE) {
-			vaddr = reloc_kmap(obj, cache, cache->page + 1);
-			page_offset = 0;
-		}
-		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
-	}
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
+					       PIN_MAPPABLE | PIN_NONBLOCK);
+		if (IS_ERR(vma))
+			return NULL;
 
-	return 0;
-}
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+		if (ret)
+			return ERR_PTR(ret);
 
-static void *reloc_iomap(struct drm_i915_private *i915,
-			 struct reloc_cache *cache,
-			 uint64_t offset)
-{
-	if (cache->page == offset >> PAGE_SHIFT)
-		return cache->vaddr;
+		ret = i915_gem_object_put_fence(obj);
+		if (ret)
+			return ERR_PTR(ret);
 
-	if (cache->vaddr)
-		io_mapping_unmap_atomic(cache->vaddr);
+		cache->node.start = vma->node.start;
+		cache->node.mm = (void *)vma;
+	}
 
-	cache->page = offset >> PAGE_SHIFT;
-	cache->vaddr =
-		io_mapping_map_atomic_wc(i915->ggtt.mappable,
-					 offset & PAGE_MASK);
-	cache->type = IOMAP;
+	vaddr = io_mapping_map_atomic_wc(cache->i915->ggtt.mappable,
+					 cache->node.start + (page << PAGE_SHIFT));
+	cache->page = page;
+	cache->vaddr = (unsigned long)vaddr;
 
-	return cache->vaddr;
+	return vaddr;
 }
 
-static int
-relocate_entry_gtt(struct drm_i915_gem_object *obj,
-		   struct drm_i915_gem_relocation_entry *reloc,
-		   struct reloc_cache *cache,
-		   uint64_t target_offset)
+static void *reloc_vaddr(struct drm_i915_gem_object *obj,
+			 struct reloc_cache *cache,
+			 int page)
 {
-	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-	struct i915_vma *vma;
-	uint64_t delta = relocation_target(reloc, target_offset);
-	uint64_t offset;
-	void __iomem *reloc_page;
-	int ret;
-
-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		goto unpin;
-
-	ret = i915_gem_object_put_fence(obj);
-	if (ret)
-		goto unpin;
-
-	/* Map the page containing the relocation we're going to perform.  */
-	offset = vma->node.start;
-	offset += reloc->offset;
-	reloc_page = reloc_iomap(dev_priv, cache, offset);
-	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
+	void *vaddr;
 
-	if (INTEL_GEN(dev_priv) >= 8) {
-		offset += sizeof(uint32_t);
-		if (offset_in_page(offset) == 0)
-			reloc_page = reloc_iomap(dev_priv, cache, offset);
-		iowrite32(upper_32_bits(delta),
-			  reloc_page + offset_in_page(offset));
+	if (cache->page == page) {
+		vaddr = unmask_page(cache->vaddr);
+	} else {
+		vaddr = NULL;
+		if ((cache->vaddr & KMAP) == 0)
+			vaddr = reloc_iomap(obj, cache, page);
+		if (vaddr == NULL)
+			vaddr = reloc_kmap(obj, cache, page);
 	}
 
-unpin:
-	__i915_vma_unpin(vma);
-	return ret;
+	return vaddr;
 }
 
 static void
-clflush_write32(void *addr, uint32_t value)
+clflush_write32(uint32_t *addr, uint32_t value, unsigned flushes)
 {
-	/* This is not a fast path, so KISS. */
-	drm_clflush_virt_range(addr, sizeof(uint32_t));
-	*(uint32_t *)addr = value;
-	drm_clflush_virt_range(addr, sizeof(uint32_t));
+	if (unlikely(flushes & (CLFLUSH_BEFORE | CLFLUSH_AFTER))) {
+		if (flushes & CLFLUSH_BEFORE) {
+			clflushopt(addr);
+			mb();
+		}
+
+		*addr = value;
+
+		/* Writes to the same cacheline are serialised by the CPU
+		 * (including clflush). On the write path, we only require
+		 * that it hits memory in an orderly fashion and place
+		 * mb barriers at the start and end of the relocation phase
+		 * to ensure ordering of clflush wrt to the system.
+		 */
+		if (flushes & CLFLUSH_AFTER)
+			clflushopt(addr);
+	} else
+		*addr = value;
 }
 
 static int
-relocate_entry_clflush(struct drm_i915_gem_object *obj,
-		       struct drm_i915_gem_relocation_entry *reloc,
-		       struct reloc_cache *cache,
-		       uint64_t target_offset)
+relocate_entry(struct drm_i915_gem_object *obj,
+	       const struct drm_i915_gem_relocation_entry *reloc,
+	       struct reloc_cache *cache,
+	       uint64_t target_offset)
 {
-	struct drm_device *dev = obj->base.dev;
-	uint32_t page_offset = offset_in_page(reloc->offset);
-	uint64_t delta = relocation_target(reloc, target_offset);
-	char *vaddr;
-	int ret;
+	uint64_t offset = reloc->offset;
+	bool wide = cache->use_64bit_reloc;
+	void *vaddr;
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		return ret;
+	target_offset = relocation_target(reloc, target_offset);
+repeat:
+	vaddr = reloc_vaddr(obj, cache, offset >> PAGE_SHIFT);
+	if (IS_ERR(vaddr))
+		return PTR_ERR(vaddr);
 
-	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
-	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
+	clflush_write32(vaddr + offset_in_page(offset),
+			lower_32_bits(target_offset),
+			cache->vaddr);
 
-	if (INTEL_GEN(dev) >= 8) {
-		page_offset += sizeof(uint32_t);
-		if (page_offset == PAGE_SIZE) {
-			vaddr = reloc_kmap(obj, cache, cache->page + 1);
-			page_offset = 0;
-		}
-		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
+	if (wide) {
+		offset += sizeof(uint32_t);
+		target_offset >>= 32;
+		wide = false;
+		goto repeat;
 	}
 
 	return 0;
@@ -557,7 +579,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 
 	/* Check that the relocation address is valid... */
 	if (unlikely(reloc->offset >
-		obj->base.size - (INTEL_INFO(dev)->gen >= 8 ? 8 : 4))) {
+		     obj->base.size - (cache->use_64bit_reloc ? 8 : 4))) {
 		DRM_DEBUG("Relocation beyond object bounds: "
 			  "obj %p target %d offset %d size %d.\n",
 			  obj, reloc->target_handle,
@@ -577,23 +599,12 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 	if (pagefault_disabled() && !object_is_idle(obj))
 		return -EFAULT;
 
-	if (use_cpu_reloc(obj))
-		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
-	else if (obj->map_and_fenceable)
-		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
-	else if (static_cpu_has(X86_FEATURE_CLFLUSH))
-		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
-	else {
-		WARN_ONCE(1, "Impossible case in relocation handling\n");
-		ret = -ENODEV;
-	}
-
+	ret = relocate_entry(obj, reloc, cache, target_offset);
 	if (ret)
 		return ret;
 
 	/* and update the user's relocation entry */
 	reloc->presumed_offset = target_offset;
-
 	return 0;
 }
 
@@ -609,7 +620,7 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	int remain, ret = 0;
 
 	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
-	reloc_cache_init(&cache);
+	reloc_cache_init(&cache, eb->i915);
 
 	remain = entry->relocation_count;
 	while (remain) {
@@ -658,7 +669,7 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
 	struct reloc_cache cache;
 	int i, ret = 0;
 
-	reloc_cache_init(&cache);
+	reloc_cache_init(&cache, eb->i915);
 	for (i = 0; i < entry->relocation_count; i++) {
 		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
 		if (ret)
@@ -1073,8 +1084,8 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 	if (flush_chipset)
 		i915_gem_chipset_flush(req->engine->i915);
 
-	if (flush_domains & I915_GEM_DOMAIN_GTT)
-		wmb();
+	/* Make sure (untracked) CPU relocs/parsing are flushed */
+	wmb();
 
 	/* Unconditionally invalidate gpu caches and TLBs. */
 	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
@@ -1606,7 +1617,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	memset(&params_master, 0x00, sizeof(params_master));
 
-	eb = eb_create(args);
+	eb = eb_create(dev_priv, args);
 	if (eb == NULL) {
 		i915_gem_context_put(ctx);
 		mutex_unlock(&dev->struct_mutex);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back
  2016-06-09 11:29 ` [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
@ 2016-06-09 11:36   ` Chris Wilson
  2016-06-09 12:54   ` Joonas Lahtinen
  1 sibling, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:36 UTC (permalink / raw)
  To: intel-gfx

On Thu, Jun 09, 2016 at 12:29:36PM +0100, Chris Wilson wrote:
> If we quickly switch from writing through the GTT to a read of the
> physical page directly with the CPU (e.g. performing relocations through
> the GTT and then running the command parser), we can observe that the
> writes are not visible to the CPU. It is not a coherency problem, as
> extensive investigations with clflush have demonstrated, but a mere
> timing issue - we have to wait for the GTT to complete it's write before
> we start our read from the CPU.
> 
> The issue can be illustrated in userspace with:
> 
> 	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
> 	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
> 	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> 
> 	for (i = 0; i < OBJECT_SIZE / 64; i++) {
> 		int x = 16*i + (i%16);
> 		gtt[x] = i;
> 		clflush(&cpu[x], sizeof(cpu[x]));
> 		assert(cpu[x] == i);
> 	}
> 
> Experimenting with that shows that this behaviour is indeed limited to
> recent Atom-class hardware.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Note this is associated with

Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt

I wrote the testcase based on this bug, but it may not be the only thing
that causes that test to fail (since byt has a major issue with
coherency).

Also note that the gem_mmap_gtt/coherency and gem_mmap_wc/coherency are
the userspace demonstrators (and that this only a GTT issue).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write()
  2016-06-09 11:29 ` [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
@ 2016-06-09 11:39   ` Chris Wilson
  2016-06-09 12:47   ` Joonas Lahtinen
  1 sibling, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 11:39 UTC (permalink / raw)
  To: intel-gfx

On Thu, Jun 09, 2016 at 12:29:34PM +0100, Chris Wilson wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a41fa01eb024..c6d06cb21191 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -563,34 +563,95 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
>   * flush the object from the CPU cache.
>   */
>  int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> -				    int *needs_clflush)
> +				    unsigned *needs_clflush)
>  {
>  	int ret;
>  
>  	*needs_clflush = 0;
> -
> -	if (WARN_ON((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0))
> +	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
>  		return -EINVAL;
>  
>  	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
> +		ret = i915_gem_object_wait_rendering(obj, true);
> +		if (ret)
> +			return ret;
> +
>  		/* If we're not in the cpu read domain, set ourself into the gtt
>  		 * read domain and manually flush cachelines (if required). This
>  		 * optimizes for the case when the gpu will dirty the data
>  		 * anyway again before the next pread happens. */
>  		*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
>  							obj->cache_level);
> -		ret = i915_gem_object_wait_rendering(obj, true);
> +	}
> +
> +	ret = i915_gem_object_get_pages(obj);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_object_pin_pages(obj);
> +
> +	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {

This should now be static_cpu_has().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state
  2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
                   ` (7 preceding siblings ...)
  2016-06-09 11:29 ` [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing Chris Wilson
@ 2016-06-09 11:40 ` Patchwork
  8 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2016-06-09 11:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state
URL   : https://patchwork.freedesktop.org/series/8491/
State : failure

== Summary ==

Applying: drm/i915: Print the batchbuffer offset next to BBADDR in error state
fatal: sha1 information is lacking or useless (drivers/gpu/drm/i915/i915_drv.h).
error: could not build fake ancestor
Patch failed at 0001 drm/i915: Print the batchbuffer offset next to BBADDR in error state
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/8] drm/i915: Cache kmap between relocations
  2016-06-09 11:29 ` [PATCH 2/8] drm/i915: Cache kmap between relocations Chris Wilson
@ 2016-06-09 12:25   ` Joonas Lahtinen
  0 siblings, 0 replies; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 12:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> When doing relocations, we have to obtain a mapping to the page
> containing the target address. This is either a kmap or iomap depending
> on GPU and its cache coherency. Neighbouring relocation entries are
> typically within the same page and so we can cache our kmapping between
> them and avoid those pesky TLB flushes.
> 
> Note that there is some sleight-of-hand in how the slow relocate works
> as the reloc_entry_cache implies pagefaults disabled (as we are inside a
> kmap_atomic section). However, the slow relocate code is meant to be the
> fallback from the atomic fast path failing. Fortunately it works as we
> already have performed the copy_from_user for the relocation array (no
> more pagefaults there) and the kmap_atomic cache is enabled after we
> have waited upon an active buffer (so no more sleeping in atomic).
> Magic!
> 

You could also mention that you mangle the relocation <-> page logic,
so this is not purely about caching. Or maybe even split it.

Below comments, those addressed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 160 +++++++++++++++++++----------
>  1 file changed, 106 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index a29c4b6fea28..318c71b663f4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -302,9 +302,50 @@ relocation_target(struct drm_i915_gem_relocation_entry *reloc,
>  	return gen8_canonical_addr((int)reloc->delta + target_offset);
>  }
>  
> +struct reloc_cache {
> +	void *vaddr;
> +	unsigned page;
> +	enum { KMAP, IOMAP } type;
> +};
> +
> +static void reloc_cache_init(struct reloc_cache *cache)
> +{
> +	cache->page = -1;
> +	cache->vaddr = NULL;
> +}
> +
> +static void reloc_cache_fini(struct reloc_cache *cache)
> +{
> +	if (cache->vaddr == NULL)
> +		return;
> +
> +	switch (cache->type) {
> +	case KMAP: kunmap_atomic(cache->vaddr); break;
> +	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
> +	}
> +}
> +
> +static void *reloc_kmap(struct drm_i915_gem_object *obj,
> +			struct reloc_cache *cache,
> +			int page)
> +{
> +	if (cache->page == page)
> +		return cache->vaddr;
> +
> +	if (cache->vaddr)
> +		kunmap_atomic(cache->vaddr);

Maybe add some GEM_BUG_ON(cache->type != KMAP) here before running
kunmap_atomic? Because that assumption is made.

> +
> +	cache->page = page;
> +	cache->vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
> +	cache->type = KMAP;
> +
> +	return cache->vaddr;
> +}
> +
>  static int
>  relocate_entry_cpu(struct drm_i915_gem_object *obj,
>  		   struct drm_i915_gem_relocation_entry *reloc,
> +		   struct reloc_cache *cache,
>  		   uint64_t target_offset)
>  {
>  	struct drm_device *dev = obj->base.dev;
> @@ -317,34 +358,47 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> -	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
> -				reloc->offset >> PAGE_SHIFT));
> +	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
>  	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
>  
> -	if (INTEL_INFO(dev)->gen >= 8) {
> -		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
> -
> -		if (page_offset == 0) {
> -			kunmap_atomic(vaddr);
> -			vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
> -			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
> +	if (INTEL_GEN(dev) >= 8) {
> +		page_offset += sizeof(uint32_t);
> +		if (page_offset == PAGE_SIZE) {
> +			vaddr = reloc_kmap(obj, cache, cache->page + 1);
> +			page_offset = 0;
>  		}
> -
>  		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
>  	}
>  
> -	kunmap_atomic(vaddr);
> -
>  	return 0;
>  }
>  
> +static void *reloc_iomap(struct drm_i915_private *i915,
> +			 struct reloc_cache *cache,
> +			 uint64_t offset)
> +{
> +	if (cache->page == offset >> PAGE_SHIFT)
> +		return cache->vaddr;
> +
> +	if (cache->vaddr)
> +		io_mapping_unmap_atomic(cache->vaddr);
> +
> +	cache->page = offset >> PAGE_SHIFT;
> +	cache->vaddr =
> +		io_mapping_map_atomic_wc(i915->ggtt.mappable,
> +					 offset & PAGE_MASK);
> +	cache->type = IOMAP;
> +
> +	return cache->vaddr;
> +}
> +
>  static int
>  relocate_entry_gtt(struct drm_i915_gem_object *obj,
>  		   struct drm_i915_gem_relocation_entry *reloc,
> +		   struct reloc_cache *cache,
>  		   uint64_t target_offset)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> -	struct i915_ggtt *ggtt = &dev_priv->ggtt;
>  	struct i915_vma *vma;
>  	uint64_t delta = relocation_target(reloc, target_offset);
>  	uint64_t offset;
> @@ -366,28 +420,19 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
>  	/* Map the page containing the relocation we're going to perform.  */
>  	offset = vma->node.start;
>  	offset += reloc->offset;
> -	reloc_page = io_mapping_map_atomic_wc(ggtt->mappable,
> -					      offset & PAGE_MASK);
> +	reloc_page = reloc_iomap(dev_priv, cache, offset);
>  	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
>  
>  	if (INTEL_GEN(dev_priv) >= 8) {
>  		offset += sizeof(uint32_t);
> -
> -		if (offset_in_page(offset) == 0) {
> -			io_mapping_unmap_atomic(reloc_page);
> -			reloc_page =
> -				io_mapping_map_atomic_wc(ggtt->mappable,
> -							 offset);
> -		}
> -
> +		if (offset_in_page(offset) == 0)
> +			reloc_page = reloc_iomap(dev_priv, cache, offset);
>  		iowrite32(upper_32_bits(delta),
>  			  reloc_page + offset_in_page(offset));
>  	}
>  
> -	io_mapping_unmap_atomic(reloc_page);
> -
>  unpin:
> -	i915_vma_unpin(vma);
> +	__i915_vma_unpin(vma);
>  	return ret;
>  }
>  
> @@ -403,6 +448,7 @@ clflush_write32(void *addr, uint32_t value)
>  static int
>  relocate_entry_clflush(struct drm_i915_gem_object *obj,
>  		       struct drm_i915_gem_relocation_entry *reloc,
> +		       struct reloc_cache *cache,

For consistency with rest of the patch, I would put the cache as last
argument, as it could easily be removed again in future, so it's the
least important.

>  		       uint64_t target_offset)
>  {
>  	struct drm_device *dev = obj->base.dev;
> @@ -415,24 +461,18 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> -	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
> -				reloc->offset >> PAGE_SHIFT));
> +	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
>  	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
>  
> -	if (INTEL_INFO(dev)->gen >= 8) {
> -		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
> -
> -		if (page_offset == 0) {
> -			kunmap_atomic(vaddr);
> -			vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
> -			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
> +	if (INTEL_GEN(dev) >= 8) {
> +		page_offset += sizeof(uint32_t);
> +		if (page_offset == PAGE_SIZE) {
> +			vaddr = reloc_kmap(obj, cache, cache->page + 1);
> +			page_offset = 0;
>  		}
> -
>  		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
>  	}
>  
> -	kunmap_atomic(vaddr);
> -
>  	return 0;
>  }
>  
> @@ -453,7 +493,8 @@ static bool object_is_idle(struct drm_i915_gem_object *obj)
>  static int
>  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  				   struct eb_vmas *eb,
> -				   struct drm_i915_gem_relocation_entry *reloc)
> +				   struct drm_i915_gem_relocation_entry *reloc,
> +				   struct reloc_cache *cache)
>  {
>  	struct drm_device *dev = obj->base.dev;
>  	struct drm_gem_object *target_obj;
> @@ -537,11 +578,11 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  		return -EFAULT;
>  
>  	if (use_cpu_reloc(obj))
> -		ret = relocate_entry_cpu(obj, reloc, target_offset);
> +		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
>  	else if (obj->map_and_fenceable)
> -		ret = relocate_entry_gtt(obj, reloc, target_offset);
> +		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
>  	else if (static_cpu_has(X86_FEATURE_CLFLUSH))
> -		ret = relocate_entry_clflush(obj, reloc, target_offset);
> +		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
>  	else {
>  		WARN_ONCE(1, "Impossible case in relocation handling\n");
>  		ret = -ENODEV;
> @@ -564,9 +605,11 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>  	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
>  	struct drm_i915_gem_relocation_entry __user *user_relocs;
>  	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> -	int remain, ret;
> +	struct reloc_cache cache;
> +	int remain, ret = 0;
>  
>  	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
> +	reloc_cache_init(&cache);
>  
>  	remain = entry->relocation_count;
>  	while (remain) {
> @@ -576,19 +619,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>  			count = ARRAY_SIZE(stack_reloc);
>  		remain -= count;
>  
> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0])))
> -			return -EFAULT;
> +		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
>  
>  		do {
>  			u64 offset = r->presumed_offset;
>  
> -			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r);
> +			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r, &cache);
>  			if (ret)
> -				return ret;
> +				goto out;
>  
>  			if (r->presumed_offset != offset &&
> -			    __put_user(r->presumed_offset, &user_relocs->presumed_offset)) {
> -				return -EFAULT;
> +			    __put_user(r->presumed_offset,
> +				       &user_relocs->presumed_offset)) {
> +				ret = -EFAULT;
> +				goto out;
>  			}
>  
>  			user_relocs++;
> @@ -596,7 +643,9 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>  		} while (--count);
>  	}
>  
> -	return 0;
> +out:
> +	reloc_cache_fini(&cache);

<NL> here to keep consistent between your next change.

Regards, Joonas

> +	return ret;
>  #undef N_RELOC
>  }
>  
> @@ -606,15 +655,18 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
>  				      struct drm_i915_gem_relocation_entry *relocs)
>  {
>  	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> -	int i, ret;
> +	struct reloc_cache cache;
> +	int i, ret = 0;
>  
> +	reloc_cache_init(&cache);
>  	for (i = 0; i < entry->relocation_count; i++) {
> -		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i]);
> +		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
>  		if (ret)
> -			return ret;
> +			break;
>  	}
> +	reloc_cache_fini(&cache);
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static int
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state
  2016-06-09 11:29 ` [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
@ 2016-06-09 12:29   ` Joonas Lahtinen
  0 siblings, 0 replies; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 12:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> It is useful when looking at captured error states to check the recorded
> BBADDR register (the address of the last batchbuffer instruction loaded)
> against the expected offset of the batch buffer, and so do a quick check
> that (a) the capture is true or (b) HEAD hasn't wandered off into the
> badlands.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h       |  1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 48cf9dfbe4ac..b333cf4923bc 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -541,6 +541,7 @@ struct drm_i915_error_state {
>  		struct drm_i915_error_object {
>  			int page_count;
>  			u64 gtt_offset;
> +			u64 gtt_size;
>  			u32 *pages[0];
>  		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 76d63e047fae..ab0824b1ce6d 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -249,6 +249,13 @@ static void i915_ring_error_state(struct drm_i915_error_state_buf *m,
>  	err_printf(m, "  IPEIR: 0x%08x\n", ring->ipeir);
>  	err_printf(m, "  IPEHR: 0x%08x\n", ring->ipehr);
>  	err_printf(m, "  INSTDONE: 0x%08x\n", ring->instdone);
> +	if (ring->batchbuffer) {
> +		u64 start = ring->batchbuffer->gtt_offset;
> +		u64 end = start + ring->batchbuffer->gtt_size;
> +		err_printf(m, "  batch: [0x%08x %08x, 0x%08x %08x]\n",
> +			   upper_32_bits(start), lower_32_bits(start),
> +			   upper_32_bits(end), lower_32_bits(end));

I'm not very fond of this 32/32 split done repeatedly and manually, but
it seems to be used already.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> +	}
>  	if (INTEL_INFO(dev)->gen >= 4) {
>  		err_printf(m, "  BBADDR: 0x%08x %08x\n", (u32)(ring->bbaddr>>32), (u32)ring->bbaddr);
>  		err_printf(m, "  BB_STATE: 0x%08x\n", ring->bbstate);
> @@ -655,7 +662,10 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
>  	if (dst == NULL)
>  		return NULL;
>  
> -	reloc_offset = dst->gtt_offset = vma->node.start;
> +	dst->gtt_offset = vma->node.start;
> +	dst->gtt_size = vma->node.size;
> +
> +	reloc_offset = dst->gtt_offset;

This is completely unrelated fixup. You could split it with my R-b in
both.

>  	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
>  		   (vma->bound & GLOBAL_BIND) &&
>  		   reloc_offset + num_pages * PAGE_SIZE <= ggtt->mappable_end);
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write()
  2016-06-09 11:29 ` [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
  2016-06-09 11:39   ` Chris Wilson
@ 2016-06-09 12:47   ` Joonas Lahtinen
  1 sibling, 0 replies; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 12:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
> the backing storage for direct writes. It first serialises with the GPU,
> pins the backing storage and then indicates what clfushes are required in
> order for the writes to be coherent.
> 
> Whilst here, fix support for ancient CPUs without clflush for which we
> cannot do the GTT+clflush tricks.
> 

Could add here that WARN is kicked

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c |   2 +-
>  drivers/gpu/drm/i915/i915_drv.h        |   6 +-
>  drivers/gpu/drm/i915/i915_gem.c        | 119 +++++++++++++++++++++------------
>  3 files changed, 83 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index b0fd6a7b0603..de038da6c01b 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -970,7 +970,7 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
>  		       u32 batch_start_offset,
>  		       u32 batch_len)
>  {
> -	int needs_clflush = 0;
> +	unsigned needs_clflush;

Mildly irritated by innuendo to boolean. Not sure why not just "flags".

>  	void *src_base, *src;
>  	void *dst = NULL;
>  	int ret;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b333cf4923bc..ef321f84e5fc 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3026,7 +3026,11 @@ void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
>  void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
>  
>  int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> -				    int *needs_clflush);
> +				    unsigned *needs_clflush);
> +int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
> +				     unsigned *needs_clflush);
> +#define CLFLUSH_BEFORE 0x1
> +#define CLFLUSH_AFTER 0x2
>  
>  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a41fa01eb024..c6d06cb21191 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -563,34 +563,95 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
>   * flush the object from the CPU cache.
>   */
>  int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> -				    int *needs_clflush)
> +				    unsigned *needs_clflush)
>  {
>  	int ret;
>  
>  	*needs_clflush = 0;
> -
> -	if (WARN_ON((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0))
> +	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)

Why no more WARN?

>  		return -EINVAL;
>  
>  	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
> +		ret = i915_gem_object_wait_rendering(obj, true);
> +		if (ret)
> +			return ret;
> +
>  		/* If we're not in the cpu read domain, set ourself into the gtt
>  		 * read domain and manually flush cachelines (if required). This
>  		 * optimizes for the case when the gpu will dirty the data
>  		 * anyway again before the next pread happens. */
>  		*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
>  							obj->cache_level);

Maybe add CLFLUSH_NONE and use ?:, this is deceiving now that it's not
boolean, and that's why the name should be changed too.

> -		ret = i915_gem_object_wait_rendering(obj, true);
> +	}
> +
> +	ret = i915_gem_object_get_pages(obj);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_object_pin_pages(obj);
> +
> +	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {

Your own comment applies here.

> +		ret = i915_gem_object_set_to_cpu_domain(obj, false);
> +		if (ret) {
> +			i915_gem_object_unpin_pages(obj);
> +			return ret;
> +		}
> +		*needs_clflush = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
> +				    unsigned *needs_clflush)
> +{
> +	int ret;
> +
> +	*needs_clflush = 0;
> +	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
> +		return -EINVAL;
> +
> +	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> +		ret = i915_gem_object_wait_rendering(obj, false);
>  		if (ret)
>  			return ret;
> +
> +		/* If we're not in the cpu write domain, set ourself into the
> +		 * gtt write domain and manually flush cachelines (as required).
> +		 * This optimizes for the case when the gpu will use the data
> +		 * right away and we therefore have to clflush anyway.
> +		 */
> +		*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
>  	}
>  
> +	/* Same trick applies to invalidate partially written cachelines read
> +	 * before writing.
> +	 */
> +	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
> +		*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
> +							 obj->cache_level);
> +
>  	ret = i915_gem_object_get_pages(obj);
>  	if (ret)
>  		return ret;
>  
>  	i915_gem_object_pin_pages(obj);
>  
> -	return ret;
> +	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
> +		ret = i915_gem_object_set_to_cpu_domain(obj, true);
> +		if (ret) {
> +			i915_gem_object_unpin_pages(obj);
> +			return ret;
> +		}
> +		*needs_clflush = 0;
> +	}

Relatively large common mid-section in these two, static func?

Otherwise looks fine to me, those explained or fixed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

> +
> +	if ((*needs_clflush & CLFLUSH_AFTER) == 0)
> +		obj->cache_dirty = true;
> +
> +	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
> +	obj->dirty = 1;
> +	return 0;
> + }
>  
>  /* Per-page copy function for the shmem pread fastpath.
> @@ -999,41 +1060,17 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  	int shmem_page_offset, page_length, ret = 0;
>  	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
>  	int hit_slowpath = 0;
> -	int needs_clflush_after = 0;
> -	int needs_clflush_before = 0;
> +	unsigned needs_clflush;
>  	struct sg_page_iter sg_iter;
>  
> -	user_data = u64_to_user_ptr(args->data_ptr);
> -	remain = args->size;
> -
> -	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
> -
> -	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> -		/* If we're not in the cpu write domain, set ourself into the gtt
> -		 * write domain and manually flush cachelines (if required). This
> -		 * optimizes for the case when the gpu will use the data
> -		 * right away and we therefore have to clflush anyway. */
> -		needs_clflush_after = cpu_write_needs_clflush(obj);
> -		ret = i915_gem_object_wait_rendering(obj, false);
> -		if (ret)
> -			return ret;
> -	}
> -	/* Same trick applies to invalidate partially written cachelines read
> -	 * before writing. */
> -	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
> -		needs_clflush_before =
> -			!cpu_cache_is_coherent(dev, obj->cache_level);
> -
> -	ret = i915_gem_object_get_pages(obj);
> +	ret = i915_gem_obj_prepare_shmem_write(obj, &needs_clflush);
>  	if (ret)
>  		return ret;
>  
> -	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
> -
> -	i915_gem_object_pin_pages(obj);
> -
> +	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
> +	user_data = u64_to_user_ptr(args->data_ptr);
>  	offset = args->offset;
> -	obj->dirty = 1;
> +	remain = args->size;
>  
>  	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
>  			 offset >> PAGE_SHIFT) {
> @@ -1057,7 +1094,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  		/* If we don't overwrite a cacheline completely we need to be
>  		 * careful to have up-to-date data by first clflushing. Don't
>  		 * overcomplicate things and flush the entire patch. */
> -		partial_cacheline_write = needs_clflush_before &&
> +		partial_cacheline_write = needs_clflush & CLFLUSH_BEFORE &&
>  			((shmem_page_offset | page_length)
>  				& (boot_cpu_data.x86_clflush_size - 1));
>  
> @@ -1067,7 +1104,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  		ret = shmem_pwrite_fast(page, shmem_page_offset, page_length,
>  					user_data, page_do_bit17_swizzling,
>  					partial_cacheline_write,
> -					needs_clflush_after);
> +					needs_clflush & CLFLUSH_AFTER);
>  		if (ret == 0)
>  			goto next_page;
>  
> @@ -1076,7 +1113,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  		ret = shmem_pwrite_slow(page, shmem_page_offset, page_length,
>  					user_data, page_do_bit17_swizzling,
>  					partial_cacheline_write,
> -					needs_clflush_after);
> +					needs_clflush & CLFLUSH_AFTER);
>  
>  		mutex_lock(&dev->struct_mutex);
>  
> @@ -1098,17 +1135,15 @@ out:
>  		 * cachelines in-line while writing and the object moved
>  		 * out of the cpu write domain while we've dropped the lock.
>  		 */
> -		if (!needs_clflush_after &&
> +		if (!(needs_clflush & CLFLUSH_AFTER) &&
>  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>  			if (i915_gem_clflush_object(obj, obj->pin_display))
> -				needs_clflush_after = true;
> +				needs_clflush |= CLFLUSH_AFTER;
>  		}
>  	}
>  
> -	if (needs_clflush_after)
> +	if (needs_clflush & CLFLUSH_AFTER)
>  		i915_gem_chipset_flush(to_i915(dev));
> -	else
> -		obj->cache_dirty = true;
>  
>  	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
>  	return ret;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] drm/i915: Before accessing an object via the cpu, flush GTT writes
  2016-06-09 11:29 ` [PATCH 4/8] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
@ 2016-06-09 12:50   ` Joonas Lahtinen
  0 siblings, 0 replies; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 12:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> If we want to read the pages directly via the CPU, we have to be sure
> that we have to flush the writes via the GTT (as the CPU can not see
> the address aliasing).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c6d06cb21191..18b4a684ddde 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -571,6 +571,8 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>  	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
>  		return -EINVAL;
>  
> +	i915_gem_object_flush_gtt_write_domain(obj);
> +
>  	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
>  		ret = i915_gem_object_wait_rendering(obj, true);
>  		if (ret)
> @@ -611,6 +613,8 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
>  	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
>  		return -EINVAL;
>  
> +	i915_gem_object_flush_gtt_write_domain(obj);
> +
>  	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>  		ret = i915_gem_object_wait_rendering(obj, false);
>  		if (ret)
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back
  2016-06-09 11:29 ` [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
  2016-06-09 11:36   ` Chris Wilson
@ 2016-06-09 12:54   ` Joonas Lahtinen
  1 sibling, 0 replies; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 12:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> If we quickly switch from writing through the GTT to a read of the
> physical page directly with the CPU (e.g. performing relocations through
> the GTT and then running the command parser), we can observe that the
> writes are not visible to the CPU. It is not a coherency problem, as
> extensive investigations with clflush have demonstrated, but a mere
> timing issue - we have to wait for the GTT to complete it's write before
> we start our read from the CPU.
> 
> The issue can be illustrated in userspace with:
> 
> 	gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
> 	cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
> 	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
> 
> 	for (i = 0; i < OBJECT_SIZE / 64; i++) {
> 		int x = 16*i + (i%16);
> 		gtt[x] = i;
> 		clflush(&cpu[x], sizeof(cpu[x]));
> 		assert(cpu[x] == i);
> 	}
> 
> Experimenting with that shows that this behaviour is indeed limited to
> recent Atom-class hardware.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 18b4a684ddde..ffe3d3e9d69d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2898,20 +2898,30 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>  static void
>  i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
>  {
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>  	uint32_t old_write_domain;
>  
>  	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
>  		return;
>  
>  	/* No actual flushing is required for the GTT write domain.  Writes
> -	 * to it immediately go to main memory as far as we know, so there's
> +	 * to it "immediately" go to main memory as far as we know, so there's
>  	 * no chipset flush.  It also doesn't land in render cache.
>  	 *
>  	 * However, we do have to enforce the order so that all writes through
>  	 * the GTT land before any writes to the device, such as updates to
>  	 * the GATT itself.
> +	 *
> +	 * We also have to wait a bit for the writes to land from the GTT.
> +	 * An uncached read (i.e. mmio) seems to be ideal for the round-trip
> +	 * timing. This issue has only been observed when switching quickly
> +	 * between GTT writes and CPU reads from inside the kernel on recent hw,
> +	 * and it appears to only affect discrete GTT blocks (i.e. on LLC
> +	 * system agents we cannot reproduce this behaviour).

This screams for a Tested-by: tag before merging...

>  	 */
>  	wmb();
> +	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))

INTEL_GEN()

This fixed, and adding the Testcase: label

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> +		POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base));
>  
>  	old_write_domain = obj->base.write_domain;
>  	obj->base.write_domain = 0;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write
  2016-06-09 11:29 ` [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
@ 2016-06-09 13:06   ` Joonas Lahtinen
  2016-06-09 13:35     ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 13:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> There is an improbable, but not impossible, case that if we leave the
> pages unpin as we operate on the object, then somebody may steal the
> lock and change the cache domains after we have already inspected them.
> 

Which lock exactly?

> (Whilst here, avail ourselves of the opportunity to take a couple of
> steps to make the two functions look more similar.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 88 ++++++++++++++++++++++++-----------------
>  1 file changed, 51 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ffe3d3e9d69d..8c90b6a12d45 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -571,13 +571,22 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>  	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
>  		return -EINVAL;
>  
> +	ret = i915_gem_object_get_pages(obj);
> +	if (ret)
> +		return ret;
> +

There's still time for the pages to disappear at this point if somebody
is racing with us, and BUG_ON will be imminent. So I'm not sure which
lock you mean.

> +	i915_gem_object_pin_pages(obj);
> +
> +	if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
> +		goto out;
> +
> +	ret = i915_gem_object_wait_rendering(obj, true);
> +	if (ret)
> +		goto err_unpin;
> +
>  	i915_gem_object_flush_gtt_write_domain(obj);
>  
>  	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
> -		ret = i915_gem_object_wait_rendering(obj, true);
> -		if (ret)
> -			return ret;
> -
>  		/* If we're not in the cpu read domain, set ourself into the gtt
>  		 * read domain and manually flush cachelines (if required). This
>  		 * optimizes for the case when the gpu will dirty the data
> @@ -586,26 +595,25 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>  							obj->cache_level);
>  	}
>  
> -	ret = i915_gem_object_get_pages(obj);
> -	if (ret)
> -		return ret;
> -
> -	i915_gem_object_pin_pages(obj);
> -
>  	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
>  		ret = i915_gem_object_set_to_cpu_domain(obj, false);
> -		if (ret) {
> -			i915_gem_object_unpin_pages(obj);
> -			return ret;
> -		}
> +		if (ret)
> +			goto err_unpin;
> +
>  		*needs_clflush = 0;
>  	}
>  
> +out:
> +	/* return with the pages pinned */
>  	return 0;
> +
> +err_unpin:
> +	i915_gem_object_unpin_pages(obj);
> +	return ret;
>  }
>  
>  int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
> -				    unsigned *needs_clflush)
> +				     unsigned *needs_clflush)
>  {
>  	int ret;
>  
> @@ -613,20 +621,27 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
>  	if ((obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE) == 0)
>  		return -EINVAL;
>  
> -	i915_gem_object_flush_gtt_write_domain(obj);
> +	ret = i915_gem_object_get_pages(obj);
> +	if (ret)
> +		return ret;
>  
> -	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> -		ret = i915_gem_object_wait_rendering(obj, false);
> -		if (ret)
> -			return ret;
> +	i915_gem_object_pin_pages(obj);
>  
> -		/* If we're not in the cpu write domain, set ourself into the
> -		 * gtt write domain and manually flush cachelines (as required).
> -		 * This optimizes for the case when the gpu will use the data
> -		 * right away and we therefore have to clflush anyway.
> -		 */
> -		*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
> -	}
> +	if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
> +		goto out;
> +
> +	ret = i915_gem_object_wait_rendering(obj, false);
> +	if (ret)
> +		goto err_unpin;
> +
> +	i915_gem_object_flush_gtt_write_domain(obj);
> +
> +	/* If we're not in the cpu write domain, set ourself into the
> +	 * gtt write domain and manually flush cachelines (as required).
> +	 * This optimizes for the case when the gpu will use the data
> +	 * right away and we therefore have to clflush anyway.
> +	 */
> +	*needs_clflush |= cpu_write_needs_clflush(obj) << 1;

Use ?: to keep the code readable.

>  
>  	/* Same trick applies to invalidate partially written cachelines read
>  	 * before writing.
> @@ -635,27 +650,26 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
>  		*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
>  							 obj->cache_level);
>  
> -	ret = i915_gem_object_get_pages(obj);
> -	if (ret)
> -		return ret;
> -
> -	i915_gem_object_pin_pages(obj);
> -
>  	if (*needs_clflush && !boot_cpu_has(X86_FEATURE_CLFLUSH)) {
>  		ret = i915_gem_object_set_to_cpu_domain(obj, true);
> -		if (ret) {
> -			i915_gem_object_unpin_pages(obj);
> -			return ret;
> -		}
> +		if (ret)
> +			goto err_unpin;
> +
>  		*needs_clflush = 0;
>  	}
>  
>  	if ((*needs_clflush & CLFLUSH_AFTER) == 0)
>  		obj->cache_dirty = true;
>  
> +out:
>  	intel_fb_obj_invalidate(obj, ORIGIN_CPU);
>  	obj->dirty = 1;
> +	/* return with the pages pinned */
>  	return 0;
> +
> +err_unpin:
> +	i915_gem_object_unpin_pages(obj);
> +	return ret;
>  }
>  
>  /* Per-page copy function for the shmem pread fastpath.
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/8] drm/i915: Tidy up flush cpu/gtt write domains
  2016-06-09 11:29 ` [PATCH 7/8] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
@ 2016-06-09 13:12   ` Joonas Lahtinen
  0 siblings, 0 replies; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 13:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> Since we know the write domain, we can drop the local variable and make
> the code look a tiny bit simpler.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 15 ++++-----------
>  1 file changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 8c90b6a12d45..45f878350d66 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2913,7 +2913,6 @@ static void
>  i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> -	uint32_t old_write_domain;
>  
>  	if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
>  		return;
> @@ -2937,36 +2936,30 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
>  	if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv))
>  		POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base));
>  
> -	old_write_domain = obj->base.write_domain;
> -	obj->base.write_domain = 0;
> -
>  	intel_fb_obj_flush(obj, false, ORIGIN_GTT);
>  
> +	obj->base.write_domain = 0;
>  	trace_i915_gem_object_change_domain(obj,
>  					    obj->base.read_domains,
> -					    old_write_domain);
> +					    I915_GEM_DOMAIN_GTT);
>  }
>  
>  /** Flushes the CPU write domain for the object if it's dirty. */
>  static void
>  i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
>  {
> -	uint32_t old_write_domain;
> -
>  	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
>  		return;
>  
>  	if (i915_gem_clflush_object(obj, obj->pin_display))
>  		i915_gem_chipset_flush(to_i915(obj->base.dev));
>  
> -	old_write_domain = obj->base.write_domain;
> -	obj->base.write_domain = 0;
> -
>  	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
>  
> +	obj->base.write_domain = 0;
>  	trace_i915_gem_object_change_domain(obj,
>  					    obj->base.read_domains,
> -					    old_write_domain);
> +					    I915_GEM_DOMAIN_CPU);
>  }
>  
>  /**
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing
  2016-06-09 11:29 ` [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing Chris Wilson
@ 2016-06-09 13:31   ` Joonas Lahtinen
  2016-06-09 14:22     ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 13:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> With in the introduction of the reloc page cache, we are just one step
> away from refactoring the relocation write functions into one. Not only
> does it tidy the code (slightly), but it greatly simplifies the control
> logic much to gcc's satisfaction.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 289 +++++++++++++++--------------
>  1 file changed, 150 insertions(+), 139 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 318c71b663f4..bda8ec8b02e6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -34,6 +34,8 @@
>  #include 
>  #include 
>  
> +#define DBG_USE_CPU_RELOC 0 /* force relocations to use the CPU write path */

Better connect to the new debug Kconfig menu you introduced?

> +
>  #define  __EXEC_OBJECT_HAS_PIN (1U<<31)
>  #define  __EXEC_OBJECT_HAS_FENCE (1U<<30)
>  #define  __EXEC_OBJECT_NEEDS_MAP (1U<<29)
> @@ -53,6 +55,7 @@ struct i915_execbuffer_params {
>  };
>  
>  struct eb_vmas {
> +	struct drm_i915_private *i915;
>  	struct list_head vmas;
>  	int and;
>  	union {
> @@ -62,7 +65,8 @@ struct eb_vmas {
>  };
>  
>  static struct eb_vmas *
> -eb_create(struct drm_i915_gem_execbuffer2 *args)
> +eb_create(struct drm_i915_private *i915,
> +	  struct drm_i915_gem_execbuffer2 *args)
>  {
>  	struct eb_vmas *eb = NULL;
>  
> @@ -89,6 +93,7 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
>  	} else
>  		eb->and = -args->buffer_count;
>  
> +	eb->i915 = i915;
>  	INIT_LIST_HEAD(&eb->vmas);
>  	return eb;
>  }
> @@ -272,7 +277,8 @@ static void eb_destroy(struct eb_vmas *eb)
>  
>  static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
>  {
> -	return (HAS_LLC(obj->base.dev) ||
> +	return (DBG_USE_CPU_RELOC ||
> +		HAS_LLC(obj->base.dev) ||
>  		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
>  		obj->cache_level != I915_CACHE_NONE);
>  }
> @@ -296,32 +302,58 @@ static inline uint64_t gen8_noncanonical_addr(uint64_t address)
>  }
>  
>  static inline uint64_t
> -relocation_target(struct drm_i915_gem_relocation_entry *reloc,
> +relocation_target(const struct drm_i915_gem_relocation_entry *reloc,

These const changes could be a bunch instead of sprinkled around.
Unless we want to smuggle them in through the resistance.

>  		  uint64_t target_offset)
>  {
>  	return gen8_canonical_addr((int)reloc->delta + target_offset);
>  }
>  
>  struct reloc_cache {
> -	void *vaddr;
> +	struct drm_i915_private *i915;
> +	unsigned long vaddr;
>  	unsigned page;
> -	enum { KMAP, IOMAP } type;
> +	struct drm_mm_node node;
> +	bool use_64bit_reloc;
>  };
>  
> -static void reloc_cache_init(struct reloc_cache *cache)
> +static void reloc_cache_init(struct reloc_cache *cache,
> +			     struct drm_i915_private *i915)
>  {
>  	cache->page = -1;
> -	cache->vaddr = NULL;
> +	cache->vaddr = 0;
> +	cache->i915 = i915;
> +	cache->use_64bit_reloc = INTEL_GEN(cache->i915) >= 8;
> +}
> +
> +static inline void *unmask_page(unsigned long p)
> +{
> +	return (void *)(uintptr_t)(p & PAGE_MASK);
>  }
>  
> +static inline unsigned unmask_flags(unsigned long p)
> +{
> +	return p & ~PAGE_MASK;
> +}
> +
> +#define KMAP 0x4
> +
>  static void reloc_cache_fini(struct reloc_cache *cache)
>  {
> -	if (cache->vaddr == NULL)
> +	void *vaddr;
> +
> +	if (cache->vaddr == 0)
>  		return;
>  
> -	switch (cache->type) {
> -	case KMAP: kunmap_atomic(cache->vaddr); break;
> -	case IOMAP: io_mapping_unmap_atomic(cache->vaddr); break;
> +	vaddr = unmask_page(cache->vaddr);
> +	if (cache->vaddr & KMAP) {
> +		if (cache->vaddr & CLFLUSH_AFTER)
> +			mb();
> +
> +		kunmap_atomic(vaddr);
> +		i915_gem_object_unpin_pages((struct drm_i915_gem_object *)cache->node.mm);

Rather long line.

> +	} else {
> +		io_mapping_unmap_atomic(vaddr);
> +		i915_vma_unpin((struct i915_vma *)cache->node.mm);
>  	}
>  }
>  
> @@ -329,148 +361,138 @@ static void *reloc_kmap(struct drm_i915_gem_object *obj,
>  			struct reloc_cache *cache,
>  			int page)
>  {
> -	if (cache->page == page)
> -		return cache->vaddr;
> +	void *vaddr;
>  
> -	if (cache->vaddr)
> -		kunmap_atomic(cache->vaddr);
> +	if (cache->vaddr) {
> +		kunmap_atomic(unmask_page(cache->vaddr));
> +	} else {
> +		unsigned flushes;
> +		int ret;
> +
> +		ret = i915_gem_obj_prepare_shmem_write(obj, &flushes);

Yeah, needs_clflush is so bad name earlier in the series, that even you
don't use it. "need_clflushes" or anything is better.

> +		if (ret)
> +			return ERR_PTR(ret);
>  
> +		cache->vaddr = flushes | KMAP;
> +		cache->node.mm = (void *)obj;
> +		if (flushes)
> +			mb();
> +	}
> +
> +	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
> +	cache->vaddr = unmask_flags(cache->vaddr) | (unsigned long)vaddr;
>  	cache->page = page;
> -	cache->vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
> -	cache->type = KMAP;
>  
> -	return cache->vaddr;
> +	return vaddr;
>  }
>  
> -static int
> -relocate_entry_cpu(struct drm_i915_gem_object *obj,
> -		   struct drm_i915_gem_relocation_entry *reloc,
> -		   struct reloc_cache *cache,
> -		   uint64_t target_offset)
> +static void *reloc_iomap(struct drm_i915_gem_object *obj,
> +			 struct reloc_cache *cache,
> +			 int page)
>  {
> -	struct drm_device *dev = obj->base.dev;
> -	uint32_t page_offset = offset_in_page(reloc->offset);
> -	uint64_t delta = relocation_target(reloc, target_offset);
> -	char *vaddr;
> -	int ret;
> +	void *vaddr;
>  
> -	ret = i915_gem_object_set_to_cpu_domain(obj, true);
> -	if (ret)
> -		return ret;
> +	if (cache->vaddr) {
> +		io_mapping_unmap_atomic(unmask_page(cache->vaddr));
> +	} else {
> +		struct i915_vma *vma;
> +		int ret;
>  
> -	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
> -	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
> +		if (use_cpu_reloc(obj))
> +			return NULL;
>  
> -	if (INTEL_GEN(dev) >= 8) {
> -		page_offset += sizeof(uint32_t);
> -		if (page_offset == PAGE_SIZE) {
> -			vaddr = reloc_kmap(obj, cache, cache->page + 1);
> -			page_offset = 0;
> -		}
> -		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
> -	}
> +		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> +					       PIN_MAPPABLE | PIN_NONBLOCK);
> +		if (IS_ERR(vma))
> +			return NULL;
>  
> -	return 0;
> -}

Ugh, did you simultaneously move and rename functions. This is rather
hard to follow from this point on...

Regards, Joonas

> +		ret = i915_gem_object_set_to_gtt_domain(obj, true);
> +		if (ret)
> +			return ERR_PTR(ret);
>  
> -static void *reloc_iomap(struct drm_i915_private *i915,
> -			 struct reloc_cache *cache,
> -			 uint64_t offset)
> -{
> -	if (cache->page == offset >> PAGE_SHIFT)
> -		return cache->vaddr;
> +		ret = i915_gem_object_put_fence(obj);
> +		if (ret)
> +			return ERR_PTR(ret);
>  
> -	if (cache->vaddr)
> -		io_mapping_unmap_atomic(cache->vaddr);
> +		cache->node.start = vma->node.start;
> +		cache->node.mm = (void *)vma;
> +	}
>  
> -	cache->page = offset >> PAGE_SHIFT;
> -	cache->vaddr =
> -		io_mapping_map_atomic_wc(i915->ggtt.mappable,
> -					 offset & PAGE_MASK);
> -	cache->type = IOMAP;
> +	vaddr = io_mapping_map_atomic_wc(cache->i915->ggtt.mappable,
> +					 cache->node.start + (page << PAGE_SHIFT));
> +	cache->page = page;
> +	cache->vaddr = (unsigned long)vaddr;
>  
> -	return cache->vaddr;
> +	return vaddr;
>  }
>  
> -static int
> -relocate_entry_gtt(struct drm_i915_gem_object *obj,
> -		   struct drm_i915_gem_relocation_entry *reloc,
> -		   struct reloc_cache *cache,
> -		   uint64_t target_offset)
> +static void *reloc_vaddr(struct drm_i915_gem_object *obj,
> +			 struct reloc_cache *cache,
> +			 int page)
>  {
> -	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> -	struct i915_vma *vma;
> -	uint64_t delta = relocation_target(reloc, target_offset);
> -	uint64_t offset;
> -	void __iomem *reloc_page;
> -	int ret;
> -
> -	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
> -	if (IS_ERR(vma))
> -		return PTR_ERR(vma);
> -
> -	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> -	if (ret)
> -		goto unpin;
> -
> -	ret = i915_gem_object_put_fence(obj);
> -	if (ret)
> -		goto unpin;
> -
> -	/* Map the page containing the relocation we're going to perform.  */
> -	offset = vma->node.start;
> -	offset += reloc->offset;
> -	reloc_page = reloc_iomap(dev_priv, cache, offset);
> -	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
> +	void *vaddr;
>  
> -	if (INTEL_GEN(dev_priv) >= 8) {
> -		offset += sizeof(uint32_t);
> -		if (offset_in_page(offset) == 0)
> -			reloc_page = reloc_iomap(dev_priv, cache, offset);
> -		iowrite32(upper_32_bits(delta),
> -			  reloc_page + offset_in_page(offset));
> +	if (cache->page == page) {
> +		vaddr = unmask_page(cache->vaddr);
> +	} else {
> +		vaddr = NULL;
> +		if ((cache->vaddr & KMAP) == 0)
> +			vaddr = reloc_iomap(obj, cache, page);
> +		if (vaddr == NULL)
> +			vaddr = reloc_kmap(obj, cache, page);
>  	}
>  
> -unpin:
> -	__i915_vma_unpin(vma);
> -	return ret;
> +	return vaddr;
>  }
>  
>  static void
> -clflush_write32(void *addr, uint32_t value)
> +clflush_write32(uint32_t *addr, uint32_t value, unsigned flushes)
>  {
> -	/* This is not a fast path, so KISS. */
> -	drm_clflush_virt_range(addr, sizeof(uint32_t));
> -	*(uint32_t *)addr = value;
> -	drm_clflush_virt_range(addr, sizeof(uint32_t));
> +	if (unlikely(flushes & (CLFLUSH_BEFORE | CLFLUSH_AFTER))) {
> +		if (flushes & CLFLUSH_BEFORE) {
> +			clflushopt(addr);
> +			mb();
> +		}
> +
> +		*addr = value;
> +
> +		/* Writes to the same cacheline are serialised by the CPU
> +		 * (including clflush). On the write path, we only require
> +		 * that it hits memory in an orderly fashion and place
> +		 * mb barriers at the start and end of the relocation phase
> +		 * to ensure ordering of clflush wrt to the system.
> +		 */
> +		if (flushes & CLFLUSH_AFTER)
> +			clflushopt(addr);
> +	} else
> +		*addr = value;
>  }
>  
>  static int
> -relocate_entry_clflush(struct drm_i915_gem_object *obj,
> -		       struct drm_i915_gem_relocation_entry *reloc,
> -		       struct reloc_cache *cache,
> -		       uint64_t target_offset)
> +relocate_entry(struct drm_i915_gem_object *obj,
> +	       const struct drm_i915_gem_relocation_entry *reloc,
> +	       struct reloc_cache *cache,
> +	       uint64_t target_offset)
>  {
> -	struct drm_device *dev = obj->base.dev;
> -	uint32_t page_offset = offset_in_page(reloc->offset);
> -	uint64_t delta = relocation_target(reloc, target_offset);
> -	char *vaddr;
> -	int ret;
> +	uint64_t offset = reloc->offset;
> +	bool wide = cache->use_64bit_reloc;
> +	void *vaddr;
>  
> -	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> -	if (ret)
> -		return ret;
> +	target_offset = relocation_target(reloc, target_offset);
> +repeat:
> +	vaddr = reloc_vaddr(obj, cache, offset >> PAGE_SHIFT);
> +	if (IS_ERR(vaddr))
> +		return PTR_ERR(vaddr);
>  
> -	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
> -	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
> +	clflush_write32(vaddr + offset_in_page(offset),
> +			lower_32_bits(target_offset),
> +			cache->vaddr);
>  
> -	if (INTEL_GEN(dev) >= 8) {
> -		page_offset += sizeof(uint32_t);
> -		if (page_offset == PAGE_SIZE) {
> -			vaddr = reloc_kmap(obj, cache, cache->page + 1);
> -			page_offset = 0;
> -		}
> -		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
> +	if (wide) {
> +		offset += sizeof(uint32_t);
> +		target_offset >>= 32;
> +		wide = false;
> +		goto repeat;
>  	}
>  
>  	return 0;
> @@ -557,7 +579,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  
>  	/* Check that the relocation address is valid... */
>  	if (unlikely(reloc->offset >
> -		obj->base.size - (INTEL_INFO(dev)->gen >= 8 ? 8 : 4))) {
> +		     obj->base.size - (cache->use_64bit_reloc ? 8 : 4))) {
>  		DRM_DEBUG("Relocation beyond object bounds: "
>  			  "obj %p target %d offset %d size %d.\n",
>  			  obj, reloc->target_handle,
> @@ -577,23 +599,12 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  	if (pagefault_disabled() && !object_is_idle(obj))
>  		return -EFAULT;
>  
> -	if (use_cpu_reloc(obj))
> -		ret = relocate_entry_cpu(obj, reloc, cache, target_offset);
> -	else if (obj->map_and_fenceable)
> -		ret = relocate_entry_gtt(obj, reloc, cache, target_offset);
> -	else if (static_cpu_has(X86_FEATURE_CLFLUSH))
> -		ret = relocate_entry_clflush(obj, reloc, cache, target_offset);
> -	else {
> -		WARN_ONCE(1, "Impossible case in relocation handling\n");
> -		ret = -ENODEV;
> -	}
> -
> +	ret = relocate_entry(obj, reloc, cache, target_offset);
>  	if (ret)
>  		return ret;
>  
>  	/* and update the user's relocation entry */
>  	reloc->presumed_offset = target_offset;
> -
>  	return 0;
>  }
>  
> @@ -609,7 +620,7 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>  	int remain, ret = 0;
>  
>  	user_relocs = u64_to_user_ptr(entry->relocs_ptr);
> -	reloc_cache_init(&cache);
> +	reloc_cache_init(&cache, eb->i915);
>  
>  	remain = entry->relocation_count;
>  	while (remain) {
> @@ -658,7 +669,7 @@ i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
>  	struct reloc_cache cache;
>  	int i, ret = 0;
>  
> -	reloc_cache_init(&cache);
> +	reloc_cache_init(&cache, eb->i915);
>  	for (i = 0; i < entry->relocation_count; i++) {
>  		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i], &cache);
>  		if (ret)
> @@ -1073,8 +1084,8 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>  	if (flush_chipset)
>  		i915_gem_chipset_flush(req->engine->i915);
>  
> -	if (flush_domains & I915_GEM_DOMAIN_GTT)
> -		wmb();
> +	/* Make sure (untracked) CPU relocs/parsing are flushed */
> +	wmb();
>  
>  	/* Unconditionally invalidate gpu caches and TLBs. */
>  	return req->engine->emit_flush(req, I915_GEM_GPU_DOMAINS, 0);
> @@ -1606,7 +1617,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  
>  	memset(¶ms_master, 0x00, sizeof(params_master));
>  
> -	eb = eb_create(args);
> +	eb = eb_create(dev_priv, args);
>  	if (eb == NULL) {
>  		i915_gem_context_put(ctx);
>  		mutex_unlock(&dev->struct_mutex);
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write
  2016-06-09 13:06   ` Joonas Lahtinen
@ 2016-06-09 13:35     ` Chris Wilson
  2016-06-09 13:51       ` Joonas Lahtinen
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 13:35 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Jun 09, 2016 at 04:06:59PM +0300, Joonas Lahtinen wrote:
> On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> > There is an improbable, but not impossible, case that if we leave the
> > pages unpin as we operate on the object, then somebody may steal the
> > lock and change the cache domains after we have already inspected them.
> > 
> 
> Which lock exactly?

The shrinker steals struct_mutex from underneath us. The guard is
pinning pages around operating on the object.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write
  2016-06-09 13:35     ` Chris Wilson
@ 2016-06-09 13:51       ` Joonas Lahtinen
  2016-06-09 14:13         ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Joonas Lahtinen @ 2016-06-09 13:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-06-09 at 14:35 +0100, Chris Wilson wrote:
> On Thu, Jun 09, 2016 at 04:06:59PM +0300, Joonas Lahtinen wrote:
> > 
> > On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> > > 
> > > There is an improbable, but not impossible, case that if we leave the
> > > pages unpin as we operate on the object, then somebody may steal the
> > > lock and change the cache domains after we have already inspected them.
> > > 
> > Which lock exactly?
> The shrinker steals struct_mutex from underneath us. The guard is
> pinning pages around operating on the object.

Wouldn't the race scenario I described then apply (between get_pages
and pin_pages)?

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write
  2016-06-09 13:51       ` Joonas Lahtinen
@ 2016-06-09 14:13         ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 14:13 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Jun 09, 2016 at 04:51:41PM +0300, Joonas Lahtinen wrote:
> On to, 2016-06-09 at 14:35 +0100, Chris Wilson wrote:
> > On Thu, Jun 09, 2016 at 04:06:59PM +0300, Joonas Lahtinen wrote:
> > > 
> > > On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> > > > 
> > > > There is an improbable, but not impossible, case that if we leave the
> > > > pages unpin as we operate on the object, then somebody may steal the
> > > > lock and change the cache domains after we have already inspected them.
> > > > 
> > > Which lock exactly?
> > The shrinker steals struct_mutex from underneath us. The guard is
> > pinning pages around operating on the object.
> 
> Wouldn't the race scenario I described then apply (between get_pages
> and pin_pages)?

Apart from direct reclaim has no opportunity to run between them. I have
sent patches in the path to change the api such that question cannot
even be raised (the only issue is a bit of ugliness where we abuse the
page pinning to workaround unknown memory layouts) but never pushed
hard. In my recent patches for using pages outside of the struct_mutex,
closing that window makes the api easier to hide the details of
acquiring the pages + pinning them.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing
  2016-06-09 13:31   ` Joonas Lahtinen
@ 2016-06-09 14:22     ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-06-09 14:22 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Jun 09, 2016 at 04:31:25PM +0300, Joonas Lahtinen wrote:
> On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote:
> > With in the introduction of the reloc page cache, we are just one step
> > away from refactoring the relocation write functions into one. Not only
> > does it tidy the code (slightly), but it greatly simplifies the control
> > logic much to gcc's satisfaction.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 289 +++++++++++++++--------------
> >  1 file changed, 150 insertions(+), 139 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 318c71b663f4..bda8ec8b02e6 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -34,6 +34,8 @@
> >  #include 
> >  #include 
> >  
> > +#define DBG_USE_CPU_RELOC 0 /* force relocations to use the CPU write path */
> 
> Better connect to the new debug Kconfig menu you introduced?

I don't think so, it's not even intended as a general debug feature.
Just needed to force certain code paths for ease of testing.

That becomes more important later on when we have fewer reasons to use
the cpu relocation path on !llc.


> > -static int
> > -relocate_entry_cpu(struct drm_i915_gem_object *obj,
> > -		   struct drm_i915_gem_relocation_entry *reloc,
> > -		   struct reloc_cache *cache,
> > -		   uint64_t target_offset)
> > +static void *reloc_iomap(struct drm_i915_gem_object *obj,
> > +			 struct reloc_cache *cache,
> > +			 int page)
> >  {
> > -	struct drm_device *dev = obj->base.dev;
> > -	uint32_t page_offset = offset_in_page(reloc->offset);
> > -	uint64_t delta = relocation_target(reloc, target_offset);
> > -	char *vaddr;
> > -	int ret;
> > +	void *vaddr;
> >  
> > -	ret = i915_gem_object_set_to_cpu_domain(obj, true);
> > -	if (ret)
> > -		return ret;
> > +	if (cache->vaddr) {
> > +		io_mapping_unmap_atomic(unmask_page(cache->vaddr));
> > +	} else {
> > +		struct i915_vma *vma;
> > +		int ret;
> >  
> > -	vaddr = reloc_kmap(obj, cache, reloc->offset >> PAGE_SHIFT);
> > -	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
> > +		if (use_cpu_reloc(obj))
> > +			return NULL;
> >  
> > -	if (INTEL_GEN(dev) >= 8) {
> > -		page_offset += sizeof(uint32_t);
> > -		if (page_offset == PAGE_SIZE) {
> > -			vaddr = reloc_kmap(obj, cache, cache->page + 1);
> > -			page_offset = 0;
> > -		}
> > -		*(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta);
> > -	}
> > +		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
> > +					       PIN_MAPPABLE | PIN_NONBLOCK);
> > +		if (IS_ERR(vma))
> > +			return NULL;
> >  
> > -	return 0;
> > -}
> 
> Ugh, did you simultaneously move and rename functions. This is rather
> hard to follow from this point on...

No, it's a bunch of function removals. I have a feeling it is going to
look ugly in diff no matter what, unless diff stops trying to relate
uncorresponding code.
-Chirs

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-06-09 14:22 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 11:29 Start tidying up execbuf relocations Chris Wilson
2016-06-09 11:29 ` [PATCH 1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
2016-06-09 12:29   ` Joonas Lahtinen
2016-06-09 11:29 ` [PATCH 2/8] drm/i915: Cache kmap between relocations Chris Wilson
2016-06-09 12:25   ` Joonas Lahtinen
2016-06-09 11:29 ` [PATCH 3/8] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
2016-06-09 11:39   ` Chris Wilson
2016-06-09 12:47   ` Joonas Lahtinen
2016-06-09 11:29 ` [PATCH 4/8] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
2016-06-09 12:50   ` Joonas Lahtinen
2016-06-09 11:29 ` [PATCH 5/8] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
2016-06-09 11:36   ` Chris Wilson
2016-06-09 12:54   ` Joonas Lahtinen
2016-06-09 11:29 ` [PATCH 6/8] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
2016-06-09 13:06   ` Joonas Lahtinen
2016-06-09 13:35     ` Chris Wilson
2016-06-09 13:51       ` Joonas Lahtinen
2016-06-09 14:13         ` Chris Wilson
2016-06-09 11:29 ` [PATCH 7/8] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
2016-06-09 13:12   ` Joonas Lahtinen
2016-06-09 11:29 ` [PATCH 8/8] drm/i915: Refactor execbuffer relocation writing Chris Wilson
2016-06-09 13:31   ` Joonas Lahtinen
2016-06-09 14:22     ` Chris Wilson
2016-06-09 11:40 ` ✗ Ro.CI.BAT: failure for series starting with [1/8] drm/i915: Print the batchbuffer offset next to BBADDR in error state Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.