All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] agp/intel: Serialise after GTT updates
@ 2015-01-14 11:20 Chris Wilson
  2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Chris Wilson @ 2015-01-14 11:20 UTC (permalink / raw)
  To: intel-gfx

An interesting bug occurs on Pineview through which the root cause is
that the writes of the PTE values into the GTT is not serialised with
subsequent memory access through the GTT. This is despite there being a
posting read after the GTT update. However, by changing the address of
the posting read, the memory access is indeed serialised correctly.

Whilst we are manipulating the memory barriers, we can remove the
compiler :memory restraint on the intermediate PTE writes knowing that
we explicitly perform a posting read afterwards.

Testcase: igt/gem_exec_big #pnv
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88191
Tested-by: huax.lu@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/char/agp/intel-gtt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 92aa43fa8d70..15685ca39193 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct agp_memory *mem, off_t pg_start,
 		intel_private.driver->write_entry(addr,
 						  i, type);
 	}
-	readl(intel_private.gtt+i-1);
+	readl(intel_private.gtt+pg_start);
 
 	return 0;
 }
@@ -329,7 +329,7 @@ static void i810_write_entry(dma_addr_t addr, unsigned int entry,
 		break;
 	}
 
-	writel(addr | pte_flags, intel_private.gtt + entry);
+	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
 static const struct aper_size_info_fixed intel_fake_agp_sizes[] = {
@@ -735,7 +735,7 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
 	if (flags ==  AGP_USER_CACHED_MEMORY)
 		pte_flags |= I830_PTE_SYSTEM_CACHED;
 
-	writel(addr | pte_flags, intel_private.gtt + entry);
+	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
 bool intel_enable_gtt(void)
@@ -858,7 +858,7 @@ void intel_gtt_insert_sg_entries(struct sg_table *st,
 			j++;
 		}
 	}
-	readl(intel_private.gtt+j-1);
+	readl(intel_private.gtt+pg_start);
 }
 EXPORT_SYMBOL(intel_gtt_insert_sg_entries);
 
@@ -875,7 +875,7 @@ static void intel_gtt_insert_pages(unsigned int first_entry,
 		intel_private.driver->write_entry(addr,
 						  j, flags);
 	}
-	readl(intel_private.gtt+j-1);
+	readl(intel_private.gtt+first_entry);
 }
 
 static int intel_fake_agp_insert_entries(struct agp_memory *mem,
@@ -938,7 +938,7 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
 		intel_private.driver->write_entry(intel_private.scratch_page_dma,
 						  i, 0);
 	}
-	readl(intel_private.gtt+i-1);
+	readl(intel_private.gtt+first_entry);
 }
 EXPORT_SYMBOL(intel_gtt_clear_range);
 
@@ -1106,7 +1106,7 @@ static void i965_write_entry(dma_addr_t addr,
 
 	/* Shift high bits down */
 	addr |= (addr >> 28) & 0xf0;
-	writel(addr | pte_flags, intel_private.gtt + entry);
+	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
 static int i9xx_setup(void)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
@ 2015-01-14 11:20 ` Chris Wilson
  2015-01-15  9:45   ` Daniel, Thomas
                     ` (2 more replies)
  2015-01-14 11:20 ` [PATCH 3/5] drm/i915: Trim the command parser allocations Chris Wilson
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 31+ messages in thread
From: Chris Wilson @ 2015-01-14 11:20 UTC (permalink / raw)
  To: intel-gfx

If the batch buffer is too large to fit into the aperture and we need a
GTT mapping for relocations, we currently fail. This only applies to a
subset of machines for a subset of environments, quite undesirable. We
can simply check after failing to insert the batch into the GTT as to
whether we only need a mappable binding for relocation and, if so, we can
revert to using a non-mappable binding and an alternate relocation
method. However, using relocate_entry_cpu() is excruciatingly slow for
large buffers on non-LLC as the entire buffer requires clflushing before
and after the relocation handling. Alternatively, we can implement a
third relocation method that only clflushes around the relocation entry.
This is still slower than updating through the GTT, so we prefer using
the GTT where possible, but is orders of magnitude faster as we
typically do not have to then clflush the entire buffer.

An alternative idea of using a temporary WC mapping of the backing store
is promising (it should be faster than using the GTT itself), but
requires fairly extensive arch/x86 support - along the lines of
kmap_atomic_prof_pfn() (which is not universally implemented even for
x86).

Testcase: igt/gem_exec_big #pnv,byt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 84 +++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e3ef17783765..06bdf7670584 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -251,7 +251,6 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 {
 	return (HAS_LLC(obj->base.dev) ||
 		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
-		!obj->map_and_fenceable ||
 		obj->cache_level != I915_CACHE_NONE);
 }
 
@@ -337,6 +336,51 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
+static void
+clflush_write32(void *addr, uint32_t value)
+{
+	/* This is not a fast path, so KISS. */
+	drm_clflush_virt_range(addr, sizeof(uint32_t));
+	*(uint32_t *)addr = value;
+	drm_clflush_virt_range(addr, sizeof(uint32_t));
+}
+
+static int
+relocate_entry_clflush(struct drm_i915_gem_object *obj,
+		       struct drm_i915_gem_relocation_entry *reloc,
+		       uint64_t target_offset)
+{
+	struct drm_device *dev = obj->base.dev;
+	uint32_t page_offset = offset_in_page(reloc->offset);
+	uint64_t delta = (int)reloc->delta + target_offset;
+	char *vaddr;
+	int ret;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret)
+		return ret;
+
+	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
+				reloc->offset >> PAGE_SHIFT));
+	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
+
+	if (INTEL_INFO(dev)->gen >= 8) {
+		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
+
+		if (page_offset == 0) {
+			kunmap_atomic(vaddr);
+			vaddr = kmap_atomic(i915_gem_object_get_page(obj,
+			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
+		}
+
+		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
+	}
+
+	kunmap_atomic(vaddr);
+
+	return 0;
+}
+
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
@@ -426,9 +470,12 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 
 	if (use_cpu_reloc(obj))
 		ret = relocate_entry_cpu(obj, reloc, target_offset);
-	else
+	else if (obj->map_and_fenceable)
 		ret = relocate_entry_gtt(obj, reloc, target_offset);
-
+	else if (cpu_has_clflush)
+		ret = relocate_entry_clflush(obj, reloc, target_offset);
+	else
+		ret = -ENODEV;
 	if (ret)
 		return ret;
 
@@ -525,6 +572,12 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
 	return ret;
 }
 
+static bool only_mappable_for_reloc(unsigned int flags)
+{
+	return (flags & (EXEC_OBJECT_NEEDS_FENCE | __EXEC_OBJECT_NEEDS_MAP)) ==
+		__EXEC_OBJECT_NEEDS_MAP;
+}
+
 static int
 i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 				struct intel_engine_cs *ring,
@@ -536,14 +589,21 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
 	int ret;
 
 	flags = 0;
-	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
-		flags |= PIN_GLOBAL | PIN_MAPPABLE;
-	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
-		flags |= PIN_GLOBAL;
-	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
-		flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+	if (!drm_mm_node_allocated(&vma->node)) {
+		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
+			flags |= PIN_GLOBAL | PIN_MAPPABLE;
+		if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
+			flags |= PIN_GLOBAL;
+		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
+			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+	}
 
 	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
+	if ((ret == -ENOSPC  || ret == -E2BIG) &&
+	    only_mappable_for_reloc(entry->flags))
+		ret = i915_gem_object_pin(obj, vma->vm,
+					  entry->alignment,
+					  flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
 	if (ret)
 		return ret;
 
@@ -605,13 +665,13 @@ eb_vma_misplaced(struct i915_vma *vma)
 	    vma->node.start & (entry->alignment - 1))
 		return true;
 
-	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
-		return true;
-
 	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
 	    vma->node.start < BATCH_OFFSET_BIAS)
 		return true;
 
+	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
+		return !only_mappable_for_reloc(entry->flags);
+
 	return false;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/5] drm/i915: Trim the command parser allocations
  2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
  2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
@ 2015-01-14 11:20 ` Chris Wilson
  2015-02-13 13:08   ` John Harrison
  2015-01-14 11:20 ` [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2015-01-14 11:20 UTC (permalink / raw)
  To: intel-gfx

Currently, the command parser tries to create a secondary batch exactly
as large as the original, and vmap both. This is open to abuse by
userspace using extremely large batch objects, but only executing very
short batches. For example, this would be if userspace were to implement
a command submission ringbuffer. However, we only need to allocate pages
for just the contents of the command sequence in the batch - all
relocations copied to the secondary batch will reference the original
batch and so there can be no access to the secondary batch outside of
the explicit execution region.

Testcase: igt/gem_exec_big #ivb,byt,hsw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 74 ++++++++++++++----------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 74 ++++++++++++++++--------------
 2 files changed, 73 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 806e812340d0..9a6da3536ae5 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -818,24 +818,26 @@ static bool valid_reg(const u32 *table, int count, u32 addr)
 	return false;
 }
 
-static u32 *vmap_batch(struct drm_i915_gem_object *obj)
+static u32 *vmap_batch(struct drm_i915_gem_object *obj,
+		       unsigned start, unsigned len)
 {
 	int i;
 	void *addr = NULL;
 	struct sg_page_iter sg_iter;
+	int first_page = start >> PAGE_SHIFT;
+	int last_page = (len + start + 4095) >> PAGE_SHIFT;
+	int npages = last_page - first_page;
 	struct page **pages;
 
-	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	pages = drm_malloc_ab(npages, sizeof(*pages));
 	if (pages == NULL) {
 		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
 		goto finish;
 	}
 
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		pages[i] = sg_page_iter_page(&sg_iter);
-		i++;
-	}
+	for_each_sg_page(obj->pages->sgl, &sg_iter, npages, first_page)
+		pages[i++] = sg_page_iter_page(&sg_iter);
 
 	addr = vmap(pages, i, 0, PAGE_KERNEL);
 	if (addr == NULL) {
@@ -855,61 +857,61 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		       u32 batch_start_offset,
 		       u32 batch_len)
 {
-	int ret = 0;
 	int needs_clflush = 0;
-	u32 *src_base, *dest_base = NULL;
-	u32 *src_addr, *dest_addr;
-	u32 offset = batch_start_offset / sizeof(*dest_addr);
-	u32 end = batch_start_offset + batch_len;
+	void *src_base, *src;
+	void *dst = NULL;
+	int ret;
 
-	if (end > dest_obj->base.size || end > src_obj->base.size)
+	if (batch_len > dest_obj->base.size ||
+	    batch_len + batch_start_offset > src_obj->base.size)
 		return ERR_PTR(-E2BIG);
 
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
 	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
+		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
 		return ERR_PTR(ret);
 	}
 
-	src_base = vmap_batch(src_obj);
+	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
 	if (!src_base) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
 		ret = -ENOMEM;
 		goto unpin_src;
 	}
 
-	src_addr = src_base + offset;
-
-	if (needs_clflush)
-		drm_clflush_virt_range((char *)src_addr, batch_len);
+	ret = i915_gem_object_get_pages(dest_obj);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to get pages for shadow batch\n");
+		goto unmap_src;
+	}
+	i915_gem_object_pin_pages(dest_obj);
 
 	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
 	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
+		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
 		goto unmap_src;
 	}
 
-	dest_base = vmap_batch(dest_obj);
-	if (!dest_base) {
+	dst = vmap_batch(dest_obj, 0, batch_len);
+	if (!dst) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
+		i915_gem_object_unpin_pages(dest_obj);
 		ret = -ENOMEM;
 		goto unmap_src;
 	}
 
-	dest_addr = dest_base + offset;
-
-	if (batch_start_offset != 0)
-		memset((u8 *)dest_base, 0, batch_start_offset);
+	src = src_base + offset_in_page(batch_start_offset);
+	if (needs_clflush)
+		drm_clflush_virt_range(src, batch_len);
 
-	memcpy(dest_addr, src_addr, batch_len);
-	memset((u8 *)dest_addr + batch_len, 0, dest_obj->base.size - end);
+	memcpy(dst, src, batch_len);
 
 unmap_src:
 	vunmap(src_base);
 unpin_src:
 	i915_gem_object_unpin_pages(src_obj);
 
-	return ret ? ERR_PTR(ret) : dest_base;
+	return ret ? ERR_PTR(ret) : dst;
 }
 
 /**
@@ -1046,34 +1048,26 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master)
 {
-	int ret = 0;
 	u32 *cmd, *batch_base, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
-
-	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to pin shadow batch\n");
-		return -1;
-	}
+	int ret = 0;
 
 	batch_base = copy_batch(shadow_batch_obj, batch_obj,
 				batch_start_offset, batch_len);
 	if (IS_ERR(batch_base)) {
 		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
-		i915_gem_object_ggtt_unpin(shadow_batch_obj);
 		return PTR_ERR(batch_base);
 	}
 
-	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
-
 	/*
 	 * We use the batch length as size because the shadow object is as
 	 * large or larger and copy_batch() will write MI_NOPs to the extra
 	 * space. Parsing should be faster in some cases this way.
 	 */
-	batch_end = cmd + (batch_len / sizeof(*batch_end));
+	batch_end = batch_base + (batch_len / sizeof(*batch_end));
 
+	cmd = batch_base;
 	while (cmd < batch_end) {
 		const struct drm_i915_cmd_descriptor *desc;
 		u32 length;
@@ -1132,7 +1126,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	}
 
 	vunmap(batch_base);
-	i915_gem_object_ggtt_unpin(shadow_batch_obj);
+	i915_gem_object_unpin_pages(shadow_batch_obj);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 06bdf7670584..3fbd5212225f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1136,16 +1136,15 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			  struct drm_i915_gem_object *batch_obj,
 			  u32 batch_start_offset,
 			  u32 batch_len,
-			  bool is_master,
-			  u32 *flags)
+			  bool is_master)
 {
 	struct drm_i915_private *dev_priv = to_i915(batch_obj->base.dev);
 	struct drm_i915_gem_object *shadow_batch_obj;
-	bool need_reloc = false;
+	struct i915_vma *vma;
 	int ret;
 
 	shadow_batch_obj = i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
-						   batch_obj->base.size);
+						   PAGE_ALIGN(batch_len));
 	if (IS_ERR(shadow_batch_obj))
 		return shadow_batch_obj;
 
@@ -1155,40 +1154,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			      batch_start_offset,
 			      batch_len,
 			      is_master);
-	if (ret) {
-		if (ret == -EACCES)
-			return batch_obj;
-	} else {
-		struct i915_vma *vma;
+	if (ret)
+		goto err;
 
-		memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
+	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
+	if (ret)
+		goto err;
 
-		vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
-		vma->exec_entry = shadow_exec_entry;
-		vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE;
-		drm_gem_object_reference(&shadow_batch_obj->base);
-		i915_gem_execbuffer_reserve_vma(vma, ring, &need_reloc);
-		list_add_tail(&vma->exec_list, &eb->vmas);
+	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
 
-		shadow_batch_obj->base.pending_read_domains =
-			batch_obj->base.pending_read_domains;
+	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
+	vma->exec_entry = shadow_exec_entry;
+	vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE | __EXEC_OBJECT_HAS_PIN;
+	drm_gem_object_reference(&shadow_batch_obj->base);
+	list_add_tail(&vma->exec_list, &eb->vmas);
 
-		/*
-		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
-		 * bit from MI_BATCH_BUFFER_START commands issued in the
-		 * dispatch_execbuffer implementations. We specifically
-		 * don't want that set when the command parser is
-		 * enabled.
-		 *
-		 * FIXME: with aliasing ppgtt, buffers that should only
-		 * be in ggtt still end up in the aliasing ppgtt. remove
-		 * this check when that is fixed.
-		 */
-		if (USES_FULL_PPGTT(dev))
-			*flags |= I915_DISPATCH_SECURE;
-	}
+	shadow_batch_obj->base.pending_read_domains = I915_GEM_DOMAIN_COMMAND;
+
+	return shadow_batch_obj;
 
-	return ret ? ERR_PTR(ret) : shadow_batch_obj;
+err:
+	if (ret == -EACCES) /* unhandled chained batch */
+		return batch_obj;
+	else
+		return ERR_PTR(ret);
 }
 
 int
@@ -1532,12 +1521,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 						      batch_obj,
 						      args->batch_start_offset,
 						      args->batch_len,
-						      file->is_master,
-						      &flags);
+						      file->is_master);
 		if (IS_ERR(batch_obj)) {
 			ret = PTR_ERR(batch_obj);
 			goto err;
 		}
+
+		/*
+		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
+		 * bit from MI_BATCH_BUFFER_START commands issued in the
+		 * dispatch_execbuffer implementations. We specifically
+		 * don't want that set when the command parser is
+		 * enabled.
+		 *
+		 * FIXME: with aliasing ppgtt, buffers that should only
+		 * be in ggtt still end up in the aliasing ppgtt. remove
+		 * this check when that is fixed.
+		 */
+		if (USES_FULL_PPGTT(dev))
+			flags |= I915_DISPATCH_SECURE;
+
+		exec_start = 0;
 	}
 
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
  2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
  2015-01-14 11:20 ` [PATCH 3/5] drm/i915: Trim the command parser allocations Chris Wilson
@ 2015-01-14 11:20 ` Chris Wilson
  2015-02-13 13:33   ` John Harrison
  2015-02-13 13:35   ` John Harrison
  2015-01-14 11:20 ` [PATCH 5/5] drm/i915: Tidy batch pool logic Chris Wilson
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Chris Wilson @ 2015-01-14 11:20 UTC (permalink / raw)
  To: intel-gfx

The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem.c |  4 ++++
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 66f0c607dbef..04a7d594d933 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2005,6 +2005,10 @@ struct drm_i915_gem_object {
 
 	struct sg_table *pages;
 	int pages_pin_count;
+	struct get_page {
+		struct scatterlist *sg;
+		int last;
+	} get_page;
 
 	/* prime dma-buf support */
 	void *dma_buf_vmapping;
@@ -2612,15 +2616,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 				    int *needs_clflush);
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
-static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
+
+static inline int sg_page_count(struct scatterlist *sg)
+{
+	return PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT;
+}
+
+static inline struct page *
+i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
 {
-	struct sg_page_iter sg_iter;
+	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
+		return NULL;
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, n)
-		return sg_page_iter_page(&sg_iter);
+	if (n < obj->get_page.last) {
+		obj->get_page.sg = obj->pages->sgl;
+		obj->get_page.last = 0;
+	}
+
+	while (obj->get_page.last + sg_page_count(obj->get_page.sg) <= n) {
+		obj->get_page.last += sg_page_count(obj->get_page.sg);
+		if (unlikely(sg_is_chain(++obj->get_page.sg)))
+			obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
+	}
 
-	return NULL;
+	return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
 }
+
 static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
 {
 	BUG_ON(obj->pages == NULL);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6c403654e33a..d710da099bdb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2260,6 +2260,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 		return ret;
 
 	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
+
+	obj->get_page.sg = obj->pages->sgl;
+	obj->get_page.last = 0;
+
 	return 0;
 }
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/5] drm/i915: Tidy batch pool logic
  2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
                   ` (2 preceding siblings ...)
  2015-01-14 11:20 ` [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
@ 2015-01-14 11:20 ` Chris Wilson
  2015-01-14 20:54   ` shuang.he
  2015-02-13 14:00   ` John Harrison
  2015-01-26 10:47 ` [PATCH v2] agp/intel: Serialise after GTT updates Chris Wilson
  2015-02-06  0:11 ` [PATCH 1/5] " Jesse Barnes
  5 siblings, 2 replies; 31+ messages in thread
From: Chris Wilson @ 2015-01-14 11:20 UTC (permalink / raw)
  To: intel-gfx

Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 12 +++---------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c | 31 +++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++--------
 3 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 9a6da3536ae5..3e5e8cb54a88 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -866,6 +866,9 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 	    batch_len + batch_start_offset > src_obj->base.size)
 		return ERR_PTR(-E2BIG);
 
+	if (WARN_ON(dest_obj->pages_pin_count == 0))
+		return ERR_PTR(-ENODEV);
+
 	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
@@ -879,13 +882,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 		goto unpin_src;
 	}
 
-	ret = i915_gem_object_get_pages(dest_obj);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: Failed to get pages for shadow batch\n");
-		goto unmap_src;
-	}
-	i915_gem_object_pin_pages(dest_obj);
-
 	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
 	if (ret) {
 		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
@@ -895,7 +891,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
 	dst = vmap_batch(dest_obj, 0, batch_len);
 	if (!dst) {
 		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
-		i915_gem_object_unpin_pages(dest_obj);
 		ret = -ENOMEM;
 		goto unmap_src;
 	}
@@ -1126,7 +1121,6 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 	}
 
 	vunmap(batch_base);
-	i915_gem_object_unpin_pages(shadow_batch_obj);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index c690170a1c4f..e5c88ddd8452 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -66,9 +66,7 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
 					 struct drm_i915_gem_object,
 					 batch_pool_list);
 
-		WARN_ON(obj->active);
-
-		list_del_init(&obj->batch_pool_list);
+		list_del(&obj->batch_pool_list);
 		drm_gem_object_unreference(&obj->base);
 	}
 }
@@ -96,10 +94,9 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
 
 	list_for_each_entry_safe(tmp, next,
-			&pool->cache_list, batch_pool_list) {
-
+				 &pool->cache_list, batch_pool_list) {
 		if (tmp->active)
-			continue;
+			break;
 
 		/* While we're looping, do some clean up */
 		if (tmp->madv == __I915_MADV_PURGED) {
@@ -113,25 +110,27 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		 * but not 'too much' bigger. A better way to do this
 		 * might be to bucket the pool objects based on size.
 		 */
-		if (tmp->base.size >= size &&
-		    tmp->base.size <= (2 * size)) {
+		if (tmp->base.size >= size && tmp->base.size <= 2 * size) {
 			obj = tmp;
 			break;
 		}
 	}
 
-	if (!obj) {
+	if (obj == NULL) {
+		int ret;
+
 		obj = i915_gem_alloc_object(pool->dev, size);
-		if (!obj)
+		if (obj == NULL)
 			return ERR_PTR(-ENOMEM);
 
-		list_add_tail(&obj->batch_pool_list, &pool->cache_list);
-	}
-	else
-		/* Keep list in LRU order */
-		list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+		ret = i915_gem_object_get_pages(obj);
+		if (ret)
+			return ERR_PTR(ret);
 
-	obj->madv = I915_MADV_WILLNEED;
+		obj->madv = I915_MADV_DONTNEED;
+	}
 
+	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
+	i915_gem_object_pin_pages(obj);
 	return obj;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3fbd5212225f..6357f01e6c46 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -37,7 +37,6 @@
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
 #define  __EXEC_OBJECT_NEEDS_MAP (1<<29)
 #define  __EXEC_OBJECT_NEEDS_BIAS (1<<28)
-#define  __EXEC_OBJECT_PURGEABLE (1<<27)
 
 #define BATCH_OFFSET_BIAS (256*1024)
 
@@ -224,12 +223,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
 		vma->pin_count--;
 
-	if (entry->flags & __EXEC_OBJECT_PURGEABLE)
-		obj->madv = I915_MADV_DONTNEED;
-
-	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE |
-			  __EXEC_OBJECT_HAS_PIN |
-			  __EXEC_OBJECT_PURGEABLE);
+	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
 }
 
 static void eb_destroy(struct eb_vmas *eb)
@@ -1154,6 +1148,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 			      batch_start_offset,
 			      batch_len,
 			      is_master);
+	i915_gem_object_unpin_pages(shadow_batch_obj);
 	if (ret)
 		goto err;
 
@@ -1165,7 +1160,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
 
 	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
 	vma->exec_entry = shadow_exec_entry;
-	vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE | __EXEC_OBJECT_HAS_PIN;
+	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	drm_gem_object_reference(&shadow_batch_obj->base);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] drm/i915: Tidy batch pool logic
  2015-01-14 11:20 ` [PATCH 5/5] drm/i915: Tidy batch pool logic Chris Wilson
@ 2015-01-14 20:54   ` shuang.he
  2015-02-13 14:00   ` John Harrison
  1 sibling, 0 replies; 31+ messages in thread
From: shuang.he @ 2015-01-14 20:54 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5581
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -1              353/353              352/353
ILK                                  353/353              353/353
SNB                                  400/422              400/422
IVB                                  487/487              487/487
BYT                                  296/296              296/296
HSW              +22                 486/508              508/508
BDW                 -1              402/402              401/402
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gen3_render_linear_blits      PASS(2, M25M23)      CRASH(1, M23)
 HSW  igt_kms_cursor_crc_cursor-size-change      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_kms_fence_pin_leak      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_kms_flip_event_leak      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_kms_mmio_vs_cs_flip_setcrtc_vs_cs_flip      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_kms_mmio_vs_cs_flip_setplane_vs_cs_flip      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_lpsp_non-edp      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_cursor      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_cursor-dpms      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_dpms-mode-unset-non-lpsp      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_dpms-non-lpsp      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_drm-resources-equal      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_fences      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_fences-dpms      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-execbuf      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-mmap-cpu      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-mmap-gtt      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_gem-pread      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_i2c      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_modeset-non-lpsp      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_modeset-non-lpsp-stress-no-wait      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_pci-d3-state      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
 HSW  igt_pm_rpm_rte      NSPT(1, M40)PASS(1, M20)      PASS(1, M20)
*BDW  igt_gem_concurrent_blit_gtt-rcs-early-read-interruptible      PASS(2, M30)      DMESG_WARN(1, M30)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
@ 2015-01-15  9:45   ` Daniel, Thomas
  2015-01-26  8:57   ` Jani Nikula
  2015-01-27 15:09   ` Daniel Vetter
  2 siblings, 0 replies; 31+ messages in thread
From: Daniel, Thomas @ 2015-01-15  9:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Chris Wilson
> Sent: Wednesday, January 14, 2015 11:21 AM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH 2/5] drm/i915: Fallback to using CPU relocations
> for large batch buffers
> 
> If the batch buffer is too large to fit into the aperture and we need a GTT
> mapping for relocations, we currently fail. This only applies to a subset of
> machines for a subset of environments, quite undesirable. We can simply
> check after failing to insert the batch into the GTT as to whether we only
> need a mappable binding for relocation and, if so, we can revert to using a
> non-mappable binding and an alternate relocation method. However, using
> relocate_entry_cpu() is excruciatingly slow for large buffers on non-LLC as
> the entire buffer requires clflushing before and after the relocation handling.
> Alternatively, we can implement a third relocation method that only clflushes
> around the relocation entry.
> This is still slower than updating through the GTT, so we prefer using the GTT
> where possible, but is orders of magnitude faster as we typically do not have
> to then clflush the entire buffer.
> 
> An alternative idea of using a temporary WC mapping of the backing store is
> promising (it should be faster than using the GTT itself), but requires fairly
> extensive arch/x86 support - along the lines of
> kmap_atomic_prof_pfn() (which is not universally implemented even for
> x86).
> 
> Testcase: igt/gem_exec_big #pnv,byt
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---

I was asked to look at the original version of this patch, so I did.  And I looked at this too.
All looks sensible.

Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>

 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
  2015-01-15  9:45   ` Daniel, Thomas
@ 2015-01-26  8:57   ` Jani Nikula
  2015-01-27 15:09   ` Daniel Vetter
  2 siblings, 0 replies; 31+ messages in thread
From: Jani Nikula @ 2015-01-26  8:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Wed, 14 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> If the batch buffer is too large to fit into the aperture and we need a
> GTT mapping for relocations, we currently fail. This only applies to a
> subset of machines for a subset of environments, quite undesirable. We
> can simply check after failing to insert the batch into the GTT as to
> whether we only need a mappable binding for relocation and, if so, we can
> revert to using a non-mappable binding and an alternate relocation
> method. However, using relocate_entry_cpu() is excruciatingly slow for
> large buffers on non-LLC as the entire buffer requires clflushing before
> and after the relocation handling. Alternatively, we can implement a
> third relocation method that only clflushes around the relocation entry.
> This is still slower than updating through the GTT, so we prefer using
> the GTT where possible, but is orders of magnitude faster as we
> typically do not have to then clflush the entire buffer.
>
> An alternative idea of using a temporary WC mapping of the backing store
> is promising (it should be faster than using the GTT itself), but
> requires fairly extensive arch/x86 support - along the lines of
> kmap_atomic_prof_pfn() (which is not universally implemented even for
> x86).
>
> Testcase: igt/gem_exec_big #pnv,byt
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392

Tested-by: lu hua <huax.lu@intel.com>

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 84 +++++++++++++++++++++++++-----
>  1 file changed, 72 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index e3ef17783765..06bdf7670584 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -251,7 +251,6 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
>  {
>  	return (HAS_LLC(obj->base.dev) ||
>  		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
> -		!obj->map_and_fenceable ||
>  		obj->cache_level != I915_CACHE_NONE);
>  }
>  
> @@ -337,6 +336,51 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
>  	return 0;
>  }
>  
> +static void
> +clflush_write32(void *addr, uint32_t value)
> +{
> +	/* This is not a fast path, so KISS. */
> +	drm_clflush_virt_range(addr, sizeof(uint32_t));
> +	*(uint32_t *)addr = value;
> +	drm_clflush_virt_range(addr, sizeof(uint32_t));
> +}
> +
> +static int
> +relocate_entry_clflush(struct drm_i915_gem_object *obj,
> +		       struct drm_i915_gem_relocation_entry *reloc,
> +		       uint64_t target_offset)
> +{
> +	struct drm_device *dev = obj->base.dev;
> +	uint32_t page_offset = offset_in_page(reloc->offset);
> +	uint64_t delta = (int)reloc->delta + target_offset;
> +	char *vaddr;
> +	int ret;
> +
> +	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> +	if (ret)
> +		return ret;
> +
> +	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
> +				reloc->offset >> PAGE_SHIFT));
> +	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
> +
> +	if (INTEL_INFO(dev)->gen >= 8) {
> +		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
> +
> +		if (page_offset == 0) {
> +			kunmap_atomic(vaddr);
> +			vaddr = kmap_atomic(i915_gem_object_get_page(obj,
> +			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
> +		}
> +
> +		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
> +	}
> +
> +	kunmap_atomic(vaddr);
> +
> +	return 0;
> +}
> +
>  static int
>  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  				   struct eb_vmas *eb,
> @@ -426,9 +470,12 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  
>  	if (use_cpu_reloc(obj))
>  		ret = relocate_entry_cpu(obj, reloc, target_offset);
> -	else
> +	else if (obj->map_and_fenceable)
>  		ret = relocate_entry_gtt(obj, reloc, target_offset);
> -
> +	else if (cpu_has_clflush)
> +		ret = relocate_entry_clflush(obj, reloc, target_offset);
> +	else
> +		ret = -ENODEV;
>  	if (ret)
>  		return ret;
>  
> @@ -525,6 +572,12 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
>  	return ret;
>  }
>  
> +static bool only_mappable_for_reloc(unsigned int flags)
> +{
> +	return (flags & (EXEC_OBJECT_NEEDS_FENCE | __EXEC_OBJECT_NEEDS_MAP)) ==
> +		__EXEC_OBJECT_NEEDS_MAP;
> +}
> +
>  static int
>  i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  				struct intel_engine_cs *ring,
> @@ -536,14 +589,21 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  	int ret;
>  
>  	flags = 0;
> -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> -		flags |= PIN_GLOBAL | PIN_MAPPABLE;
> -	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> -		flags |= PIN_GLOBAL;
> -	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> -		flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> +	if (!drm_mm_node_allocated(&vma->node)) {
> +		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> +			flags |= PIN_GLOBAL | PIN_MAPPABLE;
> +		if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> +			flags |= PIN_GLOBAL;
> +		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> +			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> +	}
>  
>  	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> +	if ((ret == -ENOSPC  || ret == -E2BIG) &&
> +	    only_mappable_for_reloc(entry->flags))
> +		ret = i915_gem_object_pin(obj, vma->vm,
> +					  entry->alignment,
> +					  flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
>  	if (ret)
>  		return ret;
>  
> @@ -605,13 +665,13 @@ eb_vma_misplaced(struct i915_vma *vma)
>  	    vma->node.start & (entry->alignment - 1))
>  		return true;
>  
> -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> -		return true;
> -
>  	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
>  	    vma->node.start < BATCH_OFFSET_BIAS)
>  		return true;
>  
> +	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> +		return !only_mappable_for_reloc(entry->flags);
> +
>  	return false;
>  }
>  
> -- 
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2] agp/intel: Serialise after GTT updates
  2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
                   ` (3 preceding siblings ...)
  2015-01-14 11:20 ` [PATCH 5/5] drm/i915: Tidy batch pool logic Chris Wilson
@ 2015-01-26 10:47 ` Chris Wilson
  2015-01-27 14:58   ` Daniel Vetter
  2015-01-28  7:50   ` shuang.he
  2015-02-06  0:11 ` [PATCH 1/5] " Jesse Barnes
  5 siblings, 2 replies; 31+ messages in thread
From: Chris Wilson @ 2015-01-26 10:47 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

An interesting bug occurs on Pineview through which the root cause is
that the writes of the PTE values into the GTT is not serialised with
subsequent memory access through the GTT (when using WC updates of the
PTE values). This is despite there being a posting read after the GTT
update. However, by changing the address of the posting read, the memory
access is indeed serialised correctly.

Whilst we are manipulating the memory barriers, we can remove the
compiler :memory restraint on the intermediate PTE writes knowing that
we explicitly perform a posting read afterwards.

v2: Replace posting reads with explicit write memory barriers - in
particular this is advantages in case of single page objects. Update
comments to mention this issue is only with WC writes.

Testcase: igt/gem_exec_big #pnv
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88191
Tested-by: huax.lu@intel.com (v1)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/char/agp/intel-gtt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 92aa43fa8d70..0b4188b9af7c 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct agp_memory *mem, off_t pg_start,
 		intel_private.driver->write_entry(addr,
 						  i, type);
 	}
-	readl(intel_private.gtt+i-1);
+	wmb();
 
 	return 0;
 }
@@ -329,7 +329,7 @@ static void i810_write_entry(dma_addr_t addr, unsigned int entry,
 		break;
 	}
 
-	writel(addr | pte_flags, intel_private.gtt + entry);
+	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
 static const struct aper_size_info_fixed intel_fake_agp_sizes[] = {
@@ -735,7 +735,7 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
 	if (flags ==  AGP_USER_CACHED_MEMORY)
 		pte_flags |= I830_PTE_SYSTEM_CACHED;
 
-	writel(addr | pte_flags, intel_private.gtt + entry);
+	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
 bool intel_enable_gtt(void)
@@ -858,7 +858,7 @@ void intel_gtt_insert_sg_entries(struct sg_table *st,
 			j++;
 		}
 	}
-	readl(intel_private.gtt+j-1);
+	wmb();
 }
 EXPORT_SYMBOL(intel_gtt_insert_sg_entries);
 
@@ -875,7 +875,7 @@ static void intel_gtt_insert_pages(unsigned int first_entry,
 		intel_private.driver->write_entry(addr,
 						  j, flags);
 	}
-	readl(intel_private.gtt+j-1);
+	wmb();
 }
 
 static int intel_fake_agp_insert_entries(struct agp_memory *mem,
@@ -938,7 +938,7 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
 		intel_private.driver->write_entry(intel_private.scratch_page_dma,
 						  i, 0);
 	}
-	readl(intel_private.gtt+i-1);
+	wmb();
 }
 EXPORT_SYMBOL(intel_gtt_clear_range);
 
@@ -1106,7 +1106,7 @@ static void i965_write_entry(dma_addr_t addr,
 
 	/* Shift high bits down */
 	addr |= (addr >> 28) & 0xf0;
-	writel(addr | pte_flags, intel_private.gtt + entry);
+	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
 static int i9xx_setup(void)
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] agp/intel: Serialise after GTT updates
  2015-01-26 10:47 ` [PATCH v2] agp/intel: Serialise after GTT updates Chris Wilson
@ 2015-01-27 14:58   ` Daniel Vetter
  2015-01-27 21:44     ` Chris Wilson
  2015-01-28  7:50   ` shuang.he
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2015-01-27 14:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon, Jan 26, 2015 at 10:47:10AM +0000, Chris Wilson wrote:
> An interesting bug occurs on Pineview through which the root cause is
> that the writes of the PTE values into the GTT is not serialised with
> subsequent memory access through the GTT (when using WC updates of the
> PTE values). This is despite there being a posting read after the GTT
> update. However, by changing the address of the posting read, the memory
> access is indeed serialised correctly.
> 
> Whilst we are manipulating the memory barriers, we can remove the
> compiler :memory restraint on the intermediate PTE writes knowing that
> we explicitly perform a posting read afterwards.
> 
> v2: Replace posting reads with explicit write memory barriers - in
> particular this is advantages in case of single page objects. Update
> comments to mention this issue is only with WC writes.
> 
> Testcase: igt/gem_exec_big #pnv
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88191
> Tested-by: huax.lu@intel.com (v1)
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Shouldn't we Cc: stable@vger.kernel.org too?
-Daniel

> ---
>  drivers/char/agp/intel-gtt.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 92aa43fa8d70..0b4188b9af7c 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct agp_memory *mem, off_t pg_start,
>  		intel_private.driver->write_entry(addr,
>  						  i, type);
>  	}
> -	readl(intel_private.gtt+i-1);
> +	wmb();
>  
>  	return 0;
>  }
> @@ -329,7 +329,7 @@ static void i810_write_entry(dma_addr_t addr, unsigned int entry,
>  		break;
>  	}
>  
> -	writel(addr | pte_flags, intel_private.gtt + entry);
> +	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
>  }
>  
>  static const struct aper_size_info_fixed intel_fake_agp_sizes[] = {
> @@ -735,7 +735,7 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
>  	if (flags ==  AGP_USER_CACHED_MEMORY)
>  		pte_flags |= I830_PTE_SYSTEM_CACHED;
>  
> -	writel(addr | pte_flags, intel_private.gtt + entry);
> +	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
>  }
>  
>  bool intel_enable_gtt(void)
> @@ -858,7 +858,7 @@ void intel_gtt_insert_sg_entries(struct sg_table *st,
>  			j++;
>  		}
>  	}
> -	readl(intel_private.gtt+j-1);
> +	wmb();
>  }
>  EXPORT_SYMBOL(intel_gtt_insert_sg_entries);
>  
> @@ -875,7 +875,7 @@ static void intel_gtt_insert_pages(unsigned int first_entry,
>  		intel_private.driver->write_entry(addr,
>  						  j, flags);
>  	}
> -	readl(intel_private.gtt+j-1);
> +	wmb();
>  }
>  
>  static int intel_fake_agp_insert_entries(struct agp_memory *mem,
> @@ -938,7 +938,7 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
>  		intel_private.driver->write_entry(intel_private.scratch_page_dma,
>  						  i, 0);
>  	}
> -	readl(intel_private.gtt+i-1);
> +	wmb();
>  }
>  EXPORT_SYMBOL(intel_gtt_clear_range);
>  
> @@ -1106,7 +1106,7 @@ static void i965_write_entry(dma_addr_t addr,
>  
>  	/* Shift high bits down */
>  	addr |= (addr >> 28) & 0xf0;
> -	writel(addr | pte_flags, intel_private.gtt + entry);
> +	writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
>  }
>  
>  static int i9xx_setup(void)
> -- 
> 2.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
  2015-01-15  9:45   ` Daniel, Thomas
  2015-01-26  8:57   ` Jani Nikula
@ 2015-01-27 15:09   ` Daniel Vetter
  2015-01-27 21:43     ` Chris Wilson
  2 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2015-01-27 15:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Jan 14, 2015 at 11:20:56AM +0000, Chris Wilson wrote:
> If the batch buffer is too large to fit into the aperture and we need a
> GTT mapping for relocations, we currently fail. This only applies to a
> subset of machines for a subset of environments, quite undesirable. We
> can simply check after failing to insert the batch into the GTT as to
> whether we only need a mappable binding for relocation and, if so, we can
> revert to using a non-mappable binding and an alternate relocation
> method. However, using relocate_entry_cpu() is excruciatingly slow for
> large buffers on non-LLC as the entire buffer requires clflushing before
> and after the relocation handling. Alternatively, we can implement a
> third relocation method that only clflushes around the relocation entry.
> This is still slower than updating through the GTT, so we prefer using
> the GTT where possible, but is orders of magnitude faster as we
> typically do not have to then clflush the entire buffer.
> 
> An alternative idea of using a temporary WC mapping of the backing store
> is promising (it should be faster than using the GTT itself), but
> requires fairly extensive arch/x86 support - along the lines of
> kmap_atomic_prof_pfn() (which is not universally implemented even for
> x86).
> 
> Testcase: igt/gem_exec_big #pnv,byt
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 84 +++++++++++++++++++++++++-----
>  1 file changed, 72 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index e3ef17783765..06bdf7670584 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -251,7 +251,6 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
>  {
>  	return (HAS_LLC(obj->base.dev) ||
>  		obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
> -		!obj->map_and_fenceable ||
>  		obj->cache_level != I915_CACHE_NONE);
>  }
>  
> @@ -337,6 +336,51 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
>  	return 0;
>  }
>  
> +static void
> +clflush_write32(void *addr, uint32_t value)
> +{
> +	/* This is not a fast path, so KISS. */
> +	drm_clflush_virt_range(addr, sizeof(uint32_t));
> +	*(uint32_t *)addr = value;
> +	drm_clflush_virt_range(addr, sizeof(uint32_t));
> +}
> +
> +static int
> +relocate_entry_clflush(struct drm_i915_gem_object *obj,
> +		       struct drm_i915_gem_relocation_entry *reloc,
> +		       uint64_t target_offset)
> +{
> +	struct drm_device *dev = obj->base.dev;
> +	uint32_t page_offset = offset_in_page(reloc->offset);
> +	uint64_t delta = (int)reloc->delta + target_offset;
> +	char *vaddr;
> +	int ret;
> +
> +	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> +	if (ret)
> +		return ret;
> +
> +	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
> +				reloc->offset >> PAGE_SHIFT));
> +	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
> +
> +	if (INTEL_INFO(dev)->gen >= 8) {
> +		page_offset = offset_in_page(page_offset + sizeof(uint32_t));
> +
> +		if (page_offset == 0) {
> +			kunmap_atomic(vaddr);
> +			vaddr = kmap_atomic(i915_gem_object_get_page(obj,
> +			    (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT));
> +		}
> +
> +		clflush_write32(vaddr + page_offset, upper_32_bits(delta));
> +	}
> +
> +	kunmap_atomic(vaddr);
> +
> +	return 0;
> +}
> +
>  static int
>  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  				   struct eb_vmas *eb,
> @@ -426,9 +470,12 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>  
>  	if (use_cpu_reloc(obj))
>  		ret = relocate_entry_cpu(obj, reloc, target_offset);
> -	else
> +	else if (obj->map_and_fenceable)
>  		ret = relocate_entry_gtt(obj, reloc, target_offset);
> -
> +	else if (cpu_has_clflush)
> +		ret = relocate_entry_clflush(obj, reloc, target_offset);
> +	else
> +		ret = -ENODEV;

I think a DRM_ERROR here would be good since this should never happen. And
why don't we have an errno for -EKERNELBUG ...

>  	if (ret)
>  		return ret;
>  
> @@ -525,6 +572,12 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
>  	return ret;
>  }
>  
> +static bool only_mappable_for_reloc(unsigned int flags)
> +{
> +	return (flags & (EXEC_OBJECT_NEEDS_FENCE | __EXEC_OBJECT_NEEDS_MAP)) ==
> +		__EXEC_OBJECT_NEEDS_MAP;
> +}

I'm slightly freaked out by us encoding this here without making it
explicit in the flags. But I couldn't come up with a better idea?

> +
>  static int
>  i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  				struct intel_engine_cs *ring,
> @@ -536,14 +589,21 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>  	int ret;
>  
>  	flags = 0;
> -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> -		flags |= PIN_GLOBAL | PIN_MAPPABLE;
> -	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> -		flags |= PIN_GLOBAL;
> -	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> -		flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> +	if (!drm_mm_node_allocated(&vma->node)) {
> +		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> +			flags |= PIN_GLOBAL | PIN_MAPPABLE;
> +		if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> +			flags |= PIN_GLOBAL;
> +		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> +			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> +	}

Hm, aren't we always calling reserve un buffers we know we've just
unbound? Keeping the flags computation would at least be a good selfcheck
about the consistency of our placing decisions, so I'd like to keep it.

>  
>  	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> +	if ((ret == -ENOSPC  || ret == -E2BIG) &&
> +	    only_mappable_for_reloc(entry->flags))
> +		ret = i915_gem_object_pin(obj, vma->vm,
> +					  entry->alignment,
> +					  flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
>  	if (ret)
>  		return ret;
>  
> @@ -605,13 +665,13 @@ eb_vma_misplaced(struct i915_vma *vma)
>  	    vma->node.start & (entry->alignment - 1))
>  		return true;
>  
> -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> -		return true;
> -
>  	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
>  	    vma->node.start < BATCH_OFFSET_BIAS)
>  		return true;
>  
> +	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> +		return !only_mappable_for_reloc(entry->flags);

Hm, this seems to contradict your commit message a bit since it'll result
in a behavioural change of what we try to push into mappable for relocs.

Shouldn't we instead just clear the NEEDS_MAP flag in reserve_vma when the
mappable pin fails and we fall back?

Besides the nitpicks on integration lgtm.
-Daniel

> +
>  	return false;
>  }
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-27 15:09   ` Daniel Vetter
@ 2015-01-27 21:43     ` Chris Wilson
  2015-01-28  9:14       ` Daniel Vetter
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2015-01-27 21:43 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Jan 27, 2015 at 04:09:37PM +0100, Daniel Vetter wrote:
> On Wed, Jan 14, 2015 at 11:20:56AM +0000, Chris Wilson wrote:
> >  static int
> >  i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> >  				struct intel_engine_cs *ring,
> > @@ -536,14 +589,21 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> >  	int ret;
> >  
> >  	flags = 0;
> > -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> > -		flags |= PIN_GLOBAL | PIN_MAPPABLE;
> > -	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> > -		flags |= PIN_GLOBAL;
> > -	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> > -		flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> > +	if (!drm_mm_node_allocated(&vma->node)) {
> > +		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> > +			flags |= PIN_GLOBAL | PIN_MAPPABLE;
> > +		if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> > +			flags |= PIN_GLOBAL;
> > +		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> > +			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> > +	}
> 
> Hm, aren't we always calling reserve un buffers we know we've just
> unbound? Keeping the flags computation would at least be a good selfcheck
> about the consistency of our placing decisions, so I'd like to keep it.
> 
> >  
> >  	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> > +	if ((ret == -ENOSPC  || ret == -E2BIG) &&
> > +	    only_mappable_for_reloc(entry->flags))
> > +		ret = i915_gem_object_pin(obj, vma->vm,
> > +					  entry->alignment,
> > +					  flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -605,13 +665,13 @@ eb_vma_misplaced(struct i915_vma *vma)
> >  	    vma->node.start & (entry->alignment - 1))
> >  		return true;
> >  
> > -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> > -		return true;
> > -
> >  	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
> >  	    vma->node.start < BATCH_OFFSET_BIAS)
> >  		return true;
> >  
> > +	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> > +		return !only_mappable_for_reloc(entry->flags);
> 
> Hm, this seems to contradict your commit message a bit since it'll result
> in a behavioural change of what we try to push into mappable for relocs.
> 
> Shouldn't we instead just clear the NEEDS_MAP flag in reserve_vma when the
> mappable pin fails and we fall back?

This pair of chunks is required to prevent the ping-pong you just
described, which was very slow.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] agp/intel: Serialise after GTT updates
  2015-01-27 14:58   ` Daniel Vetter
@ 2015-01-27 21:44     ` Chris Wilson
  2015-01-28  9:15       ` Daniel Vetter
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2015-01-27 21:44 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Tue, Jan 27, 2015 at 03:58:05PM +0100, Daniel Vetter wrote:
> On Mon, Jan 26, 2015 at 10:47:10AM +0000, Chris Wilson wrote:
> > An interesting bug occurs on Pineview through which the root cause is
> > that the writes of the PTE values into the GTT is not serialised with
> > subsequent memory access through the GTT (when using WC updates of the
> > PTE values). This is despite there being a posting read after the GTT
> > update. However, by changing the address of the posting read, the memory
> > access is indeed serialised correctly.
> > 
> > Whilst we are manipulating the memory barriers, we can remove the
> > compiler :memory restraint on the intermediate PTE writes knowing that
> > we explicitly perform a posting read afterwards.
> > 
> > v2: Replace posting reads with explicit write memory barriers - in
> > particular this is advantages in case of single page objects. Update
> > comments to mention this issue is only with WC writes.
> > 
> > Testcase: igt/gem_exec_big #pnv
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88191
> > Tested-by: huax.lu@intel.com (v1)
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Shouldn't we Cc: stable@vger.kernel.org too?

Yes, if we can narrow down a user bug it fixes, definitely.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] agp/intel: Serialise after GTT updates
  2015-01-26 10:47 ` [PATCH v2] agp/intel: Serialise after GTT updates Chris Wilson
  2015-01-27 14:58   ` Daniel Vetter
@ 2015-01-28  7:50   ` shuang.he
  1 sibling, 0 replies; 31+ messages in thread
From: shuang.he @ 2015-01-28  7:50 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5643
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -1              353/353              352/353
ILK                                  353/353              353/353
SNB                                  400/422              400/422
IVB              +1-1              485/487              485/487
BYT                                  296/296              296/296
HSW              +1-2              507/508              506/508
BDW                 -3              401/402              398/402
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gem_concurrent_blit_cpu-bcs-overwrite-source-interruptible      PASS(2, M25)      NO_RESULT(1, M25)
 IVB  igt_gem_storedw_batches_loop_normal      DMESG_WARN(2, M34M4)PASS(5, M34M4M21)      PASS(1, M34)
*IVB  igt_gem_userptr_blits_forked-unsync-swapping-mempressure-interruptible      PASS(2, M34)      NO_RESULT(1, M34)
 HSW  igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance      DMESG_WARN(1, M40)PASS(7, M40M20)      PASS(1, M20)
 HSW  igt_gem_storedw_loop_vebox      DMESG_WARN(1, M20)PASS(2, M40M20)      DMESG_WARN(1, M20)
*HSW  igt_kms_flip_nonexisting-fb      PASS(2, M40M20)      NO_RESULT(1, M20)
*BDW  igt_template_B      PASS(2, M30M28)      NO_RESULT(1, M28)
 BDW  igt_gem_pwrite_pread_display-pwrite-blt-gtt_mmap-performance      DMESG_WARN(1, M28)PASS(1, M30)      DMESG_WARN(1, M28)
*BDW  igt_gem_pwrite_pread_uncached-pwrite-blt-gtt_mmap-performance      PASS(2, M30M28)      DMESG_WARN(1, M28)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-27 21:43     ` Chris Wilson
@ 2015-01-28  9:14       ` Daniel Vetter
  2015-01-28  9:34         ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2015-01-28  9:14 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Tue, Jan 27, 2015 at 09:43:00PM +0000, Chris Wilson wrote:
> On Tue, Jan 27, 2015 at 04:09:37PM +0100, Daniel Vetter wrote:
> > On Wed, Jan 14, 2015 at 11:20:56AM +0000, Chris Wilson wrote:
> > >  static int
> > >  i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> > >  				struct intel_engine_cs *ring,
> > > @@ -536,14 +589,21 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> > >  	int ret;
> > >  
> > >  	flags = 0;
> > > -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> > > -		flags |= PIN_GLOBAL | PIN_MAPPABLE;
> > > -	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> > > -		flags |= PIN_GLOBAL;
> > > -	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> > > -		flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> > > +	if (!drm_mm_node_allocated(&vma->node)) {
> > > +		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> > > +			flags |= PIN_GLOBAL | PIN_MAPPABLE;
> > > +		if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> > > +			flags |= PIN_GLOBAL;
> > > +		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> > > +			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> > > +	}
> > 
> > Hm, aren't we always calling reserve un buffers we know we've just
> > unbound? Keeping the flags computation would at least be a good selfcheck
> > about the consistency of our placing decisions, so I'd like to keep it.

I still think this isn't required, even with the ping-pong preventer below
kept. Maybe add a WARN_ON(drm_mm_node_allocated); just for paranoia
instead?

> > 
> > >  
> > >  	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> > > +	if ((ret == -ENOSPC  || ret == -E2BIG) &&
> > > +	    only_mappable_for_reloc(entry->flags))
> > > +		ret = i915_gem_object_pin(obj, vma->vm,
> > > +					  entry->alignment,
> > > +					  flags & ~(PIN_GLOBAL | PIN_MAPPABLE));
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > @@ -605,13 +665,13 @@ eb_vma_misplaced(struct i915_vma *vma)
> > >  	    vma->node.start & (entry->alignment - 1))
> > >  		return true;
> > >  
> > > -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> > > -		return true;
> > > -
> > >  	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS &&
> > >  	    vma->node.start < BATCH_OFFSET_BIAS)
> > >  		return true;
> > >  
> > > +	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> > > +		return !only_mappable_for_reloc(entry->flags);
> > 
> > Hm, this seems to contradict your commit message a bit since it'll result
> > in a behavioural change of what we try to push into mappable for relocs.
> > 
> > Shouldn't we instead just clear the NEEDS_MAP flag in reserve_vma when the
> > mappable pin fails and we fall back?
> 
> This pair of chunks is required to prevent the ping-pong you just
> described, which was very slow.

Makes sense, but imo then requires a comment (e.g. "prevent costly
ping-pong just for relocations") and some words in the commmit message.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] agp/intel: Serialise after GTT updates
  2015-01-27 21:44     ` Chris Wilson
@ 2015-01-28  9:15       ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-01-28  9:15 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Daniel Vetter, Jani Nikula

On Tue, Jan 27, 2015 at 09:44:01PM +0000, Chris Wilson wrote:
> On Tue, Jan 27, 2015 at 03:58:05PM +0100, Daniel Vetter wrote:
> > On Mon, Jan 26, 2015 at 10:47:10AM +0000, Chris Wilson wrote:
> > > An interesting bug occurs on Pineview through which the root cause is
> > > that the writes of the PTE values into the GTT is not serialised with
> > > subsequent memory access through the GTT (when using WC updates of the
> > > PTE values). This is despite there being a posting read after the GTT
> > > update. However, by changing the address of the posting read, the memory
> > > access is indeed serialised correctly.
> > > 
> > > Whilst we are manipulating the memory barriers, we can remove the
> > > compiler :memory restraint on the intermediate PTE writes knowing that
> > > we explicitly perform a posting read afterwards.
> > > 
> > > v2: Replace posting reads with explicit write memory barriers - in
> > > particular this is advantages in case of single page objects. Update
> > > comments to mention this issue is only with WC writes.
> > > 
> > > Testcase: igt/gem_exec_big #pnv
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88191
> > > Tested-by: huax.lu@intel.com (v1)
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Shouldn't we Cc: stable@vger.kernel.org too?
> 
> Yes, if we can narrow down a user bug it fixes, definitely.

Right makes sense, so meanwhile queued for next only.
-Daneil
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers
  2015-01-28  9:14       ` Daniel Vetter
@ 2015-01-28  9:34         ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2015-01-28  9:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Jan 28, 2015 at 10:14:47AM +0100, Daniel Vetter wrote:
> On Tue, Jan 27, 2015 at 09:43:00PM +0000, Chris Wilson wrote:
> > On Tue, Jan 27, 2015 at 04:09:37PM +0100, Daniel Vetter wrote:
> > > On Wed, Jan 14, 2015 at 11:20:56AM +0000, Chris Wilson wrote:
> > > >  static int
> > > >  i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> > > >  				struct intel_engine_cs *ring,
> > > > @@ -536,14 +589,21 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> > > >  	int ret;
> > > >  
> > > >  	flags = 0;
> > > > -	if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> > > > -		flags |= PIN_GLOBAL | PIN_MAPPABLE;
> > > > -	if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> > > > -		flags |= PIN_GLOBAL;
> > > > -	if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> > > > -		flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> > > > +	if (!drm_mm_node_allocated(&vma->node)) {
> > > > +		if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> > > > +			flags |= PIN_GLOBAL | PIN_MAPPABLE;
> > > > +		if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> > > > +			flags |= PIN_GLOBAL;
> > > > +		if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> > > > +			flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> > > > +	}
> > > 
> > > Hm, aren't we always calling reserve un buffers we know we've just
> > > unbound? Keeping the flags computation would at least be a good selfcheck
> > > about the consistency of our placing decisions, so I'd like to keep it.
> 
> I still think this isn't required, even with the ping-pong preventer below
> kept. Maybe add a WARN_ON(drm_mm_node_allocated); just for paranoia
> instead?

The issue I have with this chunk here is that we have two not quite
duplicate pieces of code deciding when to migrate an object:
i915_vma_misplaced() vs eb_vma_misplaced().

Maybe a __i915_vma_pin(), but as it stands atm
i915_gem_object_pin_view() requires some TLC first.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] agp/intel: Serialise after GTT updates
  2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
                   ` (4 preceding siblings ...)
  2015-01-26 10:47 ` [PATCH v2] agp/intel: Serialise after GTT updates Chris Wilson
@ 2015-02-06  0:11 ` Jesse Barnes
  2015-02-06  8:31   ` Chris Wilson
  2015-02-06  8:32   ` Daniel Vetter
  5 siblings, 2 replies; 31+ messages in thread
From: Jesse Barnes @ 2015-02-06  0:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, 14 Jan 2015 11:20:55 +0000
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> diff --git a/drivers/char/agp/intel-gtt.c
> b/drivers/char/agp/intel-gtt.c index 92aa43fa8d70..15685ca39193 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct
> agp_memory *mem, off_t pg_start,
> intel_private.driver->write_entry(addr, i, type);
>  	}
> -	readl(intel_private.gtt+i-1);
> +	readl(intel_private.gtt+pg_start);

Any idea why?  This one scares me...  is it that the read is being
serviced from the WC buffer w/o being flushed?  Or is the compiler
optimizing the last read based on the previous write?

Writing a non-sequential address should also cause a flush, but I don't
remember the rules for reads.  We should get this figured out while we
have an easy way to reproduce and a willing tester.

Thanks,
Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] agp/intel: Serialise after GTT updates
  2015-02-06  0:11 ` [PATCH 1/5] " Jesse Barnes
@ 2015-02-06  8:31   ` Chris Wilson
  2015-02-06  8:32   ` Daniel Vetter
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2015-02-06  8:31 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx

On Thu, Feb 05, 2015 at 04:11:00PM -0800, Jesse Barnes wrote:
> On Wed, 14 Jan 2015 11:20:55 +0000
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > diff --git a/drivers/char/agp/intel-gtt.c
> > b/drivers/char/agp/intel-gtt.c index 92aa43fa8d70..15685ca39193 100644
> > --- a/drivers/char/agp/intel-gtt.c
> > +++ b/drivers/char/agp/intel-gtt.c
> > @@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct
> > agp_memory *mem, off_t pg_start,
> > intel_private.driver->write_entry(addr, i, type);
> >  	}
> > -	readl(intel_private.gtt+i-1);
> > +	readl(intel_private.gtt+pg_start);
> 
> Any idea why?  This one scares me...  is it that the read is being
> serviced from the WC buffer w/o being flushed?  Or is the compiler
> optimizing the last read based on the previous write?

The former. I checked the assembly to verify that the compiler wasn't
playing tricks.
-Chrier

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] agp/intel: Serialise after GTT updates
  2015-02-06  0:11 ` [PATCH 1/5] " Jesse Barnes
  2015-02-06  8:31   ` Chris Wilson
@ 2015-02-06  8:32   ` Daniel Vetter
  2015-02-13  8:59     ` Ville Syrjälä
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Vetter @ 2015-02-06  8:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx

On Thu, Feb 05, 2015 at 04:11:00PM -0800, Jesse Barnes wrote:
> On Wed, 14 Jan 2015 11:20:55 +0000
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > diff --git a/drivers/char/agp/intel-gtt.c
> > b/drivers/char/agp/intel-gtt.c index 92aa43fa8d70..15685ca39193 100644
> > --- a/drivers/char/agp/intel-gtt.c
> > +++ b/drivers/char/agp/intel-gtt.c
> > @@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct
> > agp_memory *mem, off_t pg_start,
> > intel_private.driver->write_entry(addr, i, type);
> >  	}
> > -	readl(intel_private.gtt+i-1);
> > +	readl(intel_private.gtt+pg_start);
> 
> Any idea why?  This one scares me...  is it that the read is being
> serviced from the WC buffer w/o being flushed?  Or is the compiler
> optimizing the last read based on the previous write?
> 
> Writing a non-sequential address should also cause a flush, but I don't
> remember the rules for reads.  We should get this figured out while we
> have an easy way to reproduce and a willing tester.

Yeah agreed, but apparently a full mb(); is good enough too. So that's
what has landed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] agp/intel: Serialise after GTT updates
  2015-02-06  8:32   ` Daniel Vetter
@ 2015-02-13  8:59     ` Ville Syrjälä
  2015-02-13  9:25       ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Ville Syrjälä @ 2015-02-13  8:59 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Feb 06, 2015 at 09:32:29AM +0100, Daniel Vetter wrote:
> On Thu, Feb 05, 2015 at 04:11:00PM -0800, Jesse Barnes wrote:
> > On Wed, 14 Jan 2015 11:20:55 +0000
> > Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > 
> > > diff --git a/drivers/char/agp/intel-gtt.c
> > > b/drivers/char/agp/intel-gtt.c index 92aa43fa8d70..15685ca39193 100644
> > > --- a/drivers/char/agp/intel-gtt.c
> > > +++ b/drivers/char/agp/intel-gtt.c
> > > @@ -225,7 +225,7 @@ static int i810_insert_dcache_entries(struct
> > > agp_memory *mem, off_t pg_start,
> > > intel_private.driver->write_entry(addr, i, type);
> > >  	}
> > > -	readl(intel_private.gtt+i-1);
> > > +	readl(intel_private.gtt+pg_start);
> > 
> > Any idea why?  This one scares me...  is it that the read is being
> > serviced from the WC buffer w/o being flushed?  Or is the compiler
> > optimizing the last read based on the previous write?
> > 
> > Writing a non-sequential address should also cause a flush, but I don't
> > remember the rules for reads.  We should get this figured out while we
> > have an easy way to reproduce and a willing tester.
> 
> Yeah agreed, but apparently a full mb(); is good enough too. So that's
> what has landed.

I was first wondering if we need something like this for gen6+ too, but
then I relalized the UC GFX_FLUSH_CNTL access should cause the WC flush
already.

Hmm, except we don't do it for clear_range(), but I guess that's not a
huge issue, just means someone could clobber some other memory besides
the scratch page if accidentally writing to a cleared area of the ggtt.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] agp/intel: Serialise after GTT updates
  2015-02-13  8:59     ` Ville Syrjälä
@ 2015-02-13  9:25       ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2015-02-13  9:25 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 10:59:42AM +0200, Ville Syrjälä wrote:
> Hmm, except we don't do it for clear_range(), but I guess that's not a
> huge issue, just means someone could clobber some other memory besides
> the scratch page if accidentally writing to a cleared area of the ggtt.

It's a very small hole, only since it is the ggtt the only user is the
kernel. (Except for i915.enable_ppgtt=0.) It is possible to forgo that
clear entirely (looks at the PD implementation for a current violator,
and that is a good example for where the extra WC writes can be have 
easurable impact on synmark).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] drm/i915: Trim the command parser allocations
  2015-01-14 11:20 ` [PATCH 3/5] drm/i915: Trim the command parser allocations Chris Wilson
@ 2015-02-13 13:08   ` John Harrison
  2015-02-13 13:23     ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: John Harrison @ 2015-02-13 13:08 UTC (permalink / raw)
  To: intel-gfx

Hello,

Apparently, I've been volunteered to review these patches despite not 
knowing too much about these areas of the driver...

On 14/01/2015 11:20, Chris Wilson wrote:
> Currently, the command parser tries to create a secondary batch exactly
> as large as the original, and vmap both. This is open to abuse by
> userspace using extremely large batch objects, but only executing very
> short batches. For example, this would be if userspace were to implement
> a command submission ringbuffer. However, we only need to allocate pages
> for just the contents of the command sequence in the batch - all
> relocations copied to the secondary batch will reference the original
> batch and so there can be no access to the secondary batch outside of
> the explicit execution region.
>
> Testcase: igt/gem_exec_big #ivb,byt,hsw
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_cmd_parser.c     | 74 ++++++++++++++----------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 74 ++++++++++++++++--------------
>   2 files changed, 73 insertions(+), 75 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 806e812340d0..9a6da3536ae5 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -818,24 +818,26 @@ static bool valid_reg(const u32 *table, int count, u32 addr)
>   	return false;
>   }
>   
> -static u32 *vmap_batch(struct drm_i915_gem_object *obj)
> +static u32 *vmap_batch(struct drm_i915_gem_object *obj,
> +		       unsigned start, unsigned len)
>   {
>   	int i;
>   	void *addr = NULL;
>   	struct sg_page_iter sg_iter;
> +	int first_page = start >> PAGE_SHIFT;
> +	int last_page = (len + start + 4095) >> PAGE_SHIFT;
> +	int npages = last_page - first_page;
>   	struct page **pages;
>   
> -	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
> +	pages = drm_malloc_ab(npages, sizeof(*pages));
>   	if (pages == NULL) {
>   		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
>   		goto finish;
>   	}
>   
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		pages[i] = sg_page_iter_page(&sg_iter);
> -		i++;
> -	}
> +	for_each_sg_page(obj->pages->sgl, &sg_iter, npages, first_page)
> +		pages[i++] = sg_page_iter_page(&sg_iter);
>   
>   	addr = vmap(pages, i, 0, PAGE_KERNEL);
>   	if (addr == NULL) {
> @@ -855,61 +857,61 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
>   		       u32 batch_start_offset,
>   		       u32 batch_len)
>   {
> -	int ret = 0;
>   	int needs_clflush = 0;
> -	u32 *src_base, *dest_base = NULL;
> -	u32 *src_addr, *dest_addr;
> -	u32 offset = batch_start_offset / sizeof(*dest_addr);
> -	u32 end = batch_start_offset + batch_len;
> +	void *src_base, *src;
> +	void *dst = NULL;
> +	int ret;
>   
> -	if (end > dest_obj->base.size || end > src_obj->base.size)
> +	if (batch_len > dest_obj->base.size ||
> +	    batch_len + batch_start_offset > src_obj->base.size)
>   		return ERR_PTR(-E2BIG);
>   
>   	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
>   	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> +		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
>   		return ERR_PTR(ret);
>   	}
>   
> -	src_base = vmap_batch(src_obj);
> +	src_base = vmap_batch(src_obj, batch_start_offset, batch_len);
>   	if (!src_base) {
>   		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
>   		ret = -ENOMEM;
>   		goto unpin_src;
>   	}
>   
> -	src_addr = src_base + offset;
> -
> -	if (needs_clflush)
> -		drm_clflush_virt_range((char *)src_addr, batch_len);
> +	ret = i915_gem_object_get_pages(dest_obj);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to get pages for shadow batch\n");
> +		goto unmap_src;
> +	}
> +	i915_gem_object_pin_pages(dest_obj);
>   
>   	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
>   	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> +		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
>   		goto unmap_src;
>   	}
>   
> -	dest_base = vmap_batch(dest_obj);
> -	if (!dest_base) {
> +	dst = vmap_batch(dest_obj, 0, batch_len);
> +	if (!dst) {
>   		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> +		i915_gem_object_unpin_pages(dest_obj);
>   		ret = -ENOMEM;
>   		goto unmap_src;
>   	}
>   
> -	dest_addr = dest_base + offset;
> -
> -	if (batch_start_offset != 0)
> -		memset((u8 *)dest_base, 0, batch_start_offset);
> +	src = src_base + offset_in_page(batch_start_offset);
> +	if (needs_clflush)
> +		drm_clflush_virt_range(src, batch_len);
>   
> -	memcpy(dest_addr, src_addr, batch_len);
> -	memset((u8 *)dest_addr + batch_len, 0, dest_obj->base.size - end);
> +	memcpy(dst, src, batch_len);
>   
>   unmap_src:
>   	vunmap(src_base);
>   unpin_src:
>   	i915_gem_object_unpin_pages(src_obj);
>   
> -	return ret ? ERR_PTR(ret) : dest_base;
> +	return ret ? ERR_PTR(ret) : dst;
>   }
>   
>   /**
> @@ -1046,34 +1048,26 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>   		    u32 batch_len,
>   		    bool is_master)
>   {
> -	int ret = 0;
>   	u32 *cmd, *batch_base, *batch_end;
>   	struct drm_i915_cmd_descriptor default_desc = { 0 };
>   	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
> -
> -	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to pin shadow batch\n");
> -		return -1;
> -	}
> +	int ret = 0;
>   
>   	batch_base = copy_batch(shadow_batch_obj, batch_obj,
>   				batch_start_offset, batch_len);
>   	if (IS_ERR(batch_base)) {
>   		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
> -		i915_gem_object_ggtt_unpin(shadow_batch_obj);
>   		return PTR_ERR(batch_base);
>   	}
>   
> -	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> -
>   	/*
>   	 * We use the batch length as size because the shadow object is as
>   	 * large or larger and copy_batch() will write MI_NOPs to the extra
>   	 * space. Parsing should be faster in some cases this way.
>   	 */
> -	batch_end = cmd + (batch_len / sizeof(*batch_end));
> +	batch_end = batch_base + (batch_len / sizeof(*batch_end));
>   
> +	cmd = batch_base;
>   	while (cmd < batch_end) {
>   		const struct drm_i915_cmd_descriptor *desc;
>   		u32 length;
> @@ -1132,7 +1126,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>   	}
>   
>   	vunmap(batch_base);
> -	i915_gem_object_ggtt_unpin(shadow_batch_obj);
> +	i915_gem_object_unpin_pages(shadow_batch_obj);
>   
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 06bdf7670584..3fbd5212225f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1136,16 +1136,15 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
>   			  struct drm_i915_gem_object *batch_obj,
>   			  u32 batch_start_offset,
>   			  u32 batch_len,
> -			  bool is_master,
> -			  u32 *flags)
> +			  bool is_master)
>   {
>   	struct drm_i915_private *dev_priv = to_i915(batch_obj->base.dev);
>   	struct drm_i915_gem_object *shadow_batch_obj;
> -	bool need_reloc = false;
> +	struct i915_vma *vma;
>   	int ret;
>   
>   	shadow_batch_obj = i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
> -						   batch_obj->base.size);
> +						   PAGE_ALIGN(batch_len));
>   	if (IS_ERR(shadow_batch_obj))
>   		return shadow_batch_obj;
>   
> @@ -1155,40 +1154,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
>   			      batch_start_offset,
>   			      batch_len,
>   			      is_master);
> -	if (ret) {
> -		if (ret == -EACCES)
> -			return batch_obj;
> -	} else {
> -		struct i915_vma *vma;
> +	if (ret)
> +		goto err;
>   
> -		memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
> +	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
There is no explicit unpin for this. Does it happen automatically due to 
adding the vma to the eb->vmas list?

Also, does it matter that it will be pinned again (and explicitly 
unpinned) if the SECURE flag is set?


> +	if (ret)
> +		goto err;
>   
> -		vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
> -		vma->exec_entry = shadow_exec_entry;
> -		vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE;
> -		drm_gem_object_reference(&shadow_batch_obj->base);
> -		i915_gem_execbuffer_reserve_vma(vma, ring, &need_reloc);
> -		list_add_tail(&vma->exec_list, &eb->vmas);
> +	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
>   
> -		shadow_batch_obj->base.pending_read_domains =
> -			batch_obj->base.pending_read_domains;
> +	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
> +	vma->exec_entry = shadow_exec_entry;
> +	vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE | __EXEC_OBJECT_HAS_PIN;
> +	drm_gem_object_reference(&shadow_batch_obj->base);
> +	list_add_tail(&vma->exec_list, &eb->vmas);
>   
> -		/*
> -		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
> -		 * bit from MI_BATCH_BUFFER_START commands issued in the
> -		 * dispatch_execbuffer implementations. We specifically
> -		 * don't want that set when the command parser is
> -		 * enabled.
> -		 *
> -		 * FIXME: with aliasing ppgtt, buffers that should only
> -		 * be in ggtt still end up in the aliasing ppgtt. remove
> -		 * this check when that is fixed.
> -		 */
> -		if (USES_FULL_PPGTT(dev))
> -			*flags |= I915_DISPATCH_SECURE;
> -	}
> +	shadow_batch_obj->base.pending_read_domains = I915_GEM_DOMAIN_COMMAND;
> +
> +	return shadow_batch_obj;
>   
> -	return ret ? ERR_PTR(ret) : shadow_batch_obj;
> +err:
> +	if (ret == -EACCES) /* unhandled chained batch */
> +		return batch_obj;
> +	else
> +		return ERR_PTR(ret);
>   }
>   
>   int
> @@ -1532,12 +1521,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   						      batch_obj,
>   						      args->batch_start_offset,
>   						      args->batch_len,
> -						      file->is_master,
> -						      &flags);
> +						      file->is_master);
>   		if (IS_ERR(batch_obj)) {
>   			ret = PTR_ERR(batch_obj);
>   			goto err;
>   		}
> +
> +		/*
> +		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
> +		 * bit from MI_BATCH_BUFFER_START commands issued in the
> +		 * dispatch_execbuffer implementations. We specifically
> +		 * don't want that set when the command parser is
> +		 * enabled.
> +		 *
> +		 * FIXME: with aliasing ppgtt, buffers that should only
> +		 * be in ggtt still end up in the aliasing ppgtt. remove
> +		 * this check when that is fixed.
> +		 */
> +		if (USES_FULL_PPGTT(dev))
> +			flags |= I915_DISPATCH_SECURE;
> +
> +		exec_start = 0;
>   	}
>   
>   	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] drm/i915: Trim the command parser allocations
  2015-02-13 13:08   ` John Harrison
@ 2015-02-13 13:23     ` Chris Wilson
  2015-02-13 16:43       ` John Harrison
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2015-02-13 13:23 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 01:08:59PM +0000, John Harrison wrote:
> >@@ -1155,40 +1154,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
> >  			      batch_start_offset,
> >  			      batch_len,
> >  			      is_master);
> >-	if (ret) {
> >-		if (ret == -EACCES)
> >-			return batch_obj;
> >-	} else {
> >-		struct i915_vma *vma;
> >+	if (ret)
> >+		goto err;
> >-		memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
> >+	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
> There is no explicit unpin for this. Does it happen automatically
> due to adding the vma to the eb->vmas list?

We set the exec_flag that tells us to unpin the obj when unwinding the
execbuf.
 
> Also, does it matter that it will be pinned again (and explicitly
> unpinned) if the SECURE flag is set?

No, pin/unpin is just a counter, it just needs to be balanced. (Long
answer, yes, the restrictions given to both pin requests much match or
else we will attempt to repin the buffer and fail miserably as the
object is already pinned.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-01-14 11:20 ` [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
@ 2015-02-13 13:33   ` John Harrison
  2015-02-13 13:35   ` John Harrison
  1 sibling, 0 replies; 31+ messages in thread
From: John Harrison @ 2015-02-13 13:33 UTC (permalink / raw)
  To: intel-gfx

On 14/01/2015 11:20, Chris Wilson wrote:
> The biggest user of i915_gem_object_get_page() is the relocation
> processing during execbuffer. Typically userspace passes in a set of
> relocations in sorted order. Sadly, we alternate between relocations
> increasing from the start of the buffers, and relocations decreasing
> from the end. However the majority of consecutive lookups will still be
> in the same page. We could cache the start of the last sg chain, however
> for most callers, the entire sgl is inside a single chain and so we see
> no improve from the extra layer of caching.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
>   drivers/gpu/drm/i915/i915_gem.c |  4 ++++
>   2 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 66f0c607dbef..04a7d594d933 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2005,6 +2005,10 @@ struct drm_i915_gem_object {
>   
>   	struct sg_table *pages;
>   	int pages_pin_count;
> +	struct get_page {
> +		struct scatterlist *sg;
> +		int last;
> +	} get_page;
>   
>   	/* prime dma-buf support */
>   	void *dma_buf_vmapping;
> @@ -2612,15 +2616,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>   
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> -static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
> +
> +static inline int sg_page_count(struct scatterlist *sg)
> +{
> +	return PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT;
> +}
> +
> +static inline struct page *
> +i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
>   {
> -	struct sg_page_iter sg_iter;
> +	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
> +		return NULL;
>   
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, n)
> -		return sg_page_iter_page(&sg_iter);
> +	if (n < obj->get_page.last) {
> +		obj->get_page.sg = obj->pages->sgl;
> +		obj->get_page.last = 0;
> +	}
> +
> +	while (obj->get_page.last + sg_page_count(obj->get_page.sg) <= n) {
> +		obj->get_page.last += sg_page_count(obj->get_page.sg);
> +		if (unlikely(sg_is_chain(++obj->get_page.sg)))
Is it safe to do the ++ inside a nested pair of macros? There is at 
least one definition of 'unlikely' in the linux source that would cause 
multiple evaluations.
> +			obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
> +	}
>   
> -	return NULL;
> +	return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
>   }
> +
>   static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
>   {
>   	BUG_ON(obj->pages == NULL);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6c403654e33a..d710da099bdb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2260,6 +2260,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
>   		return ret;
>   
>   	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> +
> +	obj->get_page.sg = obj->pages->sgl;
> +	obj->get_page.last = 0;
> +
>   	return 0;
>   }
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-01-14 11:20 ` [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
  2015-02-13 13:33   ` John Harrison
@ 2015-02-13 13:35   ` John Harrison
  2015-02-13 14:28     ` Chris Wilson
  1 sibling, 1 reply; 31+ messages in thread
From: John Harrison @ 2015-02-13 13:35 UTC (permalink / raw)
  To: intel-gfx

Accidentally hit send too early, ignore the other reply!

On 14/01/2015 11:20, Chris Wilson wrote:
> The biggest user of i915_gem_object_get_page() is the relocation
> processing during execbuffer. Typically userspace passes in a set of
> relocations in sorted order. Sadly, we alternate between relocations
> increasing from the start of the buffers, and relocations decreasing
> from the end. However the majority of consecutive lookups will still be
> in the same page. We could cache the start of the last sg chain, however
> for most callers, the entire sgl is inside a single chain and so we see
> no improve from the extra layer of caching.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
>   drivers/gpu/drm/i915/i915_gem.c |  4 ++++
>   2 files changed, 30 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 66f0c607dbef..04a7d594d933 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2005,6 +2005,10 @@ struct drm_i915_gem_object {
>   
>   	struct sg_table *pages;
>   	int pages_pin_count;
> +	struct get_page {
> +		struct scatterlist *sg;
> +		int last;
> +	} get_page;
>   
>   	/* prime dma-buf support */
>   	void *dma_buf_vmapping;
> @@ -2612,15 +2616,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>   
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> -static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
> +
> +static inline int sg_page_count(struct scatterlist *sg)
> +{
> +	return PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT;
Does this need to be rounded up or are sg->offset and sg->length 
guaranteed to always be a multiple of the page size?

> +}
> +
> +static inline struct page *
> +i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
>   {
> -	struct sg_page_iter sg_iter;
> +	if (WARN_ON(n >= obj->base.size >> PAGE_SHIFT))
> +		return NULL;
>   
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, n)
> -		return sg_page_iter_page(&sg_iter);
> +	if (n < obj->get_page.last) {
> +		obj->get_page.sg = obj->pages->sgl;
> +		obj->get_page.last = 0;
> +	}
> +
> +	while (obj->get_page.last + sg_page_count(obj->get_page.sg) <= n) {
> +		obj->get_page.last += sg_page_count(obj->get_page.sg);
> +		if (unlikely(sg_is_chain(++obj->get_page.sg)))
Is it safe to do the ++ inside a nested pair of macros? There is at 
least one definition of 'unlikely' in the linux source that would cause 
multiple evaluations.

> +			obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
> +	}
>   
> -	return NULL;
> +	return nth_page(sg_page(obj->get_page.sg), n - obj->get_page.last);
>   }
> +
>   static inline void i915_gem_object_pin_pages(struct drm_i915_gem_object *obj)
>   {
>   	BUG_ON(obj->pages == NULL);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6c403654e33a..d710da099bdb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2260,6 +2260,10 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
>   		return ret;
>   
>   	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> +
> +	obj->get_page.sg = obj->pages->sgl;
> +	obj->get_page.last = 0;
> +
>   	return 0;
>   }
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] drm/i915: Tidy batch pool logic
  2015-01-14 11:20 ` [PATCH 5/5] drm/i915: Tidy batch pool logic Chris Wilson
  2015-01-14 20:54   ` shuang.he
@ 2015-02-13 14:00   ` John Harrison
  2015-02-13 14:57     ` Chris Wilson
  1 sibling, 1 reply; 31+ messages in thread
From: John Harrison @ 2015-02-13 14:00 UTC (permalink / raw)
  To: intel-gfx


On 14/01/2015 11:20, Chris Wilson wrote:
> Move the madvise logic out of the execbuffer main path into the
> relatively rare allocation path, making the execbuffer manipulation less
> fragile.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_cmd_parser.c     | 12 +++---------
>   drivers/gpu/drm/i915/i915_gem_batch_pool.c | 31 +++++++++++++++---------------
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++--------
>   3 files changed, 21 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 9a6da3536ae5..3e5e8cb54a88 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -866,6 +866,9 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
>   	    batch_len + batch_start_offset > src_obj->base.size)
>   		return ERR_PTR(-E2BIG);
>   
> +	if (WARN_ON(dest_obj->pages_pin_count == 0))
> +		return ERR_PTR(-ENODEV);
> +
>   	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n");
> @@ -879,13 +882,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
>   		goto unpin_src;
>   	}
>   
> -	ret = i915_gem_object_get_pages(dest_obj);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("CMD: Failed to get pages for shadow batch\n");
> -		goto unmap_src;
> -	}
> -	i915_gem_object_pin_pages(dest_obj);
> -
>   	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n");
> @@ -895,7 +891,6 @@ static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
>   	dst = vmap_batch(dest_obj, 0, batch_len);
>   	if (!dst) {
>   		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
> -		i915_gem_object_unpin_pages(dest_obj);
>   		ret = -ENOMEM;
>   		goto unmap_src;
>   	}
> @@ -1126,7 +1121,6 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
>   	}
>   
>   	vunmap(batch_base);
> -	i915_gem_object_unpin_pages(shadow_batch_obj);
>   
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> index c690170a1c4f..e5c88ddd8452 100644
> --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> @@ -66,9 +66,7 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
>   					 struct drm_i915_gem_object,
>   					 batch_pool_list);
>   
> -		WARN_ON(obj->active);
> -
> -		list_del_init(&obj->batch_pool_list);
> +		list_del(&obj->batch_pool_list);
>   		drm_gem_object_unreference(&obj->base);
>   	}
>   }
> @@ -96,10 +94,9 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
>   	WARN_ON(!mutex_is_locked(&pool->dev->struct_mutex));
>   
>   	list_for_each_entry_safe(tmp, next,
> -			&pool->cache_list, batch_pool_list) {
> -
> +				 &pool->cache_list, batch_pool_list) {
>   		if (tmp->active)
> -			continue;
> +			break;
>   
>   		/* While we're looping, do some clean up */
>   		if (tmp->madv == __I915_MADV_PURGED) {
> @@ -113,25 +110,27 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
>   		 * but not 'too much' bigger. A better way to do this
>   		 * might be to bucket the pool objects based on size.
>   		 */
> -		if (tmp->base.size >= size &&
> -		    tmp->base.size <= (2 * size)) {
> +		if (tmp->base.size >= size && tmp->base.size <= 2 * size) {
>   			obj = tmp;
>   			break;
>   		}
>   	}
>   
> -	if (!obj) {
> +	if (obj == NULL) {
> +		int ret;
> +
>   		obj = i915_gem_alloc_object(pool->dev, size);
> -		if (!obj)
> +		if (obj == NULL)
>   			return ERR_PTR(-ENOMEM);
>   
> -		list_add_tail(&obj->batch_pool_list, &pool->cache_list);
> -	}
> -	else
> -		/* Keep list in LRU order */
> -		list_move_tail(&obj->batch_pool_list, &pool->cache_list);
> +		ret = i915_gem_object_get_pages(obj);
> +		if (ret)
> +			return ERR_PTR(ret);
>   
> -	obj->madv = I915_MADV_WILLNEED;
> +		obj->madv = I915_MADV_DONTNEED;
> +	}
>   
> +	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
Why is it now safe to do a move_tail instead of add_tail if the node has 
just been allocated? Was the original add_tail() wrong or am I not 
spotting some critical difference to how new pool objects are created?
> +	i915_gem_object_pin_pages(obj);
Is it worth updating the function description comment to add a line 
about the returned buffer now being pinned and the caller must worry 
about unpinning it?

>   	return obj;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 3fbd5212225f..6357f01e6c46 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -37,7 +37,6 @@
>   #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
>   #define  __EXEC_OBJECT_NEEDS_MAP (1<<29)
>   #define  __EXEC_OBJECT_NEEDS_BIAS (1<<28)
> -#define  __EXEC_OBJECT_PURGEABLE (1<<27)
>   
>   #define BATCH_OFFSET_BIAS (256*1024)
>   
> @@ -224,12 +223,7 @@ i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
>   	if (entry->flags & __EXEC_OBJECT_HAS_PIN)
>   		vma->pin_count--;
>   
> -	if (entry->flags & __EXEC_OBJECT_PURGEABLE)
> -		obj->madv = I915_MADV_DONTNEED;
> -
> -	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE |
> -			  __EXEC_OBJECT_HAS_PIN |
> -			  __EXEC_OBJECT_PURGEABLE);
> +	entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | __EXEC_OBJECT_HAS_PIN);
>   }
>   
>   static void eb_destroy(struct eb_vmas *eb)
> @@ -1154,6 +1148,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
>   			      batch_start_offset,
>   			      batch_len,
>   			      is_master);
> +	i915_gem_object_unpin_pages(shadow_batch_obj);
>   	if (ret)
>   		goto err;
>   
> @@ -1165,7 +1160,7 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
>   
>   	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
>   	vma->exec_entry = shadow_exec_entry;
> -	vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE | __EXEC_OBJECT_HAS_PIN;
> +	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
>   	drm_gem_object_reference(&shadow_batch_obj->base);
>   	list_add_tail(&vma->exec_list, &eb->vmas);
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page()
  2015-02-13 13:35   ` John Harrison
@ 2015-02-13 14:28     ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2015-02-13 14:28 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 01:35:26PM +0000, John Harrison wrote:
> Accidentally hit send too early, ignore the other reply!
> 
> On 14/01/2015 11:20, Chris Wilson wrote:
> >The biggest user of i915_gem_object_get_page() is the relocation
> >processing during execbuffer. Typically userspace passes in a set of
> >relocations in sorted order. Sadly, we alternate between relocations
> >increasing from the start of the buffers, and relocations decreasing
> >from the end. However the majority of consecutive lookups will still be
> >in the same page. We could cache the start of the last sg chain, however
> >for most callers, the entire sgl is inside a single chain and so we see
> >no improve from the extra layer of caching.
> >
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_drv.h | 31 ++++++++++++++++++++++++++-----
> >  drivers/gpu/drm/i915/i915_gem.c |  4 ++++
> >  2 files changed, 30 insertions(+), 5 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index 66f0c607dbef..04a7d594d933 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2005,6 +2005,10 @@ struct drm_i915_gem_object {
> >  	struct sg_table *pages;
> >  	int pages_pin_count;
> >+	struct get_page {
> >+		struct scatterlist *sg;
> >+		int last;
> >+	} get_page;
> >  	/* prime dma-buf support */
> >  	void *dma_buf_vmapping;
> >@@ -2612,15 +2616,32 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >  				    int *needs_clflush);
> >  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> >-static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
> >+
> >+static inline int sg_page_count(struct scatterlist *sg)
> >+{
> >+	return PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT;
> Does this need to be rounded up or are sg->offset and sg->length
> guaranteed to always be a multiple of the page size?

For our sg, sg->offset is always 0, but sg->length may be a
multiple of pages. I kept the generic version, but we could just do
sg->length >> PAGE_SHIFT.

> >+	while (obj->get_page.last + sg_page_count(obj->get_page.sg) <= n) {
> >+		obj->get_page.last += sg_page_count(obj->get_page.sg);
> >+		if (unlikely(sg_is_chain(++obj->get_page.sg)))
> Is it safe to do the ++ inside a nested pair of macros? There is at
> least one definition of 'unlikely' in the linux source that would
> cause multiple evaluations.

That's easy enough to fix.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] drm/i915: Tidy batch pool logic
  2015-02-13 14:00   ` John Harrison
@ 2015-02-13 14:57     ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2015-02-13 14:57 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 02:00:50PM +0000, John Harrison wrote:
> >+	list_move_tail(&obj->batch_pool_list, &pool->cache_list);
> Why is it now safe to do a move_tail instead of add_tail if the node
> has just been allocated? Was the original add_tail() wrong or am I
> not spotting some critical difference to how new pool objects are
> created?

The link is initialised in i915_gem_object_init(). It was always safe to
use list_move_tail.

> >+	i915_gem_object_pin_pages(obj);
> Is it worth updating the function description comment to add a line
> about the returned buffer now being pinned and the caller must worry
> about unpinning it?

Didn't even spot that there was a function description. The other choice
is to just push the pinning into the caller, the emphasis was on moving
get_pages() into the allocator, and so for consistency it should also
pin the pages. Will update.

Now I just want to rename it from batch pool to buffer pool...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] drm/i915: Trim the command parser allocations
  2015-02-13 13:23     ` Chris Wilson
@ 2015-02-13 16:43       ` John Harrison
  2015-02-23 16:09         ` Daniel Vetter
  0 siblings, 1 reply; 31+ messages in thread
From: John Harrison @ 2015-02-13 16:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 13/02/2015 13:23, Chris Wilson wrote:
> On Fri, Feb 13, 2015 at 01:08:59PM +0000, John Harrison wrote:
>>> @@ -1155,40 +1154,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
>>>   			      batch_start_offset,
>>>   			      batch_len,
>>>   			      is_master);
>>> -	if (ret) {
>>> -		if (ret == -EACCES)
>>> -			return batch_obj;
>>> -	} else {
>>> -		struct i915_vma *vma;
>>> +	if (ret)
>>> +		goto err;
>>> -		memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
>>> +	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
>> There is no explicit unpin for this. Does it happen automatically
>> due to adding the vma to the eb->vmas list?
> We set the exec_flag that tells us to unpin the obj when unwinding the
> execbuf.
>   
>> Also, does it matter that it will be pinned again (and explicitly
>> unpinned) if the SECURE flag is set?
> No, pin/unpin is just a counter, it just needs to be balanced. (Long
> answer, yes, the restrictions given to both pin requests much match or
> else we will attempt to repin the buffer and fail miserably as the
> object is already pinned.)
> -Chris
>

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] drm/i915: Trim the command parser allocations
  2015-02-13 16:43       ` John Harrison
@ 2015-02-23 16:09         ` Daniel Vetter
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Vetter @ 2015-02-23 16:09 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

On Fri, Feb 13, 2015 at 04:43:22PM +0000, John Harrison wrote:
> On 13/02/2015 13:23, Chris Wilson wrote:
> >On Fri, Feb 13, 2015 at 01:08:59PM +0000, John Harrison wrote:
> >>>@@ -1155,40 +1154,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *ring,
> >>>  			      batch_start_offset,
> >>>  			      batch_len,
> >>>  			      is_master);
> >>>-	if (ret) {
> >>>-		if (ret == -EACCES)
> >>>-			return batch_obj;
> >>>-	} else {
> >>>-		struct i915_vma *vma;
> >>>+	if (ret)
> >>>+		goto err;
> >>>-		memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
> >>>+	ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 0, 0);
> >>There is no explicit unpin for this. Does it happen automatically
> >>due to adding the vma to the eb->vmas list?
> >We set the exec_flag that tells us to unpin the obj when unwinding the
> >execbuf.
> >>Also, does it matter that it will be pinned again (and explicitly
> >>unpinned) if the SECURE flag is set?
> >No, pin/unpin is just a counter, it just needs to be balanced. (Long
> >answer, yes, the restrictions given to both pin requests much match or
> >else we will attempt to repin the buffer and fail miserably as the
> >object is already pinned.)
> >-Chris
> >
> 
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-02-23 16:08 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-14 11:20 [PATCH 1/5] agp/intel: Serialise after GTT updates Chris Wilson
2015-01-14 11:20 ` [PATCH 2/5] drm/i915: Fallback to using CPU relocations for large batch buffers Chris Wilson
2015-01-15  9:45   ` Daniel, Thomas
2015-01-26  8:57   ` Jani Nikula
2015-01-27 15:09   ` Daniel Vetter
2015-01-27 21:43     ` Chris Wilson
2015-01-28  9:14       ` Daniel Vetter
2015-01-28  9:34         ` Chris Wilson
2015-01-14 11:20 ` [PATCH 3/5] drm/i915: Trim the command parser allocations Chris Wilson
2015-02-13 13:08   ` John Harrison
2015-02-13 13:23     ` Chris Wilson
2015-02-13 16:43       ` John Harrison
2015-02-23 16:09         ` Daniel Vetter
2015-01-14 11:20 ` [PATCH 4/5] drm/i915: Cache last obj->pages location for i915_gem_object_get_page() Chris Wilson
2015-02-13 13:33   ` John Harrison
2015-02-13 13:35   ` John Harrison
2015-02-13 14:28     ` Chris Wilson
2015-01-14 11:20 ` [PATCH 5/5] drm/i915: Tidy batch pool logic Chris Wilson
2015-01-14 20:54   ` shuang.he
2015-02-13 14:00   ` John Harrison
2015-02-13 14:57     ` Chris Wilson
2015-01-26 10:47 ` [PATCH v2] agp/intel: Serialise after GTT updates Chris Wilson
2015-01-27 14:58   ` Daniel Vetter
2015-01-27 21:44     ` Chris Wilson
2015-01-28  9:15       ` Daniel Vetter
2015-01-28  7:50   ` shuang.he
2015-02-06  0:11 ` [PATCH 1/5] " Jesse Barnes
2015-02-06  8:31   ` Chris Wilson
2015-02-06  8:32   ` Daniel Vetter
2015-02-13  8:59     ` Ville Syrjälä
2015-02-13  9:25       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.