intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* A bunch of random execbuffer patches
@ 2012-12-03 11:48 Chris Wilson
  2012-12-03 11:48 ` [PATCH 01/14] drm/i915: Move the get_pages assertions up to the right layer Chris Wilson
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:48 UTC (permalink / raw)
  To: intel-gfx

So Daniel likes the "Tighten the checks for invalid relocation domains"
so much he wrote a i-g-t for it. Here is the patch to fix it - only I
had it inside a patch series to adjust the execbuffer API to reduce the
relocation overhead, which in turn requires a couple of other patches.
Then I threw in a couple of patches that have been baking nearby for
good measure...

It is the last patch to allow userspace to assume a known offset for a
particular bo that is both the most contentious and pulls in a bunch of
earlier patches - but those should be useful in their own right.

Please review, and look at the proposed changes to execbuffer carefully.
-Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/14] drm/i915: Move the get_pages assertions up to the right layer
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
@ 2012-12-03 11:48 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 02/14] drm/i915: Decouple the object from the unbound list before freeing pages Chris Wilson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:48 UTC (permalink / raw)
  To: intel-gfx

A useful assertion resided inside the get_pages_gtt() routine,
which is generally applicable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dfe7174..726bfc2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1653,8 +1653,6 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 	struct scatterlist *sg;
 	int ret, i;
 
-	BUG_ON(obj->madv == __I915_MADV_PURGED);
-
 	ret = i915_gem_object_set_to_cpu_domain(obj, true);
 	if (ret) {
 		/* In the event of a disaster, abandon all caches and
@@ -1697,6 +1695,7 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 		return 0;
 
 	BUG_ON(obj->gtt_space);
+	BUG_ON(obj->madv == __I915_MADV_PURGED);
 
 	if (obj->pages_pin_count)
 		return -EBUSY;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/14] drm/i915: Decouple the object from the unbound list before freeing pages
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
  2012-12-03 11:48 ` [PATCH 01/14] drm/i915: Move the get_pages assertions up to the right layer Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 16:16   ` Daniel Vetter
  2012-12-03 11:49 ` [PATCH 03/14] drm/i915: Bail if we attempt to allocate pages for a purged object Chris Wilson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

As we may actually allocate in order to save the physical swizzling bits
during the free, we have to be careful not to trigger the shrinker on
the same object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 726bfc2..59202e4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1700,10 +1700,11 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 	if (obj->pages_pin_count)
 		return -EBUSY;
 
+	list_del(&obj->gtt_list);
+
 	ops->put_pages(obj);
 	obj->pages = NULL;
 
-	list_del(&obj->gtt_list);
 	if (i915_gem_object_is_purgeable(obj))
 		i915_gem_object_truncate(obj);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/14] drm/i915: Bail if we attempt to allocate pages for a purged object
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
  2012-12-03 11:48 ` [PATCH 01/14] drm/i915: Move the get_pages assertions up to the right layer Chris Wilson
  2012-12-03 11:49 ` [PATCH 02/14] drm/i915: Decouple the object from the unbound list before freeing pages Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 04/14] drm/i915: Defer the unbind for a fence change until the next get_fence() Chris Wilson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Move the existing checking inside bind_to_gtt() to the more appropriate
layer in order to prevent recreation of the pages after they have been
explicitly truncated.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 59202e4..f660f20 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1850,6 +1850,11 @@ i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
 	if (obj->pages)
 		return 0;
 
+	if (obj->madv != I915_MADV_WILLNEED) {
+		DRM_ERROR("Attempting to obtain a purgeable object\n");
+		return -EINVAL;
+	}
+
 	BUG_ON(obj->pages_pin_count);
 
 	ret = ops->get_pages(obj);
@@ -2442,7 +2447,7 @@ int
 i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 {
 	drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
-	int ret = 0;
+	int ret;
 
 	if (obj->gtt_space == NULL)
 		return 0;
@@ -2896,11 +2901,6 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	bool mappable, fenceable;
 	int ret;
 
-	if (obj->madv != I915_MADV_WILLNEED) {
-		DRM_ERROR("Attempting to bind a purgeable object\n");
-		return -EINVAL;
-	}
-
 	fence_size = i915_gem_get_gtt_size(dev,
 					   obj->base.size,
 					   obj->tiling_mode);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/14] drm/i915: Defer the unbind for a fence change until the next get_fence()
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (2 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 03/14] drm/i915: Bail if we attempt to allocate pages for a purged object Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 05/14] drm/i915: Avoid forcing relocations through the mappable GTT or CPU Chris Wilson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h        |    4 ++
 drivers/gpu/drm/i915/i915_gem.c        |   10 ++--
 drivers/gpu/drm/i915/i915_gem_tiling.c |   87 ++++++++------------------------
 3 files changed, 29 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e2944e9..32cfe23 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1437,6 +1437,10 @@ int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
 		      uint32_t handle, uint64_t *offset);
 int i915_gem_dumb_destroy(struct drm_file *file_priv, struct drm_device *dev,
 			  uint32_t handle);
+
+uint32_t i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode);
+uint32_t i915_gem_get_gtt_alignment(struct drm_device *dev, uint32_t size, int tiling_mode);
+
 /**
  * Returns true if seq1 is later than seq2.
  */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f660f20..987227b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1437,7 +1437,7 @@ i915_gem_release_mmap(struct drm_i915_gem_object *obj)
 	obj->fault_mappable = false;
 }
 
-static uint32_t
+uint32_t
 i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode)
 {
 	uint32_t gtt_size;
@@ -1465,7 +1465,7 @@ i915_gem_get_gtt_size(struct drm_device *dev, uint32_t size, int tiling_mode)
  * Return the required GTT alignment for an object, taking into account
  * potential fence register mapping.
  */
-static uint32_t
+uint32_t
 i915_gem_get_gtt_alignment(struct drm_device *dev,
 			   uint32_t size,
 			   int tiling_mode)
@@ -2484,13 +2484,13 @@ i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 
 	list_del(&obj->mm_list);
 	list_move_tail(&obj->gtt_list, &dev_priv->mm.unbound_list);
-	/* Avoid an unnecessary call to unbind on rebind. */
-	obj->map_and_fenceable = true;
 
 	drm_mm_put_block(obj->gtt_space);
 	obj->gtt_space = NULL;
 	obj->gtt_offset = 0;
 
+	obj->map_and_fenceable = false;
+
 	return 0;
 }
 
@@ -3689,8 +3689,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 
 	obj->fence_reg = I915_FENCE_REG_NONE;
 	obj->madv = I915_MADV_WILLNEED;
-	/* Avoid an unnecessary call to unbind on the first bind. */
-	obj->map_and_fenceable = true;
 
 	i915_gem_info_add_obj(obj->base.dev->dev_private, obj->base.size);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index cedbfd7..8e59bb5 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -252,47 +252,6 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode)
 	return true;
 }
 
-/* Is the current GTT allocation valid for the change in tiling? */
-static bool
-i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
-{
-	u32 size;
-
-	if (tiling_mode == I915_TILING_NONE)
-		return true;
-
-	if (INTEL_INFO(obj->base.dev)->gen >= 4)
-		return true;
-
-	if (INTEL_INFO(obj->base.dev)->gen == 3) {
-		if (obj->gtt_offset & ~I915_FENCE_START_MASK)
-			return false;
-	} else {
-		if (obj->gtt_offset & ~I830_FENCE_START_MASK)
-			return false;
-	}
-
-	/*
-	 * Previous chips need to be aligned to the size of the smallest
-	 * fence register that can contain the object.
-	 */
-	if (INTEL_INFO(obj->base.dev)->gen == 3)
-		size = 1024*1024;
-	else
-		size = 512*1024;
-
-	while (size < obj->base.size)
-		size <<= 1;
-
-	if (obj->gtt_space->size != size)
-		return false;
-
-	if (obj->gtt_offset & (size - 1))
-		return false;
-
-	return true;
-}
-
 /**
  * Sets the tiling mode of an object, returning the required swizzling of
  * bit 6 of addresses in the object.
@@ -365,33 +324,29 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 		 * has to also include the unfenced register the GPU uses
 		 * whilst executing a fenced command for an untiled object.
 		 */
-
-		obj->map_and_fenceable =
-			obj->gtt_space == NULL ||
-			(obj->gtt_offset + obj->base.size <= dev_priv->mm.gtt_mappable_end &&
-			 i915_gem_object_fence_ok(obj, args->tiling_mode));
-
-		/* Rebind if we need a change of alignment */
-		if (!obj->map_and_fenceable) {
-			u32 unfenced_alignment =
-				i915_gem_get_unfenced_gtt_alignment(dev,
-								    obj->base.size,
-								    args->tiling_mode);
-			if (obj->gtt_offset & (unfenced_alignment - 1))
-				ret = i915_gem_object_unbind(obj);
+		obj->fence_dirty =
+			obj->fenced_gpu_access ||
+			obj->fence_reg != I915_FENCE_REG_NONE;
+
+		obj->tiling_mode = args->tiling_mode;
+		obj->stride = args->stride;
+
+		if (obj->gtt_space) {
+			unsigned fence_size = i915_gem_get_gtt_size(dev, obj->base.size, obj->tiling_mode);
+			unsigned fence_alignment = i915_gem_get_gtt_alignment(dev,
+									      obj->base.size,
+									      obj->tiling_mode);
+			bool fenceable =
+				obj->gtt_space->size == fence_size &&
+				(obj->gtt_space->start & (fence_alignment - 1)) == 0;
+			bool mappable =
+				obj->gtt_offset + obj->base.size <= dev_priv->mm.gtt_mappable_end;
+
+			obj->map_and_fenceable = mappable && fenceable;
 		}
 
-		if (ret == 0) {
-			obj->fence_dirty =
-				obj->fenced_gpu_access ||
-				obj->fence_reg != I915_FENCE_REG_NONE;
-
-			obj->tiling_mode = args->tiling_mode;
-			obj->stride = args->stride;
-
-			/* Force the fence to be reacquired for GTT access */
-			i915_gem_release_mmap(obj);
-		}
+		/* Force the fence to be reacquired for GTT access */
+		i915_gem_release_mmap(obj);
 	}
 	/* we have to maintain this existing ABI... */
 	args->stride = obj->stride;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/14] drm/i915: Avoid forcing relocations through the mappable GTT or CPU
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (3 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 04/14] drm/i915: Defer the unbind for a fence change until the next get_fence() Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 06/14] drm: Optionally create mm blocks from top-to-bottom Chris Wilson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

If the object lies outside of the mappable GTT aperture, do not force it
through the CPU domain for relocations, but simply flush the writes as
we perform them and then queue a chipset flush.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   85 ++++++++++++++++------------
 1 file changed, 50 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 802d925..c77a57d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -33,6 +33,9 @@
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
 
+#define __EXEC_OBJECT_HAS_PIN (1<<31)
+#define __EXEC_OBJECT_HAS_FENCE (1<<30)
+
 struct eb_objects {
 	int and;
 	struct hlist_head buckets[0];
@@ -95,10 +98,16 @@ eb_destroy(struct eb_objects *eb)
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 {
 	return (obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
-		!obj->map_and_fenceable ||
 		obj->cache_level != I915_CACHE_NONE);
 }
 
+static inline struct page *
+gtt_offset_to_page(struct drm_i915_gem_object *obj, u32 offset)
+{
+	offset -= obj->gtt_space->start;
+	return i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
+}
+
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_objects *eb,
@@ -193,22 +202,20 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		return -EFAULT;
 
 	reloc->delta += target_offset;
+	reloc->offset += obj->gtt_offset;
 	if (use_cpu_reloc(obj)) {
-		uint32_t page_offset = reloc->offset & ~PAGE_MASK;
 		char *vaddr;
 
-		ret = i915_gem_object_set_to_cpu_domain(obj, 1);
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
 		if (ret)
 			return ret;
 
-		vaddr = kmap_atomic(i915_gem_object_get_page(obj,
-							     reloc->offset >> PAGE_SHIFT));
-		*(uint32_t *)(vaddr + page_offset) = reloc->delta;
+		vaddr = kmap_atomic(gtt_offset_to_page(obj, reloc->offset));
+		*(uint32_t *)(vaddr + offset_in_page(reloc->offset)) = reloc->delta;
 		kunmap_atomic(vaddr);
 	} else {
 		struct drm_i915_private *dev_priv = dev->dev_private;
-		uint32_t __iomem *reloc_entry;
-		void __iomem *reloc_page;
+		unsigned page_offset;
 
 		ret = i915_gem_object_set_to_gtt_domain(obj, true);
 		if (ret)
@@ -219,13 +226,28 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 			return ret;
 
 		/* Map the page containing the relocation we're going to perform.  */
-		reloc->offset += obj->gtt_offset;
-		reloc_page = io_mapping_map_atomic_wc(dev_priv->mm.gtt_mapping,
-						      reloc->offset & PAGE_MASK);
-		reloc_entry = (uint32_t __iomem *)
-			(reloc_page + (reloc->offset & ~PAGE_MASK));
-		iowrite32(reloc->delta, reloc_entry);
-		io_mapping_unmap_atomic(reloc_page);
+		page_offset = offset_in_page(reloc->offset);
+
+		if (reloc->offset < dev_priv->mm.gtt_mappable_end) {
+			void __iomem *reloc_page;
+
+			reloc_page = io_mapping_map_atomic_wc(dev_priv->mm.gtt_mapping,
+							      reloc->offset & PAGE_MASK);
+			iowrite32(reloc->delta, reloc_page + page_offset);
+			io_mapping_unmap_atomic(reloc_page);
+		} else {
+			char *vaddr;
+
+			vaddr = kmap_atomic(gtt_offset_to_page(obj, reloc->offset));
+
+			drm_clflush_virt_range(vaddr + page_offset, 4);
+			*(uint32_t *)(vaddr + page_offset) = reloc->delta;
+			drm_clflush_virt_range(vaddr + page_offset, 4);
+
+			kunmap_atomic(vaddr);
+
+			obj->base.pending_write_domain |= I915_GEM_DOMAIN_CPU;
+		}
 	}
 
 	/* and update the user's relocation entry */
@@ -323,16 +345,6 @@ i915_gem_execbuffer_relocate(struct drm_device *dev,
 	return ret;
 }
 
-#define  __EXEC_OBJECT_HAS_PIN (1<<31)
-#define  __EXEC_OBJECT_HAS_FENCE (1<<30)
-
-static int
-need_reloc_mappable(struct drm_i915_gem_object *obj)
-{
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
-	return entry->relocation_count && !use_cpu_reloc(obj);
-}
-
 static int
 i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 				   struct intel_ring_buffer *ring)
@@ -340,16 +352,15 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
-	bool need_fence, need_mappable;
+	bool need_fence;
 	int ret;
 
 	need_fence =
 		has_fenced_gpu_access &&
 		entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 		obj->tiling_mode != I915_TILING_NONE;
-	need_mappable = need_fence || need_reloc_mappable(obj);
 
-	ret = i915_gem_object_pin(obj, entry->alignment, need_mappable, false);
+	ret = i915_gem_object_pin(obj, entry->alignment, need_fence, false);
 	if (ret)
 		return ret;
 
@@ -412,7 +423,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 	INIT_LIST_HEAD(&ordered_objects);
 	while (!list_empty(objects)) {
 		struct drm_i915_gem_exec_object2 *entry;
-		bool need_fence, need_mappable;
+		bool need_fence;
 
 		obj = list_first_entry(objects,
 				       struct drm_i915_gem_object,
@@ -423,9 +434,8 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			has_fenced_gpu_access &&
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 			obj->tiling_mode != I915_TILING_NONE;
-		need_mappable = need_fence || need_reloc_mappable(obj);
 
-		if (need_mappable)
+		if (need_fence)
 			list_move(&obj->exec_list, &ordered_objects);
 		else
 			list_move_tail(&obj->exec_list, &ordered_objects);
@@ -455,7 +465,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 		/* Unbind any ill-fitting objects or pin. */
 		list_for_each_entry(obj, objects, exec_list) {
 			struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
-			bool need_fence, need_mappable;
+			bool need_fence;
 
 			if (!obj->gtt_space)
 				continue;
@@ -464,10 +474,9 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 				has_fenced_gpu_access &&
 				entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 				obj->tiling_mode != I915_TILING_NONE;
-			need_mappable = need_fence || need_reloc_mappable(obj);
 
 			if ((entry->alignment && obj->gtt_offset & (entry->alignment - 1)) ||
-			    (need_mappable && !obj->map_and_fenceable))
+			    (need_fence && !obj->map_and_fenceable))
 				ret = i915_gem_object_unbind(obj);
 			else
 				ret = i915_gem_execbuffer_reserve_object(obj, ring);
@@ -614,10 +623,16 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 		if (ret)
 			return ret;
 
+		flush_domains |= obj->base.write_domain;
+
 		if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
 			i915_gem_clflush_object(obj);
 
-		flush_domains |= obj->base.write_domain;
+		/* Used as an internal marker during relocation processing */
+		if (obj->base.pending_write_domain & ~I915_GEM_GPU_DOMAINS) {
+			flush_domains |= obj->base.pending_write_domain & ~I915_GEM_GPU_DOMAINS;
+			obj->base.pending_write_domain &= I915_GEM_GPU_DOMAINS;
+		}
 	}
 
 	if (flush_domains & I915_GEM_DOMAIN_CPU)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/14] drm: Optionally create mm blocks from top-to-bottom
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (4 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 05/14] drm/i915: Avoid forcing relocations through the mappable GTT or CPU Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 07/14] drm/i915: Preferentially allocate mappable GTT space to uncached bo Chris Wilson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Clients like i915 needs to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_mm.c        |   73 +++++++++++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem.c |    4 +--
 include/drm/drm_mm.h            |   42 ++++++++++++++--------
 3 files changed, 77 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index b751b8e..59b21ec 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -49,7 +49,7 @@
 
 #define MM_UNUSED_TARGET 4
 
-static struct drm_mm_node *drm_mm_kmalloc(struct drm_mm *mm, int atomic)
+static struct drm_mm_node *drm_mm_kmalloc(struct drm_mm *mm, bool atomic)
 {
 	struct drm_mm_node *child;
 
@@ -105,7 +105,8 @@ EXPORT_SYMBOL(drm_mm_pre_get);
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 				 struct drm_mm_node *node,
 				 unsigned long size, unsigned alignment,
-				 unsigned long color)
+				 unsigned long color,
+				 unsigned flags)
 {
 	struct drm_mm *mm = hole_node->mm;
 	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
@@ -118,12 +119,22 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 	if (mm->color_adjust)
 		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
 
+	if (flags & DRM_MM_CREATE_TOP)
+		adj_start = adj_end - size;
+
 	if (alignment) {
 		unsigned tmp = adj_start % alignment;
-		if (tmp)
-			adj_start += alignment - tmp;
+		if (tmp) {
+			if (flags & DRM_MM_CREATE_TOP)
+				adj_start -= tmp;
+			else
+				adj_start += alignment - tmp;
+		}
 	}
 
+	BUG_ON(adj_start < hole_start);
+	BUG_ON(adj_end > hole_end);
+
 	if (adj_start == hole_start) {
 		hole_node->hole_follows = 0;
 		list_del(&hole_node->hole_stack);
@@ -150,7 +161,7 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 struct drm_mm_node *drm_mm_create_block(struct drm_mm *mm,
 					unsigned long start,
 					unsigned long size,
-					bool atomic)
+					unsigned flags)
 {
 	struct drm_mm_node *hole, *node;
 	unsigned long end = start + size;
@@ -161,7 +172,7 @@ struct drm_mm_node *drm_mm_create_block(struct drm_mm *mm,
 		if (hole_start > start || hole_end < end)
 			continue;
 
-		node = drm_mm_kmalloc(mm, atomic);
+		node = drm_mm_kmalloc(mm, flags & DRM_MM_CREATE_ATOMIC);
 		if (unlikely(node == NULL))
 			return NULL;
 
@@ -196,15 +207,15 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
 					     unsigned long size,
 					     unsigned alignment,
 					     unsigned long color,
-					     int atomic)
+					     unsigned flags)
 {
 	struct drm_mm_node *node;
 
-	node = drm_mm_kmalloc(hole_node->mm, atomic);
+	node = drm_mm_kmalloc(hole_node->mm, flags & DRM_MM_CREATE_ATOMIC);
 	if (unlikely(node == NULL))
 		return NULL;
 
-	drm_mm_insert_helper(hole_node, node, size, alignment, color);
+	drm_mm_insert_helper(hole_node, node, size, alignment, color, flags);
 
 	return node;
 }
@@ -220,11 +231,11 @@ int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
 {
 	struct drm_mm_node *hole_node;
 
-	hole_node = drm_mm_search_free(mm, size, alignment, false);
+	hole_node = drm_mm_search_free(mm, size, alignment, 0);
 	if (!hole_node)
 		return -ENOSPC;
 
-	drm_mm_insert_helper(hole_node, node, size, alignment, 0);
+	drm_mm_insert_helper(hole_node, node, size, alignment, 0, 0);
 
 	return 0;
 }
@@ -234,7 +245,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
 				       struct drm_mm_node *node,
 				       unsigned long size, unsigned alignment,
 				       unsigned long color,
-				       unsigned long start, unsigned long end)
+				       unsigned long start, unsigned long end,
+				       unsigned flags)
 {
 	struct drm_mm *mm = hole_node->mm;
 	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
@@ -249,11 +261,20 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
 
 	if (adj_start < start)
 		adj_start = start;
+	if (adj_end > end)
+		adj_end = end;
+
+	if (flags & DRM_MM_CREATE_TOP)
+		adj_start = adj_end - size;
 
 	if (alignment) {
 		unsigned tmp = adj_start % alignment;
-		if (tmp)
-			adj_start += alignment - tmp;
+		if (tmp) {
+			if (flags & DRM_MM_CREATE_TOP)
+				adj_start -= tmp;
+			else
+				adj_start += alignment - tmp;
+		}
 	}
 
 	if (adj_start == hole_start) {
@@ -270,6 +291,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
+	BUG_ON(node->start < start);
+	BUG_ON(node->start < adj_start);
 	BUG_ON(node->start + node->size > adj_end);
 	BUG_ON(node->start + node->size > end);
 
@@ -286,16 +309,16 @@ struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node
 						unsigned long color,
 						unsigned long start,
 						unsigned long end,
-						int atomic)
+						unsigned flags)
 {
 	struct drm_mm_node *node;
 
-	node = drm_mm_kmalloc(hole_node->mm, atomic);
+	node = drm_mm_kmalloc(hole_node->mm, flags & DRM_MM_CREATE_ATOMIC);
 	if (unlikely(node == NULL))
 		return NULL;
 
 	drm_mm_insert_helper_range(hole_node, node, size, alignment, color,
-				   start, end);
+				   start, end, flags);
 
 	return node;
 }
@@ -313,12 +336,12 @@ int drm_mm_insert_node_in_range(struct drm_mm *mm, struct drm_mm_node *node,
 	struct drm_mm_node *hole_node;
 
 	hole_node = drm_mm_search_free_in_range(mm, size, alignment,
-						start, end, false);
+						start, end, 0);
 	if (!hole_node)
 		return -ENOSPC;
 
 	drm_mm_insert_helper_range(hole_node, node, size, alignment, 0,
-				   start, end);
+				   start, end, 0);
 
 	return 0;
 }
@@ -399,7 +422,7 @@ struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
 					       unsigned long size,
 					       unsigned alignment,
 					       unsigned long color,
-					       bool best_match)
+					       unsigned flags)
 {
 	struct drm_mm_node *entry;
 	struct drm_mm_node *best;
@@ -412,7 +435,7 @@ struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
 	best = NULL;
 	best_size = ~0UL;
 
-	drm_mm_for_each_hole(entry, mm, adj_start, adj_end) {
+	__drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) {
 		if (mm->color_adjust) {
 			mm->color_adjust(entry, color, &adj_start, &adj_end);
 			if (adj_end <= adj_start)
@@ -422,7 +445,7 @@ struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
 		if (!check_free_hole(adj_start, adj_end, size, alignment))
 			continue;
 
-		if (!best_match)
+		if ((flags & DRM_MM_SEARCH_BEST) == 0)
 			return entry;
 
 		if (entry->size < best_size) {
@@ -441,7 +464,7 @@ struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm,
 							unsigned long color,
 							unsigned long start,
 							unsigned long end,
-							bool best_match)
+							unsigned flags)
 {
 	struct drm_mm_node *entry;
 	struct drm_mm_node *best;
@@ -454,7 +477,7 @@ struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm,
 	best = NULL;
 	best_size = ~0UL;
 
-	drm_mm_for_each_hole(entry, mm, adj_start, adj_end) {
+	__drm_mm_for_each_hole(entry, mm, adj_start, adj_end, flags & DRM_MM_SEARCH_BELOW) {
 		if (adj_start < start)
 			adj_start = start;
 		if (adj_end > end)
@@ -469,7 +492,7 @@ struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm,
 		if (!check_free_hole(adj_start, adj_end, size, alignment))
 			continue;
 
-		if (!best_match)
+		if ((flags & DRM_MM_SEARCH_BEST) == 0)
 			return entry;
 
 		if (entry->size < best_size) {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 987227b..ec07f33 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2954,12 +2954,12 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 				drm_mm_get_block_range_generic(free_space,
 							       size, alignment, obj->cache_level,
 							       0, dev_priv->mm.gtt_mappable_end,
-							       false);
+							       0);
 		else
 			free_space =
 				drm_mm_get_block_generic(free_space,
 							 size, alignment, obj->cache_level,
-							 false);
+							 0);
 	}
 	if (free_space == NULL) {
 		ret = i915_gem_evict_something(dev, size, alignment,
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index cd45365..1f3f5f1 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -135,18 +135,26 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node)
 	     1 : 0; \
 	     entry = list_entry(entry->hole_stack.next, struct drm_mm_node, hole_stack))
 
+#define __drm_mm_for_each_hole(entry, mm, hole_start, hole_end, backwards) \
+	for (entry = list_entry((backwards) ? (mm)->hole_stack.prev : (mm)->hole_stack.next, struct drm_mm_node, hole_stack); \
+	     &entry->hole_stack != &(mm)->hole_stack ? \
+	     hole_start = drm_mm_hole_node_start(entry), \
+	     hole_end = drm_mm_hole_node_end(entry), \
+	     1 : 0; \
+	     entry = list_entry((backwards) ? entry->hole_stack.prev : entry->hole_stack.next, struct drm_mm_node, hole_stack))
+
 /*
  * Basic range manager support (drm_mm.c)
  */
 extern struct drm_mm_node *drm_mm_create_block(struct drm_mm *mm,
 					       unsigned long start,
 					       unsigned long size,
-					       bool atomic);
+					       unsigned flags);
 extern struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *node,
 						    unsigned long size,
 						    unsigned alignment,
 						    unsigned long color,
-						    int atomic);
+						    unsigned flags);
 extern struct drm_mm_node *drm_mm_get_block_range_generic(
 						struct drm_mm_node *node,
 						unsigned long size,
@@ -154,7 +162,9 @@ extern struct drm_mm_node *drm_mm_get_block_range_generic(
 						unsigned long color,
 						unsigned long start,
 						unsigned long end,
-						int atomic);
+						unsigned flags);
+#define DRM_MM_CREATE_ATOMIC	0x1
+#define DRM_MM_CREATE_TOP	0x2
 static inline struct drm_mm_node *drm_mm_get_block(struct drm_mm_node *parent,
 						   unsigned long size,
 						   unsigned alignment)
@@ -165,7 +175,7 @@ static inline struct drm_mm_node *drm_mm_get_block_atomic(struct drm_mm_node *pa
 							  unsigned long size,
 							  unsigned alignment)
 {
-	return drm_mm_get_block_generic(parent, size, alignment, 0, 1);
+	return drm_mm_get_block_generic(parent, size, alignment, 0, DRM_MM_CREATE_ATOMIC);
 }
 static inline struct drm_mm_node *drm_mm_get_block_range(
 						struct drm_mm_node *parent,
@@ -196,7 +206,7 @@ static inline struct drm_mm_node *drm_mm_get_block_atomic_range(
 						unsigned long end)
 {
 	return drm_mm_get_block_range_generic(parent, size, alignment, 0,
-						start, end, 1);
+						start, end, DRM_MM_CREATE_ATOMIC);
 }
 extern int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
 			      unsigned long size, unsigned alignment);
@@ -211,7 +221,7 @@ extern struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
 						      unsigned long size,
 						      unsigned alignment,
 						      unsigned long color,
-						      bool best_match);
+						      unsigned flags);
 extern struct drm_mm_node *drm_mm_search_free_in_range_generic(
 						const struct drm_mm *mm,
 						unsigned long size,
@@ -219,13 +229,15 @@ extern struct drm_mm_node *drm_mm_search_free_in_range_generic(
 						unsigned long color,
 						unsigned long start,
 						unsigned long end,
-						bool best_match);
+						unsigned flags);
+#define DRM_MM_SEARCH_BEST	0x1
+#define DRM_MM_SEARCH_BELOW	0x2
 static inline struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
 						     unsigned long size,
 						     unsigned alignment,
-						     bool best_match)
+						     unsigned flags)
 {
-	return drm_mm_search_free_generic(mm,size, alignment, 0, best_match);
+	return drm_mm_search_free_generic(mm,size, alignment, 0, flags);
 }
 static inline  struct drm_mm_node *drm_mm_search_free_in_range(
 						const struct drm_mm *mm,
@@ -233,18 +245,18 @@ static inline  struct drm_mm_node *drm_mm_search_free_in_range(
 						unsigned alignment,
 						unsigned long start,
 						unsigned long end,
-						bool best_match)
+						unsigned flags)
 {
 	return drm_mm_search_free_in_range_generic(mm, size, alignment, 0,
-						   start, end, best_match);
+						   start, end, flags);
 }
 static inline struct drm_mm_node *drm_mm_search_free_color(const struct drm_mm *mm,
 							   unsigned long size,
 							   unsigned alignment,
 							   unsigned long color,
-							   bool best_match)
+							   unsigned flags)
 {
-	return drm_mm_search_free_generic(mm,size, alignment, color, best_match);
+	return drm_mm_search_free_generic(mm,size, alignment, color, flags);
 }
 static inline  struct drm_mm_node *drm_mm_search_free_in_range_color(
 						const struct drm_mm *mm,
@@ -253,10 +265,10 @@ static inline  struct drm_mm_node *drm_mm_search_free_in_range_color(
 						unsigned long color,
 						unsigned long start,
 						unsigned long end,
-						bool best_match)
+						unsigned flags)
 {
 	return drm_mm_search_free_in_range_generic(mm, size, alignment, color,
-						   start, end, best_match);
+						   start, end, flags);
 }
 extern int drm_mm_init(struct drm_mm *mm,
 		       unsigned long start,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/14] drm/i915: Preferentially allocate mappable GTT space to uncached bo
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (5 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 06/14] drm: Optionally create mm blocks from top-to-bottom Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 08/14] drm/i915: Tighten the checks for invalid relocation domains Chris Wilson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Both to reduce GTT wastage due to intermingling colouring and to
reduce the number of cached bo allocated in the mappable aperture.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         |   13 +++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.c |   13 +++++++++----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ec07f33..86d549b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2898,6 +2898,7 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_mm_node *free_space;
 	u32 size, fence_size, fence_alignment, unfenced_alignment;
+	unsigned search;
 	bool mappable, fenceable;
 	int ret;
 
@@ -2937,16 +2938,20 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_pin_pages(obj);
 
+	search = 0;
+	if (obj->cache_level != I915_CACHE_NONE && !map_and_fenceable)
+		search |= DRM_MM_SEARCH_BELOW;
+
  search_free:
 	if (map_and_fenceable)
 		free_space = drm_mm_search_free_in_range_color(&dev_priv->mm.gtt_space,
 							       size, alignment, obj->cache_level,
 							       0, dev_priv->mm.gtt_mappable_end,
-							       false);
+							       search);
 	else
 		free_space = drm_mm_search_free_color(&dev_priv->mm.gtt_space,
 						      size, alignment, obj->cache_level,
-						      false);
+						      search);
 
 	if (free_space != NULL) {
 		if (map_and_fenceable)
@@ -2954,12 +2959,12 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 				drm_mm_get_block_range_generic(free_space,
 							       size, alignment, obj->cache_level,
 							       0, dev_priv->mm.gtt_mappable_end,
-							       0);
+							       search);
 		else
 			free_space =
 				drm_mm_get_block_generic(free_space,
 							 size, alignment, obj->cache_level,
-							 0);
+							 search);
 	}
 	if (free_space == NULL) {
 		ret = i915_gem_evict_something(dev, size, alignment,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 36e1e13a..c06d528 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -458,10 +458,13 @@ init_pipe_control(struct intel_ring_buffer *ring)
 
 	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
 
-	ret = i915_gem_object_pin(obj, 4096, true, false);
+	ret = i915_gem_object_pin(obj, 4096, false, false);
 	if (ret)
 		goto err_unref;
 
+	if (!obj->has_global_gtt_mapping)
+		i915_gem_gtt_bind_object(obj, obj->cache_level);
+
 	pc->gtt_offset = obj->gtt_offset;
 	pc->cpu_page =  kmap(sg_page(obj->pages->sgl));
 	if (pc->cpu_page == NULL)
@@ -1036,10 +1039,12 @@ static int init_status_page(struct intel_ring_buffer *ring)
 
 	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
 
-	ret = i915_gem_object_pin(obj, 4096, true, false);
-	if (ret != 0) {
+	ret = i915_gem_object_pin(obj, 4096, false, false);
+	if (ret != 0)
 		goto err_unref;
-	}
+
+	if (!obj->has_global_gtt_mapping)
+		i915_gem_gtt_bind_object(obj, obj->cache_level);
 
 	ring->status_page.gfx_addr = obj->gtt_offset;
 	ring->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/14] drm/i915: Tighten the checks for invalid relocation domains
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (6 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 07/14] drm/i915: Preferentially allocate mappable GTT space to uncached bo Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 09/14] drm/i915: Remove check for conflicting relocation write-domains Chris Wilson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Be specific for the GPU domains so that we can detect if userspace ever
passed in an invalid combination, as well as accurately reflect the
known GPU domains when printing state.

Fixes i-g-t/gem_exec_bad_domains

References: https://bugs.freedesktop.org/show_bug.cgi?id=57826
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 32cfe23..dd67f94 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -83,7 +83,12 @@ enum port {
 };
 #define port_name(p) ((p) + 'A')
 
-#define I915_GEM_GPU_DOMAINS	(~(I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT))
+#define I915_GEM_GPU_DOMAINS \
+	(I915_GEM_DOMAIN_RENDER | \
+	 I915_GEM_DOMAIN_SAMPLER | \
+	 I915_GEM_DOMAIN_COMMAND | \
+	 I915_GEM_DOMAIN_INSTRUCTION | \
+	 I915_GEM_DOMAIN_VERTEX)
 
 #define for_each_pipe(p) for ((p) = 0; (p) < dev_priv->num_pipe; (p)++)
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/14] drm/i915: Remove check for conflicting relocation write-domains
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (7 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 08/14] drm/i915: Tighten the checks for invalid relocation domains Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 19:18   ` Daniel Vetter
  2012-12-03 11:49 ` [PATCH 10/14] drm/i915: Take the handle idr spinlock once for looking up the exec objects Chris Wilson
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Simply use the last write-domain set for the object in the batch,
trusting userspace to have correctly flushed the caches between usage as
a write target. This check dates back from the golden age of having only
a single operation per batch with the kernel repeating it for each
cliprect, and conflicts both with userspace trying to efficiently batch
multiple operations and with reducing the kernel overhead of relocation
processing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 -----------
 1 file changed, 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c77a57d..1e53828 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -159,17 +159,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 			  reloc->write_domain);
 		return ret;
 	}
-	if (unlikely(reloc->write_domain && target_obj->pending_write_domain &&
-		     reloc->write_domain != target_obj->pending_write_domain)) {
-		DRM_DEBUG("Write domain conflict: "
-			  "obj %p target %d offset %d "
-			  "new %08x old %08x\n",
-			  obj, reloc->target_handle,
-			  (int) reloc->offset,
-			  reloc->write_domain,
-			  target_obj->pending_write_domain);
-		return ret;
-	}
 
 	target_obj->pending_read_domains |= reloc->read_domains;
 	target_obj->pending_write_domain |= reloc->write_domain;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/14] drm/i915: Take the handle idr spinlock once for looking up the exec objects
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (8 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 09/14] drm/i915: Remove check for conflicting relocation write-domains Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 11/14] drm/i915: Move the execbuffer objects list from the stack into the tracker Chris Wilson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   86 +++++++++++++++-------------
 1 file changed, 46 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1e53828..75792f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -72,6 +72,46 @@ eb_add_object(struct eb_objects *eb, struct drm_i915_gem_object *obj)
 		       &eb->buckets[obj->exec_handle & eb->and]);
 }
 
+static int
+eb_lookup_objects(struct eb_objects *eb,
+		  struct drm_i915_gem_exec_object2 *exec,
+		  int count,
+		  struct drm_file *file,
+		  struct list_head *objects)
+{
+	int i;
+
+	spin_lock(&file->table_lock);
+	for (i = 0; i < count; i++) {
+		struct drm_i915_gem_object *obj;
+
+		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
+		if (obj == NULL) {
+			spin_unlock(&file->table_lock);
+			DRM_DEBUG("Invalid object handle %d at index %d\n",
+				   exec[i].handle, i);
+			return -ENOENT;
+		}
+
+		if (!list_empty(&obj->exec_list)) {
+			spin_unlock(&file->table_lock);
+			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
+				   obj, exec[i].handle, i);
+			return -EINVAL;
+		}
+
+		drm_gem_object_reference(&obj->base);
+		list_add_tail(&obj->exec_list, objects);
+
+		obj->exec_handle = exec[i].handle;
+		obj->exec_entry = &exec[i];
+		eb_add_object(eb, obj);
+	}
+	spin_unlock(&file->table_lock);
+
+	return 0;
+}
+
 static struct drm_i915_gem_object *
 eb_get_object(struct eb_objects *eb, unsigned long handle)
 {
@@ -559,21 +599,9 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	for (i = 0; i < count; i++) {
-		obj = to_intel_bo(drm_gem_object_lookup(dev, file,
-							exec[i].handle));
-		if (&obj->base == NULL) {
-			DRM_DEBUG("Invalid object handle %d at index %d\n",
-				   exec[i].handle, i);
-			ret = -ENOENT;
-			goto err;
-		}
-
-		list_add_tail(&obj->exec_list, objects);
-		obj->exec_handle = exec[i].handle;
-		obj->exec_entry = &exec[i];
-		eb_add_object(eb, obj);
-	}
+	ret = eb_lookup_objects(eb, exec, count, file, objects);
+	if (ret)
+		goto err;
 
 	ret = i915_gem_execbuffer_reserve(ring, file, objects);
 	if (ret)
@@ -885,31 +913,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	/* Look up object handles */
 	INIT_LIST_HEAD(&objects);
-	for (i = 0; i < args->buffer_count; i++) {
-		struct drm_i915_gem_object *obj;
-
-		obj = to_intel_bo(drm_gem_object_lookup(dev, file,
-							exec[i].handle));
-		if (&obj->base == NULL) {
-			DRM_DEBUG("Invalid object handle %d at index %d\n",
-				   exec[i].handle, i);
-			/* prevent error path from reading uninitialized data */
-			ret = -ENOENT;
-			goto err;
-		}
-
-		if (!list_empty(&obj->exec_list)) {
-			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
-				   obj, exec[i].handle, i);
-			ret = -EINVAL;
-			goto err;
-		}
-
-		list_add_tail(&obj->exec_list, &objects);
-		obj->exec_handle = exec[i].handle;
-		obj->exec_entry = &exec[i];
-		eb_add_object(eb, obj);
-	}
+	ret = eb_lookup_objects(eb, exec, args->buffer_count, file, &objects);
+	if (ret)
+		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
 	batch_obj = list_entry(objects.prev,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/14] drm/i915: Move the execbuffer objects list from the stack into the tracker
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (9 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 10/14] drm/i915: Take the handle idr spinlock once for looking up the exec objects Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 12/14] drm/i915: Allow userspace to hint that the relocations were known Chris Wilson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Instead of passing around the eb-objects hashtable and a separate object
list, we can include the object list into the eb-objects structure for
convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   58 +++++++++++++---------------
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 75792f6..90e704e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -37,6 +37,7 @@
 #define __EXEC_OBJECT_HAS_FENCE (1<<30)
 
 struct eb_objects {
+	struct list_head objects;
 	int and;
 	struct hlist_head buckets[0];
 };
@@ -56,6 +57,7 @@ eb_create(int size)
 		return eb;
 
 	eb->and = count - 1;
+	INIT_LIST_HEAD(&eb->objects);
 	return eb;
 }
 
@@ -76,8 +78,7 @@ static int
 eb_lookup_objects(struct eb_objects *eb,
 		  struct drm_i915_gem_exec_object2 *exec,
 		  int count,
-		  struct drm_file *file,
-		  struct list_head *objects)
+		  struct drm_file *file)
 {
 	int i;
 
@@ -101,7 +102,7 @@ eb_lookup_objects(struct eb_objects *eb,
 		}
 
 		drm_gem_object_reference(&obj->base);
-		list_add_tail(&obj->exec_list, objects);
+		list_add_tail(&obj->exec_list, &eb->objects);
 
 		obj->exec_handle = exec[i].handle;
 		obj->exec_entry = &exec[i];
@@ -132,6 +133,15 @@ eb_get_object(struct eb_objects *eb, unsigned long handle)
 static void
 eb_destroy(struct eb_objects *eb)
 {
+	while (!list_empty(&eb->objects)) {
+		struct drm_i915_gem_object *obj;
+
+		obj = list_first_entry(&eb->objects,
+				       struct drm_i915_gem_object,
+				       exec_list);
+		list_del_init(&obj->exec_list);
+		drm_gem_object_unreference(&obj->base);
+	}
 	kfree(eb);
 }
 
@@ -350,8 +360,7 @@ i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
 
 static int
 i915_gem_execbuffer_relocate(struct drm_device *dev,
-			     struct eb_objects *eb,
-			     struct list_head *objects)
+			     struct eb_objects *eb)
 {
 	struct drm_i915_gem_object *obj;
 	int ret = 0;
@@ -364,7 +373,7 @@ i915_gem_execbuffer_relocate(struct drm_device *dev,
 	 * lockdep complains vehemently.
 	 */
 	pagefault_disable();
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(obj, &eb->objects, exec_list) {
 		ret = i915_gem_execbuffer_relocate_object(obj, eb);
 		if (ret)
 			break;
@@ -540,7 +549,6 @@ static int
 i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 				  struct drm_file *file,
 				  struct intel_ring_buffer *ring,
-				  struct list_head *objects,
 				  struct eb_objects *eb,
 				  struct drm_i915_gem_exec_object2 *exec,
 				  int count)
@@ -551,8 +559,8 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 	int i, total, ret;
 
 	/* We may process another execbuffer during the unlock... */
-	while (!list_empty(objects)) {
-		obj = list_first_entry(objects,
+	while (!list_empty(&eb->objects)) {
+		obj = list_first_entry(&eb->objects,
 				       struct drm_i915_gem_object,
 				       exec_list);
 		list_del_init(&obj->exec_list);
@@ -599,15 +607,15 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_objects(eb, exec, count, file, objects);
+	ret = eb_lookup_objects(eb, exec, count, file);
 	if (ret)
 		goto err;
 
-	ret = i915_gem_execbuffer_reserve(ring, file, objects);
+	ret = i915_gem_execbuffer_reserve(ring, file, &eb->objects);
 	if (ret)
 		goto err;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(obj, &eb->objects, exec_list) {
 		int offset = obj->exec_entry - exec;
 		ret = i915_gem_execbuffer_relocate_object_slow(obj, eb,
 							       reloc + reloc_offset[offset]);
@@ -771,7 +779,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_i915_gem_exec_object2 *exec)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct list_head objects;
 	struct eb_objects *eb;
 	struct drm_i915_gem_object *batch_obj;
 	struct drm_clip_rect *cliprects = NULL;
@@ -912,28 +919,26 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Look up object handles */
-	INIT_LIST_HEAD(&objects);
-	ret = eb_lookup_objects(eb, exec, args->buffer_count, file, &objects);
+	ret = eb_lookup_objects(eb, exec, args->buffer_count, file);
 	if (ret)
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	batch_obj = list_entry(objects.prev,
+	batch_obj = list_entry(eb->objects.prev,
 			       struct drm_i915_gem_object,
 			       exec_list);
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	ret = i915_gem_execbuffer_reserve(ring, file, &objects);
+	ret = i915_gem_execbuffer_reserve(ring, file, &eb->objects);
 	if (ret)
 		goto err;
 
 	/* The objects are in their final locations, apply the relocations. */
-	ret = i915_gem_execbuffer_relocate(dev, eb, &objects);
+	ret = i915_gem_execbuffer_relocate(dev, eb);
 	if (ret) {
 		if (ret == -EFAULT) {
 			ret = i915_gem_execbuffer_relocate_slow(dev, file, ring,
-								&objects, eb,
-								exec,
+								eb, exec,
 								args->buffer_count);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
@@ -956,7 +961,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (flags & I915_DISPATCH_SECURE && !batch_obj->has_global_gtt_mapping)
 		i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level);
 
-	ret = i915_gem_execbuffer_move_to_gpu(ring, &objects);
+	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->objects);
 	if (ret)
 		goto err;
 
@@ -1010,20 +1015,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
 
-	i915_gem_execbuffer_move_to_active(&objects, ring);
+	i915_gem_execbuffer_move_to_active(&eb->objects, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring);
 
 err:
 	eb_destroy(eb);
-	while (!list_empty(&objects)) {
-		struct drm_i915_gem_object *obj;
-
-		obj = list_first_entry(&objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
-	}
 
 	mutex_unlock(&dev->struct_mutex);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/14] drm/i915: Allow userspace to hint that the relocations were known
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (10 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 11/14] drm/i915: Move the execbuffer objects list from the stack into the tracker Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 13/14] drm/i915: Use the reloc.handle as an index into the execbuffer array Chris Wilson
  2012-12-03 11:49 ` [PATCH 14/14] drm/i915: Allow userspace to request an object at a specific offset Chris Wilson
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Userspace is able to hint to the kernel that its command stream and
auxiliary state buffers already hold the correct presumed addresses and
so the relocation process may be skipped if the kernel does not need to
move any buffers in preparation for the execbuffer. Thus for the common
case where the allotment of buffers is static between batches, we can
avoid the overhead of individually checking the relocation entries.

Note that this requires userspace to supply the domain tracking and
requests for workarounds itself that would otherwise be computed based
upon the relocation entries.

Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 632000.0/sec.
i3-330m: 748000.0/sec to 830000.0/sec.

(measured relative to a baseline with neither optimisations applied).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c            |    3 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   62 ++++++++++++++++++----------
 include/uapi/drm/i915_drm.h                |   14 +++++++
 3 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 2635ee6..9da7863 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -989,6 +989,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_SECURE_BATCHES:
 		value = capable(CAP_SYS_ADMIN);
 		break;
+	case I915_PARAM_HAS_EXEC_NO_RELOC:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG_DRIVER("Unknown parameter %d\n",
 				 param->param);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 90e704e..62a1489 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -385,7 +385,8 @@ i915_gem_execbuffer_relocate(struct drm_device *dev,
 
 static int
 i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
-				   struct intel_ring_buffer *ring)
+				   struct intel_ring_buffer *ring,
+				   bool *need_reloc)
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
@@ -425,7 +426,18 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 		obj->has_aliasing_ppgtt_mapping = 1;
 	}
 
-	entry->offset = obj->gtt_offset;
+	if (entry->offset != obj->gtt_offset) {
+		entry->offset = obj->gtt_offset;
+		*need_reloc = true;
+	}
+
+	if (entry->flags & EXEC_OBJECT_WRITE)
+		obj->base.pending_write_domain = I915_GEM_DOMAIN_RENDER;
+
+	if (entry->flags & EXEC_OBJECT_NEEDS_GTT &&
+	    !obj->has_global_gtt_mapping)
+		i915_gem_gtt_bind_object(obj, obj->cache_level);
+
 	return 0;
 }
 
@@ -451,7 +463,8 @@ i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
 static int
 i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			    struct drm_file *file,
-			    struct list_head *objects)
+			    struct list_head *objects,
+			    bool *need_relocs)
 {
 	struct drm_i915_gem_object *obj;
 	struct list_head ordered_objects;
@@ -478,7 +491,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 		else
 			list_move_tail(&obj->exec_list, &ordered_objects);
 
-		obj->base.pending_read_domains = 0;
+		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
 		obj->base.pending_write_domain = 0;
 		obj->pending_fenced_gpu_access = false;
 	}
@@ -517,7 +530,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			    (need_fence && !obj->map_and_fenceable))
 				ret = i915_gem_object_unbind(obj);
 			else
-				ret = i915_gem_execbuffer_reserve_object(obj, ring);
+				ret = i915_gem_execbuffer_reserve_object(obj, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
@@ -527,7 +540,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			if (obj->gtt_space)
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_object(obj, ring);
+			ret = i915_gem_execbuffer_reserve_object(obj, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
@@ -547,16 +560,18 @@ err:		/* Decrement pin count for bound objects */
 
 static int
 i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
+				  struct drm_i915_gem_execbuffer2 *args,
 				  struct drm_file *file,
 				  struct intel_ring_buffer *ring,
 				  struct eb_objects *eb,
-				  struct drm_i915_gem_exec_object2 *exec,
-				  int count)
+				  struct drm_i915_gem_exec_object2 *exec)
 {
 	struct drm_i915_gem_relocation_entry *reloc;
 	struct drm_i915_gem_object *obj;
+	bool need_relocs;
 	int *reloc_offset;
 	int i, total, ret;
+	int count = args->buffer_count;
 
 	/* We may process another execbuffer during the unlock... */
 	while (!list_empty(&eb->objects)) {
@@ -611,7 +626,8 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 	if (ret)
 		goto err;
 
-	ret = i915_gem_execbuffer_reserve(ring, file, &eb->objects);
+	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	ret = i915_gem_execbuffer_reserve(ring, file, &eb->objects, &need_relocs);
 	if (ret)
 		goto err;
 
@@ -675,6 +691,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 static bool
 i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec)
 {
+	if (exec->flags & __I915_EXEC_UNKNOWN_FLAGS)
+		return false;
+
 	return ((exec->batch_start_offset | exec->batch_len) & 0x7) == 0;
 }
 
@@ -688,6 +707,9 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 		char __user *ptr = (char __user *)(uintptr_t)exec[i].relocs_ptr;
 		int length; /* limited by fault_in_pages_readable() */
 
+		if (exec[i].flags & __EXEC_OBJECT_UNKNOWN_FLAGS)
+			return -EINVAL;
+
 		/* First check for malicious input causing overflow */
 		if (exec[i].relocation_count >
 		    INT_MAX / sizeof(struct drm_i915_gem_relocation_entry))
@@ -695,9 +717,6 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 
 		length = exec[i].relocation_count *
 			sizeof(struct drm_i915_gem_relocation_entry);
-		if (!access_ok(VERIFY_READ, ptr, length))
-			return -EFAULT;
-
 		/* we may also need to update the presumed offsets */
 		if (!access_ok(VERIFY_WRITE, ptr, length))
 			return -EFAULT;
@@ -785,14 +804,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct intel_ring_buffer *ring;
 	u32 ctx_id = i915_execbuffer2_get_context_id(*args);
 	u32 exec_start, exec_len;
-	u32 mask;
-	u32 flags;
+	u32 mask, flags;
 	int ret, mode, i;
+	bool need_relocs;
 
-	if (!i915_gem_check_execbuffer(args)) {
-		DRM_DEBUG("execbuf with invalid offset/length\n");
+	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
-	}
 
 	ret = validate_exec_list(exec, args->buffer_count);
 	if (ret)
@@ -929,17 +946,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			       exec_list);
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
-	ret = i915_gem_execbuffer_reserve(ring, file, &eb->objects);
+	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
+	ret = i915_gem_execbuffer_reserve(ring, file, &eb->objects, &need_relocs);
 	if (ret)
 		goto err;
 
 	/* The objects are in their final locations, apply the relocations. */
-	ret = i915_gem_execbuffer_relocate(dev, eb);
+	if (need_relocs)
+		ret = i915_gem_execbuffer_relocate(dev, eb);
 	if (ret) {
 		if (ret == -EFAULT) {
-			ret = i915_gem_execbuffer_relocate_slow(dev, file, ring,
-								eb, exec,
-								args->buffer_count);
+			ret = i915_gem_execbuffer_relocate_slow(dev, args, file, ring,
+								eb, exec);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
 		if (ret)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b746a3c..8057515 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -307,6 +307,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_PRIME_VMAP_FLUSH	 21
 #define I915_PARAM_RSVD_FOR_FUTURE_USE	 22
 #define I915_PARAM_HAS_SECURE_BATCHES	 23
+#define I915_PARAM_HAS_EXEC_NO_RELOC	 24
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -627,7 +628,11 @@ struct drm_i915_gem_exec_object2 {
 	__u64 offset;
 
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
+#define EXEC_OBJECT_NEEDS_GTT	(1<<1)
+#define EXEC_OBJECT_WRITE	(1<<2)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
 	__u64 flags;
+
 	__u64 rsvd1;
 	__u64 rsvd2;
 };
@@ -677,6 +682,15 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_SECURE		(1<<9)
 
+/** Provide a hint to the kernel that the command stream and auxilliary
+ * state buffers already holds the correct presumed addresses and so the
+ * relocation process may be skipped if no buffers need to be moved in
+ * preparation for the execbuffer.
+ */
+#define I915_EXEC_NO_RELOC		(1<<10)
+
+#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_NO_RELOC<<1)
+
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
 	(eb2).rsvd1 = context & I915_EXEC_CONTEXT_ID_MASK
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 13/14] drm/i915: Use the reloc.handle as an index into the execbuffer array
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (11 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 12/14] drm/i915: Allow userspace to hint that the relocations were known Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  2012-12-03 11:49 ` [PATCH 14/14] drm/i915: Allow userspace to request an object at a specific offset Chris Wilson
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 623000.0/sec.
i3-330m: 748000.0/sec to 789000.0/sec.

(measured relative to a baseline with neither optimisations applied).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c            |    3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  100 +++++++++++++++++-----------
 include/uapi/drm/i915_drm.h                |    8 ++-
 3 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 9da7863..63d27c9 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -992,6 +992,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_NO_RELOC:
 		value = 1;
 		break;
+	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG_DRIVER("Unknown parameter %d\n",
 				 param->param);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 62a1489..17b09bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -39,24 +39,40 @@
 struct eb_objects {
 	struct list_head objects;
 	int and;
-	struct hlist_head buckets[0];
+	union {
+		struct drm_i915_gem_object *lut[0];
+		struct hlist_head buckets[0];
+	};
 };
 
 static struct eb_objects *
-eb_create(int size)
+eb_create(struct drm_i915_gem_execbuffer2 *args)
 {
-	struct eb_objects *eb;
-	int count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
-	BUILD_BUG_ON(!is_power_of_2(PAGE_SIZE / sizeof(struct hlist_head)));
-	while (count > size)
-		count >>= 1;
-	eb = kzalloc(count*sizeof(struct hlist_head) +
-		     sizeof(struct eb_objects),
-		     GFP_KERNEL);
-	if (eb == NULL)
-		return eb;
-
-	eb->and = count - 1;
+	struct eb_objects *eb = NULL;
+
+	if (args->flags & I915_EXEC_HANDLE_LUT) {
+		int size = args->buffer_count;
+		size *= sizeof(struct drm_i915_gem_object *);
+		size += sizeof(struct eb_objects);
+		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+	}
+
+	if (eb == NULL) {
+		int size = args->buffer_count;
+		int count = PAGE_SIZE / sizeof(struct hlist_head) / 2;
+		BUILD_BUG_ON(!is_power_of_2(PAGE_SIZE / sizeof(struct hlist_head)));
+		while (count > 2*size)
+			count >>= 1;
+		eb = kzalloc(count*sizeof(struct hlist_head) +
+			     sizeof(struct eb_objects),
+			     GFP_TEMPORARY);
+		if (eb == NULL)
+			return eb;
+
+		eb->and = count - 1;
+	} else
+		eb->and = -args->buffer_count;
+
 	INIT_LIST_HEAD(&eb->objects);
 	return eb;
 }
@@ -64,26 +80,20 @@ eb_create(int size)
 static void
 eb_reset(struct eb_objects *eb)
 {
-	memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
-}
-
-static void
-eb_add_object(struct eb_objects *eb, struct drm_i915_gem_object *obj)
-{
-	hlist_add_head(&obj->exec_node,
-		       &eb->buckets[obj->exec_handle & eb->and]);
+	if (eb->and >= 0)
+		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static int
 eb_lookup_objects(struct eb_objects *eb,
 		  struct drm_i915_gem_exec_object2 *exec,
-		  int count,
+		  const struct drm_i915_gem_execbuffer2 *args,
 		  struct drm_file *file)
 {
 	int i;
 
 	spin_lock(&file->table_lock);
-	for (i = 0; i < count; i++) {
+	for (i = 0; i < args->buffer_count; i++) {
 		struct drm_i915_gem_object *obj;
 
 		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
@@ -104,9 +114,15 @@ eb_lookup_objects(struct eb_objects *eb,
 		drm_gem_object_reference(&obj->base);
 		list_add_tail(&obj->exec_list, &eb->objects);
 
-		obj->exec_handle = exec[i].handle;
 		obj->exec_entry = &exec[i];
-		eb_add_object(eb, obj);
+		if (eb->and < 0) {
+			eb->lut[i] = obj;
+		} else {
+			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
+			obj->exec_handle = handle;
+			hlist_add_head(&obj->exec_node,
+				       &eb->buckets[handle & eb->and]);
+		}
 	}
 	spin_unlock(&file->table_lock);
 
@@ -116,18 +132,24 @@ eb_lookup_objects(struct eb_objects *eb,
 static struct drm_i915_gem_object *
 eb_get_object(struct eb_objects *eb, unsigned long handle)
 {
-	struct hlist_head *head;
-	struct hlist_node *node;
-	struct drm_i915_gem_object *obj;
+	if (eb->and < 0) {
+		if (handle >= -eb->and)
+			return NULL;
+		return eb->lut[handle];
+	} else {
+		struct hlist_head *head;
+		struct hlist_node *node;
 
-	head = &eb->buckets[handle & eb->and];
-	hlist_for_each(node, head) {
-		obj = hlist_entry(node, struct drm_i915_gem_object, exec_node);
-		if (obj->exec_handle == handle)
-			return obj;
-	}
+		head = &eb->buckets[handle & eb->and];
+		hlist_for_each(node, head) {
+			struct drm_i915_gem_object *obj;
 
-	return NULL;
+			obj = hlist_entry(node, struct drm_i915_gem_object, exec_node);
+			if (obj->exec_handle == handle)
+				return obj;
+		}
+		return NULL;
+	}
 }
 
 static void
@@ -622,7 +644,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_objects(eb, exec, count, file);
+	ret = eb_lookup_objects(eb, exec, args, file);
 	if (ret)
 		goto err;
 
@@ -928,7 +950,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
-	eb = eb_create(args->buffer_count);
+	eb = eb_create(args);
 	if (eb == NULL) {
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
@@ -936,7 +958,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Look up object handles */
-	ret = eb_lookup_objects(eb, exec, args->buffer_count, file);
+	ret = eb_lookup_objects(eb, exec, args, file);
 	if (ret)
 		goto err;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8057515..6210872 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -308,6 +308,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_RSVD_FOR_FUTURE_USE	 22
 #define I915_PARAM_HAS_SECURE_BATCHES	 23
 #define I915_PARAM_HAS_EXEC_NO_RELOC	 24
+#define I915_PARAM_HAS_EXEC_HANDLE_LUT   25
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -689,7 +690,12 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_NO_RELOC		(1<<10)
 
-#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_NO_RELOC<<1)
+/** Use the reloc.handle as an index into the exec object array rather
+ * than as the per-file handle.
+ */
+#define I915_EXEC_HANDLE_LUT		(1<<11)
+
+#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_HANDLE_LUT<<1)
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 14/14] drm/i915: Allow userspace to request an object at a specific offset
  2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
                   ` (12 preceding siblings ...)
  2012-12-03 11:49 ` [PATCH 13/14] drm/i915: Use the reloc.handle as an index into the execbuffer array Chris Wilson
@ 2012-12-03 11:49 ` Chris Wilson
  13 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 11:49 UTC (permalink / raw)
  To: intel-gfx

Certain workarounds and workloads require objects at specific or at
least known offsets. Privileged users could pin an object into the GTT,
but that has obvious limitations for the general case. Instead, the user
can construct a batch assuming a particular layout for an object and
request that the kernel try its utmost to provide the object at that
location. This has the advantage that not only can it fail, but also
such allocations are transitory - although contention should be rare and
the object persist at the same location between batches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |    4 ++
 drivers/gpu/drm/i915/i915_gem.c            |    6 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  103 +++++++++++++++++++++++++++-
 include/uapi/drm/i915_drm.h                |    3 +-
 4 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dd67f94..931d722 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1394,6 +1394,10 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 void i915_gem_free_object(struct drm_gem_object *obj);
 
+bool i915_gem_valid_gtt_space(struct drm_device *dev,
+			      struct drm_mm_node *gtt_space,
+			      unsigned long cache_level);
+
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
 				     uint32_t alignment,
 				     bool map_and_fenceable,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 86d549b..a014784 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2815,9 +2815,9 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
-static bool i915_gem_valid_gtt_space(struct drm_device *dev,
-				     struct drm_mm_node *gtt_space,
-				     unsigned long cache_level)
+bool i915_gem_valid_gtt_space(struct drm_device *dev,
+			      struct drm_mm_node *gtt_space,
+			      unsigned long cache_level)
 {
 	struct drm_mm_node *other;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 17b09bd..3fc07ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -405,6 +405,90 @@ i915_gem_execbuffer_relocate(struct drm_device *dev,
 	return ret;
 }
 
+static struct drm_mm_node *
+get_pinned_block(struct drm_i915_gem_object *obj)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_mm_node *gtt;
+
+	gtt = drm_mm_create_block(&dev_priv->mm.gtt_space,
+				  obj->exec_entry->offset,
+				  i915_gem_get_gtt_size(dev, obj->base.size, obj->tiling_mode),
+				  false);
+	if (gtt == NULL)
+		return NULL;
+
+	if (!i915_gem_valid_gtt_space(dev, gtt, obj->cache_level)) {
+		drm_mm_put_block(gtt);
+		return NULL;
+	}
+
+	gtt->color = obj->cache_level;
+	return gtt;
+}
+
+static int
+i915_gem_execbuffer_pinned_object(struct drm_i915_gem_object *obj)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_mm_node *gtt;
+	int ret;
+
+	if (obj->gtt_offset == entry->offset)
+		return 0;
+
+	if (entry->offset & (i915_gem_get_gtt_alignment(dev, obj->base.size, obj->tiling_mode) - 1))
+		return -EINVAL;
+
+	if (entry->alignment && entry->offset & (entry->alignment - 1))
+		return -EINVAL;
+
+	i915_gem_object_pin_pages(obj);
+
+	ret = i915_gem_object_unbind(obj);
+	if (ret)
+		goto unpin_pages;
+
+	gtt = get_pinned_block(obj);
+	if (gtt == NULL) {
+		ret = i915_gem_evict_everything(dev);
+		if (ret)
+			goto unpin_pages;
+
+		gtt = get_pinned_block(obj);
+	}
+	if (gtt == NULL) {
+		ret = -EBUSY;
+		goto unpin_pages;
+	}
+
+	ret = i915_gem_gtt_prepare_object(obj);
+	if (ret) {
+		drm_mm_put_block(gtt);
+		goto unpin_pages;
+	}
+
+	list_move_tail(&obj->gtt_list, &dev_priv->mm.bound_list);
+	list_add_tail(&obj->mm_list, &dev_priv->mm.inactive_list);
+
+	obj->gtt_space = gtt;
+	obj->gtt_offset += gtt->start;
+
+	obj->map_and_fenceable =
+		obj->gtt_offset + obj->base.size <= dev_priv->mm.gtt_mappable_end;
+	trace_i915_gem_object_bind(obj, false);
+
+	if (!dev_priv->mm.aliasing_ppgtt)
+		i915_gem_gtt_bind_object(obj, obj->cache_level);
+
+unpin_pages:
+	i915_gem_object_unpin_pages(obj);
+	return ret;
+}
+
 static int
 i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 				   struct intel_ring_buffer *ring,
@@ -416,6 +500,12 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 	bool need_fence;
 	int ret;
 
+	if (entry->flags & EXEC_OBJECT_PINNED) {
+		ret = i915_gem_execbuffer_pinned_object(obj);
+		if (ret)
+			return ret;
+	}
+
 	need_fence =
 		has_fenced_gpu_access &&
 		entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
@@ -427,6 +517,10 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 
 	entry->flags |= __EXEC_OBJECT_HAS_PIN;
 
+	if (entry->flags & EXEC_OBJECT_PINNED &&
+	    obj->gtt_offset != entry->offset)
+		return -EINVAL;
+
 	if (has_fenced_gpu_access) {
 		if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
 			ret = i915_gem_object_get_fence(obj);
@@ -489,11 +583,12 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			    bool *need_relocs)
 {
 	struct drm_i915_gem_object *obj;
-	struct list_head ordered_objects;
+	struct list_head ordered_objects, pinned_objects;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
 	INIT_LIST_HEAD(&ordered_objects);
+	INIT_LIST_HEAD(&pinned_objects);
 	while (!list_empty(objects)) {
 		struct drm_i915_gem_exec_object2 *entry;
 		bool need_fence;
@@ -507,8 +602,9 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			has_fenced_gpu_access &&
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 			obj->tiling_mode != I915_TILING_NONE;
-
-		if (need_fence)
+		if (entry->flags & EXEC_OBJECT_PINNED)
+			list_move(&obj->exec_list, &pinned_objects);
+		else if (need_fence)
 			list_move(&obj->exec_list, &ordered_objects);
 		else
 			list_move_tail(&obj->exec_list, &ordered_objects);
@@ -517,6 +613,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 		obj->base.pending_write_domain = 0;
 		obj->pending_fenced_gpu_access = false;
 	}
+	list_splice(&pinned_objects, &ordered_objects);
 	list_splice(&ordered_objects, objects);
 
 	/* Attempt to pin all of the buffers into the GTT.
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6210872..525ab30 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -631,7 +631,8 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
 #define EXEC_OBJECT_NEEDS_GTT	(1<<1)
 #define EXEC_OBJECT_WRITE	(1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_PINNED	(1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PINNED<<1)
 	__u64 flags;
 
 	__u64 rsvd1;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 02/14] drm/i915: Decouple the object from the unbound list before freeing pages
  2012-12-03 11:49 ` [PATCH 02/14] drm/i915: Decouple the object from the unbound list before freeing pages Chris Wilson
@ 2012-12-03 16:16   ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2012-12-03 16:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Dec 03, 2012 at 11:49:00AM +0000, Chris Wilson wrote:
> As we may actually allocate in order to save the physical swizzling bits
> during the free, we have to be careful not to trigger the shrinker on
> the same object.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Since this one here smells like a potential OOPS fixer on relevant
machines, I've applied it to 3.8-fixes. Since it's tricky, I'll add a
small comment. Thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 726bfc2..59202e4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1700,10 +1700,11 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
>  	if (obj->pages_pin_count)
>  		return -EBUSY;
>  
> +	list_del(&obj->gtt_list);
> +
>  	ops->put_pages(obj);
>  	obj->pages = NULL;
>  
> -	list_del(&obj->gtt_list);
>  	if (i915_gem_object_is_purgeable(obj))
>  		i915_gem_object_truncate(obj);
>  
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 09/14] drm/i915: Remove check for conflicting relocation write-domains
  2012-12-03 11:49 ` [PATCH 09/14] drm/i915: Remove check for conflicting relocation write-domains Chris Wilson
@ 2012-12-03 19:18   ` Daniel Vetter
  2012-12-03 21:03     ` [PATCH] drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2012-12-03 19:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Dec 03, 2012 at 11:49:07AM +0000, Chris Wilson wrote:
> Simply use the last write-domain set for the object in the batch,
> trusting userspace to have correctly flushed the caches between usage as
> a write target. This check dates back from the golden age of having only
> a single operation per batch with the kernel repeating it for each
> cliprect, and conflicts both with userspace trying to efficiently batch
> multiple operations and with reducing the kernel overhead of relocation
> processing.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

This and the previous patch are merged to dinq, thanks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages
  2012-12-03 19:18   ` Daniel Vetter
@ 2012-12-03 21:03     ` Chris Wilson
  2012-12-07  0:16       ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2012-12-03 21:03 UTC (permalink / raw)
  To: intel-gfx

On a machine with bit17 swizzling, we need to store the bit17 of the
physical page address in put-pages. This requires a memory allocation,
on average less than a page, which may be difficult to satisfy is the
request to put-pages is on behalf of the shrinker. We could allow that
allocation to pull from the reserved memory pools, but it seems much
safer to preallocate the array for tiled objects on affected machines.

v2: Export i915_gem_object_needs_bit17_swizzle() for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h        |   10 ++++++++++
 drivers/gpu/drm/i915/i915_gem.c        |    8 --------
 drivers/gpu/drm/i915/i915_gem_tiling.c |   12 ++++++++++++
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d4a3595..656775c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -30,6 +30,8 @@
 #ifndef _I915_DRV_H_
 #define _I915_DRV_H_
 
+#include <uapi/drm/i915_drm.h>
+
 #include "i915_reg.h"
 #include "intel_bios.h"
 #include "intel_ringbuffer.h"
@@ -1648,6 +1650,14 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 void i915_gem_object_release_stolen(struct drm_i915_gem_object *obj);
 
 /* i915_gem_tiling.c */
+inline static bool i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
+{
+	drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
+
+	return dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_9_10_17 &&
+		obj->tiling_mode != I915_TILING_NONE;
+}
+
 void i915_gem_detect_bit_6_swizzle(struct drm_device *dev);
 void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj);
 void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b00e55f..569d40e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -261,14 +261,6 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			       args->size, &args->handle);
 }
 
-static int i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
-{
-	drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
-
-	return dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_9_10_17 &&
-		obj->tiling_mode != I915_TILING_NONE;
-}
-
 static inline int
 __copy_to_user_swizzled(char __user *cpu_vaddr,
 			const char *gpu_vaddr, int gpu_offset,
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 8e59bb5..78a7565 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -351,6 +351,18 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 	/* we have to maintain this existing ABI... */
 	args->stride = obj->stride;
 	args->tiling_mode = obj->tiling_mode;
+
+	/* Try to preallocate memory required to save swizzling on put-pages */
+	if (i915_gem_object_needs_bit17_swizzle(obj)) {
+		if (obj->bit_17 == NULL) {
+			obj->bit_17 = kmalloc(BITS_TO_LONGS(obj->base.size >> PAGE_SHIFT) *
+					      sizeof(long), GFP_KERNEL);
+		}
+	} else {
+		kfree(obj->bit_17);
+		obj->bit_17 = NULL;
+	}
+
 	drm_gem_object_unreference(&obj->base);
 	mutex_unlock(&dev->struct_mutex);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages
  2012-12-03 21:03     ` [PATCH] drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages Chris Wilson
@ 2012-12-07  0:16       ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2012-12-07  0:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Dec 03, 2012 at 09:03:14PM +0000, Chris Wilson wrote:
> On a machine with bit17 swizzling, we need to store the bit17 of the
> physical page address in put-pages. This requires a memory allocation,
> on average less than a page, which may be difficult to satisfy is the
> request to put-pages is on behalf of the shrinker. We could allow that
> allocation to pull from the reserved memory pools, but it seems much
> safer to preallocate the array for tiled objects on affected machines.
> 
> v2: Export i915_gem_object_needs_bit17_swizzle() for reuse.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Missed this one here buried in another thread. Queued for -next, thanks
for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-12-07  0:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-03 11:48 A bunch of random execbuffer patches Chris Wilson
2012-12-03 11:48 ` [PATCH 01/14] drm/i915: Move the get_pages assertions up to the right layer Chris Wilson
2012-12-03 11:49 ` [PATCH 02/14] drm/i915: Decouple the object from the unbound list before freeing pages Chris Wilson
2012-12-03 16:16   ` Daniel Vetter
2012-12-03 11:49 ` [PATCH 03/14] drm/i915: Bail if we attempt to allocate pages for a purged object Chris Wilson
2012-12-03 11:49 ` [PATCH 04/14] drm/i915: Defer the unbind for a fence change until the next get_fence() Chris Wilson
2012-12-03 11:49 ` [PATCH 05/14] drm/i915: Avoid forcing relocations through the mappable GTT or CPU Chris Wilson
2012-12-03 11:49 ` [PATCH 06/14] drm: Optionally create mm blocks from top-to-bottom Chris Wilson
2012-12-03 11:49 ` [PATCH 07/14] drm/i915: Preferentially allocate mappable GTT space to uncached bo Chris Wilson
2012-12-03 11:49 ` [PATCH 08/14] drm/i915: Tighten the checks for invalid relocation domains Chris Wilson
2012-12-03 11:49 ` [PATCH 09/14] drm/i915: Remove check for conflicting relocation write-domains Chris Wilson
2012-12-03 19:18   ` Daniel Vetter
2012-12-03 21:03     ` [PATCH] drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages Chris Wilson
2012-12-07  0:16       ` Daniel Vetter
2012-12-03 11:49 ` [PATCH 10/14] drm/i915: Take the handle idr spinlock once for looking up the exec objects Chris Wilson
2012-12-03 11:49 ` [PATCH 11/14] drm/i915: Move the execbuffer objects list from the stack into the tracker Chris Wilson
2012-12-03 11:49 ` [PATCH 12/14] drm/i915: Allow userspace to hint that the relocations were known Chris Wilson
2012-12-03 11:49 ` [PATCH 13/14] drm/i915: Use the reloc.handle as an index into the execbuffer array Chris Wilson
2012-12-03 11:49 ` [PATCH 14/14] drm/i915: Allow userspace to request an object at a specific offset Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).