All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] [v2] drm/i915: Remove node only when allocated
@ 2013-08-14  1:09 Ben Widawsky
  2013-08-14  1:09 ` [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind Ben Widawsky
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14  1:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Paulo Zanoni, Ben Widawsky

VMAs can be created and not bound. One may think of it as lazy cleanup,
and safely gloss over the conditions which manufacture it. In either
case, when the object backing the i915 vma is destroyed, we must cleanup
the vma without stumbling into a bunch of pitfalls that assume the vma
is bound.

NOTE: I was pretty certain the above condition could only happen when we
introduced the use of VMAs being looked up at execbuf, and already
existing. Paulo has hit this though, so I must be missing something. As
I believe the patch is correct anyway, therefore I won't scratch my head
too hard.

v2: use goto destroy as a compromise (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3d9e248b..4a58ead 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2606,6 +2606,9 @@ int i915_vma_unbind(struct i915_vma *vma)
 	if (list_empty(&vma->vma_link))
 		return 0;
 
+	if (!drm_mm_node_allocated(&vma->node))
+		goto destroy;
+
 	if (obj->pin_count)
 		return -EBUSY;
 
@@ -2643,6 +2646,8 @@ int i915_vma_unbind(struct i915_vma *vma)
 		obj->map_and_fenceable = true;
 
 	drm_mm_remove_node(&vma->node);
+
+destroy:
 	i915_gem_vma_destroy(vma);
 
 	/* Since the unbound list is global, only move to that list if
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind
  2013-08-14  1:09 [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Ben Widawsky
@ 2013-08-14  1:09 ` Ben Widawsky
  2013-08-14  8:18   ` Daniel Vetter
  2013-08-14  1:09 ` [PATCH 3/4] drm: WARN when removing unallocated node Ben Widawsky
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14  1:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Ben Widawsky

Cleanup the map and fenceable setting during bind to make more sense,
and not check i915_is_ggtt() 2 unnecessary times

v2: Move the bools into the if block (Chris) - There are ways to tidy
this function (fence calculations for instance) even further, but they
are quite invasive, so I am punting on those unless specifically asked.

v3: Add newline between variable declaration and logic (Chris)

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4a58ead..01cc016 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3106,7 +3106,6 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	u32 size, fence_size, fence_alignment, unfenced_alignment;
-	bool mappable, fenceable;
 	size_t gtt_max =
 		map_and_fenceable ? dev_priv->gtt.mappable_end : vm->total;
 	struct i915_vma *vma;
@@ -3191,18 +3190,18 @@ search_free:
 	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
 	list_add_tail(&vma->mm_list, &vm->inactive_list);
 
-	fenceable =
-		i915_is_ggtt(vm) &&
-		i915_gem_obj_ggtt_size(obj) == fence_size &&
-		(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
+	if (i915_is_ggtt(vm)) {
+		bool mappable, fenceable;
 
-	mappable =
-		i915_is_ggtt(vm) &&
-		vma->node.start + obj->base.size <= dev_priv->gtt.mappable_end;
+		fenceable =
+			i915_gem_obj_ggtt_size(obj) == fence_size &&
+			(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
+
+		mappable =
+			vma->node.start + obj->base.size <= dev_priv->gtt.mappable_end;
 
-	/* Map and fenceable only changes if the VM is the global GGTT */
-	if (i915_is_ggtt(vm))
 		obj->map_and_fenceable = mappable && fenceable;
+	}
 
 	WARN_ON(map_and_fenceable && !obj->map_and_fenceable);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/4] drm: WARN when removing unallocated node
  2013-08-14  1:09 [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Ben Widawsky
  2013-08-14  1:09 ` [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind Ben Widawsky
@ 2013-08-14  1:09 ` Ben Widawsky
  2013-08-14  8:52   ` [Intel-gfx] " Daniel Vetter
  2013-08-14  1:09 ` [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas Ben Widawsky
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14  1:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Dave Airlie, Ben Widawsky, dri-devel, Ben Widawsky

The conditional is usually a recoverable driver bug, and so WARNing, and
preventing the drm_mm code from doing potential damage (BUG) is
desirable.

This issue was hit and fixed twice while developing the i915 multiple
address space code. The first fix is the patch just before this, and is
hit on an not frequently occuring error path. Another was fixed during
patch iteration, so it's hard to see from the patch:

commit c6cfb325677ea6305fb19acf3a4d14ea267f923e
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Fri Jul 5 14:41:06 2013 -0700

    drm/i915: Embed drm_mm_node in i915 gem obj

>From the intel-gfx mailing list, we discussed this:
References: <20130705191235.GA3057@bwidawsk.net>

Cc: Dave Airlie <airlied@redhat.com>
CC: <dri-devel@lists.freedesktop.org>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/drm_mm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index aded1e1..af93cc5 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -254,6 +254,9 @@ void drm_mm_remove_node(struct drm_mm_node *node)
 	struct drm_mm *mm = node->mm;
 	struct drm_mm_node *prev_node;
 
+	if (WARN_ON(!node->allocated))
+		return;
+
 	BUG_ON(node->scanned_block || node->scanned_prev_free
 				   || node->scanned_next_free);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas
  2013-08-14  1:09 [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Ben Widawsky
  2013-08-14  1:09 ` [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind Ben Widawsky
  2013-08-14  1:09 ` [PATCH 3/4] drm: WARN when removing unallocated node Ben Widawsky
@ 2013-08-14  1:09 ` Ben Widawsky
  2013-08-14  1:11   ` Ben Widawsky
  2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
  2013-08-14  8:06 ` [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Daniel Vetter
  2013-08-14  8:19 ` Chris Wilson
  4 siblings, 2 replies; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14  1:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

From: Ben Widawsky <ben@bwidawsk.net>

In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569d366beb4be72cacfde41adee2c25e1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  12 +-
 drivers/gpu/drm/i915/i915_drv.h            |  25 ++-
 drivers/gpu/drm/i915/i915_gem.c            |  28 ++-
 drivers/gpu/drm/i915/i915_gem_evict.c      |  31 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 319 ++++++++++++++++-------------
 5 files changed, 232 insertions(+), 183 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index eb87865..4785d8c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -195,9 +195,9 @@ static int obj_rank_by_stolen(void *priv,
 			      struct list_head *A, struct list_head *B)
 {
 	struct drm_i915_gem_object *a =
-		container_of(A, struct drm_i915_gem_object, exec_list);
+		container_of(A, struct drm_i915_gem_object, obj_exec_link);
 	struct drm_i915_gem_object *b =
-		container_of(B, struct drm_i915_gem_object, exec_list);
+		container_of(B, struct drm_i915_gem_object, obj_exec_link);
 
 	return a->stolen->start - b->stolen->start;
 }
@@ -221,7 +221,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->exec_list, &stolen);
+		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		total_gtt_size += i915_gem_obj_ggtt_size(obj);
@@ -231,7 +231,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->exec_list, &stolen);
+		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		count++;
@@ -239,11 +239,11 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	list_sort(NULL, &stolen, obj_rank_by_stolen);
 	seq_puts(m, "Stolen:\n");
 	while (!list_empty(&stolen)) {
-		obj = list_first_entry(&stolen, typeof(*obj), exec_list);
+		obj = list_first_entry(&stolen, typeof(*obj), obj_exec_link);
 		seq_puts(m, "   ");
 		describe_obj(m, obj);
 		seq_putc(m, '\n');
-		list_del_init(&obj->exec_list);
+		list_del_init(&obj->obj_exec_link);
 	}
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3fc4324..2147e4d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -563,6 +563,17 @@ struct i915_vma {
 	struct list_head mm_list;
 
 	struct list_head vma_link; /* Link in the object's VMA list */
+
+	/** This vma's place in the batchbuffer or on the eviction list */
+	struct list_head exec_list;
+
+	/**
+	 * Used for performing relocations during execbuffer insertion.
+	 */
+	struct hlist_node exec_node;
+	unsigned long exec_handle;
+	struct drm_i915_gem_exec_object2 *exec_entry;
+
 };
 
 struct i915_ctx_hang_stats {
@@ -1312,8 +1323,8 @@ struct drm_i915_gem_object {
 	struct list_head global_list;
 
 	struct list_head ring_list;
-	/** This object's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
+	/** Used in execbuf to temporarily hold a ref */
+	struct list_head obj_exec_link;
 
 	/**
 	 * This is set if the object is on the active lists (has pending
@@ -1399,13 +1410,6 @@ struct drm_i915_gem_object {
 	void *dma_buf_vmapping;
 	int vmapping_count;
 
-	/**
-	 * Used for performing relocations during execbuffer insertion.
-	 */
-	struct hlist_node exec_node;
-	unsigned long exec_handle;
-	struct drm_i915_gem_exec_object2 *exec_entry;
-
 	struct intel_ring_buffer *ring;
 
 	/** Breadcrumb of last rendering to the buffer. */
@@ -1907,6 +1911,9 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm);
 struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm);
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm);
 /* Some GGTT VM helpers */
 #define obj_to_ggtt(obj) \
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 01cc016..6bd341a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3111,8 +3111,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int ret;
 
-	if (WARN_ON(!list_empty(&obj->vma_list)))
-		return -EBUSY;
+	BUG_ON(!i915_is_ggtt(vm));
 
 	fence_size = i915_gem_get_gtt_size(dev,
 					   obj->base.size,
@@ -3152,16 +3151,15 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_pin_pages(obj);
 
-	/* FIXME: For now we only ever use 1 VMA per object */
-	BUG_ON(!i915_is_ggtt(vm));
-	WARN_ON(!list_empty(&obj->vma_list));
-
-	vma = i915_gem_vma_create(obj, vm);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unpin;
 	}
 
+	/* For now we only ever use 1 vma per object */
+	WARN_ON(!list_is_singular(&obj->vma_list));
+
 search_free:
 	ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
 						  size, alignment,
@@ -3981,7 +3979,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 {
 	INIT_LIST_HEAD(&obj->global_list);
 	INIT_LIST_HEAD(&obj->ring_list);
-	INIT_LIST_HEAD(&obj->exec_list);
+	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 
 	obj->ops = ops;
@@ -4121,6 +4119,7 @@ struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vma_link);
 	INIT_LIST_HEAD(&vma->mm_list);
+	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
 
@@ -4870,3 +4869,16 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 
 	return NULL;
 }
+
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm)
+{
+	struct i915_vma *vma;
+
+	vma = i915_gem_obj_to_vma(obj, vm);
+	if (!vma)
+		vma = i915_gem_vma_create(obj, vm);
+
+	return vma;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 425939b..8787588 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -37,7 +37,7 @@ mark_free(struct i915_vma *vma, struct list_head *unwind)
 	if (vma->obj->pin_count)
 		return false;
 
-	list_add(&vma->obj->exec_list, unwind);
+	list_add(&vma->exec_list, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
 
@@ -49,7 +49,6 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct list_head eviction_list, unwind_list;
 	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
 	int ret = 0;
 
 	trace_i915_gem_evict(dev, min_size, alignment, mappable);
@@ -104,14 +103,13 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 none:
 	/* Nothing found, clean up and bail out! */
 	while (!list_empty(&unwind_list)) {
-		obj = list_first_entry(&unwind_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&unwind_list,
+				       struct i915_vma,
 				       exec_list);
-		vma = i915_gem_obj_to_vma(obj, vm);
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 
-		list_del_init(&obj->exec_list);
+		list_del_init(&vma->exec_list);
 	}
 
 	/* We expect the caller to unpin, evict all and try again, or give up.
@@ -125,28 +123,27 @@ found:
 	 * temporary list. */
 	INIT_LIST_HEAD(&eviction_list);
 	while (!list_empty(&unwind_list)) {
-		obj = list_first_entry(&unwind_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&unwind_list,
+				       struct i915_vma,
 				       exec_list);
-		vma = i915_gem_obj_to_vma(obj, vm);
 		if (drm_mm_scan_remove_block(&vma->node)) {
-			list_move(&obj->exec_list, &eviction_list);
-			drm_gem_object_reference(&obj->base);
+			list_move(&vma->exec_list, &eviction_list);
+			drm_gem_object_reference(&vma->obj->base);
 			continue;
 		}
-		list_del_init(&obj->exec_list);
+		list_del_init(&vma->exec_list);
 	}
 
 	/* Unbinding will emit any required flushes */
 	while (!list_empty(&eviction_list)) {
-		obj = list_first_entry(&eviction_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&eviction_list,
+				       struct i915_vma,
 				       exec_list);
 		if (ret == 0)
-			ret = i915_vma_unbind(i915_gem_obj_to_vma(obj, vm));
+			ret = i915_vma_unbind(vma);
 
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7dcf78c..3ca18bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -33,24 +33,24 @@
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
 
-struct eb_objects {
-	struct list_head objects;
+struct eb_vmas {
+	struct list_head vmas;
 	int and;
 	union {
-		struct drm_i915_gem_object *lut[0];
+		struct i915_vma *lut[0];
 		struct hlist_head buckets[0];
 	};
 };
 
-static struct eb_objects *
-eb_create(struct drm_i915_gem_execbuffer2 *args)
+static struct eb_vmas *
+eb_create(struct drm_i915_gem_execbuffer2 *args, struct i915_address_space *vm)
 {
-	struct eb_objects *eb = NULL;
+	struct eb_vmas *eb = NULL;
 
 	if (args->flags & I915_EXEC_HANDLE_LUT) {
 		int size = args->buffer_count;
-		size *= sizeof(struct drm_i915_gem_object *);
-		size += sizeof(struct eb_objects);
+		size *= sizeof(struct i915_vma *);
+		size += sizeof(struct eb_vmas);
 		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	}
 
@@ -61,7 +61,7 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
 		while (count > 2*size)
 			count >>= 1;
 		eb = kzalloc(count*sizeof(struct hlist_head) +
-			     sizeof(struct eb_objects),
+			     sizeof(struct eb_vmas),
 			     GFP_TEMPORARY);
 		if (eb == NULL)
 			return eb;
@@ -70,64 +70,97 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
 	} else
 		eb->and = -args->buffer_count;
 
-	INIT_LIST_HEAD(&eb->objects);
+	INIT_LIST_HEAD(&eb->vmas);
 	return eb;
 }
 
 static void
-eb_reset(struct eb_objects *eb)
+eb_reset(struct eb_vmas *eb)
 {
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static int
-eb_lookup_objects(struct eb_objects *eb,
-		  struct drm_i915_gem_exec_object2 *exec,
-		  const struct drm_i915_gem_execbuffer2 *args,
-		  struct drm_file *file)
+eb_lookup_vmas(struct eb_vmas *eb,
+	       struct drm_i915_gem_exec_object2 *exec,
+	       const struct drm_i915_gem_execbuffer2 *args,
+	       struct i915_address_space *vm,
+	       struct drm_file *file)
 {
-	int i;
+	struct drm_i915_gem_object *obj;
+	struct list_head objects;
+	int i, ret = 0;
 
+	INIT_LIST_HEAD(&objects);
 	spin_lock(&file->table_lock);
+	/* Grab a reference to the object and release the lock so we can lookup
+	 * or create the VMA without using GFP_ATOMIC */
 	for (i = 0; i < args->buffer_count; i++) {
-		struct drm_i915_gem_object *obj;
-
 		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
 		if (obj == NULL) {
 			spin_unlock(&file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
 				   exec[i].handle, i);
-			return -ENOENT;
+			ret = -ENOENT;
+			goto out;
 		}
 
-		if (!list_empty(&obj->exec_list)) {
+		if (!list_empty(&obj->obj_exec_link)) {
 			spin_unlock(&file->table_lock);
 			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
 				   obj, exec[i].handle, i);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 
 		drm_gem_object_reference(&obj->base);
-		list_add_tail(&obj->exec_list, &eb->objects);
+		list_add_tail(&obj->obj_exec_link, &objects);
+	}
+	spin_unlock(&file->table_lock);
 
-		obj->exec_entry = &exec[i];
+	i = 0;
+	list_for_each_entry(obj, &objects, obj_exec_link) {
+		struct i915_vma *vma;
+
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
+		if (IS_ERR(vma)) {
+			/* XXX: We don't need an error path fro vma because if
+			 * the vma was created just for this execbuf, object
+			 * unreference should kill it off.*/
+			DRM_DEBUG("Failed to lookup VMA\n");
+			ret = PTR_ERR(vma);
+			goto out;
+		}
+
+		list_add_tail(&vma->exec_list, &eb->vmas);
+
+		vma->exec_entry = &exec[i];
 		if (eb->and < 0) {
-			eb->lut[i] = obj;
+			eb->lut[i] = vma;
 		} else {
 			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
-			obj->exec_handle = handle;
-			hlist_add_head(&obj->exec_node,
+			vma->exec_handle = handle;
+			hlist_add_head(&vma->exec_node,
 				       &eb->buckets[handle & eb->and]);
 		}
+		++i;
 	}
-	spin_unlock(&file->table_lock);
 
-	return 0;
+
+out:
+	while (!list_empty(&objects)) {
+		obj = list_first_entry(&objects,
+				       struct drm_i915_gem_object,
+				       obj_exec_link);
+		list_del_init(&obj->obj_exec_link);
+		if (ret)
+			drm_gem_object_unreference(&obj->base);
+	}
+	return ret;
 }
 
-static struct drm_i915_gem_object *
-eb_get_object(struct eb_objects *eb, unsigned long handle)
+static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
 {
 	if (eb->and < 0) {
 		if (handle >= -eb->and)
@@ -139,27 +172,25 @@ eb_get_object(struct eb_objects *eb, unsigned long handle)
 
 		head = &eb->buckets[handle & eb->and];
 		hlist_for_each(node, head) {
-			struct drm_i915_gem_object *obj;
+			struct i915_vma *vma;
 
-			obj = hlist_entry(node, struct drm_i915_gem_object, exec_node);
-			if (obj->exec_handle == handle)
-				return obj;
+			vma = hlist_entry(node, struct i915_vma, exec_node);
+			if (vma->exec_handle == handle)
+				return vma;
 		}
 		return NULL;
 	}
 }
 
-static void
-eb_destroy(struct eb_objects *eb)
-{
-	while (!list_empty(&eb->objects)) {
-		struct drm_i915_gem_object *obj;
+static void eb_destroy(struct eb_vmas *eb) {
+	while (!list_empty(&eb->vmas)) {
+		struct i915_vma *vma;
 
-		obj = list_first_entry(&eb->objects,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&eb->vmas,
+				       struct i915_vma,
 				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 	kfree(eb);
 }
@@ -173,22 +204,24 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
-				   struct eb_objects *eb,
+				   struct eb_vmas *eb,
 				   struct drm_i915_gem_relocation_entry *reloc,
 				   struct i915_address_space *vm)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
 	struct drm_i915_gem_object *target_i915_obj;
+	struct i915_vma *target_vma;
 	uint32_t target_offset;
 	int ret = -EINVAL;
 
 	/* we've already hold a reference to all valid objects */
-	target_obj = &eb_get_object(eb, reloc->target_handle)->base;
-	if (unlikely(target_obj == NULL))
+	target_vma = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(target_vma == NULL))
 		return -ENOENT;
+	target_i915_obj = target_vma->obj;
+	target_obj = &target_vma->obj->base;
 
-	target_i915_obj = to_intel_bo(target_obj);
 	target_offset = i915_gem_obj_ggtt_offset(target_i915_obj);
 
 	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
@@ -297,14 +330,13 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
-				    struct eb_objects *eb,
-				    struct i915_address_space *vm)
+i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
+				 struct eb_vmas *eb)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	int remain, ret;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
@@ -323,8 +355,8 @@ i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(obj, eb, r,
-								 vm);
+			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r,
+								 vma->vm);
 			if (ret)
 				return ret;
 
@@ -345,17 +377,16 @@ i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
-					 struct eb_objects *eb,
-					 struct drm_i915_gem_relocation_entry *relocs,
-					 struct i915_address_space *vm)
+i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
+				      struct eb_vmas *eb,
+				      struct drm_i915_gem_relocation_entry *relocs)
 {
-	const struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	int i, ret;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(obj, eb, &relocs[i],
-							 vm);
+		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i],
+							 vma->vm);
 		if (ret)
 			return ret;
 	}
@@ -364,10 +395,10 @@ i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate(struct eb_objects *eb,
+i915_gem_execbuffer_relocate(struct eb_vmas *eb,
 			     struct i915_address_space *vm)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	int ret = 0;
 
 	/* This is the fast path and we cannot handle a pagefault whilst
@@ -378,8 +409,8 @@ i915_gem_execbuffer_relocate(struct eb_objects *eb,
 	 * lockdep complains vehemently.
 	 */
 	pagefault_disable();
-	list_for_each_entry(obj, &eb->objects, exec_list) {
-		ret = i915_gem_execbuffer_relocate_object(obj, eb, vm);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
@@ -392,31 +423,32 @@ i915_gem_execbuffer_relocate(struct eb_objects *eb,
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
 
 static int
-need_reloc_mappable(struct drm_i915_gem_object *obj)
+need_reloc_mappable(struct i915_vma *vma)
 {
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
-	return entry->relocation_count && !use_cpu_reloc(obj);
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	return entry->relocation_count && !use_cpu_reloc(vma->obj) &&
+		i915_is_ggtt(vma->vm);
 }
 
 static int
-i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
-				   struct intel_ring_buffer *ring,
-				   struct i915_address_space *vm,
-				   bool *need_reloc)
+i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
+				struct intel_ring_buffer *ring,
+				bool *need_reloc)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	bool need_fence, need_mappable;
+	struct drm_i915_gem_object *obj = vma->obj;
 	int ret;
 
 	need_fence =
 		has_fenced_gpu_access &&
 		entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 		obj->tiling_mode != I915_TILING_NONE;
-	need_mappable = need_fence || need_reloc_mappable(obj);
+	need_mappable = need_fence || need_reloc_mappable(vma);
 
-	ret = i915_gem_object_pin(obj, vm, entry->alignment, need_mappable,
+	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, need_mappable,
 				  false);
 	if (ret)
 		return ret;
@@ -444,8 +476,8 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 		obj->has_aliasing_ppgtt_mapping = 1;
 	}
 
-	if (entry->offset != i915_gem_obj_offset(obj, vm)) {
-		entry->offset = i915_gem_obj_offset(obj, vm);
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start;
 		*need_reloc = true;
 	}
 
@@ -462,14 +494,15 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 }
 
 static void
-i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
+i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry;
+	struct drm_i915_gem_object *obj = vma->obj;
 
-	if (!i915_gem_obj_bound_any(obj))
+	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
-	entry = obj->exec_entry;
+	entry = vma->exec_entry;
 
 	if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
 		i915_gem_object_unpin_fence(obj);
@@ -482,41 +515,40 @@ i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
 
 static int
 i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
-			    struct list_head *objects,
-			    struct i915_address_space *vm,
+			    struct list_head *vmas,
 			    bool *need_relocs)
 {
 	struct drm_i915_gem_object *obj;
-	struct list_head ordered_objects;
+	struct i915_vma *vma;
+	struct list_head ordered_vmas;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	INIT_LIST_HEAD(&ordered_objects);
-	while (!list_empty(objects)) {
+	INIT_LIST_HEAD(&ordered_vmas);
+	while (!list_empty(vmas)) {
 		struct drm_i915_gem_exec_object2 *entry;
 		bool need_fence, need_mappable;
 
-		obj = list_first_entry(objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		entry = obj->exec_entry;
+		vma = list_first_entry(vmas, struct i915_vma, exec_list);
+		obj = vma->obj;
+		entry = vma->exec_entry;
 
 		need_fence =
 			has_fenced_gpu_access &&
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 			obj->tiling_mode != I915_TILING_NONE;
-		need_mappable = need_fence || need_reloc_mappable(obj);
+		need_mappable = need_fence || need_reloc_mappable(vma);
 
 		if (need_mappable)
-			list_move(&obj->exec_list, &ordered_objects);
+			list_move(&vma->exec_list, &ordered_vmas);
 		else
-			list_move_tail(&obj->exec_list, &ordered_objects);
+			list_move_tail(&vma->exec_list, &ordered_vmas);
 
 		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
 		obj->base.pending_write_domain = 0;
 		obj->pending_fenced_gpu_access = false;
 	}
-	list_splice(&ordered_objects, objects);
+	list_splice(&ordered_vmas, vmas);
 
 	/* Attempt to pin all of the buffers into the GTT.
 	 * This is done in 3 phases:
@@ -535,47 +567,47 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 		int ret = 0;
 
 		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(obj, objects, exec_list) {
-			struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+		list_for_each_entry(vma, vmas, exec_list) {
+			struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 			bool need_fence, need_mappable;
-			u32 obj_offset;
 
-			if (!i915_gem_obj_bound(obj, vm))
+			obj = vma->obj;
+
+			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
-			obj_offset = i915_gem_obj_offset(obj, vm);
 			need_fence =
 				has_fenced_gpu_access &&
 				entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 				obj->tiling_mode != I915_TILING_NONE;
-			need_mappable = need_fence || need_reloc_mappable(obj);
+			need_mappable = need_fence || need_reloc_mappable(vma);
 
 			WARN_ON((need_mappable || need_fence) &&
-				!i915_is_ggtt(vm));
+			       !i915_is_ggtt(vma->vm));
 
 			if ((entry->alignment &&
-			     obj_offset & (entry->alignment - 1)) ||
+			     vma->node.start & (entry->alignment - 1)) ||
 			    (need_mappable && !obj->map_and_fenceable))
-				ret = i915_vma_unbind(i915_gem_obj_to_vma(obj, vm));
+				ret = i915_vma_unbind(vma);
 			else
-				ret = i915_gem_execbuffer_reserve_object(obj, ring, vm, need_relocs);
+				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
 
 		/* Bind fresh objects */
-		list_for_each_entry(obj, objects, exec_list) {
-			if (i915_gem_obj_bound(obj, vm))
+		list_for_each_entry(vma, vmas, exec_list) {
+			if (drm_mm_node_allocated(&vma->node))
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_object(obj, ring, vm, need_relocs);
+			ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
 
 err:		/* Decrement pin count for bound objects */
-		list_for_each_entry(obj, objects, exec_list)
-			i915_gem_execbuffer_unreserve_object(obj);
+		list_for_each_entry(vma, vmas, exec_list)
+			i915_gem_execbuffer_unreserve_vma(vma);
 
 		if (ret != -ENOSPC || retry++)
 			return ret;
@@ -591,24 +623,27 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 				  struct drm_i915_gem_execbuffer2 *args,
 				  struct drm_file *file,
 				  struct intel_ring_buffer *ring,
-				  struct eb_objects *eb,
-				  struct drm_i915_gem_exec_object2 *exec,
-				  struct i915_address_space *vm)
+				  struct eb_vmas *eb,
+				  struct drm_i915_gem_exec_object2 *exec)
 {
 	struct drm_i915_gem_relocation_entry *reloc;
-	struct drm_i915_gem_object *obj;
+	struct i915_address_space *vm;
+	struct i915_vma *vma;
 	bool need_relocs;
 	int *reloc_offset;
 	int i, total, ret;
 	int count = args->buffer_count;
 
+	if (WARN_ON(list_empty(&eb->vmas)))
+		return 0;
+
+	vm = list_first_entry(&eb->vmas, struct i915_vma, exec_list)->vm;
+
 	/* We may process another execbuffer during the unlock... */
-	while (!list_empty(&eb->objects)) {
-		obj = list_first_entry(&eb->objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+	while (!list_empty(&eb->vmas)) {
+		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -672,20 +707,19 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_objects(eb, exec, args, file);
+	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
 		goto err;
 
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->objects, vm, &need_relocs);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
 	if (ret)
 		goto err;
 
-	list_for_each_entry(obj, &eb->objects, exec_list) {
-		int offset = obj->exec_entry - exec;
-		ret = i915_gem_execbuffer_relocate_object_slow(obj, eb,
-							       reloc + reloc_offset[offset],
-							       vm);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		int offset = vma->exec_entry - exec;
+		ret = i915_gem_execbuffer_relocate_vma_slow(vma, eb,
+							    reloc + reloc_offset[offset]);
 		if (ret)
 			goto err;
 	}
@@ -704,14 +738,15 @@ err:
 
 static int
 i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
-				struct list_head *objects)
+				struct list_head *vmas)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
 	int ret;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		ret = i915_gem_object_sync(obj, ring);
 		if (ret)
 			return ret;
@@ -786,13 +821,13 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 }
 
 static void
-i915_gem_execbuffer_move_to_active(struct list_head *objects,
-				   struct i915_address_space *vm,
+i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_ring_buffer *ring)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
 
@@ -803,7 +838,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *objects,
 		obj->fenced_gpu_access = obj->pending_fenced_gpu_access;
 
 		/* FIXME: This lookup gets fixed later <-- danvet */
-		list_move_tail(&i915_gem_obj_to_vma(obj, vm)->mm_list, &vm->active_list);
+		list_move_tail(&vma->mm_list, &vma->vm->active_list);
 		i915_gem_object_move_to_active(obj, ring);
 		if (obj->base.write_domain) {
 			obj->dirty = 1;
@@ -862,7 +897,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct i915_address_space *vm)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct eb_objects *eb;
+	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
 	struct drm_clip_rect *cliprects = NULL;
 	struct intel_ring_buffer *ring;
@@ -1002,7 +1037,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
-	eb = eb_create(args);
+	eb = eb_create(args, vm);
 	if (eb == NULL) {
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
@@ -1010,18 +1045,16 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Look up object handles */
-	ret = eb_lookup_objects(eb, exec, args, file);
+	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	batch_obj = list_entry(eb->objects.prev,
-			       struct drm_i915_gem_object,
-			       exec_list);
+	batch_obj = list_entry(eb->vmas.prev, struct i915_vma, exec_list)->obj;
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->objects, vm, &need_relocs);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
 	if (ret)
 		goto err;
 
@@ -1031,7 +1064,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret) {
 		if (ret == -EFAULT) {
 			ret = i915_gem_execbuffer_relocate_slow(dev, args, file, ring,
-								eb, exec, vm);
+								eb, exec);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
 		if (ret)
@@ -1053,7 +1086,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (flags & I915_DISPATCH_SECURE && !batch_obj->has_global_gtt_mapping)
 		i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level);
 
-	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->objects);
+	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
 	if (ret)
 		goto err;
 
@@ -1108,7 +1141,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
 
-	i915_gem_execbuffer_move_to_active(&eb->objects, vm, ring);
+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
 
 err:
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas
  2013-08-14  1:09 ` [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas Ben Widawsky
@ 2013-08-14  1:11   ` Ben Widawsky
  2013-08-14  7:58     ` Chris Wilson
  2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
  1 sibling, 1 reply; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14  1:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Aug 13, 2013 at 06:09:09PM -0700, Ben Widawsky wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> In order to transition more of our code over to using a VMA instead of
> an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
> until now, we've only had a VMA when actually binding an object.
> 
> The previous patch helped handle the distinction on bound vs. unbound.
> This patch will help us catch leaks, and other issues before we actually
> shuffle a bunch of stuff around.
> 
> This attempts to convert all the execbuf code to speak in vmas. Since
> the execbuf code is very self contained it was a nice isolated
> conversion.
> 
> The meat of the code is about turning eb_objects into eb_vma, and then
> wiring up the rest of the code to use vmas instead of obj, vm pairs.
> 
> Unfortunately, to do this, we must move the exec_list link from the obj
> structure. This list is reused in the eviction code, so we must also
> modify the eviction code to make this work.
> 
> WARNING: This patch makes an already hotly profiled path slower. The cost is
> unavoidable. In reply to this mail, I will attach the extra data.
> 

[snip]

Here is the output from gem_exec_lut_handle both before and after this
patch. The results honestly don't make sense to me, but I'll set Chris
parse it before scratching my head harder.

Before patch
============
relocation: buffers=   1: old=   8060 + 165.3*reloc, lut=   7816 + 164.8*reloc (ns)
relocation: buffers=   2: old=   6748 + 166.6*reloc, lut=   6952 + 165.4*reloc (ns)
relocation: buffers=   4: old=   8140 + 165.9*reloc, lut=   8216 + 165.4*reloc (ns)
relocation: buffers=   8: old=  10732 + 166.0*reloc, lut=  10615 + 165.2*reloc (ns)
relocation: buffers=  16: old=  15099 + 167.8*reloc, lut=  15337 + 165.3*reloc (ns)
relocation: buffers=  32: old=  26140 + 166.0*reloc, lut=  25488 + 165.5*reloc (ns)
relocation: buffers=  64: old=  46300 + 170.5*reloc, lut=  44279 + 166.7*reloc (ns)
relocation: buffers= 128: old=  84056 + 176.9*reloc, lut=  85379 + 166.3*reloc (ns)
relocation: buffers= 256: old= 174398 + 167.9*reloc, lut= 167744 + 167.0*reloc (ns)
relocation: buffers= 512: old= 349688 + 175.7*reloc, lut= 348590 + 170.8*reloc (ns)
relocation: buffers=1024: old= 726265 + 191.2*reloc, lut= 719774 + 180.2*reloc (ns)
relocation: buffers=2048: old=1456866 + 224.3*reloc, lut=1442087 + 173.0*reloc (ns)
skip-relocs: buffers=   1: old=   4445 + 16.0*reloc, lut=   4433 + 15.6*reloc (ns)
skip-relocs: buffers=   2: old=   4585 + 16.0*reloc, lut=   4571 + 15.6*reloc (ns)
skip-relocs: buffers=   4: old=   5667 + 16.0*reloc, lut=   5340 + 15.6*reloc (ns)
skip-relocs: buffers=   8: old=   6051 + 16.1*reloc, lut=   6026 + 15.6*reloc (ns)
skip-relocs: buffers=  16: old=   7953 + 16.1*reloc, lut=   7914 + 15.6*reloc (ns)
skip-relocs: buffers=  32: old=  11972 + 16.2*reloc, lut=  11875 + 15.7*reloc (ns)
skip-relocs: buffers=  64: old=  19999 + 16.5*reloc, lut=  19832 + 15.7*reloc (ns)
skip-relocs: buffers= 128: old=  37796 + 16.9*reloc, lut=  36539 + 15.9*reloc (ns)
skip-relocs: buffers= 256: old=  71604 + 18.1*reloc, lut=  71313 + 16.5*reloc (ns)
skip-relocs: buffers= 512: old= 152682 + 24.3*reloc, lut= 141379 + 27.9*reloc (ns)
skip-relocs: buffers=1024: old= 314116 + 41.7*reloc, lut= 303019 + 20.1*reloc (ns)
skip-relocs: buffers=2048: old= 619784 + 54.1*reloc, lut= 603931 + 20.0*reloc (ns)
no-relocs: buffers=   1: old=   4194 + 5.1*reloc, lut=   4206 + 4.8*reloc (ns)
no-relocs: buffers=   2: old=   4404 + 5.1*reloc, lut=   4381 + 4.8*reloc (ns)
no-relocs: buffers=   4: old=   4926 + 5.1*reloc, lut=   4921 + 4.8*reloc (ns)
no-relocs: buffers=   8: old=   5901 + 5.1*reloc, lut=   5822 + 4.9*reloc (ns)
no-relocs: buffers=  16: old=   7840 + 5.1*reloc, lut=   7737 + 4.9*reloc (ns)
no-relocs: buffers=  32: old=  11842 + 5.1*reloc, lut=  11681 + 4.9*reloc (ns)
no-relocs: buffers=  64: old=  19741 + 5.1*reloc, lut=  19542 + 4.8*reloc (ns)
no-relocs: buffers= 128: old=  36479 + 5.2*reloc, lut=  35958 + 4.9*reloc (ns)
no-relocs: buffers= 256: old=  70171 + 5.4*reloc, lut=  69390 + 5.2*reloc (ns)
no-relocs: buffers= 512: old= 147213 + 3.5*reloc, lut= 137953 + 13.0*reloc (ns)
no-relocs: buffers=1024: old= 300165 + 4.8*reloc, lut= 293852 + 4.9*reloc (ns)
no-relocs: buffers=2048: old= 597992 + 8.3*reloc, lut= 590185 + 2.1*reloc (ns)


After patch
===========
relocation: buffers=   1: old=   8075 + 81.4*reloc, lut=   7592 + 80.6*reloc (ns)
relocation: buffers=   2: old=   5744 + 82.3*reloc, lut=   5837 + 81.1*reloc (ns)
relocation: buffers=   4: old=   4875 + 82.7*reloc, lut=   4871 + 81.6*reloc (ns)
relocation: buffers=   8: old=   5729 + 82.7*reloc, lut=   5698 + 81.5*reloc (ns)
relocation: buffers=  16: old=   7952 + 83.0*reloc, lut=   7809 + 81.9*reloc (ns)
relocation: buffers=  32: old=  11884 + 82.9*reloc, lut=  11702 + 81.6*reloc (ns)
relocation: buffers=  64: old=  20388 + 83.4*reloc, lut=  19995 + 82.2*reloc (ns)
relocation: buffers= 128: old=  38057 + 85.0*reloc, lut=  37675 + 83.4*reloc (ns)
relocation: buffers= 256: old=  74912 + 87.0*reloc, lut=  74064 + 85.4*reloc (ns)
relocation: buffers= 512: old= 161136 + 94.8*reloc, lut= 157046 + 87.5*reloc (ns)
relocation: buffers=1024: old= 349443 + 107.0*reloc, lut= 342081 + 91.2*reloc (ns)
relocation: buffers=2048: old= 707951 + 131.8*reloc, lut= 690754 + 96.9*reloc (ns)
skip-relocs: buffers=   1: old=   2966 + 16.6*reloc, lut=   2963 + 15.6*reloc (ns)
skip-relocs: buffers=   2: old=   3083 + 16.5*reloc, lut=   3056 + 15.5*reloc (ns)
skip-relocs: buffers=   4: old=   3279 + 16.6*reloc, lut=   3242 + 15.6*reloc (ns)
skip-relocs: buffers=   8: old=   3692 + 16.7*reloc, lut=   3654 + 15.6*reloc (ns)
skip-relocs: buffers=  16: old=   4522 + 16.7*reloc, lut=   4461 + 15.5*reloc (ns)
skip-relocs: buffers=  32: old=   6254 + 16.7*reloc, lut=   6138 + 15.7*reloc (ns)
skip-relocs: buffers=  64: old=  10098 + 16.8*reloc, lut=   9939 + 15.7*reloc (ns)
skip-relocs: buffers= 128: old=  17983 + 17.6*reloc, lut=  17729 + 16.3*reloc (ns)
skip-relocs: buffers= 256: old=  34388 + 18.8*reloc, lut=  33981 + 17.6*reloc (ns)
skip-relocs: buffers= 512: old=  74211 + 25.2*reloc, lut=  72185 + 18.6*reloc (ns)
skip-relocs: buffers=1024: old= 160514 + 34.1*reloc, lut= 157086 + 20.3*reloc (ns)
skip-relocs: buffers=2048: old= 323954 + 51.5*reloc, lut= 315928 + 22.5*reloc (ns)
no-relocs: buffers=   1: old=   2840 + 5.1*reloc, lut=   2834 + 4.8*reloc (ns)
no-relocs: buffers=   2: old=   2938 + 5.1*reloc, lut=   2917 + 4.8*reloc (ns)
no-relocs: buffers=   4: old=   3220 + 5.1*reloc, lut=   3201 + 4.8*reloc (ns)
no-relocs: buffers=   8: old=   3614 + 5.1*reloc, lut=   3545 + 4.8*reloc (ns)
no-relocs: buffers=  16: old=   4437 + 5.1*reloc, lut=   4368 + 4.8*reloc (ns)
no-relocs: buffers=  32: old=   6105 + 5.1*reloc, lut=   6024 + 4.9*reloc (ns)
no-relocs: buffers=  64: old=   9864 + 5.1*reloc, lut=   9652 + 4.9*reloc (ns)
no-relocs: buffers= 128: old=  17388 + 5.1*reloc, lut=  17126 + 4.9*reloc (ns)
no-relocs: buffers= 256: old=  33087 + 5.4*reloc, lut=  32668 + 5.3*reloc (ns)
no-relocs: buffers= 512: old=  71476 + 5.0*reloc, lut=  69464 + 4.9*reloc (ns)
no-relocs: buffers=1024: old= 154379 + 4.9*reloc, lut= 152796 + 4.3*reloc (ns)
no-relocs: buffers=2048: old= 309435 + 5.0*reloc, lut= 301095 + 4.9*reloc (ns)


-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas
  2013-08-14  1:11   ` Ben Widawsky
@ 2013-08-14  7:58     ` Chris Wilson
       [not found]       ` <20130814224358.GA21854@nuc-i3427.alporthouse.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Chris Wilson @ 2013-08-14  7:58 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx

On Tue, Aug 13, 2013 at 06:11:59PM -0700, Ben Widawsky wrote:
> On Tue, Aug 13, 2013 at 06:09:09PM -0700, Ben Widawsky wrote:
> > From: Ben Widawsky <ben@bwidawsk.net>
> > 
> > In order to transition more of our code over to using a VMA instead of
> > an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
> > until now, we've only had a VMA when actually binding an object.
> > 
> > The previous patch helped handle the distinction on bound vs. unbound.
> > This patch will help us catch leaks, and other issues before we actually
> > shuffle a bunch of stuff around.
> > 
> > This attempts to convert all the execbuf code to speak in vmas. Since
> > the execbuf code is very self contained it was a nice isolated
> > conversion.
> > 
> > The meat of the code is about turning eb_objects into eb_vma, and then
> > wiring up the rest of the code to use vmas instead of obj, vm pairs.
> > 
> > Unfortunately, to do this, we must move the exec_list link from the obj
> > structure. This list is reused in the eviction code, so we must also
> > modify the eviction code to make this work.
> > 
> > WARNING: This patch makes an already hotly profiled path slower. The cost is
> > unavoidable. In reply to this mail, I will attach the extra data.
> > 
> 
> [snip]
> 
> Here is the output from gem_exec_lut_handle both before and after this
> patch. The results honestly don't make sense to me, but I'll set Chris
> parse it before scratching my head harder.
> 
> Before patch
> ============
> relocation: buffers=   1: old=   8060 + 165.3*reloc, lut=   7816 + 164.8*reloc (ns)
> relocation: buffers=   2: old=   6748 + 166.6*reloc, lut=   6952 + 165.4*reloc (ns)
> relocation: buffers=   4: old=   8140 + 165.9*reloc, lut=   8216 + 165.4*reloc (ns)
> relocation: buffers=   8: old=  10732 + 166.0*reloc, lut=  10615 + 165.2*reloc (ns)
> relocation: buffers=  16: old=  15099 + 167.8*reloc, lut=  15337 + 165.3*reloc (ns)
> relocation: buffers=  32: old=  26140 + 166.0*reloc, lut=  25488 + 165.5*reloc (ns)
> relocation: buffers=  64: old=  46300 + 170.5*reloc, lut=  44279 + 166.7*reloc (ns)
> relocation: buffers= 128: old=  84056 + 176.9*reloc, lut=  85379 + 166.3*reloc (ns)
> relocation: buffers= 256: old= 174398 + 167.9*reloc, lut= 167744 + 167.0*reloc (ns)
> relocation: buffers= 512: old= 349688 + 175.7*reloc, lut= 348590 + 170.8*reloc (ns)
> relocation: buffers=1024: old= 726265 + 191.2*reloc, lut= 719774 + 180.2*reloc (ns)
> relocation: buffers=2048: old=1456866 + 224.3*reloc, lut=1442087 + 173.0*reloc (ns)
> skip-relocs: buffers=   1: old=   4445 + 16.0*reloc, lut=   4433 + 15.6*reloc (ns)
> skip-relocs: buffers=   2: old=   4585 + 16.0*reloc, lut=   4571 + 15.6*reloc (ns)
> skip-relocs: buffers=   4: old=   5667 + 16.0*reloc, lut=   5340 + 15.6*reloc (ns)
> skip-relocs: buffers=   8: old=   6051 + 16.1*reloc, lut=   6026 + 15.6*reloc (ns)
> skip-relocs: buffers=  16: old=   7953 + 16.1*reloc, lut=   7914 + 15.6*reloc (ns)
> skip-relocs: buffers=  32: old=  11972 + 16.2*reloc, lut=  11875 + 15.7*reloc (ns)
> skip-relocs: buffers=  64: old=  19999 + 16.5*reloc, lut=  19832 + 15.7*reloc (ns)
> skip-relocs: buffers= 128: old=  37796 + 16.9*reloc, lut=  36539 + 15.9*reloc (ns)
> skip-relocs: buffers= 256: old=  71604 + 18.1*reloc, lut=  71313 + 16.5*reloc (ns)
> skip-relocs: buffers= 512: old= 152682 + 24.3*reloc, lut= 141379 + 27.9*reloc (ns)
> skip-relocs: buffers=1024: old= 314116 + 41.7*reloc, lut= 303019 + 20.1*reloc (ns)
> skip-relocs: buffers=2048: old= 619784 + 54.1*reloc, lut= 603931 + 20.0*reloc (ns)
> no-relocs: buffers=   1: old=   4194 + 5.1*reloc, lut=   4206 + 4.8*reloc (ns)
> no-relocs: buffers=   2: old=   4404 + 5.1*reloc, lut=   4381 + 4.8*reloc (ns)
> no-relocs: buffers=   4: old=   4926 + 5.1*reloc, lut=   4921 + 4.8*reloc (ns)
> no-relocs: buffers=   8: old=   5901 + 5.1*reloc, lut=   5822 + 4.9*reloc (ns)
> no-relocs: buffers=  16: old=   7840 + 5.1*reloc, lut=   7737 + 4.9*reloc (ns)
> no-relocs: buffers=  32: old=  11842 + 5.1*reloc, lut=  11681 + 4.9*reloc (ns)
> no-relocs: buffers=  64: old=  19741 + 5.1*reloc, lut=  19542 + 4.8*reloc (ns)
> no-relocs: buffers= 128: old=  36479 + 5.2*reloc, lut=  35958 + 4.9*reloc (ns)
> no-relocs: buffers= 256: old=  70171 + 5.4*reloc, lut=  69390 + 5.2*reloc (ns)
> no-relocs: buffers= 512: old= 147213 + 3.5*reloc, lut= 137953 + 13.0*reloc (ns)
> no-relocs: buffers=1024: old= 300165 + 4.8*reloc, lut= 293852 + 4.9*reloc (ns)
> no-relocs: buffers=2048: old= 597992 + 8.3*reloc, lut= 590185 + 2.1*reloc (ns)
> 
> 
> After patch
> ===========
> relocation: buffers=   1: old=   8075 + 81.4*reloc, lut=   7592 + 80.6*reloc (ns)
> relocation: buffers=   2: old=   5744 + 82.3*reloc, lut=   5837 + 81.1*reloc (ns)
> relocation: buffers=   4: old=   4875 + 82.7*reloc, lut=   4871 + 81.6*reloc (ns)
> relocation: buffers=   8: old=   5729 + 82.7*reloc, lut=   5698 + 81.5*reloc (ns)
> relocation: buffers=  16: old=   7952 + 83.0*reloc, lut=   7809 + 81.9*reloc (ns)
> relocation: buffers=  32: old=  11884 + 82.9*reloc, lut=  11702 + 81.6*reloc (ns)
> relocation: buffers=  64: old=  20388 + 83.4*reloc, lut=  19995 + 82.2*reloc (ns)
> relocation: buffers= 128: old=  38057 + 85.0*reloc, lut=  37675 + 83.4*reloc (ns)
> relocation: buffers= 256: old=  74912 + 87.0*reloc, lut=  74064 + 85.4*reloc (ns)
> relocation: buffers= 512: old= 161136 + 94.8*reloc, lut= 157046 + 87.5*reloc (ns)
> relocation: buffers=1024: old= 349443 + 107.0*reloc, lut= 342081 + 91.2*reloc (ns)
> relocation: buffers=2048: old= 707951 + 131.8*reloc, lut= 690754 + 96.9*reloc (ns)
> skip-relocs: buffers=   1: old=   2966 + 16.6*reloc, lut=   2963 + 15.6*reloc (ns)
> skip-relocs: buffers=   2: old=   3083 + 16.5*reloc, lut=   3056 + 15.5*reloc (ns)
> skip-relocs: buffers=   4: old=   3279 + 16.6*reloc, lut=   3242 + 15.6*reloc (ns)
> skip-relocs: buffers=   8: old=   3692 + 16.7*reloc, lut=   3654 + 15.6*reloc (ns)
> skip-relocs: buffers=  16: old=   4522 + 16.7*reloc, lut=   4461 + 15.5*reloc (ns)
> skip-relocs: buffers=  32: old=   6254 + 16.7*reloc, lut=   6138 + 15.7*reloc (ns)
> skip-relocs: buffers=  64: old=  10098 + 16.8*reloc, lut=   9939 + 15.7*reloc (ns)
> skip-relocs: buffers= 128: old=  17983 + 17.6*reloc, lut=  17729 + 16.3*reloc (ns)
> skip-relocs: buffers= 256: old=  34388 + 18.8*reloc, lut=  33981 + 17.6*reloc (ns)
> skip-relocs: buffers= 512: old=  74211 + 25.2*reloc, lut=  72185 + 18.6*reloc (ns)
> skip-relocs: buffers=1024: old= 160514 + 34.1*reloc, lut= 157086 + 20.3*reloc (ns)
> skip-relocs: buffers=2048: old= 323954 + 51.5*reloc, lut= 315928 + 22.5*reloc (ns)
> no-relocs: buffers=   1: old=   2840 + 5.1*reloc, lut=   2834 + 4.8*reloc (ns)
> no-relocs: buffers=   2: old=   2938 + 5.1*reloc, lut=   2917 + 4.8*reloc (ns)
> no-relocs: buffers=   4: old=   3220 + 5.1*reloc, lut=   3201 + 4.8*reloc (ns)
> no-relocs: buffers=   8: old=   3614 + 5.1*reloc, lut=   3545 + 4.8*reloc (ns)
> no-relocs: buffers=  16: old=   4437 + 5.1*reloc, lut=   4368 + 4.8*reloc (ns)
> no-relocs: buffers=  32: old=   6105 + 5.1*reloc, lut=   6024 + 4.9*reloc (ns)
> no-relocs: buffers=  64: old=   9864 + 5.1*reloc, lut=   9652 + 4.9*reloc (ns)
> no-relocs: buffers= 128: old=  17388 + 5.1*reloc, lut=  17126 + 4.9*reloc (ns)
> no-relocs: buffers= 256: old=  33087 + 5.4*reloc, lut=  32668 + 5.3*reloc (ns)
> no-relocs: buffers= 512: old=  71476 + 5.0*reloc, lut=  69464 + 4.9*reloc (ns)
> no-relocs: buffers=1024: old= 154379 + 4.9*reloc, lut= 152796 + 4.3*reloc (ns)
> no-relocs: buffers=2048: old= 309435 + 5.0*reloc, lut= 301095 + 4.9*reloc (ns)

Hmm, either the patch really did make things faster or the results are
being subjected to cpufreq.

relocation: do the full relocation processing
skip-relocs: all the relocation addresses are correct, so no rewrite
no-relocs: no buffers moved, no relocation processing

"Buffers" is the number of buffers passed into execbuffer, and so
we measure the cost of trying to process a certain complexity of a batch.
Then we do a least-squares plot through a number of batches with varying
numbers of relocations to estimate the overhead of processing the
execbuffer array versus the cost of each addition relocation in the
batch. The first number (the x_0 intercept in nanoseconds) should be the
measure of how long it takes to grab all the buffers into our local
structures - this is what I was expecting to be worst hit by the vma
patches.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] [v2] drm/i915: Remove node only when allocated
  2013-08-14  1:09 [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Ben Widawsky
                   ` (2 preceding siblings ...)
  2013-08-14  1:09 ` [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas Ben Widawsky
@ 2013-08-14  8:06 ` Daniel Vetter
  2013-08-14  8:15   ` Daniel Vetter
  2013-08-14  8:19 ` Chris Wilson
  4 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  8:06 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Ben Widawsky, Paulo Zanoni

On Tue, Aug 13, 2013 at 06:09:06PM -0700, Ben Widawsky wrote:
> VMAs can be created and not bound. One may think of it as lazy cleanup,
> and safely gloss over the conditions which manufacture it. In either
> case, when the object backing the i915 vma is destroyed, we must cleanup
> the vma without stumbling into a bunch of pitfalls that assume the vma
> is bound.
> 
> NOTE: I was pretty certain the above condition could only happen when we
> introduced the use of VMAs being looked up at execbuf, and already
> existing. Paulo has hit this though, so I must be missing something. As
> I believe the patch is correct anyway, therefore I won't scratch my head
> too hard.

If we end up calling evict_everything from i915_gem_object_bind_to_vm then
we'll hit this. One more reason for a testcase here ;-) I'll amend the
commit message and merge this. I've also applied a tiny bikeshed I've
created while reviewing existing vma_create/destroy callsites.
-Daniel

> 
> v2: use goto destroy as a compromise (Chris)
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3d9e248b..4a58ead 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2606,6 +2606,9 @@ int i915_vma_unbind(struct i915_vma *vma)
>  	if (list_empty(&vma->vma_link))
>  		return 0;
>  
> +	if (!drm_mm_node_allocated(&vma->node))
> +		goto destroy;
> +
>  	if (obj->pin_count)
>  		return -EBUSY;
>  
> @@ -2643,6 +2646,8 @@ int i915_vma_unbind(struct i915_vma *vma)
>  		obj->map_and_fenceable = true;
>  
>  	drm_mm_remove_node(&vma->node);
> +
> +destroy:
>  	i915_gem_vma_destroy(vma);
>  
>  	/* Since the unbound list is global, only move to that list if
> -- 
> 1.8.3.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] [v2] drm/i915: Remove node only when allocated
  2013-08-14  8:06 ` [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Daniel Vetter
@ 2013-08-14  8:15   ` Daniel Vetter
  2013-08-15 14:05     ` Daniel Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  8:15 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Ben Widawsky, Paulo Zanoni

On Wed, Aug 14, 2013 at 10:06:30AM +0200, Daniel Vetter wrote:
> On Tue, Aug 13, 2013 at 06:09:06PM -0700, Ben Widawsky wrote:
> > VMAs can be created and not bound. One may think of it as lazy cleanup,
> > and safely gloss over the conditions which manufacture it. In either
> > case, when the object backing the i915 vma is destroyed, we must cleanup
> > the vma without stumbling into a bunch of pitfalls that assume the vma
> > is bound.
> > 
> > NOTE: I was pretty certain the above condition could only happen when we
> > introduced the use of VMAs being looked up at execbuf, and already
> > existing. Paulo has hit this though, so I must be missing something. As
> > I believe the patch is correct anyway, therefore I won't scratch my head
> > too hard.
> 
> If we end up calling evict_everything from i915_gem_object_bind_to_vm then
> we'll hit this. One more reason for a testcase here ;-) I'll amend the
> commit message and merge this. I've also applied a tiny bikeshed I've
> created while reviewing existing vma_create/destroy callsites.

Actually evict_everything isn't in the callpath, and there's no memory
allocation where the shrinker might play havoc. Furthermore the pages are
pinned so the shrinker shouldn't be able to sneak in at all. This is a bit
unsettling, I need to think more about this.

I'll wait with merging this for now.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind
  2013-08-14  1:09 ` [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind Ben Widawsky
@ 2013-08-14  8:18   ` Daniel Vetter
  2013-08-14 17:27     ` Ben Widawsky
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  8:18 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Ben Widawsky

On Tue, Aug 13, 2013 at 06:09:07PM -0700, Ben Widawsky wrote:
> Cleanup the map and fenceable setting during bind to make more sense,
> and not check i915_is_ggtt() 2 unnecessary times
> 
> v2: Move the bools into the if block (Chris) - There are ways to tidy
> this function (fence calculations for instance) even further, but they
> are quite invasive, so I am punting on those unless specifically asked.
> 
> v3: Add newline between variable declaration and logic (Chris)
> 
> Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4a58ead..01cc016 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3106,7 +3106,6 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>  	struct drm_device *dev = obj->base.dev;
>  	drm_i915_private_t *dev_priv = dev->dev_private;
>  	u32 size, fence_size, fence_alignment, unfenced_alignment;
> -	bool mappable, fenceable;
>  	size_t gtt_max =
>  		map_and_fenceable ? dev_priv->gtt.mappable_end : vm->total;
>  	struct i915_vma *vma;
> @@ -3191,18 +3190,18 @@ search_free:
>  	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
>  	list_add_tail(&vma->mm_list, &vm->inactive_list);
>  
> -	fenceable =
> -		i915_is_ggtt(vm) &&
> -		i915_gem_obj_ggtt_size(obj) == fence_size &&
> -		(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
> +	if (i915_is_ggtt(vm)) {
> +		bool mappable, fenceable;
>  
> -	mappable =
> -		i915_is_ggtt(vm) &&
> -		vma->node.start + obj->base.size <= dev_priv->gtt.mappable_end;
> +		fenceable =
> +			i915_gem_obj_ggtt_size(obj) == fence_size &&
> +			(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;

Why not go the full mounty and also use vma->node here? Would also make
checkpatch a bit more happy. I'll do a follow-up commit.
-Daniel

> +
> +		mappable =
> +			vma->node.start + obj->base.size <= dev_priv->gtt.mappable_end;
>  
> -	/* Map and fenceable only changes if the VM is the global GGTT */
> -	if (i915_is_ggtt(vm))
>  		obj->map_and_fenceable = mappable && fenceable;
> +	}
>  
>  	WARN_ON(map_and_fenceable && !obj->map_and_fenceable);
>  
> -- 
> 1.8.3.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] [v2] drm/i915: Remove node only when allocated
  2013-08-14  1:09 [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Ben Widawsky
                   ` (3 preceding siblings ...)
  2013-08-14  8:06 ` [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Daniel Vetter
@ 2013-08-14  8:19 ` Chris Wilson
  4 siblings, 0 replies; 29+ messages in thread
From: Chris Wilson @ 2013-08-14  8:19 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Paulo Zanoni, Ben Widawsky

On Tue, Aug 13, 2013 at 06:09:06PM -0700, Ben Widawsky wrote:
> VMAs can be created and not bound. One may think of it as lazy cleanup,
> and safely gloss over the conditions which manufacture it. In either
> case, when the object backing the i915 vma is destroyed, we must cleanup
> the vma without stumbling into a bunch of pitfalls that assume the vma
> is bound.
> 
> NOTE: I was pretty certain the above condition could only happen when we
> introduced the use of VMAs being looked up at execbuf, and already
> existing. Paulo has hit this though, so I must be missing something. As
> I believe the patch is correct anyway, therefore I won't scratch my head
> too hard.
> 
> v2: use goto destroy as a compromise (Chris)
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm: WARN when removing unallocated node
  2013-08-14  1:09 ` [PATCH 3/4] drm: WARN when removing unallocated node Ben Widawsky
@ 2013-08-14  8:52   ` Daniel Vetter
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  8:52 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Dave Airlie, intel-gfx, Ben Widawsky, dri-devel

On Tue, Aug 13, 2013 at 06:09:08PM -0700, Ben Widawsky wrote:
> The conditional is usually a recoverable driver bug, and so WARNing, and
> preventing the drm_mm code from doing potential damage (BUG) is
> desirable.
> 
> This issue was hit and fixed twice while developing the i915 multiple
> address space code. The first fix is the patch just before this, and is
> hit on an not frequently occuring error path. Another was fixed during
> patch iteration, so it's hard to see from the patch:
> 
> commit c6cfb325677ea6305fb19acf3a4d14ea267f923e
> Author: Ben Widawsky <ben@bwidawsk.net>
> Date:   Fri Jul 5 14:41:06 2013 -0700
> 
>     drm/i915: Embed drm_mm_node in i915 gem obj
> 
> From the intel-gfx mailing list, we discussed this:
> References: <20130705191235.GA3057@bwidawsk.net>
> 
> Cc: Dave Airlie <airlied@redhat.com>
> CC: <dri-devel@lists.freedesktop.org>
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Patches 2&3 of this series are merged to dinq, thanks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Split up execbuf vma conversion
  2013-08-14  1:09 ` [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas Ben Widawsky
  2013-08-14  1:11   ` Ben Widawsky
@ 2013-08-14  9:38   ` Daniel Vetter
  2013-08-14  9:38     ` [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link Daniel Vetter
                       ` (3 more replies)
  1 sibling, 4 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:38 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Ben Widawsky

Hi Ben,

For my own review benefit I've slashed your patch into pieces and split out 3
prep patches. Overall diff is unchanged with the exception of a now stale FIXME
comment that I've killed. Still need to do the in-depth review of the last
patch, but looks good thus far.

Cheers, Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link
  2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
@ 2013-08-14  9:38     ` Daniel Vetter
  2013-08-14  9:42       ` Daniel Vetter
  2013-08-14  9:38     ` [PATCH 2/4] drm/i915: Switch eviction code to use vmas Daniel Vetter
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:38 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

From: Ben Widawsky <ben@bwidawsk.net>

To convert the execbuf code over to use vmas natively we need to
shuffle the exec_list a bit. This patch here just prepares things with
the debugfs code, which also uses the old exec_list list_head, newly
called obj_exec_link.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 12 ++++++------
 drivers/gpu/drm/i915/i915_drv.h     |  2 ++
 drivers/gpu/drm/i915/i915_gem.c     |  1 +
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index eb87865..4785d8c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -195,9 +195,9 @@ static int obj_rank_by_stolen(void *priv,
 			      struct list_head *A, struct list_head *B)
 {
 	struct drm_i915_gem_object *a =
-		container_of(A, struct drm_i915_gem_object, exec_list);
+		container_of(A, struct drm_i915_gem_object, obj_exec_link);
 	struct drm_i915_gem_object *b =
-		container_of(B, struct drm_i915_gem_object, exec_list);
+		container_of(B, struct drm_i915_gem_object, obj_exec_link);
 
 	return a->stolen->start - b->stolen->start;
 }
@@ -221,7 +221,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->exec_list, &stolen);
+		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		total_gtt_size += i915_gem_obj_ggtt_size(obj);
@@ -231,7 +231,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->exec_list, &stolen);
+		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		count++;
@@ -239,11 +239,11 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	list_sort(NULL, &stolen, obj_rank_by_stolen);
 	seq_puts(m, "Stolen:\n");
 	while (!list_empty(&stolen)) {
-		obj = list_first_entry(&stolen, typeof(*obj), exec_list);
+		obj = list_first_entry(&stolen, typeof(*obj), obj_exec_link);
 		seq_puts(m, "   ");
 		describe_obj(m, obj);
 		seq_putc(m, '\n');
-		list_del_init(&obj->exec_list);
+		list_del_init(&obj->obj_exec_link);
 	}
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2e7d5f9..6532d97 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1312,6 +1312,8 @@ struct drm_i915_gem_object {
 	struct list_head global_list;
 
 	struct list_head ring_list;
+	/** Used in execbuf to temporarily hold a ref */
+	struct list_head obj_exec_link;
 	/** This object's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fd48724..219e2fb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3993,6 +3993,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->global_list);
 	INIT_LIST_HEAD(&obj->ring_list);
 	INIT_LIST_HEAD(&obj->exec_list);
+	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 
 	obj->ops = ops;
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/4] drm/i915: Switch eviction code to use vmas
  2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
  2013-08-14  9:38     ` [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link Daniel Vetter
@ 2013-08-14  9:38     ` Daniel Vetter
  2013-08-14  9:38     ` [PATCH 3/4] drm/i915: prepare bind_to_vm for preallocated vma Daniel Vetter
  2013-08-14  9:38     ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Daniel Vetter
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:38 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

From: Ben Widawsky <ben@bwidawsk.net>

The execbuf wants to do relocations usings vmas, so we need a
vma->exec_list. The eviction code also uses the old obj execbuf list
for it's own book-keeping, but would really prefer to deal in vmas
only. So switch it over to the new list.

Again this is just a prep patch for the big execbuf vma conversion.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf vma patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h       |  4 ++++
 drivers/gpu/drm/i915/i915_gem.c       |  1 +
 drivers/gpu/drm/i915/i915_gem_evict.c | 31 ++++++++++++++-----------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6532d97..b2b9836 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -563,6 +563,10 @@ struct i915_vma {
 	struct list_head mm_list;
 
 	struct list_head vma_link; /* Link in the object's VMA list */
+
+	/** This vma's place in the batchbuffer or on the eviction list */
+	struct list_head exec_list;
+
 };
 
 struct i915_ctx_hang_stats {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 219e2fb..122c6b0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4133,6 +4133,7 @@ struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vma_link);
 	INIT_LIST_HEAD(&vma->mm_list);
+	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 425939b..8787588 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -37,7 +37,7 @@ mark_free(struct i915_vma *vma, struct list_head *unwind)
 	if (vma->obj->pin_count)
 		return false;
 
-	list_add(&vma->obj->exec_list, unwind);
+	list_add(&vma->exec_list, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
 
@@ -49,7 +49,6 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct list_head eviction_list, unwind_list;
 	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
 	int ret = 0;
 
 	trace_i915_gem_evict(dev, min_size, alignment, mappable);
@@ -104,14 +103,13 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 none:
 	/* Nothing found, clean up and bail out! */
 	while (!list_empty(&unwind_list)) {
-		obj = list_first_entry(&unwind_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&unwind_list,
+				       struct i915_vma,
 				       exec_list);
-		vma = i915_gem_obj_to_vma(obj, vm);
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 
-		list_del_init(&obj->exec_list);
+		list_del_init(&vma->exec_list);
 	}
 
 	/* We expect the caller to unpin, evict all and try again, or give up.
@@ -125,28 +123,27 @@ found:
 	 * temporary list. */
 	INIT_LIST_HEAD(&eviction_list);
 	while (!list_empty(&unwind_list)) {
-		obj = list_first_entry(&unwind_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&unwind_list,
+				       struct i915_vma,
 				       exec_list);
-		vma = i915_gem_obj_to_vma(obj, vm);
 		if (drm_mm_scan_remove_block(&vma->node)) {
-			list_move(&obj->exec_list, &eviction_list);
-			drm_gem_object_reference(&obj->base);
+			list_move(&vma->exec_list, &eviction_list);
+			drm_gem_object_reference(&vma->obj->base);
 			continue;
 		}
-		list_del_init(&obj->exec_list);
+		list_del_init(&vma->exec_list);
 	}
 
 	/* Unbinding will emit any required flushes */
 	while (!list_empty(&eviction_list)) {
-		obj = list_first_entry(&eviction_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&eviction_list,
+				       struct i915_vma,
 				       exec_list);
 		if (ret == 0)
-			ret = i915_vma_unbind(i915_gem_obj_to_vma(obj, vm));
+			ret = i915_vma_unbind(vma);
 
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	return ret;
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/4] drm/i915: prepare bind_to_vm for preallocated vma
  2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
  2013-08-14  9:38     ` [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link Daniel Vetter
  2013-08-14  9:38     ` [PATCH 2/4] drm/i915: Switch eviction code to use vmas Daniel Vetter
@ 2013-08-14  9:38     ` Daniel Vetter
  2013-08-14  9:38     ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Daniel Vetter
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:38 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

From: Ben Widawsky <ben@bwidawsk.net>

In the new execbuf code we want to track buffers using the vmas even
before they're all properly mapped. Which means that bind_to_vm needs
to deal with buffers which have preallocated vmas which aren't yet
bound.

This patch implements this prep work and adjusts our WARN/BUG checks.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf patch. Also move one BUG
back to its original place to deflate the diff a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h |  3 +++
 drivers/gpu/drm/i915/i915_gem.c | 23 +++++++++++++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b2b9836..4bd66e9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1914,6 +1914,9 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm);
 struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm);
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm);
 /* Some GGTT VM helpers */
 #define obj_to_ggtt(obj) \
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 122c6b0..d8f322e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3124,9 +3124,6 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int ret;
 
-	if (WARN_ON(!list_empty(&obj->vma_list)))
-		return -EBUSY;
-
 	fence_size = i915_gem_get_gtt_size(dev,
 					   obj->base.size,
 					   obj->tiling_mode);
@@ -3165,16 +3162,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_pin_pages(obj);
 
-	/* FIXME: For now we only ever use 1 VMA per object */
 	BUG_ON(!i915_is_ggtt(vm));
-	WARN_ON(!list_empty(&obj->vma_list));
 
-	vma = i915_gem_vma_create(obj, vm);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unpin;
 	}
 
+	/* For now we only ever use 1 vma per object */
+	WARN_ON(!list_is_singular(&obj->vma_list));
+
 search_free:
 	ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
 						  size, alignment,
@@ -4883,3 +4881,16 @@ struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 
 	return NULL;
 }
+
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm)
+{
+	struct i915_vma *vma;
+
+	vma = i915_gem_obj_to_vma(obj, vm);
+	if (!vma)
+		vma = i915_gem_vma_create(obj, vm);
+
+	return vma;
+}
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] drm/i915: Convert execbuf code to use vmas
  2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
                       ` (2 preceding siblings ...)
  2013-08-14  9:38     ` [PATCH 3/4] drm/i915: prepare bind_to_vm for preallocated vma Daniel Vetter
@ 2013-08-14  9:38     ` Daniel Vetter
  2013-08-14  9:59       ` [PATCH] drm/i915: inline vma_create into lookup_or_create_vma Daniel Vetter
  2013-08-14 11:54       ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Chris Wilson
  3 siblings, 2 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:38 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

From: Ben Widawsky <ben@bwidawsk.net>

In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569d366beb4be72cacfde41adee2c25e1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h            |  16 +-
 drivers/gpu/drm/i915/i915_gem.c            |   1 -
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 320 ++++++++++++++++-------------
 3 files changed, 183 insertions(+), 154 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4bd66e9..fc35ae3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -567,6 +567,13 @@ struct i915_vma {
 	/** This vma's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
 
+	/**
+	 * Used for performing relocations during execbuffer insertion.
+	 */
+	struct hlist_node exec_node;
+	unsigned long exec_handle;
+	struct drm_i915_gem_exec_object2 *exec_entry;
+
 };
 
 struct i915_ctx_hang_stats {
@@ -1318,8 +1325,6 @@ struct drm_i915_gem_object {
 	struct list_head ring_list;
 	/** Used in execbuf to temporarily hold a ref */
 	struct list_head obj_exec_link;
-	/** This object's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
 
 	/**
 	 * This is set if the object is on the active lists (has pending
@@ -1405,13 +1410,6 @@ struct drm_i915_gem_object {
 	void *dma_buf_vmapping;
 	int vmapping_count;
 
-	/**
-	 * Used for performing relocations during execbuffer insertion.
-	 */
-	struct hlist_node exec_node;
-	unsigned long exec_handle;
-	struct drm_i915_gem_exec_object2 *exec_entry;
-
 	struct intel_ring_buffer *ring;
 
 	/** Breadcrumb of last rendering to the buffer. */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d8f322e..4be3be9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3990,7 +3990,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 {
 	INIT_LIST_HEAD(&obj->global_list);
 	INIT_LIST_HEAD(&obj->ring_list);
-	INIT_LIST_HEAD(&obj->exec_list);
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7dcf78c..c3b36f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -33,24 +33,24 @@
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
 
-struct eb_objects {
-	struct list_head objects;
+struct eb_vmas {
+	struct list_head vmas;
 	int and;
 	union {
-		struct drm_i915_gem_object *lut[0];
+		struct i915_vma *lut[0];
 		struct hlist_head buckets[0];
 	};
 };
 
-static struct eb_objects *
-eb_create(struct drm_i915_gem_execbuffer2 *args)
+static struct eb_vmas *
+eb_create(struct drm_i915_gem_execbuffer2 *args, struct i915_address_space *vm)
 {
-	struct eb_objects *eb = NULL;
+	struct eb_vmas *eb = NULL;
 
 	if (args->flags & I915_EXEC_HANDLE_LUT) {
 		int size = args->buffer_count;
-		size *= sizeof(struct drm_i915_gem_object *);
-		size += sizeof(struct eb_objects);
+		size *= sizeof(struct i915_vma *);
+		size += sizeof(struct eb_vmas);
 		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	}
 
@@ -61,7 +61,7 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
 		while (count > 2*size)
 			count >>= 1;
 		eb = kzalloc(count*sizeof(struct hlist_head) +
-			     sizeof(struct eb_objects),
+			     sizeof(struct eb_vmas),
 			     GFP_TEMPORARY);
 		if (eb == NULL)
 			return eb;
@@ -70,64 +70,97 @@ eb_create(struct drm_i915_gem_execbuffer2 *args)
 	} else
 		eb->and = -args->buffer_count;
 
-	INIT_LIST_HEAD(&eb->objects);
+	INIT_LIST_HEAD(&eb->vmas);
 	return eb;
 }
 
 static void
-eb_reset(struct eb_objects *eb)
+eb_reset(struct eb_vmas *eb)
 {
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static int
-eb_lookup_objects(struct eb_objects *eb,
-		  struct drm_i915_gem_exec_object2 *exec,
-		  const struct drm_i915_gem_execbuffer2 *args,
-		  struct drm_file *file)
+eb_lookup_vmas(struct eb_vmas *eb,
+	       struct drm_i915_gem_exec_object2 *exec,
+	       const struct drm_i915_gem_execbuffer2 *args,
+	       struct i915_address_space *vm,
+	       struct drm_file *file)
 {
-	int i;
+	struct drm_i915_gem_object *obj;
+	struct list_head objects;
+	int i, ret = 0;
 
+	INIT_LIST_HEAD(&objects);
 	spin_lock(&file->table_lock);
+	/* Grab a reference to the object and release the lock so we can lookup
+	 * or create the VMA without using GFP_ATOMIC */
 	for (i = 0; i < args->buffer_count; i++) {
-		struct drm_i915_gem_object *obj;
-
 		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
 		if (obj == NULL) {
 			spin_unlock(&file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
 				   exec[i].handle, i);
-			return -ENOENT;
+			ret = -ENOENT;
+			goto out;
 		}
 
-		if (!list_empty(&obj->exec_list)) {
+		if (!list_empty(&obj->obj_exec_link)) {
 			spin_unlock(&file->table_lock);
 			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
 				   obj, exec[i].handle, i);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 
 		drm_gem_object_reference(&obj->base);
-		list_add_tail(&obj->exec_list, &eb->objects);
+		list_add_tail(&obj->obj_exec_link, &objects);
+	}
+	spin_unlock(&file->table_lock);
 
-		obj->exec_entry = &exec[i];
+	i = 0;
+	list_for_each_entry(obj, &objects, obj_exec_link) {
+		struct i915_vma *vma;
+
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
+		if (IS_ERR(vma)) {
+			/* XXX: We don't need an error path fro vma because if
+			 * the vma was created just for this execbuf, object
+			 * unreference should kill it off.*/
+			DRM_DEBUG("Failed to lookup VMA\n");
+			ret = PTR_ERR(vma);
+			goto out;
+		}
+
+		list_add_tail(&vma->exec_list, &eb->vmas);
+
+		vma->exec_entry = &exec[i];
 		if (eb->and < 0) {
-			eb->lut[i] = obj;
+			eb->lut[i] = vma;
 		} else {
 			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
-			obj->exec_handle = handle;
-			hlist_add_head(&obj->exec_node,
+			vma->exec_handle = handle;
+			hlist_add_head(&vma->exec_node,
 				       &eb->buckets[handle & eb->and]);
 		}
+		++i;
 	}
-	spin_unlock(&file->table_lock);
 
-	return 0;
+
+out:
+	while (!list_empty(&objects)) {
+		obj = list_first_entry(&objects,
+				       struct drm_i915_gem_object,
+				       obj_exec_link);
+		list_del_init(&obj->obj_exec_link);
+		if (ret)
+			drm_gem_object_unreference(&obj->base);
+	}
+	return ret;
 }
 
-static struct drm_i915_gem_object *
-eb_get_object(struct eb_objects *eb, unsigned long handle)
+static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
 {
 	if (eb->and < 0) {
 		if (handle >= -eb->and)
@@ -139,27 +172,25 @@ eb_get_object(struct eb_objects *eb, unsigned long handle)
 
 		head = &eb->buckets[handle & eb->and];
 		hlist_for_each(node, head) {
-			struct drm_i915_gem_object *obj;
+			struct i915_vma *vma;
 
-			obj = hlist_entry(node, struct drm_i915_gem_object, exec_node);
-			if (obj->exec_handle == handle)
-				return obj;
+			vma = hlist_entry(node, struct i915_vma, exec_node);
+			if (vma->exec_handle == handle)
+				return vma;
 		}
 		return NULL;
 	}
 }
 
-static void
-eb_destroy(struct eb_objects *eb)
-{
-	while (!list_empty(&eb->objects)) {
-		struct drm_i915_gem_object *obj;
+static void eb_destroy(struct eb_vmas *eb) {
+	while (!list_empty(&eb->vmas)) {
+		struct i915_vma *vma;
 
-		obj = list_first_entry(&eb->objects,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&eb->vmas,
+				       struct i915_vma,
 				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 	kfree(eb);
 }
@@ -173,22 +204,24 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
-				   struct eb_objects *eb,
+				   struct eb_vmas *eb,
 				   struct drm_i915_gem_relocation_entry *reloc,
 				   struct i915_address_space *vm)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
 	struct drm_i915_gem_object *target_i915_obj;
+	struct i915_vma *target_vma;
 	uint32_t target_offset;
 	int ret = -EINVAL;
 
 	/* we've already hold a reference to all valid objects */
-	target_obj = &eb_get_object(eb, reloc->target_handle)->base;
-	if (unlikely(target_obj == NULL))
+	target_vma = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(target_vma == NULL))
 		return -ENOENT;
+	target_i915_obj = target_vma->obj;
+	target_obj = &target_vma->obj->base;
 
-	target_i915_obj = to_intel_bo(target_obj);
 	target_offset = i915_gem_obj_ggtt_offset(target_i915_obj);
 
 	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
@@ -297,14 +330,13 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
-				    struct eb_objects *eb,
-				    struct i915_address_space *vm)
+i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
+				 struct eb_vmas *eb)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	int remain, ret;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
@@ -323,8 +355,8 @@ i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(obj, eb, r,
-								 vm);
+			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r,
+								 vma->vm);
 			if (ret)
 				return ret;
 
@@ -345,17 +377,16 @@ i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
-					 struct eb_objects *eb,
-					 struct drm_i915_gem_relocation_entry *relocs,
-					 struct i915_address_space *vm)
+i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
+				      struct eb_vmas *eb,
+				      struct drm_i915_gem_relocation_entry *relocs)
 {
-	const struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	int i, ret;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(obj, eb, &relocs[i],
-							 vm);
+		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i],
+							 vma->vm);
 		if (ret)
 			return ret;
 	}
@@ -364,10 +395,10 @@ i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate(struct eb_objects *eb,
+i915_gem_execbuffer_relocate(struct eb_vmas *eb,
 			     struct i915_address_space *vm)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	int ret = 0;
 
 	/* This is the fast path and we cannot handle a pagefault whilst
@@ -378,8 +409,8 @@ i915_gem_execbuffer_relocate(struct eb_objects *eb,
 	 * lockdep complains vehemently.
 	 */
 	pagefault_disable();
-	list_for_each_entry(obj, &eb->objects, exec_list) {
-		ret = i915_gem_execbuffer_relocate_object(obj, eb, vm);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
@@ -392,31 +423,32 @@ i915_gem_execbuffer_relocate(struct eb_objects *eb,
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
 
 static int
-need_reloc_mappable(struct drm_i915_gem_object *obj)
+need_reloc_mappable(struct i915_vma *vma)
 {
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
-	return entry->relocation_count && !use_cpu_reloc(obj);
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	return entry->relocation_count && !use_cpu_reloc(vma->obj) &&
+		i915_is_ggtt(vma->vm);
 }
 
 static int
-i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
-				   struct intel_ring_buffer *ring,
-				   struct i915_address_space *vm,
-				   bool *need_reloc)
+i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
+				struct intel_ring_buffer *ring,
+				bool *need_reloc)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	bool need_fence, need_mappable;
+	struct drm_i915_gem_object *obj = vma->obj;
 	int ret;
 
 	need_fence =
 		has_fenced_gpu_access &&
 		entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 		obj->tiling_mode != I915_TILING_NONE;
-	need_mappable = need_fence || need_reloc_mappable(obj);
+	need_mappable = need_fence || need_reloc_mappable(vma);
 
-	ret = i915_gem_object_pin(obj, vm, entry->alignment, need_mappable,
+	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, need_mappable,
 				  false);
 	if (ret)
 		return ret;
@@ -444,8 +476,8 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 		obj->has_aliasing_ppgtt_mapping = 1;
 	}
 
-	if (entry->offset != i915_gem_obj_offset(obj, vm)) {
-		entry->offset = i915_gem_obj_offset(obj, vm);
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start;
 		*need_reloc = true;
 	}
 
@@ -462,14 +494,15 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 }
 
 static void
-i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
+i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry;
+	struct drm_i915_gem_object *obj = vma->obj;
 
-	if (!i915_gem_obj_bound_any(obj))
+	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
-	entry = obj->exec_entry;
+	entry = vma->exec_entry;
 
 	if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
 		i915_gem_object_unpin_fence(obj);
@@ -482,41 +515,40 @@ i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
 
 static int
 i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
-			    struct list_head *objects,
-			    struct i915_address_space *vm,
+			    struct list_head *vmas,
 			    bool *need_relocs)
 {
 	struct drm_i915_gem_object *obj;
-	struct list_head ordered_objects;
+	struct i915_vma *vma;
+	struct list_head ordered_vmas;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	INIT_LIST_HEAD(&ordered_objects);
-	while (!list_empty(objects)) {
+	INIT_LIST_HEAD(&ordered_vmas);
+	while (!list_empty(vmas)) {
 		struct drm_i915_gem_exec_object2 *entry;
 		bool need_fence, need_mappable;
 
-		obj = list_first_entry(objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		entry = obj->exec_entry;
+		vma = list_first_entry(vmas, struct i915_vma, exec_list);
+		obj = vma->obj;
+		entry = vma->exec_entry;
 
 		need_fence =
 			has_fenced_gpu_access &&
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 			obj->tiling_mode != I915_TILING_NONE;
-		need_mappable = need_fence || need_reloc_mappable(obj);
+		need_mappable = need_fence || need_reloc_mappable(vma);
 
 		if (need_mappable)
-			list_move(&obj->exec_list, &ordered_objects);
+			list_move(&vma->exec_list, &ordered_vmas);
 		else
-			list_move_tail(&obj->exec_list, &ordered_objects);
+			list_move_tail(&vma->exec_list, &ordered_vmas);
 
 		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
 		obj->base.pending_write_domain = 0;
 		obj->pending_fenced_gpu_access = false;
 	}
-	list_splice(&ordered_objects, objects);
+	list_splice(&ordered_vmas, vmas);
 
 	/* Attempt to pin all of the buffers into the GTT.
 	 * This is done in 3 phases:
@@ -535,47 +567,47 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 		int ret = 0;
 
 		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(obj, objects, exec_list) {
-			struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+		list_for_each_entry(vma, vmas, exec_list) {
+			struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 			bool need_fence, need_mappable;
-			u32 obj_offset;
 
-			if (!i915_gem_obj_bound(obj, vm))
+			obj = vma->obj;
+
+			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
-			obj_offset = i915_gem_obj_offset(obj, vm);
 			need_fence =
 				has_fenced_gpu_access &&
 				entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 				obj->tiling_mode != I915_TILING_NONE;
-			need_mappable = need_fence || need_reloc_mappable(obj);
+			need_mappable = need_fence || need_reloc_mappable(vma);
 
 			WARN_ON((need_mappable || need_fence) &&
-				!i915_is_ggtt(vm));
+			       !i915_is_ggtt(vma->vm));
 
 			if ((entry->alignment &&
-			     obj_offset & (entry->alignment - 1)) ||
+			     vma->node.start & (entry->alignment - 1)) ||
 			    (need_mappable && !obj->map_and_fenceable))
-				ret = i915_vma_unbind(i915_gem_obj_to_vma(obj, vm));
+				ret = i915_vma_unbind(vma);
 			else
-				ret = i915_gem_execbuffer_reserve_object(obj, ring, vm, need_relocs);
+				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
 
 		/* Bind fresh objects */
-		list_for_each_entry(obj, objects, exec_list) {
-			if (i915_gem_obj_bound(obj, vm))
+		list_for_each_entry(vma, vmas, exec_list) {
+			if (drm_mm_node_allocated(&vma->node))
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_object(obj, ring, vm, need_relocs);
+			ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
 
 err:		/* Decrement pin count for bound objects */
-		list_for_each_entry(obj, objects, exec_list)
-			i915_gem_execbuffer_unreserve_object(obj);
+		list_for_each_entry(vma, vmas, exec_list)
+			i915_gem_execbuffer_unreserve_vma(vma);
 
 		if (ret != -ENOSPC || retry++)
 			return ret;
@@ -591,24 +623,27 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 				  struct drm_i915_gem_execbuffer2 *args,
 				  struct drm_file *file,
 				  struct intel_ring_buffer *ring,
-				  struct eb_objects *eb,
-				  struct drm_i915_gem_exec_object2 *exec,
-				  struct i915_address_space *vm)
+				  struct eb_vmas *eb,
+				  struct drm_i915_gem_exec_object2 *exec)
 {
 	struct drm_i915_gem_relocation_entry *reloc;
-	struct drm_i915_gem_object *obj;
+	struct i915_address_space *vm;
+	struct i915_vma *vma;
 	bool need_relocs;
 	int *reloc_offset;
 	int i, total, ret;
 	int count = args->buffer_count;
 
+	if (WARN_ON(list_empty(&eb->vmas)))
+		return 0;
+
+	vm = list_first_entry(&eb->vmas, struct i915_vma, exec_list)->vm;
+
 	/* We may process another execbuffer during the unlock... */
-	while (!list_empty(&eb->objects)) {
-		obj = list_first_entry(&eb->objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+	while (!list_empty(&eb->vmas)) {
+		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -672,20 +707,19 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_objects(eb, exec, args, file);
+	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
 		goto err;
 
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->objects, vm, &need_relocs);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
 	if (ret)
 		goto err;
 
-	list_for_each_entry(obj, &eb->objects, exec_list) {
-		int offset = obj->exec_entry - exec;
-		ret = i915_gem_execbuffer_relocate_object_slow(obj, eb,
-							       reloc + reloc_offset[offset],
-							       vm);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		int offset = vma->exec_entry - exec;
+		ret = i915_gem_execbuffer_relocate_vma_slow(vma, eb,
+							    reloc + reloc_offset[offset]);
 		if (ret)
 			goto err;
 	}
@@ -704,14 +738,15 @@ err:
 
 static int
 i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
-				struct list_head *objects)
+				struct list_head *vmas)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
 	int ret;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		ret = i915_gem_object_sync(obj, ring);
 		if (ret)
 			return ret;
@@ -786,13 +821,13 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 }
 
 static void
-i915_gem_execbuffer_move_to_active(struct list_head *objects,
-				   struct i915_address_space *vm,
+i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_ring_buffer *ring)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
 
@@ -802,8 +837,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *objects,
 		obj->base.read_domains = obj->base.pending_read_domains;
 		obj->fenced_gpu_access = obj->pending_fenced_gpu_access;
 
-		/* FIXME: This lookup gets fixed later <-- danvet */
-		list_move_tail(&i915_gem_obj_to_vma(obj, vm)->mm_list, &vm->active_list);
+		list_move_tail(&vma->mm_list, &vma->vm->active_list);
 		i915_gem_object_move_to_active(obj, ring);
 		if (obj->base.write_domain) {
 			obj->dirty = 1;
@@ -862,7 +896,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct i915_address_space *vm)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct eb_objects *eb;
+	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
 	struct drm_clip_rect *cliprects = NULL;
 	struct intel_ring_buffer *ring;
@@ -1002,7 +1036,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
-	eb = eb_create(args);
+	eb = eb_create(args, vm);
 	if (eb == NULL) {
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
@@ -1010,18 +1044,16 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Look up object handles */
-	ret = eb_lookup_objects(eb, exec, args, file);
+	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	batch_obj = list_entry(eb->objects.prev,
-			       struct drm_i915_gem_object,
-			       exec_list);
+	batch_obj = list_entry(eb->vmas.prev, struct i915_vma, exec_list)->obj;
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->objects, vm, &need_relocs);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
 	if (ret)
 		goto err;
 
@@ -1031,7 +1063,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret) {
 		if (ret == -EFAULT) {
 			ret = i915_gem_execbuffer_relocate_slow(dev, args, file, ring,
-								eb, exec, vm);
+								eb, exec);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
 		if (ret)
@@ -1053,7 +1085,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (flags & I915_DISPATCH_SECURE && !batch_obj->has_global_gtt_mapping)
 		i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level);
 
-	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->objects);
+	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
 	if (ret)
 		goto err;
 
@@ -1108,7 +1140,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
 
-	i915_gem_execbuffer_move_to_active(&eb->objects, vm, ring);
+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
 
 err:
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link
  2013-08-14  9:38     ` [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link Daniel Vetter
@ 2013-08-14  9:42       ` Daniel Vetter
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:42 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

On Wed, Aug 14, 2013 at 11:38:33AM +0200, Daniel Vetter wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> To convert the execbuf code over to use vmas natively we need to
> shuffle the exec_list a bit. This patch here just prepares things with
> the debugfs code, which also uses the old exec_list list_head, newly
> called obj_exec_link.

Meh, patch subject is missing a "in debugfs" at the end. I'll fix this
when merging.
-Daniel

> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> [danvet: Split out from Ben's big patch.]
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 12 ++++++------
>  drivers/gpu/drm/i915/i915_drv.h     |  2 ++
>  drivers/gpu/drm/i915/i915_gem.c     |  1 +
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index eb87865..4785d8c 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -195,9 +195,9 @@ static int obj_rank_by_stolen(void *priv,
>  			      struct list_head *A, struct list_head *B)
>  {
>  	struct drm_i915_gem_object *a =
> -		container_of(A, struct drm_i915_gem_object, exec_list);
> +		container_of(A, struct drm_i915_gem_object, obj_exec_link);
>  	struct drm_i915_gem_object *b =
> -		container_of(B, struct drm_i915_gem_object, exec_list);
> +		container_of(B, struct drm_i915_gem_object, obj_exec_link);
>  
>  	return a->stolen->start - b->stolen->start;
>  }
> @@ -221,7 +221,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
>  		if (obj->stolen == NULL)
>  			continue;
>  
> -		list_add(&obj->exec_list, &stolen);
> +		list_add(&obj->obj_exec_link, &stolen);
>  
>  		total_obj_size += obj->base.size;
>  		total_gtt_size += i915_gem_obj_ggtt_size(obj);
> @@ -231,7 +231,7 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
>  		if (obj->stolen == NULL)
>  			continue;
>  
> -		list_add(&obj->exec_list, &stolen);
> +		list_add(&obj->obj_exec_link, &stolen);
>  
>  		total_obj_size += obj->base.size;
>  		count++;
> @@ -239,11 +239,11 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
>  	list_sort(NULL, &stolen, obj_rank_by_stolen);
>  	seq_puts(m, "Stolen:\n");
>  	while (!list_empty(&stolen)) {
> -		obj = list_first_entry(&stolen, typeof(*obj), exec_list);
> +		obj = list_first_entry(&stolen, typeof(*obj), obj_exec_link);
>  		seq_puts(m, "   ");
>  		describe_obj(m, obj);
>  		seq_putc(m, '\n');
> -		list_del_init(&obj->exec_list);
> +		list_del_init(&obj->obj_exec_link);
>  	}
>  	mutex_unlock(&dev->struct_mutex);
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2e7d5f9..6532d97 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1312,6 +1312,8 @@ struct drm_i915_gem_object {
>  	struct list_head global_list;
>  
>  	struct list_head ring_list;
> +	/** Used in execbuf to temporarily hold a ref */
> +	struct list_head obj_exec_link;
>  	/** This object's place in the batchbuffer or on the eviction list */
>  	struct list_head exec_list;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index fd48724..219e2fb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3993,6 +3993,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>  	INIT_LIST_HEAD(&obj->global_list);
>  	INIT_LIST_HEAD(&obj->ring_list);
>  	INIT_LIST_HEAD(&obj->exec_list);
> +	INIT_LIST_HEAD(&obj->obj_exec_link);
>  	INIT_LIST_HEAD(&obj->vma_list);
>  
>  	obj->ops = ops;
> -- 
> 1.8.4.rc1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] drm/i915: inline vma_create into lookup_or_create_vma
  2013-08-14  9:38     ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Daniel Vetter
@ 2013-08-14  9:59       ` Daniel Vetter
  2013-08-14 11:49         ` Chris Wilson
  2013-08-14 16:47         ` Daniel Vetter
  2013-08-14 11:54       ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Chris Wilson
  1 sibling, 2 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14  9:59 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---

That's the only concern I could come up with when reading the execbuf
vma conversion patch. So looks good and I'll slurp it all in as soon
as some more head scratching is done for the very first patch in this
series about the vma_unbind fix to only call vma_destroy if the vma
isn't bound.
-Daniel
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 --
 drivers/gpu/drm/i915/i915_gem.c            | 41 +++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++---
 drivers/gpu/drm/i915/i915_gem_stolen.c     |  2 +-
 4 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fc35ae3..b06a90f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1748,8 +1748,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 void i915_gem_free_object(struct drm_gem_object *obj);
-struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm);
 void i915_gem_vma_destroy(struct i915_vma *vma);
 
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4be3be9..bb029a4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4121,28 +4121,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	i915_gem_object_free(obj);
 }
 
-struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm)
-{
-	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
-	if (vma == NULL)
-		return ERR_PTR(-ENOMEM);
-
-	INIT_LIST_HEAD(&vma->vma_link);
-	INIT_LIST_HEAD(&vma->mm_list);
-	INIT_LIST_HEAD(&vma->exec_list);
-	vma->vm = vm;
-	vma->obj = obj;
-
-	/* Keep GGTT vmas first to make debug easier */
-	if (i915_is_ggtt(vm))
-		list_add(&vma->vma_link, &obj->vma_list);
-	else
-		list_add_tail(&vma->vma_link, &obj->vma_list);
-
-	return vma;
-}
-
 void i915_gem_vma_destroy(struct i915_vma *vma)
 {
 	WARN_ON(vma->node.allocated);
@@ -4888,8 +4866,23 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 
 	vma = i915_gem_obj_to_vma(obj, vm);
-	if (!vma)
-		vma = i915_gem_vma_create(obj, vm);
+	if (!vma) {
+		vma = kzalloc(sizeof(*vma), GFP_KERNEL);
+		if (vma == NULL)
+			return ERR_PTR(-ENOMEM);
+
+		INIT_LIST_HEAD(&vma->vma_link);
+		INIT_LIST_HEAD(&vma->mm_list);
+		INIT_LIST_HEAD(&vma->exec_list);
+		vma->vm = vm;
+		vma->obj = obj;
+
+		/* Keep GGTT vmas first to make debug easier */
+		if (i915_is_ggtt(vm))
+			list_add(&vma->vma_link, &obj->vma_list);
+		else
+			list_add_tail(&vma->vma_link, &obj->vma_list);
+	}
 
 	return vma;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c3b36f5..9b3b5f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -123,11 +123,16 @@ eb_lookup_vmas(struct eb_vmas *eb,
 	list_for_each_entry(obj, &objects, obj_exec_link) {
 		struct i915_vma *vma;
 
+		/*
+		 * NOTE: We can leak any vmas created here when something fails
+		 * later on. But that's no issue since vma_unbind can deal with
+		 * vmas which are not actually bound. And since only
+		 * lookup_or_create exists as an interface to get at the vma
+		 * from the (obj, vm) we don't run the risk of creating
+		 * duplicated vmas for the same vm.
+		 */
 		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
 		if (IS_ERR(vma)) {
-			/* XXX: We don't need an error path fro vma because if
-			 * the vma was created just for this execbuf, object
-			 * unreference should kill it off.*/
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
 			goto out;
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 7f4c510..b239185 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -375,7 +375,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (gtt_offset == I915_GTT_OFFSET_NONE)
 		return obj;
 
-	vma = i915_gem_vma_create(obj, ggtt);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, ggtt);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_out;
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/i915: inline vma_create into lookup_or_create_vma
  2013-08-14  9:59       ` [PATCH] drm/i915: inline vma_create into lookup_or_create_vma Daniel Vetter
@ 2013-08-14 11:49         ` Chris Wilson
  2013-08-14 12:08           ` Daniel Vetter
  2013-08-14 12:14           ` Daniel Vetter
  2013-08-14 16:47         ` Daniel Vetter
  1 sibling, 2 replies; 29+ messages in thread
From: Chris Wilson @ 2013-08-14 11:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky

On Wed, Aug 14, 2013 at 11:59:09AM +0200, Daniel Vetter wrote:
> In the execbuf code we don't clean up any vmas which ended up not
> getting bound for code simplicity. To make sure that we don't end up
> creating multiple vma for the same vm kill the somewhat dangerous
> vma_create function and inline it into lookup_or_create.

Just make it static. It's a nice little standalone function that helps
the reader.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] drm/i915: Convert execbuf code to use vmas
  2013-08-14  9:38     ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Daniel Vetter
  2013-08-14  9:59       ` [PATCH] drm/i915: inline vma_create into lookup_or_create_vma Daniel Vetter
@ 2013-08-14 11:54       ` Chris Wilson
  1 sibling, 0 replies; 29+ messages in thread
From: Chris Wilson @ 2013-08-14 11:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development, Ben Widawsky

On Wed, Aug 14, 2013 at 11:38:36AM +0200, Daniel Vetter wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> +		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
> +		if (IS_ERR(vma)) {
> +			/* XXX: We don't need an error path fro vma because if
> +			 * the vma was created just for this execbuf, object
> +			 * unreference should kill it off.*/

The XXX part here is that before the object is finally unreffed we
have a dangling vma and so we need to take care else way not to assume
that a vma is implicitly bound.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/i915: inline vma_create into lookup_or_create_vma
  2013-08-14 11:49         ` Chris Wilson
@ 2013-08-14 12:08           ` Daniel Vetter
  2013-08-14 12:14           ` Daniel Vetter
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14 12:08 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Intel Graphics Development, Ben Widawsky

On Wed, Aug 14, 2013 at 12:49:04PM +0100, Chris Wilson wrote:
> On Wed, Aug 14, 2013 at 11:59:09AM +0200, Daniel Vetter wrote:
> > In the execbuf code we don't clean up any vmas which ended up not
> > getting bound for code simplicity. To make sure that we don't end up
> > creating multiple vma for the same vm kill the somewhat dangerous
> > vma_create function and inline it into lookup_or_create.
> 
> Just make it static. It's a nice little standalone function that helps
> the reader.

I'm afraid that someone will use it eventually. I'll smash a __ prefix
onto it to make it clear that people better don't touch it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] drm/i915: inline vma_create into lookup_or_create_vma
  2013-08-14 11:49         ` Chris Wilson
  2013-08-14 12:08           ` Daniel Vetter
@ 2013-08-14 12:14           ` Daniel Vetter
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14 12:14 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h            |  2 --
 drivers/gpu/drm/i915/i915_gem.c            | 50 +++++++++++++++---------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++--
 drivers/gpu/drm/i915/i915_gem_stolen.c     |  2 +-
 4 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fc35ae3..b06a90f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1748,8 +1748,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
 void i915_gem_free_object(struct drm_gem_object *obj);
-struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm);
 void i915_gem_vma_destroy(struct i915_vma *vma);
 
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4be3be9..0832f61 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4121,9 +4121,20 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	i915_gem_object_free(obj);
 }
 
-struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
+struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm)
 {
+	struct i915_vma *vma;
+	list_for_each_entry(vma, &obj->vma_list, vma_link)
+		if (vma->vm == vm)
+			return vma;
+
+	return NULL;
+}
+
+static struct i915_vma *__i915_gem_vma_create(struct drm_i915_gem_object *obj,
+					      struct i915_address_space *vm)
+{
 	struct i915_vma *vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
@@ -4143,6 +4154,19 @@ struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	return vma;
 }
 
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm)
+{
+	struct i915_vma *vma;
+
+	vma = i915_gem_obj_to_vma(obj, vm);
+	if (!vma)
+		vma = __i915_gem_vma_create(obj, vm);
+
+	return vma;
+}
+
 void i915_gem_vma_destroy(struct i915_vma *vma)
 {
 	WARN_ON(vma->node.allocated);
@@ -4869,27 +4893,3 @@ unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 
 	return 0;
 }
-
-struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, vma_link)
-		if (vma->vm == vm)
-			return vma;
-
-	return NULL;
-}
-
-struct i915_vma *
-i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-
-	vma = i915_gem_obj_to_vma(obj, vm);
-	if (!vma)
-		vma = i915_gem_vma_create(obj, vm);
-
-	return vma;
-}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c3b36f5..9b3b5f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -123,11 +123,16 @@ eb_lookup_vmas(struct eb_vmas *eb,
 	list_for_each_entry(obj, &objects, obj_exec_link) {
 		struct i915_vma *vma;
 
+		/*
+		 * NOTE: We can leak any vmas created here when something fails
+		 * later on. But that's no issue since vma_unbind can deal with
+		 * vmas which are not actually bound. And since only
+		 * lookup_or_create exists as an interface to get at the vma
+		 * from the (obj, vm) we don't run the risk of creating
+		 * duplicated vmas for the same vm.
+		 */
 		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
 		if (IS_ERR(vma)) {
-			/* XXX: We don't need an error path fro vma because if
-			 * the vma was created just for this execbuf, object
-			 * unreference should kill it off.*/
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
 			goto out;
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 7f4c510..b239185 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -375,7 +375,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (gtt_offset == I915_GTT_OFFSET_NONE)
 		return obj;
 
-	vma = i915_gem_vma_create(obj, ggtt);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, ggtt);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_out;
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/i915: inline vma_create into lookup_or_create_vma
  2013-08-14  9:59       ` [PATCH] drm/i915: inline vma_create into lookup_or_create_vma Daniel Vetter
  2013-08-14 11:49         ` Chris Wilson
@ 2013-08-14 16:47         ` Daniel Vetter
  2013-08-14 17:35           ` Ben Widawsky
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14 16:47 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Ben Widawsky

On Wed, Aug 14, 2013 at 11:59:09AM +0200, Daniel Vetter wrote:
> In the execbuf code we don't clean up any vmas which ended up not
> getting bound for code simplicity. To make sure that we don't end up
> creating multiple vma for the same vm kill the somewhat dangerous
> vma_create function and inline it into lookup_or_create.
> 
> This is just a safety measure to prevent surprises in the future.
> 
> Also update the somewhat confused comment in the execbuf code and
> clarify what kind of magic is going on with a new one.
> 
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> 
> That's the only concern I could come up with when reading the execbuf
> vma conversion patch. So looks good and I'll slurp it all in as soon
> as some more head scratching is done for the very first patch in this
> series about the vma_unbind fix to only call vma_destroy if the vma
> isn't bound.

One thing I've noticed but forgot to mention here is that the reloc code
still uses obj_ggtt_size/offset. I guess that will be fixed later on?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind
  2013-08-14  8:18   ` Daniel Vetter
@ 2013-08-14 17:27     ` Ben Widawsky
  2013-08-14 18:08       ` Daniel Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14 17:27 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Ben Widawsky

On Wed, Aug 14, 2013 at 10:18:55AM +0200, Daniel Vetter wrote:
> On Tue, Aug 13, 2013 at 06:09:07PM -0700, Ben Widawsky wrote:
> > Cleanup the map and fenceable setting during bind to make more sense,
> > and not check i915_is_ggtt() 2 unnecessary times
> > 
> > v2: Move the bools into the if block (Chris) - There are ways to tidy
> > this function (fence calculations for instance) even further, but they
> > are quite invasive, so I am punting on those unless specifically asked.
> > 
> > v3: Add newline between variable declaration and logic (Chris)
> > 
> > Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 4a58ead..01cc016 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3106,7 +3106,6 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> >  	struct drm_device *dev = obj->base.dev;
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> >  	u32 size, fence_size, fence_alignment, unfenced_alignment;
> > -	bool mappable, fenceable;
> >  	size_t gtt_max =
> >  		map_and_fenceable ? dev_priv->gtt.mappable_end : vm->total;
> >  	struct i915_vma *vma;
> > @@ -3191,18 +3190,18 @@ search_free:
> >  	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
> >  	list_add_tail(&vma->mm_list, &vm->inactive_list);
> >  
> > -	fenceable =
> > -		i915_is_ggtt(vm) &&
> > -		i915_gem_obj_ggtt_size(obj) == fence_size &&
> > -		(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
> > +	if (i915_is_ggtt(vm)) {
> > +		bool mappable, fenceable;
> >  
> > -	mappable =
> > -		i915_is_ggtt(vm) &&
> > -		vma->node.start + obj->base.size <= dev_priv->gtt.mappable_end;
> > +		fenceable =
> > +			i915_gem_obj_ggtt_size(obj) == fence_size &&
> > +			(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
> 
> Why not go the full mounty and also use vma->node here? Would also make
> checkpatch a bit more happy. I'll do a follow-up commit.
> -Daniel
> 

You've already done it, so it's moot. The idea I had was to use "ggtt"
as much as possible when it can only every be ggtt. I think this would
be helpful for both clarity, and debug.

The extra indirection would be immeasurable.

Again, you've already changed it, so meh.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/i915: inline vma_create into lookup_or_create_vma
  2013-08-14 16:47         ` Daniel Vetter
@ 2013-08-14 17:35           ` Ben Widawsky
  0 siblings, 0 replies; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14 17:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development

On Wed, Aug 14, 2013 at 06:47:00PM +0200, Daniel Vetter wrote:
> On Wed, Aug 14, 2013 at 11:59:09AM +0200, Daniel Vetter wrote:
> > In the execbuf code we don't clean up any vmas which ended up not
> > getting bound for code simplicity. To make sure that we don't end up
> > creating multiple vma for the same vm kill the somewhat dangerous
> > vma_create function and inline it into lookup_or_create.
> > 
> > This is just a safety measure to prevent surprises in the future.
> > 
> > Also update the somewhat confused comment in the execbuf code and
> > clarify what kind of magic is going on with a new one.
> > 
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> > 
> > That's the only concern I could come up with when reading the execbuf
> > vma conversion patch. So looks good and I'll slurp it all in as soon
> > as some more head scratching is done for the very first patch in this
> > series about the vma_unbind fix to only call vma_destroy if the vma
> > isn't bound.
> 
> One thing I've noticed but forgot to mention here is that the reloc code
> still uses obj_ggtt_size/offset. I guess that will be fixed later on?
> -Daniel

Yes.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind
  2013-08-14 17:27     ` Ben Widawsky
@ 2013-08-14 18:08       ` Daniel Vetter
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2013-08-14 18:08 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Ben Widawsky

On Wed, Aug 14, 2013 at 10:27:42AM -0700, Ben Widawsky wrote:
> On Wed, Aug 14, 2013 at 10:18:55AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 13, 2013 at 06:09:07PM -0700, Ben Widawsky wrote:
> > > Cleanup the map and fenceable setting during bind to make more sense,
> > > and not check i915_is_ggtt() 2 unnecessary times
> > > 
> > > v2: Move the bools into the if block (Chris) - There are ways to tidy
> > > this function (fence calculations for instance) even further, but they
> > > are quite invasive, so I am punting on those unless specifically asked.
> > > 
> > > v3: Add newline between variable declaration and logic (Chris)
> > > 
> > > Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++----------
> > >  1 file changed, 9 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 4a58ead..01cc016 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3106,7 +3106,6 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> > >  	struct drm_device *dev = obj->base.dev;
> > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > >  	u32 size, fence_size, fence_alignment, unfenced_alignment;
> > > -	bool mappable, fenceable;
> > >  	size_t gtt_max =
> > >  		map_and_fenceable ? dev_priv->gtt.mappable_end : vm->total;
> > >  	struct i915_vma *vma;
> > > @@ -3191,18 +3190,18 @@ search_free:
> > >  	list_move_tail(&obj->global_list, &dev_priv->mm.bound_list);
> > >  	list_add_tail(&vma->mm_list, &vm->inactive_list);
> > >  
> > > -	fenceable =
> > > -		i915_is_ggtt(vm) &&
> > > -		i915_gem_obj_ggtt_size(obj) == fence_size &&
> > > -		(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
> > > +	if (i915_is_ggtt(vm)) {
> > > +		bool mappable, fenceable;
> > >  
> > > -	mappable =
> > > -		i915_is_ggtt(vm) &&
> > > -		vma->node.start + obj->base.size <= dev_priv->gtt.mappable_end;
> > > +		fenceable =
> > > +			i915_gem_obj_ggtt_size(obj) == fence_size &&
> > > +			(i915_gem_obj_ggtt_offset(obj) & (fence_alignment - 1)) == 0;
> > 
> > Why not go the full mounty and also use vma->node here? Would also make
> > checkpatch a bit more happy. I'll do a follow-up commit.
> > -Daniel
> > 
> 
> You've already done it, so it's moot. The idea I had was to use "ggtt"
> as much as possible when it can only every be ggtt. I think this would
> be helpful for both clarity, and debug.

I disagree here and I think the extra indirection hampers code readability
- I always end up checking your little functions to make sure they
actually fish out the right value from the right vma. So my plan is that
once this all lands I'll fully rip them out.

This wasn't ever about performance, although I admit that unnecessary
looping in our gem code does irk me a bit ;-) But again mostly from a
clarify pov.

For marking ggtt-only paths I think Chris' suggestion to enclose such code
in if (is_ggtt(vm)) {} blocks which you've implemented here is much better
for clarity.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas
       [not found]       ` <20130814224358.GA21854@nuc-i3427.alporthouse.com>
@ 2013-08-14 23:22         ` Ben Widawsky
  0 siblings, 0 replies; 29+ messages in thread
From: Ben Widawsky @ 2013-08-14 23:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On Wed, Aug 14, 2013 at 11:43:58PM +0100, Chris Wilson wrote:
> These are my numbers for a beefy haswell box (note the really
> interesting numbers will be on Baytrail):
> 
> unpatched:
> 
> relocation: buffers=   1: old=  21945 + 34.4*reloc, lut=  21814 + 34.0*reloc (ns)
> relocation: buffers=   2: old=  15947 + 36.4*reloc, lut=  16169 + 35.4*reloc (ns)
> relocation: buffers=   4: old=  12711 + 37.6*reloc, lut=  13039 + 36.7*reloc (ns)
> relocation: buffers=   8: old=   6154 + 40.9*reloc, lut=   7201 + 38.9*reloc (ns)
> relocation: buffers=  16: old=   4846 + 41.6*reloc, lut=   5337 + 40.6*reloc (ns)
> relocation: buffers=  32: old=   7097 + 41.9*reloc, lut=   6943 + 41.0*reloc (ns)
> relocation: buffers=  64: old=  13318 + 41.9*reloc, lut=  12748 + 41.2*reloc (ns)
> relocation: buffers= 128: old=  27282 + 43.0*reloc, lut=  25778 + 41.7*reloc (ns)
> relocation: buffers= 256: old=  54535 + 45.2*reloc, lut=  51912 + 43.7*reloc (ns)
> relocation: buffers= 512: old= 137447 + 53.2*reloc, lut= 129333 + 45.5*reloc (ns)
> relocation: buffers=1024: old= 307347 + 66.5*reloc, lut= 291487 + 48.1*reloc (ns)
> relocation: buffers=2048: old= 606300 + 92.1*reloc, lut= 574774 + 51.6*reloc (ns)
> skip-relocs: buffers=   1: old=   1583 + 15.6*reloc, lut=   1516 + 14.5*reloc (ns)
> skip-relocs: buffers=   2: old=   1621 + 15.6*reloc, lut=   1603 + 14.5*reloc (ns)
> skip-relocs: buffers=   4: old=   1791 + 15.6*reloc, lut=   1777 + 14.5*reloc (ns)
> skip-relocs: buffers=   8: old=   2009 + 15.6*reloc, lut=   2024 + 14.6*reloc (ns)
> skip-relocs: buffers=  16: old=   2637 + 15.7*reloc, lut=   2564 + 14.6*reloc (ns)
> skip-relocs: buffers=  32: old=   3835 + 15.8*reloc, lut=   3785 + 14.7*reloc (ns)
> skip-relocs: buffers=  64: old=   6996 + 15.8*reloc, lut=   6681 + 14.7*reloc (ns)
> skip-relocs: buffers= 128: old=  14333 + 16.4*reloc, lut=  13560 + 15.2*reloc (ns)
> skip-relocs: buffers= 256: old=  28092 + 17.7*reloc, lut=  26759 + 16.2*reloc (ns)
> skip-relocs: buffers= 512: old=  70885 + 25.2*reloc, lut=  66713 + 17.9*reloc (ns)
> skip-relocs: buffers=1024: old= 158520 + 35.2*reloc, lut= 150828 + 20.1*reloc (ns)
> skip-relocs: buffers=2048: old= 314208 + 54.3*reloc, lut= 298343 + 22.1*reloc (ns)
> no-relocs: buffers=   1: old=   1533 + 5.2*reloc, lut=   1498 + 4.9*reloc (ns)
> no-relocs: buffers=   2: old=   1518 + 5.2*reloc, lut=   1505 + 4.9*reloc (ns)
> no-relocs: buffers=   4: old=   1647 + 5.2*reloc, lut=   1593 + 4.9*reloc (ns)
> no-relocs: buffers=   8: old=   1882 + 5.3*reloc, lut=   1874 + 5.0*reloc (ns)
> no-relocs: buffers=  16: old=   2399 + 5.3*reloc, lut=   2341 + 5.0*reloc (ns)
> no-relocs: buffers=  32: old=   3638 + 5.3*reloc, lut=   3554 + 5.0*reloc (ns)
> no-relocs: buffers=  64: old=   6622 + 5.3*reloc, lut=   6308 + 5.1*reloc (ns)
> no-relocs: buffers= 128: old=  13584 + 5.3*reloc, lut=  12872 + 5.1*reloc (ns)
> no-relocs: buffers= 256: old=  26519 + 5.8*reloc, lut=  25234 + 5.5*reloc (ns)
> no-relocs: buffers= 512: old=  67128 + 5.4*reloc, lut=  63054 + 5.2*reloc (ns)
> no-relocs: buffers=1024: old= 146705 + 5.2*reloc, lut= 139020 + 5.1*reloc (ns)
> no-relocs: buffers=2048: old= 290319 + 5.4*reloc, lut= 274705 + 5.4*reloc (ns)
> 
> vma(execbuffer):
> 
> relocation: buffers=   1: old=  21922 + 34.6*reloc, lut=  21510 + 34.0*reloc (ns)
> relocation: buffers=   2: old=  16851 + 37.4*reloc, lut=  17123 + 35.4*reloc (ns)
> relocation: buffers=   4: old=  13234 + 37.8*reloc, lut=  13436 + 36.9*reloc (ns)
> relocation: buffers=   8: old=   6549 + 40.8*reloc, lut=   6512 + 39.8*reloc (ns)
> relocation: buffers=  16: old=   5012 + 41.8*reloc, lut=   4883 + 41.0*reloc (ns)
> relocation: buffers=  32: old=   8591 + 42.2*reloc, lut=   8377 + 41.1*reloc (ns)
> relocation: buffers=  64: old=  16051 + 42.8*reloc, lut=  15658 + 41.7*reloc (ns)
> relocation: buffers= 128: old=  33397 + 44.5*reloc, lut=  32705 + 43.3*reloc (ns)
> relocation: buffers= 256: old=  68012 + 46.8*reloc, lut=  66904 + 45.5*reloc (ns)
> relocation: buffers= 512: old= 160162 + 56.4*reloc, lut= 155586 + 49.1*reloc (ns)
> relocation: buffers=1024: old= 348728 + 71.8*reloc, lut= 338113 + 55.1*reloc (ns)
> relocation: buffers=2048: old= 699331 + 98.7*reloc, lut= 675969 + 62.2*reloc (ns)
> skip-relocs: buffers=   1: old=   1642 + 16.5*reloc, lut=   1588 + 15.6*reloc (ns)
> skip-relocs: buffers=   2: old=   1676 + 16.4*reloc, lut=   1663 + 15.6*reloc (ns)
> skip-relocs: buffers=   4: old=   1926 + 16.4*reloc, lut=   1891 + 15.6*reloc (ns)
> skip-relocs: buffers=   8: old=   2218 + 16.6*reloc, lut=   2212 + 15.7*reloc (ns)
> skip-relocs: buffers=  16: old=   2933 + 16.6*reloc, lut=   2880 + 15.7*reloc (ns)
> skip-relocs: buffers=  32: old=   4594 + 16.6*reloc, lut=   4523 + 15.8*reloc (ns)
> skip-relocs: buffers=  64: old=   8414 + 16.8*reloc, lut=   8210 + 15.9*reloc (ns)
> skip-relocs: buffers= 128: old=  17429 + 17.9*reloc, lut=  17062 + 16.8*reloc (ns)
> skip-relocs: buffers= 256: old=  34794 + 19.8*reloc, lut=  34144 + 18.4*reloc (ns)
> skip-relocs: buffers= 512: old=  82287 + 27.6*reloc, lut=  80002 + 20.8*reloc (ns)
> skip-relocs: buffers=1024: old= 179851 + 38.0*reloc, lut= 174574 + 23.9*reloc (ns)
> skip-relocs: buffers=2048: old= 361511 + 57.2*reloc, lut= 350132 + 26.8*reloc (ns)
> no-relocs: buffers=   1: old=   1581 + 5.2*reloc, lut=   1579 + 4.9*reloc (ns)
> no-relocs: buffers=   2: old=   1609 + 5.2*reloc, lut=   1572 + 4.9*reloc (ns)
> no-relocs: buffers=   4: old=   1701 + 5.3*reloc, lut=   1685 + 4.9*reloc (ns)
> no-relocs: buffers=   8: old=   2084 + 5.3*reloc, lut=   2033 + 5.0*reloc (ns)
> no-relocs: buffers=  16: old=   2747 + 5.3*reloc, lut=   2686 + 5.0*reloc (ns)
> no-relocs: buffers=  32: old=   4379 + 5.3*reloc, lut=   4285 + 5.0*reloc (ns)
> no-relocs: buffers=  64: old=   8049 + 5.3*reloc, lut=   7850 + 5.1*reloc (ns)
> no-relocs: buffers= 128: old=  16641 + 5.4*reloc, lut=  16301 + 5.2*reloc (ns)
> no-relocs: buffers= 256: old=  33111 + 5.7*reloc, lut=  32539 + 5.5*reloc (ns)
> no-relocs: buffers= 512: old=  79898 + 5.4*reloc, lut=  77517 + 5.2*reloc (ns)
> no-relocs: buffers=1024: old= 172199 + 5.2*reloc, lut= 166907 + 5.1*reloc (ns)
> no-relocs: buffers=2048: old= 345542 + 5.2*reloc, lut= 334300 + 5.3*reloc (ns)
> 
> So there is measurable degradation for the extra indirections, both for
> looking up the execbuffers and for performing the relocations. Though it
> doesn't merit anything more than a footnote in the changelog.
> -Chris
> 

I'm sad I can't reproduce it. I think I amended the commit message
already, I can do more if you want.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] [v2] drm/i915: Remove node only when allocated
  2013-08-14  8:15   ` Daniel Vetter
@ 2013-08-15 14:05     ` Daniel Vetter
  2013-08-15 21:42       ` Ben Widawsky
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2013-08-15 14:05 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Ben Widawsky, Paulo Zanoni

On Wed, Aug 14, 2013 at 10:15:50AM +0200, Daniel Vetter wrote:
> On Wed, Aug 14, 2013 at 10:06:30AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 13, 2013 at 06:09:06PM -0700, Ben Widawsky wrote:
> > > VMAs can be created and not bound. One may think of it as lazy cleanup,
> > > and safely gloss over the conditions which manufacture it. In either
> > > case, when the object backing the i915 vma is destroyed, we must cleanup
> > > the vma without stumbling into a bunch of pitfalls that assume the vma
> > > is bound.
> > > 
> > > NOTE: I was pretty certain the above condition could only happen when we
> > > introduced the use of VMAs being looked up at execbuf, and already
> > > existing. Paulo has hit this though, so I must be missing something. As
> > > I believe the patch is correct anyway, therefore I won't scratch my head
> > > too hard.
> > 
> > If we end up calling evict_everything from i915_gem_object_bind_to_vm then
> > we'll hit this. One more reason for a testcase here ;-) I'll amend the
> > commit message and merge this. I've also applied a tiny bikeshed I've
> > created while reviewing existing vma_create/destroy callsites.
> 
> Actually evict_everything isn't in the callpath, and there's no memory
> allocation where the shrinker might play havoc. Furthermore the pages are
> pinned so the shrinker shouldn't be able to sneak in at all. This is a bit
> unsettling, I need to think more about this.
> 
> I'll wait with merging this for now.

Ok, I've merged the entire pile. I think now's the time to throw a bit of
igt on top to exercise the corner cases here ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] [v2] drm/i915: Remove node only when allocated
  2013-08-15 14:05     ` Daniel Vetter
@ 2013-08-15 21:42       ` Ben Widawsky
  0 siblings, 0 replies; 29+ messages in thread
From: Ben Widawsky @ 2013-08-15 21:42 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Paulo Zanoni, Ben Widawsky

On Thu, Aug 15, 2013 at 04:05:56PM +0200, Daniel Vetter wrote:
> On Wed, Aug 14, 2013 at 10:15:50AM +0200, Daniel Vetter wrote:
> > On Wed, Aug 14, 2013 at 10:06:30AM +0200, Daniel Vetter wrote:
> > > On Tue, Aug 13, 2013 at 06:09:06PM -0700, Ben Widawsky wrote:
> > > > VMAs can be created and not bound. One may think of it as lazy cleanup,
> > > > and safely gloss over the conditions which manufacture it. In either
> > > > case, when the object backing the i915 vma is destroyed, we must cleanup
> > > > the vma without stumbling into a bunch of pitfalls that assume the vma
> > > > is bound.
> > > > 
> > > > NOTE: I was pretty certain the above condition could only happen when we
> > > > introduced the use of VMAs being looked up at execbuf, and already
> > > > existing. Paulo has hit this though, so I must be missing something. As
> > > > I believe the patch is correct anyway, therefore I won't scratch my head
> > > > too hard.
> > > 
> > > If we end up calling evict_everything from i915_gem_object_bind_to_vm then
> > > we'll hit this. One more reason for a testcase here ;-) I'll amend the
> > > commit message and merge this. I've also applied a tiny bikeshed I've
> > > created while reviewing existing vma_create/destroy callsites.
> > 
> > Actually evict_everything isn't in the callpath, and there's no memory
> > allocation where the shrinker might play havoc. Furthermore the pages are
> > pinned so the shrinker shouldn't be able to sneak in at all. This is a bit
> > unsettling, I need to think more about this.
> > 
> > I'll wait with merging this for now.
> 
> Ok, I've merged the entire pile. I think now's the time to throw a bit of
> igt on top to exercise the corner cases here ...
> -Daniel

Was that a specific request for me to do something, I missed the message
if so? When should I rework the remaining patches and resend?

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-08-15 21:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-14  1:09 [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Ben Widawsky
2013-08-14  1:09 ` [PATCH 2/4] [v3] drm/i915: cleanup map&fence in bind Ben Widawsky
2013-08-14  8:18   ` Daniel Vetter
2013-08-14 17:27     ` Ben Widawsky
2013-08-14 18:08       ` Daniel Vetter
2013-08-14  1:09 ` [PATCH 3/4] drm: WARN when removing unallocated node Ben Widawsky
2013-08-14  8:52   ` [Intel-gfx] " Daniel Vetter
2013-08-14  1:09 ` [PATCH 4/4] [v4] drm/i915: Convert execbuf code to use vmas Ben Widawsky
2013-08-14  1:11   ` Ben Widawsky
2013-08-14  7:58     ` Chris Wilson
     [not found]       ` <20130814224358.GA21854@nuc-i3427.alporthouse.com>
2013-08-14 23:22         ` Ben Widawsky
2013-08-14  9:38   ` Split up execbuf vma conversion Daniel Vetter
2013-08-14  9:38     ` [PATCH 1/4] drm/i915: s/obj->exec_list/obj->obj_exec_link Daniel Vetter
2013-08-14  9:42       ` Daniel Vetter
2013-08-14  9:38     ` [PATCH 2/4] drm/i915: Switch eviction code to use vmas Daniel Vetter
2013-08-14  9:38     ` [PATCH 3/4] drm/i915: prepare bind_to_vm for preallocated vma Daniel Vetter
2013-08-14  9:38     ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Daniel Vetter
2013-08-14  9:59       ` [PATCH] drm/i915: inline vma_create into lookup_or_create_vma Daniel Vetter
2013-08-14 11:49         ` Chris Wilson
2013-08-14 12:08           ` Daniel Vetter
2013-08-14 12:14           ` Daniel Vetter
2013-08-14 16:47         ` Daniel Vetter
2013-08-14 17:35           ` Ben Widawsky
2013-08-14 11:54       ` [PATCH 4/4] drm/i915: Convert execbuf code to use vmas Chris Wilson
2013-08-14  8:06 ` [PATCH 1/4] [v2] drm/i915: Remove node only when allocated Daniel Vetter
2013-08-14  8:15   ` Daniel Vetter
2013-08-15 14:05     ` Daniel Vetter
2013-08-15 21:42       ` Ben Widawsky
2013-08-14  8:19 ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.