All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
@ 2016-11-16  8:58 Chris Wilson
  2016-11-16  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
  2016-11-16 14:41 ` [PATCH] " Daniel Vetter
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Wilson @ 2016-11-16  8:58 UTC (permalink / raw)
  To: intel-gfx

Soft-pinning depends upon being able to check for availabilty of an
interval and evict overlapping object from a drm_mm range manager very
quickly. Currently it uses a linear list, and so performance is dire and
not suitable as a general replacement. Worse, the current code will oops
if it tries to evict an active buffer.

It also helps if the routine reports the correct error codes as expected
by its callers and emits a tracepoint upon use.

For posterity since the wrong patch was pushed (i.e. that missed these
key points and had known bugs), this is the changelog that should have
been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for
execbuffer"):

Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
* if the user supplies a virtual address via the execobject->offset
  *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
  that object is placed at that offset in the address space selected
  by the context specifier in execbuffer.
* the location must be aligned to the GTT page size, 4096 bytes
* as the object is placed exactly as specified, it may be used by this
  execbuffer call without relocations pointing to it

It may fail to do so if:
* EINVAL is returned if the object does not have a 4096 byte aligned
  address
* the object conflicts with another pinned object (either pinned by
  hardware in that address space, e.g. scanouts in the aliasing ppgtt)
  or within the same batch.
  EBUSY is returned if the location is pinned by hardware
  EINVAL is returned if the location is already in use by the batch
* EINVAL is returned if the object conflicts with its own alignment (as meets
  the hardware requirements) or if the placement of the object does not fit
  within the address space

All other execbuffer errors apply.

Presence of this execbuf extension may be queried by passing
I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
a reported value of 1 (or greater).

Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_mm.c                   |  9 ++++
 drivers/gpu/drm/i915/gvt/aperture_gm.c     |  4 +-
 drivers/gpu/drm/i915/i915_drv.h            |  3 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      | 87 +++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++--
 drivers/gpu/drm/i915/i915_trace.h          | 23 ++++++++
 drivers/gpu/drm/i915/i915_vma.c            |  8 +--
 8 files changed, 105 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 632473beb40c..06f319e54dc1 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -339,6 +339,15 @@ int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
 	if (hole_start > node->start || hole_end < end)
 		return -ENOSPC;
 
+	if (mm->color_adjust) {
+		u64 adj_start = hole_start, adj_end = hole_end;
+
+		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
+		if (adj_start > node->start ||
+		    adj_end < node->start + node->size)
+			return -ENOSPC;
+	}
+
 	node->mm = mm;
 	node->allocated = 1;
 
diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 0d41ebc4aea6..46e66feaef32 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -73,12 +73,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
 	mutex_lock(&dev_priv->drm.struct_mutex);
 search_again:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->ggtt.base.mm,
-						  node, size, 4096, 0,
+						  node, size, 4096, -1,
 						  start, end, search_flag,
 						  alloc_flag);
 	if (ret) {
 		ret = i915_gem_evict_something(&dev_priv->ggtt.base,
-					       size, 4096, 0, start, end, 0);
+					       size, 4096, -1, start, end, 0);
 		if (ret == 0 && ++retried < 3)
 			goto search_again;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 006914ca36fe..fc3fedb99ef2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3243,7 +3243,8 @@ int __must_check i915_gem_evict_something(struct i915_address_space *vm,
 					  unsigned cache_level,
 					  u64 start, u64 end,
 					  unsigned flags);
-int __must_check i915_gem_evict_for_vma(struct i915_vma *target);
+int __must_check i915_gem_evict_for_vma(struct i915_vma *vma,
+					unsigned int flags);
 int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
 
 /* belongs in i915_gem_gtt.h */
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index bd08814b015c..094ca96843a6 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -212,45 +212,82 @@ i915_gem_evict_something(struct i915_address_space *vm,
 	return ret;
 }
 
-int
-i915_gem_evict_for_vma(struct i915_vma *target)
+int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
 {
-	struct drm_mm_node *node, *next;
+	struct list_head eviction_list;
+	struct drm_mm_node *node;
+	u64 start = target->node.start;
+	u64 end = start + target->node.size - 1;
+	struct i915_vma *vma, *next;
+	bool check_snoop;
+	int ret = 0;
 
 	lockdep_assert_held(&target->vm->dev->struct_mutex);
+	trace_i915_gem_evict_vma(target, flags);
+
+	check_snoop = target->vm->mm.color_adjust;
+	if (check_snoop) {
+		if (start > target->vm->start)
+			start -= 4096;
+		if (end < target->vm->start + target->vm->total)
+			end += 4096;
+	}
 
-	list_for_each_entry_safe(node, next,
-			&target->vm->mm.head_node.node_list,
-			node_list) {
-		struct i915_vma *vma;
-		int ret;
+	node = drm_mm_interval_first(&target->vm->mm, start, end);
+	if (!node)
+		return 0;
 
-		if (node->start + node->size <= target->node.start)
-			continue;
-		if (node->start >= target->node.start + target->node.size)
+	INIT_LIST_HEAD(&eviction_list);
+	vma = container_of(node, typeof(*vma), node);
+	list_for_each_entry_from(vma,
+				 &target->vm->mm.head_node.node_list,
+				 node.node_list) {
+		if (vma->node.start > end)
 			break;
 
-		vma = container_of(node, typeof(*vma), node);
+		if (check_snoop) {
+			if (vma->node.start + vma->node.size == target->node.start) {
+				if (vma->node.color == target->node.color)
+					continue;
+			}
+			if (vma->node.start == target->node.start + target->node.size) {
+				if (vma->node.color == target->node.color)
+					continue;
+			}
+		}
 
-		if (i915_vma_is_pinned(vma)) {
-			if (!vma->exec_entry || i915_vma_pin_count(vma) > 1)
-				/* Object is pinned for some other use */
-				return -EBUSY;
+		if (vma->node.color == -1) {
+			ret = -ENOSPC;
+			break;
+		}
 
-			/* We need to evict a buffer in the same batch */
-			if (vma->exec_entry->flags & EXEC_OBJECT_PINNED)
-				/* Overlapping fixed objects in the same batch */
-				return -EINVAL;
+		if (flags & PIN_NONBLOCK &&
+		    (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))) {
+			ret = -ENOSPC;
+			break;
+		}
 
-			return -ENOSPC;
+		/* Overlap of objects in the same batch? */
+		if (i915_vma_is_pinned(vma)) {
+			ret = -ENOSPC;
+			if (vma->exec_entry &&
+			    vma->exec_entry->flags & EXEC_OBJECT_PINNED)
+				ret = -EINVAL;
+			break;
 		}
 
-		ret = i915_vma_unbind(vma);
-		if (ret)
-			return ret;
+		__i915_vma_pin(vma);
+		list_add(&vma->exec_list, &eviction_list);
 	}
 
-	return 0;
+	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
+		list_del_init(&vma->exec_list);
+		__i915_vma_unpin(vma);
+		if (ret == 0)
+			ret = i915_vma_unbind(vma);
+	}
+
+	return ret;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e804cb2fa57e..1905e4419b1f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -274,6 +274,7 @@ static void eb_destroy(struct eb_vmas *eb)
 				       exec_list);
 		list_del_init(&vma->exec_list);
 		i915_gem_execbuffer_unreserve_vma(vma);
+		vma->exec_entry = NULL;
 		i915_vma_put(vma);
 	}
 	kfree(eb);
@@ -437,7 +438,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
 			memset(&cache->node, 0, sizeof(cache->node));
 			ret = drm_mm_insert_node_in_range_generic
 				(&ggtt->base.mm, &cache->node,
-				 4096, 0, 0,
+				 4096, 0, -1,
 				 0, ggtt->mappable_end,
 				 DRM_MM_SEARCH_DEFAULT,
 				 DRM_MM_CREATE_DEFAULT);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 01f238adfb67..bae4e9d8f682 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2072,16 +2072,14 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 
 alloc:
-	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm,
-						  &ppgtt->node, GEN6_PD_SIZE,
-						  GEN6_PD_ALIGN, 0,
-						  0, ggtt->base.total,
+	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm, &ppgtt->node,
+						  GEN6_PD_SIZE, GEN6_PD_ALIGN,
+						  -1, 0, ggtt->base.total,
 						  DRM_MM_TOPDOWN);
 	if (ret == -ENOSPC && !retried) {
 		ret = i915_gem_evict_something(&ggtt->base,
 					       GEN6_PD_SIZE, GEN6_PD_ALIGN,
-					       I915_CACHE_NONE,
-					       0, ggtt->base.total,
+					       -1, 0, ggtt->base.total,
 					       0);
 		if (ret)
 			goto err_out;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index c5d210ebaa9a..2ed60ed70fe1 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -450,6 +450,29 @@ TRACE_EVENT(i915_gem_evict_vm,
 	    TP_printk("dev=%d, vm=%p", __entry->dev, __entry->vm)
 );
 
+TRACE_EVENT(i915_gem_evict_vma,
+	    TP_PROTO(struct i915_vma *vma, unsigned int flags),
+	    TP_ARGS(vma, flags),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(struct i915_address_space *, vm)
+			     __field(u64, start)
+			     __field(u64, size)
+			     __field(unsigned int, flags)
+			    ),
+
+	    TP_fast_assign(
+			   __entry->dev = vma->vm->dev->primary->index;
+			   __entry->vm = vma->vm;
+			   __entry->start = vma->node.start;
+			   __entry->size = vma->node.size;
+			   __entry->flags = flags;
+			  ),
+
+	    TP_printk("dev=%d, vm=%p, start=%llx size=%llx, flags=%x", __entry->dev, __entry->vm, (long long)__entry->start, (long long)__entry->size, __entry->flags)
+);
+
 TRACE_EVENT(i915_gem_ring_sync_to,
 	    TP_PROTO(struct drm_i915_gem_request *to,
 		     struct drm_i915_gem_request *from),
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 738ff3a5cd6e..827eaeb75524 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -325,12 +325,6 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma,
 	if (vma->vm->mm.color_adjust == NULL)
 		return true;
 
-	if (!drm_mm_node_allocated(gtt_space))
-		return true;
-
-	if (list_empty(&gtt_space->node_list))
-		return true;
-
 	other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
 	if (other->allocated && !other->hole_follows && other->color != cache_level)
 		return false;
@@ -413,7 +407,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		vma->node.color = obj->cache_level;
 		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 		if (ret) {
-			ret = i915_gem_evict_for_vma(vma);
+			ret = i915_gem_evict_for_vma(vma, flags);
 			if (ret == 0)
 				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
 			if (ret)
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✗ Fi.CI.BAT: warning for drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-11-16  8:58 [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
@ 2016-11-16  9:46 ` Patchwork
  2016-11-16  9:50   ` Chris Wilson
  2016-11-16 14:41 ` [PATCH] " Daniel Vetter
  1 sibling, 1 reply; 7+ messages in thread
From: Patchwork @ 2016-11-16  9:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
URL   : https://patchwork.freedesktop.org/series/15395/
State : warning

== Summary ==

Series 15395v1 drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
https://patchwork.freedesktop.org/api/1.0/series/15395/revisions/1/mbox/

Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-snb-2520m)

fi-bdw-5557u     total:244  pass:229  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:244  pass:204  dwarn:0   dfail:0   fail:0   skip:40 
fi-bxt-t5700     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-j1900     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-n2820     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32 
fi-hsw-4770      total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20 
fi-hsw-4770r     total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20 
fi-ilk-650       total:244  pass:191  dwarn:0   dfail:0   fail:0   skip:53 
fi-ivb-3520m     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-ivb-3770      total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-kbl-7200u     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-skl-6260u     total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:244  pass:223  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6700k     total:244  pass:222  dwarn:1   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14 
fi-snb-2520m     total:244  pass:211  dwarn:1   dfail:0   fail:0   skip:32 
fi-snb-2600      total:244  pass:211  dwarn:0   dfail:0   fail:0   skip:33 

a68195b8144bca52c6fe1aa1e541a2ff14bf984e drm-intel-nightly: 2016y-11m-16d-05h-46m-49s UTC integration manifest
c4cc338 drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3012/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ✗ Fi.CI.BAT: warning for drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-11-16  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
@ 2016-11-16  9:50   ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2016-11-16  9:50 UTC (permalink / raw)
  To: intel-gfx

On Wed, Nov 16, 2016 at 09:46:21AM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
> URL   : https://patchwork.freedesktop.org/series/15395/
> State : warning
> 
> == Summary ==
> 
> Series 15395v1 drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
> https://patchwork.freedesktop.org/api/1.0/series/15395/revisions/1/mbox/
> 
> Test kms_pipe_crc_basic:
>         Subgroup hang-read-crc-pipe-b:
>                 pass       -> DMESG-WARN (fi-snb-2520m)

Guess how many softpin users we have in BAT.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-11-16  8:58 [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
  2016-11-16  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
@ 2016-11-16 14:41 ` Daniel Vetter
  2016-11-16 14:55   ` Chris Wilson
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-11-16 14:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Nov 16, 2016 at 08:58:30AM +0000, Chris Wilson wrote:
> Soft-pinning depends upon being able to check for availabilty of an
> interval and evict overlapping object from a drm_mm range manager very
> quickly. Currently it uses a linear list, and so performance is dire and
> not suitable as a general replacement. Worse, the current code will oops
> if it tries to evict an active buffer.
> 
> It also helps if the routine reports the correct error codes as expected
> by its callers and emits a tracepoint upon use.
> 
> For posterity since the wrong patch was pushed (i.e. that missed these
> key points and had known bugs), this is the changelog that should have
> been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for
> execbuffer"):
> 
> Userspace can pass in an offset that it presumes the object is located
> at. The kernel will then do its utmost to fit the object into that
> location. The assumption is that userspace is handling its own object
> locations (for example along with full-ppgtt) and that the kernel will
> rarely have to make space for the user's requests.
> 
> This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
> * if the user supplies a virtual address via the execobject->offset
>   *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
>   that object is placed at that offset in the address space selected
>   by the context specifier in execbuffer.
> * the location must be aligned to the GTT page size, 4096 bytes
> * as the object is placed exactly as specified, it may be used by this
>   execbuffer call without relocations pointing to it
> 
> It may fail to do so if:
> * EINVAL is returned if the object does not have a 4096 byte aligned
>   address
> * the object conflicts with another pinned object (either pinned by
>   hardware in that address space, e.g. scanouts in the aliasing ppgtt)
>   or within the same batch.
>   EBUSY is returned if the location is pinned by hardware
>   EINVAL is returned if the location is already in use by the batch
> * EINVAL is returned if the object conflicts with its own alignment (as meets
>   the hardware requirements) or if the placement of the object does not fit
>   within the address space
> 
> All other execbuffer errors apply.
> 
> Presence of this execbuf extension may be queried by passing
> I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
> a reported value of 1 (or greater).
> 
> Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/drm_mm.c                   |  9 ++++
>  drivers/gpu/drm/i915/gvt/aperture_gm.c     |  4 +-
>  drivers/gpu/drm/i915/i915_drv.h            |  3 +-
>  drivers/gpu/drm/i915/i915_gem_evict.c      | 87 +++++++++++++++++++++---------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++--
>  drivers/gpu/drm/i915/i915_trace.h          | 23 ++++++++
>  drivers/gpu/drm/i915/i915_vma.c            |  8 +--
>  8 files changed, 105 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index 632473beb40c..06f319e54dc1 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -339,6 +339,15 @@ int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
>  	if (hole_start > node->start || hole_end < end)
>  		return -ENOSPC;
>  
> +	if (mm->color_adjust) {
> +		u64 adj_start = hole_start, adj_end = hole_end;
> +
> +		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
> +		if (adj_start > node->start ||
> +		    adj_end < node->start + node->size)
> +			return -ENOSPC;
> +	}

Can't we move that up and combine with the earlier check like in other
places, i.e.

    	u64 adj_start = hole_start, adj_end = hole_end;

	if (mm->color_ajust)
		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
    	if (adj_start > node->start ||
    	    adj_end < node->start + node->size)
    		return -ENOSPC;

gcc can then optimize this if it sees fit.

>  	node->mm = mm;
>  	node->allocated = 1;

And we should split out the drm_mm bugfix into a separate patch.

>  
> diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> index 0d41ebc4aea6..46e66feaef32 100644
> --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> @@ -73,12 +73,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
>  	mutex_lock(&dev_priv->drm.struct_mutex);
>  search_again:
>  	ret = drm_mm_insert_node_in_range_generic(&dev_priv->ggtt.base.mm,
> -						  node, size, 4096, 0,
> +						  node, size, 4096, -1,
>  						  start, end, search_flag,
>  						  alloc_flag);
>  	if (ret) {
>  		ret = i915_gem_evict_something(&dev_priv->ggtt.base,
> -					       size, 4096, 0, start, end, 0);
> +					       size, 4096, -1, start, end, 0);

Bugfix for gvt coloring? Should we have a GVT_COLOR? Feels like separate
patch once more.

>  		if (ret == 0 && ++retried < 3)
>  			goto search_again;
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 006914ca36fe..fc3fedb99ef2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3243,7 +3243,8 @@ int __must_check i915_gem_evict_something(struct i915_address_space *vm,
>  					  unsigned cache_level,
>  					  u64 start, u64 end,
>  					  unsigned flags);
> -int __must_check i915_gem_evict_for_vma(struct i915_vma *target);
> +int __must_check i915_gem_evict_for_vma(struct i915_vma *vma,
> +					unsigned int flags);
>  int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
>  
>  /* belongs in i915_gem_gtt.h */
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index bd08814b015c..094ca96843a6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -212,45 +212,82 @@ i915_gem_evict_something(struct i915_address_space *vm,
>  	return ret;
>  }
>  
> -int
> -i915_gem_evict_for_vma(struct i915_vma *target)
> +int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)

Kerneldoc for this one here is missing.

>  {
> -	struct drm_mm_node *node, *next;
> +	struct list_head eviction_list;
> +	struct drm_mm_node *node;
> +	u64 start = target->node.start;
> +	u64 end = start + target->node.size - 1;
> +	struct i915_vma *vma, *next;
> +	bool check_snoop;
> +	int ret = 0;
>  
>  	lockdep_assert_held(&target->vm->dev->struct_mutex);
> +	trace_i915_gem_evict_vma(target, flags);
> +
> +	check_snoop = target->vm->mm.color_adjust;
> +	if (check_snoop) {
> +		if (start > target->vm->start)
> +			start -= 4096;
> +		if (end < target->vm->start + target->vm->total)
> +			end += 4096;
> +	}
>  
> -	list_for_each_entry_safe(node, next,
> -			&target->vm->mm.head_node.node_list,
> -			node_list) {
> -		struct i915_vma *vma;
> -		int ret;
> +	node = drm_mm_interval_first(&target->vm->mm, start, end);

drm_mm_interval_first is lacking some pretty kernel-doc, should be fixed!

> +	if (!node)
> +		return 0;
>  
> -		if (node->start + node->size <= target->node.start)
> -			continue;
> -		if (node->start >= target->node.start + target->node.size)
> +	INIT_LIST_HEAD(&eviction_list);
> +	vma = container_of(node, typeof(*vma), node);
> +	list_for_each_entry_from(vma,
> +				 &target->vm->mm.head_node.node_list,
> +				 node.node_list) {
> +		if (vma->node.start > end)
>  			break;

We have this nice drm_mm_interval_next helper, imo should use it here
instead of open-coding.
>  
> -		vma = container_of(node, typeof(*vma), node);
> +		if (check_snoop) {
> +			if (vma->node.start + vma->node.size == target->node.start) {
> +				if (vma->node.color == target->node.color)
> +					continue;
> +			}
> +			if (vma->node.start == target->node.start + target->node.size) {
> +				if (vma->node.color == target->node.color)
> +					continue;
> +			}
> +		}

Not feeling too happy with open-coding our color_adjust here.

>  
> -		if (i915_vma_is_pinned(vma)) {
> -			if (!vma->exec_entry || i915_vma_pin_count(vma) > 1)
> -				/* Object is pinned for some other use */
> -				return -EBUSY;
> +		if (vma->node.color == -1) {

The magic -1!

> +			ret = -ENOSPC;
> +			break;
> +		}
>  
> -			/* We need to evict a buffer in the same batch */
> -			if (vma->exec_entry->flags & EXEC_OBJECT_PINNED)
> -				/* Overlapping fixed objects in the same batch */
> -				return -EINVAL;
> +		if (flags & PIN_NONBLOCK &&
> +		    (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))) {
> +			ret = -ENOSPC;
> +			break;
> +		}
>  
> -			return -ENOSPC;
> +		/* Overlap of objects in the same batch? */
> +		if (i915_vma_is_pinned(vma)) {
> +			ret = -ENOSPC;
> +			if (vma->exec_entry &&
> +			    vma->exec_entry->flags & EXEC_OBJECT_PINNED)
> +				ret = -EINVAL;
> +			break;
>  		}
>  
> -		ret = i915_vma_unbind(vma);
> -		if (ret)
> -			return ret;
> +		__i915_vma_pin(vma);
> +		list_add(&vma->exec_list, &eviction_list);
>  	}
>  
> -	return 0;
> +	list_for_each_entry_safe(vma, next, &eviction_list, exec_list) {
> +		list_del_init(&vma->exec_list);
> +		__i915_vma_unpin(vma);
> +		if (ret == 0)
> +			ret = i915_vma_unbind(vma);
> +	}
> +
> +	return ret;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index e804cb2fa57e..1905e4419b1f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -274,6 +274,7 @@ static void eb_destroy(struct eb_vmas *eb)
>  				       exec_list);
>  		list_del_init(&vma->exec_list);
>  		i915_gem_execbuffer_unreserve_vma(vma);
> +		vma->exec_entry = NULL;
>  		i915_vma_put(vma);
>  	}
>  	kfree(eb);
> @@ -437,7 +438,7 @@ static void *reloc_iomap(struct drm_i915_gem_object *obj,
>  			memset(&cache->node, 0, sizeof(cache->node));
>  			ret = drm_mm_insert_node_in_range_generic
>  				(&ggtt->base.mm, &cache->node,
> -				 4096, 0, 0,
> +				 4096, 0, -1,

More magic color, please paint in some nice bikeshed ;-)

>  				 0, ggtt->mappable_end,
>  				 DRM_MM_SEARCH_DEFAULT,
>  				 DRM_MM_CREATE_DEFAULT);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 01f238adfb67..bae4e9d8f682 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2072,16 +2072,14 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
>  		return ret;
>  
>  alloc:
> -	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm,
> -						  &ppgtt->node, GEN6_PD_SIZE,
> -						  GEN6_PD_ALIGN, 0,
> -						  0, ggtt->base.total,
> +	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm, &ppgtt->node,
> +						  GEN6_PD_SIZE, GEN6_PD_ALIGN,
> +						  -1, 0, ggtt->base.total,

Yet another magic -1 color. Definitely want a define for this ...

>  						  DRM_MM_TOPDOWN);
>  	if (ret == -ENOSPC && !retried) {
>  		ret = i915_gem_evict_something(&ggtt->base,
>  					       GEN6_PD_SIZE, GEN6_PD_ALIGN,
> -					       I915_CACHE_NONE,
> -					       0, ggtt->base.total,
> +					       -1, 0, ggtt->base.total,
>  					       0);
>  		if (ret)
>  			goto err_out;
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index c5d210ebaa9a..2ed60ed70fe1 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -450,6 +450,29 @@ TRACE_EVENT(i915_gem_evict_vm,
>  	    TP_printk("dev=%d, vm=%p", __entry->dev, __entry->vm)
>  );
>  
> +TRACE_EVENT(i915_gem_evict_vma,
> +	    TP_PROTO(struct i915_vma *vma, unsigned int flags),
> +	    TP_ARGS(vma, flags),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(struct i915_address_space *, vm)
> +			     __field(u64, start)
> +			     __field(u64, size)
> +			     __field(unsigned int, flags)
> +			    ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = vma->vm->dev->primary->index;
> +			   __entry->vm = vma->vm;
> +			   __entry->start = vma->node.start;
> +			   __entry->size = vma->node.size;
> +			   __entry->flags = flags;
> +			  ),
> +
> +	    TP_printk("dev=%d, vm=%p, start=%llx size=%llx, flags=%x", __entry->dev, __entry->vm, (long long)__entry->start, (long long)__entry->size, __entry->flags)
> +);
> +
>  TRACE_EVENT(i915_gem_ring_sync_to,
>  	    TP_PROTO(struct drm_i915_gem_request *to,
>  		     struct drm_i915_gem_request *from),
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 738ff3a5cd6e..827eaeb75524 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -325,12 +325,6 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma,
>  	if (vma->vm->mm.color_adjust == NULL)
>  		return true;
>  
> -	if (!drm_mm_node_allocated(gtt_space))
> -		return true;
> -
> -	if (list_empty(&gtt_space->node_list))
> -		return true;

Not clear to me why we need this: set_cache_level should never see
pre-filled vmas, and vma_insert only has this check in the success case
where the same should hold. Why remove these 2 cases? This seems to change
set_cache_level behaviour ...

Otherwise looks all good, but needs some splitting&polish imo.
-Daniel

> -
>  	other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
>  	if (other->allocated && !other->hole_follows && other->color != cache_level)
>  		return false;
> @@ -413,7 +407,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>  		vma->node.color = obj->cache_level;
>  		ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
>  		if (ret) {
> -			ret = i915_gem_evict_for_vma(vma);
> +			ret = i915_gem_evict_for_vma(vma, flags);
>  			if (ret == 0)
>  				ret = drm_mm_reserve_node(&vma->vm->mm, &vma->node);
>  			if (ret)
> -- 
> 2.10.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-11-16 14:41 ` [PATCH] " Daniel Vetter
@ 2016-11-16 14:55   ` Chris Wilson
  2016-11-17  8:42     ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-11-16 14:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Nov 16, 2016 at 03:41:53PM +0100, Daniel Vetter wrote:
> On Wed, Nov 16, 2016 at 08:58:30AM +0000, Chris Wilson wrote:
> > Soft-pinning depends upon being able to check for availabilty of an
> > interval and evict overlapping object from a drm_mm range manager very
> > quickly. Currently it uses a linear list, and so performance is dire and
> > not suitable as a general replacement. Worse, the current code will oops
> > if it tries to evict an active buffer.
> > 
> > It also helps if the routine reports the correct error codes as expected
> > by its callers and emits a tracepoint upon use.
> > 
> > For posterity since the wrong patch was pushed (i.e. that missed these
> > key points and had known bugs), this is the changelog that should have
> > been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for
> > execbuffer"):
> > 
> > Userspace can pass in an offset that it presumes the object is located
> > at. The kernel will then do its utmost to fit the object into that
> > location. The assumption is that userspace is handling its own object
> > locations (for example along with full-ppgtt) and that the kernel will
> > rarely have to make space for the user's requests.
> > 
> > This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
> > * if the user supplies a virtual address via the execobject->offset
> >   *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
> >   that object is placed at that offset in the address space selected
> >   by the context specifier in execbuffer.
> > * the location must be aligned to the GTT page size, 4096 bytes
> > * as the object is placed exactly as specified, it may be used by this
> >   execbuffer call without relocations pointing to it
> > 
> > It may fail to do so if:
> > * EINVAL is returned if the object does not have a 4096 byte aligned
> >   address
> > * the object conflicts with another pinned object (either pinned by
> >   hardware in that address space, e.g. scanouts in the aliasing ppgtt)
> >   or within the same batch.
> >   EBUSY is returned if the location is pinned by hardware
> >   EINVAL is returned if the location is already in use by the batch
> > * EINVAL is returned if the object conflicts with its own alignment (as meets
> >   the hardware requirements) or if the placement of the object does not fit
> >   within the address space
> > 
> > All other execbuffer errors apply.
> > 
> > Presence of this execbuf extension may be queried by passing
> > I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
> > a reported value of 1 (or greater).
> > 
> > Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/drm_mm.c                   |  9 ++++
> >  drivers/gpu/drm/i915/gvt/aperture_gm.c     |  4 +-
> >  drivers/gpu/drm/i915/i915_drv.h            |  3 +-
> >  drivers/gpu/drm/i915/i915_gem_evict.c      | 87 +++++++++++++++++++++---------
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +-
> >  drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++--
> >  drivers/gpu/drm/i915/i915_trace.h          | 23 ++++++++
> >  drivers/gpu/drm/i915/i915_vma.c            |  8 +--
> >  8 files changed, 105 insertions(+), 42 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> > index 632473beb40c..06f319e54dc1 100644
> > --- a/drivers/gpu/drm/drm_mm.c
> > +++ b/drivers/gpu/drm/drm_mm.c
> > @@ -339,6 +339,15 @@ int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
> >  	if (hole_start > node->start || hole_end < end)
> >  		return -ENOSPC;
> >  
> > +	if (mm->color_adjust) {
> > +		u64 adj_start = hole_start, adj_end = hole_end;
> > +
> > +		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
> > +		if (adj_start > node->start ||
> > +		    adj_end < node->start + node->size)
> > +			return -ENOSPC;
> > +	}
> 
> Can't we move that up and combine with the earlier check like in other
> places, i.e.

Unlike the other places we don't use adj_start, adj_end.

> > --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > @@ -73,12 +73,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
> >  	mutex_lock(&dev_priv->drm.struct_mutex);
> >  search_again:
> >  	ret = drm_mm_insert_node_in_range_generic(&dev_priv->ggtt.base.mm,
> > -						  node, size, 4096, 0,
> > +						  node, size, 4096, -1,
> >  						  start, end, search_flag,
> >  						  alloc_flag);
> >  	if (ret) {
> >  		ret = i915_gem_evict_something(&dev_priv->ggtt.base,
> > -					       size, 4096, 0, start, end, 0);
> > +					       size, 4096, -1, start, end, 0);
> 
> Bugfix for gvt coloring? Should we have a GVT_COLOR? Feels like separate
> patch once more.

No, copy paste of broken code. The pattern was broken by evict_for_vma.

> > -int
> > -i915_gem_evict_for_vma(struct i915_vma *target)
> > +int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
> 
> Kerneldoc for this one here is missing.

There should be exactly one caller. It is not an interface that should
be used.

> >  {
> > -	struct drm_mm_node *node, *next;
> > +	struct list_head eviction_list;
> > +	struct drm_mm_node *node;
> > +	u64 start = target->node.start;
> > +	u64 end = start + target->node.size - 1;
> > +	struct i915_vma *vma, *next;
> > +	bool check_snoop;
> > +	int ret = 0;
> >  
> >  	lockdep_assert_held(&target->vm->dev->struct_mutex);
> > +	trace_i915_gem_evict_vma(target, flags);
> > +
> > +	check_snoop = target->vm->mm.color_adjust;
> > +	if (check_snoop) {
> > +		if (start > target->vm->start)
> > +			start -= 4096;
> > +		if (end < target->vm->start + target->vm->total)
> > +			end += 4096;
> > +	}
> >  
> > -	list_for_each_entry_safe(node, next,
> > -			&target->vm->mm.head_node.node_list,
> > -			node_list) {
> > -		struct i915_vma *vma;
> > -		int ret;
> > +	node = drm_mm_interval_first(&target->vm->mm, start, end);
> 
> drm_mm_interval_first is lacking some pretty kernel-doc, should be fixed!
> 
> > +	if (!node)
> > +		return 0;
> >  
> > -		if (node->start + node->size <= target->node.start)
> > -			continue;
> > -		if (node->start >= target->node.start + target->node.size)
> > +	INIT_LIST_HEAD(&eviction_list);
> > +	vma = container_of(node, typeof(*vma), node);
> > +	list_for_each_entry_from(vma,
> > +				 &target->vm->mm.head_node.node_list,
> > +				 node.node_list) {
> > +		if (vma->node.start > end)
> >  			break;
> 
> We have this nice drm_mm_interval_next helper, imo should use it here
> instead of open-coding.

It's not the same. This is a linear walk which is fewer pointer dances.

> >  
> > -		vma = container_of(node, typeof(*vma), node);
> > +		if (check_snoop) {
> > +			if (vma->node.start + vma->node.size == target->node.start) {
> > +				if (vma->node.color == target->node.color)
> > +					continue;
> > +			}
> > +			if (vma->node.start == target->node.start + target->node.size) {
> > +				if (vma->node.color == target->node.color)
> > +					continue;
> > +			}
> > +		}
> 
> Not feeling too happy with open-coding our color_adjust here.

Good, because it doesn't.

> > -		if (i915_vma_is_pinned(vma)) {
> > -			if (!vma->exec_entry || i915_vma_pin_count(vma) > 1)
> > -				/* Object is pinned for some other use */
> > -				return -EBUSY;
> > +		if (vma->node.color == -1) {
> 
> The magic -1!

And now you see why it is absolutely required. And not any more magic
than any other color value.

> > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> > index 738ff3a5cd6e..827eaeb75524 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -325,12 +325,6 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma,
> >  	if (vma->vm->mm.color_adjust == NULL)
> >  		return true;
> >  
> > -	if (!drm_mm_node_allocated(gtt_space))
> > -		return true;
> > -
> > -	if (list_empty(&gtt_space->node_list))
> > -		return true;
> 
> Not clear to me why we need this: set_cache_level should never see
> pre-filled vmas, and vma_insert only has this check in the success case
> where the same should hold. Why remove these 2 cases? This seems to change
> set_cache_level behaviour ...

Because they're irrelevant. We only get called when bound and the second
is not our bug.

There's no change in caller behaviour.

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-11-16 14:55   ` Chris Wilson
@ 2016-11-17  8:42     ` Daniel Vetter
  2016-11-17  9:22       ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-11-17  8:42 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Wed, Nov 16, 2016 at 02:55:59PM +0000, Chris Wilson wrote:
> On Wed, Nov 16, 2016 at 03:41:53PM +0100, Daniel Vetter wrote:
> > On Wed, Nov 16, 2016 at 08:58:30AM +0000, Chris Wilson wrote:
> > > Soft-pinning depends upon being able to check for availabilty of an
> > > interval and evict overlapping object from a drm_mm range manager very
> > > quickly. Currently it uses a linear list, and so performance is dire and
> > > not suitable as a general replacement. Worse, the current code will oops
> > > if it tries to evict an active buffer.
> > > 
> > > It also helps if the routine reports the correct error codes as expected
> > > by its callers and emits a tracepoint upon use.
> > > 
> > > For posterity since the wrong patch was pushed (i.e. that missed these
> > > key points and had known bugs), this is the changelog that should have
> > > been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for
> > > execbuffer"):
> > > 
> > > Userspace can pass in an offset that it presumes the object is located
> > > at. The kernel will then do its utmost to fit the object into that
> > > location. The assumption is that userspace is handling its own object
> > > locations (for example along with full-ppgtt) and that the kernel will
> > > rarely have to make space for the user's requests.
> > > 
> > > This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
> > > * if the user supplies a virtual address via the execobject->offset
> > >   *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
> > >   that object is placed at that offset in the address space selected
> > >   by the context specifier in execbuffer.
> > > * the location must be aligned to the GTT page size, 4096 bytes
> > > * as the object is placed exactly as specified, it may be used by this
> > >   execbuffer call without relocations pointing to it
> > > 
> > > It may fail to do so if:
> > > * EINVAL is returned if the object does not have a 4096 byte aligned
> > >   address
> > > * the object conflicts with another pinned object (either pinned by
> > >   hardware in that address space, e.g. scanouts in the aliasing ppgtt)
> > >   or within the same batch.
> > >   EBUSY is returned if the location is pinned by hardware
> > >   EINVAL is returned if the location is already in use by the batch
> > > * EINVAL is returned if the object conflicts with its own alignment (as meets
> > >   the hardware requirements) or if the placement of the object does not fit
> > >   within the address space
> > > 
> > > All other execbuffer errors apply.
> > > 
> > > Presence of this execbuf extension may be queried by passing
> > > I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
> > > a reported value of 1 (or greater).
> > > 
> > > Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/drm_mm.c                   |  9 ++++
> > >  drivers/gpu/drm/i915/gvt/aperture_gm.c     |  4 +-
> > >  drivers/gpu/drm/i915/i915_drv.h            |  3 +-
> > >  drivers/gpu/drm/i915/i915_gem_evict.c      | 87 +++++++++++++++++++++---------
> > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +-
> > >  drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++--
> > >  drivers/gpu/drm/i915/i915_trace.h          | 23 ++++++++
> > >  drivers/gpu/drm/i915/i915_vma.c            |  8 +--
> > >  8 files changed, 105 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> > > index 632473beb40c..06f319e54dc1 100644
> > > --- a/drivers/gpu/drm/drm_mm.c
> > > +++ b/drivers/gpu/drm/drm_mm.c
> > > @@ -339,6 +339,15 @@ int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
> > >  	if (hole_start > node->start || hole_end < end)
> > >  		return -ENOSPC;
> > >  
> > > +	if (mm->color_adjust) {
> > > +		u64 adj_start = hole_start, adj_end = hole_end;
> > > +
> > > +		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
> > > +		if (adj_start > node->start ||
> > > +		    adj_end < node->start + node->size)
> > > +			return -ENOSPC;
> > > +	}
> > 
> > Can't we move that up and combine with the earlier check like in other
> > places, i.e.
> 
> Unlike the other places we don't use adj_start, adj_end.

I meant the duplication of the if (does_it_fit) return -ENOSPC; snippet,
and imo unifying that makes the code more readable irrespective of whether
we need adj_start/end later on. And would be more consistent with other
places in drm_mm.c.

> > > --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > > +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > > @@ -73,12 +73,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
> > >  	mutex_lock(&dev_priv->drm.struct_mutex);
> > >  search_again:
> > >  	ret = drm_mm_insert_node_in_range_generic(&dev_priv->ggtt.base.mm,
> > > -						  node, size, 4096, 0,
> > > +						  node, size, 4096, -1,
> > >  						  start, end, search_flag,
> > >  						  alloc_flag);
> > >  	if (ret) {
> > >  		ret = i915_gem_evict_something(&dev_priv->ggtt.base,
> > > -					       size, 4096, 0, start, end, 0);
> > > +					       size, 4096, -1, start, end, 0);
> > 
> > Bugfix for gvt coloring? Should we have a GVT_COLOR? Feels like separate
> > patch once more.
> 
> No, copy paste of broken code. The pattern was broken by evict_for_vma.
> 
> > > -int
> > > -i915_gem_evict_for_vma(struct i915_vma *target)
> > > +int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
> > 
> > Kerneldoc for this one here is missing.
> 
> There should be exactly one caller. It is not an interface that should
> be used.

Same holds for others, I just think it's a good place to put some
documentation for how this is supposed to work. At least personally I do
believe a bit more prose does make the code more approachable, but if the
i915 gem gang deems them useless we can forgo them too.

> > >  {
> > > -	struct drm_mm_node *node, *next;
> > > +	struct list_head eviction_list;
> > > +	struct drm_mm_node *node;
> > > +	u64 start = target->node.start;
> > > +	u64 end = start + target->node.size - 1;
> > > +	struct i915_vma *vma, *next;
> > > +	bool check_snoop;
> > > +	int ret = 0;
> > >  
> > >  	lockdep_assert_held(&target->vm->dev->struct_mutex);
> > > +	trace_i915_gem_evict_vma(target, flags);
> > > +
> > > +	check_snoop = target->vm->mm.color_adjust;
> > > +	if (check_snoop) {
> > > +		if (start > target->vm->start)
> > > +			start -= 4096;
> > > +		if (end < target->vm->start + target->vm->total)
> > > +			end += 4096;
> > > +	}
> > >  
> > > -	list_for_each_entry_safe(node, next,
> > > -			&target->vm->mm.head_node.node_list,
> > > -			node_list) {
> > > -		struct i915_vma *vma;
> > > -		int ret;
> > > +	node = drm_mm_interval_first(&target->vm->mm, start, end);
> > 
> > drm_mm_interval_first is lacking some pretty kernel-doc, should be fixed!
> > 
> > > +	if (!node)
> > > +		return 0;
> > >  
> > > -		if (node->start + node->size <= target->node.start)
> > > -			continue;
> > > -		if (node->start >= target->node.start + target->node.size)
> > > +	INIT_LIST_HEAD(&eviction_list);
> > > +	vma = container_of(node, typeof(*vma), node);
> > > +	list_for_each_entry_from(vma,
> > > +				 &target->vm->mm.head_node.node_list,
> > > +				 node.node_list) {
> > > +		if (vma->node.start > end)
> > >  			break;
> > 
> > We have this nice drm_mm_interval_next helper, imo should use it here
> > instead of open-coding.
> 
> It's not the same. This is a linear walk which is fewer pointer dances.

Shouldn't we fix up drm_mm_interval_next then to be linear? drm_mm never
has overlapping intervals, so the full r-b search is overkill.

> > > -		vma = container_of(node, typeof(*vma), node);
> > > +		if (check_snoop) {
> > > +			if (vma->node.start + vma->node.size == target->node.start) {
> > > +				if (vma->node.color == target->node.color)
> > > +					continue;
> > > +			}
> > > +			if (vma->node.start == target->node.start + target->node.size) {
> > > +				if (vma->node.color == target->node.color)
> > > +					continue;
> > > +			}
> > > +		}
> > 
> > Not feeling too happy with open-coding our color_adjust here.
> 
> Good, because it doesn't.

Ok, what does it do then? It looked like it makes sure the guard page is
there if needed, which is what our color_adjust does too.

> > > -		if (i915_vma_is_pinned(vma)) {
> > > -			if (!vma->exec_entry || i915_vma_pin_count(vma) > 1)
> > > -				/* Object is pinned for some other use */
> > > -				return -EBUSY;
> > > +		if (vma->node.color == -1) {
> > 
> > The magic -1!
> 
> And now you see why it is absolutely required. And not any more magic
> than any other color value.

#define MAGIC -1

For a suitable name of MAGIC is what I meant of course ;-)

> > > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> > > index 738ff3a5cd6e..827eaeb75524 100644
> > > --- a/drivers/gpu/drm/i915/i915_vma.c
> > > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > > @@ -325,12 +325,6 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma,
> > >  	if (vma->vm->mm.color_adjust == NULL)
> > >  		return true;
> > >  
> > > -	if (!drm_mm_node_allocated(gtt_space))
> > > -		return true;
> > > -
> > > -	if (list_empty(&gtt_space->node_list))
> > > -		return true;
> > 
> > Not clear to me why we need this: set_cache_level should never see
> > pre-filled vmas, and vma_insert only has this check in the success case
> > where the same should hold. Why remove these 2 cases? This seems to change
> > set_cache_level behaviour ...
> 
> Because they're irrelevant. We only get called when bound and the second
> is not our bug.
> 
> There's no change in caller behaviour.

Yeah, but then the change seems unrelated to the bugfix here and imo
should be split out.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
  2016-11-17  8:42     ` Daniel Vetter
@ 2016-11-17  9:22       ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2016-11-17  9:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Nov 17, 2016 at 09:42:52AM +0100, Daniel Vetter wrote:
> On Wed, Nov 16, 2016 at 02:55:59PM +0000, Chris Wilson wrote:
> > On Wed, Nov 16, 2016 at 03:41:53PM +0100, Daniel Vetter wrote:
> > > On Wed, Nov 16, 2016 at 08:58:30AM +0000, Chris Wilson wrote:
> > > > Soft-pinning depends upon being able to check for availabilty of an
> > > > interval and evict overlapping object from a drm_mm range manager very
> > > > quickly. Currently it uses a linear list, and so performance is dire and
> > > > not suitable as a general replacement. Worse, the current code will oops
> > > > if it tries to evict an active buffer.
> > > > 
> > > > It also helps if the routine reports the correct error codes as expected
> > > > by its callers and emits a tracepoint upon use.
> > > > 
> > > > For posterity since the wrong patch was pushed (i.e. that missed these
> > > > key points and had known bugs), this is the changelog that should have
> > > > been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for
> > > > execbuffer"):
> > > > 
> > > > Userspace can pass in an offset that it presumes the object is located
> > > > at. The kernel will then do its utmost to fit the object into that
> > > > location. The assumption is that userspace is handling its own object
> > > > locations (for example along with full-ppgtt) and that the kernel will
> > > > rarely have to make space for the user's requests.
> > > > 
> > > > This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
> > > > * if the user supplies a virtual address via the execobject->offset
> > > >   *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
> > > >   that object is placed at that offset in the address space selected
> > > >   by the context specifier in execbuffer.
> > > > * the location must be aligned to the GTT page size, 4096 bytes
> > > > * as the object is placed exactly as specified, it may be used by this
> > > >   execbuffer call without relocations pointing to it
> > > > 
> > > > It may fail to do so if:
> > > > * EINVAL is returned if the object does not have a 4096 byte aligned
> > > >   address
> > > > * the object conflicts with another pinned object (either pinned by
> > > >   hardware in that address space, e.g. scanouts in the aliasing ppgtt)
> > > >   or within the same batch.
> > > >   EBUSY is returned if the location is pinned by hardware
> > > >   EINVAL is returned if the location is already in use by the batch
> > > > * EINVAL is returned if the object conflicts with its own alignment (as meets
> > > >   the hardware requirements) or if the placement of the object does not fit
> > > >   within the address space
> > > > 
> > > > All other execbuffer errors apply.
> > > > 
> > > > Presence of this execbuf extension may be queried by passing
> > > > I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
> > > > a reported value of 1 (or greater).
> > > > 
> > > > Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer")
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/drm_mm.c                   |  9 ++++
> > > >  drivers/gpu/drm/i915/gvt/aperture_gm.c     |  4 +-
> > > >  drivers/gpu/drm/i915/i915_drv.h            |  3 +-
> > > >  drivers/gpu/drm/i915/i915_gem_evict.c      | 87 +++++++++++++++++++++---------
> > > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +-
> > > >  drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++--
> > > >  drivers/gpu/drm/i915/i915_trace.h          | 23 ++++++++
> > > >  drivers/gpu/drm/i915/i915_vma.c            |  8 +--
> > > >  8 files changed, 105 insertions(+), 42 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> > > > index 632473beb40c..06f319e54dc1 100644
> > > > --- a/drivers/gpu/drm/drm_mm.c
> > > > +++ b/drivers/gpu/drm/drm_mm.c
> > > > @@ -339,6 +339,15 @@ int drm_mm_reserve_node(struct drm_mm *mm, struct drm_mm_node *node)
> > > >  	if (hole_start > node->start || hole_end < end)
> > > >  		return -ENOSPC;
> > > >  
> > > > +	if (mm->color_adjust) {
> > > > +		u64 adj_start = hole_start, adj_end = hole_end;
> > > > +
> > > > +		mm->color_adjust(hole, node->color, &adj_start, &adj_end);
> > > > +		if (adj_start > node->start ||
> > > > +		    adj_end < node->start + node->size)
> > > > +			return -ENOSPC;
> > > > +	}
> > > 
> > > Can't we move that up and combine with the earlier check like in other
> > > places, i.e.
> > 
> > Unlike the other places we don't use adj_start, adj_end.
> 
> I meant the duplication of the if (does_it_fit) return -ENOSPC; snippet,
> and imo unifying that makes the code more readable irrespective of whether
> we need adj_start/end later on. And would be more consistent with other
> places in drm_mm.c.

(Of which there are too many.)
 
> > > > --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > > > +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> > > > @@ -73,12 +73,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
> > > >  	mutex_lock(&dev_priv->drm.struct_mutex);
> > > >  search_again:
> > > >  	ret = drm_mm_insert_node_in_range_generic(&dev_priv->ggtt.base.mm,
> > > > -						  node, size, 4096, 0,
> > > > +						  node, size, 4096, -1,
> > > >  						  start, end, search_flag,
> > > >  						  alloc_flag);
> > > >  	if (ret) {
> > > >  		ret = i915_gem_evict_something(&dev_priv->ggtt.base,
> > > > -					       size, 4096, 0, start, end, 0);
> > > > +					       size, 4096, -1, start, end, 0);
> > > 
> > > Bugfix for gvt coloring? Should we have a GVT_COLOR? Feels like separate
> > > patch once more.
> > 
> > No, copy paste of broken code. The pattern was broken by evict_for_vma.
> > 
> > > > -int
> > > > -i915_gem_evict_for_vma(struct i915_vma *target)
> > > > +int i915_gem_evict_for_vma(struct i915_vma *target, unsigned int flags)
> > > 
> > > Kerneldoc for this one here is missing.
> > 
> > There should be exactly one caller. It is not an interface that should
> > be used.
> 
> Same holds for others, I just think it's a good place to put some
> documentation for how this is supposed to work. At least personally I do
> believe a bit more prose does make the code more approachable, but if the
> i915 gem gang deems them useless we can forgo them too.

Eviction is, this just applies the same rules to an explicit region.
This would be "evicts the matching region in the address space as
described by VMA". The rules on what can and cannot be evicted based on
the requirements of the caller as the same as the others and are derived
from the pin/bind/insert function which is the general interface that we
expose to the rest of the driver. That is the interface (around
drm_mm_reserve_node in this case) that we are missing.

> > > >  {
> > > > -	struct drm_mm_node *node, *next;
> > > > +	struct list_head eviction_list;
> > > > +	struct drm_mm_node *node;
> > > > +	u64 start = target->node.start;
> > > > +	u64 end = start + target->node.size - 1;
> > > > +	struct i915_vma *vma, *next;
> > > > +	bool check_snoop;
> > > > +	int ret = 0;
> > > >  
> > > >  	lockdep_assert_held(&target->vm->dev->struct_mutex);
> > > > +	trace_i915_gem_evict_vma(target, flags);
> > > > +
> > > > +	check_snoop = target->vm->mm.color_adjust;
> > > > +	if (check_snoop) {
> > > > +		if (start > target->vm->start)
> > > > +			start -= 4096;
> > > > +		if (end < target->vm->start + target->vm->total)
> > > > +			end += 4096;
> > > > +	}
> > > >  
> > > > -	list_for_each_entry_safe(node, next,
> > > > -			&target->vm->mm.head_node.node_list,
> > > > -			node_list) {
> > > > -		struct i915_vma *vma;
> > > > -		int ret;
> > > > +	node = drm_mm_interval_first(&target->vm->mm, start, end);
> > > 
> > > drm_mm_interval_first is lacking some pretty kernel-doc, should be fixed!
> > > 
> > > > +	if (!node)
> > > > +		return 0;
> > > >  
> > > > -		if (node->start + node->size <= target->node.start)
> > > > -			continue;
> > > > -		if (node->start >= target->node.start + target->node.size)
> > > > +	INIT_LIST_HEAD(&eviction_list);
> > > > +	vma = container_of(node, typeof(*vma), node);
> > > > +	list_for_each_entry_from(vma,
> > > > +				 &target->vm->mm.head_node.node_list,
> > > > +				 node.node_list) {
> > > > +		if (vma->node.start > end)
> > > >  			break;
> > > 
> > > We have this nice drm_mm_interval_next helper, imo should use it here
> > > instead of open-coding.
> > 
> > It's not the same. This is a linear walk which is fewer pointer dances.
> 
> Shouldn't we fix up drm_mm_interval_next then to be linear? drm_mm never
> has overlapping intervals, so the full r-b search is overkill.
> 
> > > > -		vma = container_of(node, typeof(*vma), node);
> > > > +		if (check_snoop) {
> > > > +			if (vma->node.start + vma->node.size == target->node.start) {
> > > > +				if (vma->node.color == target->node.color)
> > > > +					continue;
> > > > +			}
> > > > +			if (vma->node.start == target->node.start + target->node.size) {
> > > > +				if (vma->node.color == target->node.color)
> > > > +					continue;
> > > > +			}
> > > > +		}
> > > 
> > > Not feeling too happy with open-coding our color_adjust here.
> > 
> > Good, because it doesn't.
> 
> Ok, what does it do then? It looked like it makes sure the guard page is
> there if needed, which is what our color_adjust does too.

color_adjust adds the guard page following the object. This is the
complementary function to ask if we need a guard page on this or the
previous object, having expanded our search to include the bookends.
Unlike color_adjust we have to check both for a missing page and
previously adjust the search.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-11-17  9:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-16  8:58 [PATCH] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
2016-11-16  9:46 ` ✗ Fi.CI.BAT: warning for " Patchwork
2016-11-16  9:50   ` Chris Wilson
2016-11-16 14:41 ` [PATCH] " Daniel Vetter
2016-11-16 14:55   ` Chris Wilson
2016-11-17  8:42     ` Daniel Vetter
2016-11-17  9:22       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.