All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 10/42] drm/i915: Defer active reference until required
Date: Fri, 7 Oct 2016 17:58:30 +0100	[thread overview]
Message-ID: <20161007165830.GT22676@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <3820e608-0508-ac0c-19d4-dccd00e55fdd@linux.intel.com>

On Fri, Oct 07, 2016 at 05:35:38PM +0100, Tvrtko Ursulin wrote:
> 
> On 07/10/2016 10:46, Chris Wilson wrote:
> >We only need the active reference to keep the object alive after the
> >handle has been deleted (so as to prevent a synchronous gem_close). Why
> >then pay the price of a kref on every execbuf when we can insert that
> >final active ref just in time for the handle deletion?
> 
> I really dislike this.  Where there was elegance with obj/vma_put,
> it is now replaced with out of place looking
> __i915_gem_object_release_unless_active. I don't see why would
> higher level layers have to concern themselves with calling
> something with such a low-level sounding name.
> 
> How much does this influence performance and in what cases? If
> significant, could we try to come up with something similar but more
> elegant?

Back in the day, this was one of the most frequent atomic operations we
did. And whilst perf overemphasizes the stalls from locked instructions,
the sheer numbers of them we do are significant (since we do one at the
start and end of every execbuf for every object in typical conditions).
Whilst it is less significant in the face of obj->resv undoing all of the
gains, it is still a deep paper cut. (At the GL level, consider about 100
objects per batch, several thousand times a second x 2, these ops are low
hanging fruit.)

What's needed is a function to take the place of the close_object for
internally allocated objects. It is also worth noting that they are either
already part of a cache, or are suitable for caching....
-Chris
> 
> Regards,
> 
> Tvrtko
> 
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_drv.h              | 28 ++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_gem.c              | 22 +++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  2 +-
> >  drivers/gpu/drm/i915/i915_gem_context.c      |  2 +-
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  2 --
> >  drivers/gpu/drm/i915/i915_gem_gtt.c          |  7 ++++++-
> >  drivers/gpu/drm/i915/i915_gem_render_state.c |  3 ++-
> >  drivers/gpu/drm/i915/intel_ringbuffer.c      | 15 ++++++++++++---
> >  8 files changed, 71 insertions(+), 10 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index ee25e265416f..fee5cc92e2f2 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2232,6 +2232,12 @@ struct drm_i915_gem_object {
> >  	((READ_ONCE((bo)->flags) >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK)
> >  	/**
> >+	 * Have we taken a reference for the object for incomplete GPU
> >+	 * activity?
> >+	 */
> >+#define I915_BO_ACTIVE_REF (I915_BO_ACTIVE_SHIFT + I915_NUM_ENGINES)
> >+
> >+	/**
> >  	 * This is set if the object has been written to since last bound
> >  	 * to the GTT
> >  	 */
> >@@ -2399,6 +2405,28 @@ i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
> >  	return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
> >  }
> >+static inline bool
> >+i915_gem_object_has_active_reference(const struct drm_i915_gem_object *obj)
> >+{
> >+	return test_bit(I915_BO_ACTIVE_REF, &obj->flags);
> >+}
> >+
> >+static inline void
> >+i915_gem_object_set_active_reference(struct drm_i915_gem_object *obj)
> >+{
> >+	lockdep_assert_held(&obj->base.dev->struct_mutex);
> >+	__set_bit(I915_BO_ACTIVE_REF, &obj->flags);
> >+}
> >+
> >+static inline void
> >+i915_gem_object_clear_active_reference(struct drm_i915_gem_object *obj)
> >+{
> >+	lockdep_assert_held(&obj->base.dev->struct_mutex);
> >+	__clear_bit(I915_BO_ACTIVE_REF, &obj->flags);
> >+}
> >+
> >+void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj);
> >+
> >  static inline unsigned int
> >  i915_gem_object_get_tiling(struct drm_i915_gem_object *obj)
> >  {
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 7fa5cb764739..b560263bf446 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -2618,7 +2618,10 @@ i915_gem_object_retire__read(struct i915_gem_active *active,
> >  		list_move_tail(&obj->global_list,
> >  			       &request->i915->mm.bound_list);
> >-	i915_gem_object_put(obj);
> >+	if (i915_gem_object_has_active_reference(obj)) {
> >+		i915_gem_object_clear_active_reference(obj);
> >+		i915_gem_object_put(obj);
> >+	}
> >  }
> >  static bool i915_context_is_banned(const struct i915_gem_context *ctx)
> >@@ -2889,6 +2892,12 @@ void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
> >  	list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
> >  		if (vma->vm->file == fpriv)
> >  			i915_vma_close(vma);
> >+
> >+	if (i915_gem_object_is_active(obj) &&
> >+	    !i915_gem_object_has_active_reference(obj)) {
> >+		i915_gem_object_set_active_reference(obj);
> >+		i915_gem_object_get(obj);
> >+	}
> >  	mutex_unlock(&obj->base.dev->struct_mutex);
> >  }
> >@@ -4365,6 +4374,17 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
> >  	intel_runtime_pm_put(dev_priv);
> >  }
> >+void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
> >+{
> >+	lockdep_assert_held(&obj->base.dev->struct_mutex);
> >+
> >+	GEM_BUG_ON(i915_gem_object_has_active_reference(obj));
> >+	if (i915_gem_object_is_active(obj))
> >+		i915_gem_object_set_active_reference(obj);
> >+	else
> >+		i915_gem_object_put(obj);
> >+}
> >+
> >  int i915_gem_suspend(struct drm_device *dev)
> >  {
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> >diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> >index ed989596d9a3..cb25cad3318c 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> >@@ -73,7 +73,7 @@ void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
> >  		list_for_each_entry_safe(obj, next,
> >  					 &pool->cache_list[n],
> >  					 batch_pool_link)
> >-			i915_gem_object_put(obj);
> >+			__i915_gem_object_release_unless_active(obj);
> >  		INIT_LIST_HEAD(&pool->cache_list[n]);
> >  	}
> >diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >index df10f4e95736..1d2ab73a8f43 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_context.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >@@ -155,7 +155,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
> >  		if (ce->ring)
> >  			intel_ring_free(ce->ring);
> >-		i915_vma_put(ce->state);
> >+		__i915_gem_object_release_unless_active(ce->state->obj);
> >  	}
> >  	put_pid(ctx->pid);
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 72c7c1855e70..0deecd4e3b6c 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -1299,8 +1299,6 @@ void i915_vma_move_to_active(struct i915_vma *vma,
> >  	 * add the active reference first and queue for it to be dropped
> >  	 * *last*.
> >  	 */
> >-	if (!i915_gem_object_is_active(obj))
> >-		i915_gem_object_get(obj);
> >  	i915_gem_object_set_active(obj, idx);
> >  	i915_gem_active_set(&obj->last_read[idx], req);
> >diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >index 2d846aa39ca5..1c95da8424cb 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >@@ -3712,11 +3712,16 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
> >  void i915_vma_unpin_and_release(struct i915_vma **p_vma)
> >  {
> >  	struct i915_vma *vma;
> >+	struct drm_i915_gem_object *obj;
> >  	vma = fetch_and_zero(p_vma);
> >  	if (!vma)
> >  		return;
> >+	obj = vma->obj;
> >+
> >  	i915_vma_unpin(vma);
> >-	i915_vma_put(vma);
> >+	i915_vma_close(vma);
> >+
> >+	__i915_gem_object_release_unless_active(obj);
> >  }
> >diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
> >index 95b7e9afd5f8..09cf4874c45f 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> >@@ -224,7 +224,8 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
> >  	i915_vma_move_to_active(so.vma, req, 0);
> >  err_unpin:
> >  	i915_vma_unpin(so.vma);
> >+	i915_vma_close(so.vma);
> >  err_obj:
> >-	i915_gem_object_put(obj);
> >+	__i915_gem_object_release_unless_active(obj);
> >  	return ret;
> >  }
> >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >index b60c6f09fbfd..f3dfb7ca625d 100644
> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >@@ -1763,14 +1763,19 @@ static void cleanup_phys_status_page(struct intel_engine_cs *engine)
> >  static void cleanup_status_page(struct intel_engine_cs *engine)
> >  {
> >  	struct i915_vma *vma;
> >+	struct drm_i915_gem_object *obj;
> >  	vma = fetch_and_zero(&engine->status_page.vma);
> >  	if (!vma)
> >  		return;
> >+	obj = vma->obj;
> >+
> >  	i915_vma_unpin(vma);
> >-	i915_gem_object_unpin_map(vma->obj);
> >-	i915_vma_put(vma);
> >+	i915_vma_close(vma);
> >+
> >+	i915_gem_object_unpin_map(obj);
> >+	__i915_gem_object_release_unless_active(obj);
> >  }
> >  static int init_status_page(struct intel_engine_cs *engine)
> >@@ -1968,7 +1973,11 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
> >  void
> >  intel_ring_free(struct intel_ring *ring)
> >  {
> >-	i915_vma_put(ring->vma);
> >+	struct drm_i915_gem_object *obj = ring->vma->obj;
> >+
> >+	i915_vma_close(ring->vma);
> >+	__i915_gem_object_release_unless_active(obj);
> >+
> >  	kfree(ring);
> >  }
> 

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-10-07 16:58 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-07  9:45 Explicit fencing on multiple timelines, again Chris Wilson
2016-10-07  9:45 ` [PATCH 01/42] drm/i915: Allow disabling error capture Chris Wilson
2016-10-07  9:45 ` [PATCH 02/42] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-10-07 10:11   ` Joonas Lahtinen
2016-10-07  9:45 ` [PATCH 03/42] drm/i915: Always use the GTT for error capture Chris Wilson
2016-10-07  9:45 ` [PATCH 04/42] drm/i915: Consolidate error object printing Chris Wilson
2016-10-07  9:45 ` [PATCH 05/42] drm/i915: Compress GPU objects in error state Chris Wilson
2016-10-07  9:45 ` [PATCH 06/42] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
2016-10-07  9:56   ` Joonas Lahtinen
2016-10-07 15:51   ` Tvrtko Ursulin
2016-10-07 16:12     ` Chris Wilson
2016-10-07 16:16       ` Tvrtko Ursulin
2016-10-07 16:37         ` Chris Wilson
2016-10-08  8:23           ` Tvrtko Ursulin
2016-10-08  8:58             ` Chris Wilson
2016-10-07  9:46 ` [PATCH 07/42] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
2016-10-07 16:10   ` Tvrtko Ursulin
2016-10-07 16:22     ` Chris Wilson
2016-10-08  8:21       ` Tvrtko Ursulin
2016-10-07  9:46 ` [PATCH 08/42] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
2016-10-07  9:58   ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 09/42] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
2016-10-07  9:46 ` [PATCH 10/42] drm/i915: Defer active reference until required Chris Wilson
2016-10-07 16:35   ` Tvrtko Ursulin
2016-10-07 16:58     ` Chris Wilson [this message]
2016-10-08  8:18       ` Tvrtko Ursulin
2016-10-07  9:46 ` [PATCH 11/42] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2016-10-07 10:01   ` Joonas Lahtinen
2016-10-07 16:52   ` Tvrtko Ursulin
2016-10-07 17:08     ` Chris Wilson
2016-10-08  8:12       ` Tvrtko Ursulin
2016-10-08  8:32         ` Chris Wilson
2016-10-08  8:34         ` [PATCH v2] " Chris Wilson
2016-10-10  7:01           ` Joonas Lahtinen
2016-10-10  8:11           ` Tvrtko Ursulin
2016-10-10  8:19             ` Chris Wilson
2016-10-10  8:25               ` Tvrtko Ursulin
2016-10-07  9:46 ` [PATCH 12/42] drm/i915: Reuse the active golden render state batch Chris Wilson
2016-10-07  9:46 ` [PATCH 13/42] drm/i915: Markup GEM API with lockdep asserts Chris Wilson
2016-10-07  9:46 ` [PATCH 14/42] drm/i915: Use a radixtree for random access to the object's backing storage Chris Wilson
2016-10-07 10:12   ` Joonas Lahtinen
2016-10-07 11:05     ` Chris Wilson
2016-10-07 11:33       ` Joonas Lahtinen
2016-10-07 13:36   ` John Harrison
2016-10-11  9:32   ` Tvrtko Ursulin
2016-10-11 10:15     ` John Harrison
2016-10-07  9:46 ` [PATCH 15/42] drm/i915: Use radixtree to jump start intel_partial_pages() Chris Wilson
2016-10-07 13:46   ` John Harrison
2016-10-07  9:46 ` [PATCH 16/42] drm/i915: Refactor object page API Chris Wilson
2016-10-10 10:54   ` John Harrison
2016-10-11 11:23   ` Tvrtko Ursulin
2016-10-13 11:04   ` Joonas Lahtinen
2016-10-13 11:10     ` Chris Wilson
2016-10-07  9:46 ` [PATCH 17/42] drm/i915: Pass around sg_table to get_pages/put_pages backend Chris Wilson
2016-10-14  9:12   ` Joonas Lahtinen
2016-10-14  9:24     ` Chris Wilson
2016-10-14  9:28   ` Tvrtko Ursulin
2016-10-14  9:43     ` Chris Wilson
2016-10-17 10:52       ` Tvrtko Ursulin
2016-10-17 11:08         ` Chris Wilson
2016-10-07  9:46 ` [PATCH 18/42] drm/i915: Move object backing storage manipulation to its own locking Chris Wilson
2016-10-13 12:46   ` Joonas Lahtinen
2016-10-13 12:56     ` Chris Wilson
2016-10-07  9:46 ` [PATCH 19/42] drm/i915/dmabuf: Acquire the backing storage outside of struct_mutex Chris Wilson
2016-10-13 11:54   ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 20/42] drm/i915: Implement pread without struct-mutex Chris Wilson
2016-10-12 12:53   ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 21/42] drm/i915: Implement pwrite " Chris Wilson
2016-10-13 11:17   ` Joonas Lahtinen
2016-10-13 11:54     ` Chris Wilson
2016-10-14  7:08       ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 22/42] drm/i915: Acquire the backing storage outside of struct_mutex in set-domain Chris Wilson
2016-10-13 11:47   ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 23/42] drm/i915: Move object release to a freelist + worker Chris Wilson
2016-10-11  9:52   ` John Harrison
2016-10-07  9:46 ` [PATCH 24/42] drm/i915: Treat a framebuffer reference as an active reference whilst shrinking Chris Wilson
2016-10-11  9:54   ` John Harrison
2016-10-07  9:46 ` [PATCH 25/42] drm/i915: Use lockless object free Chris Wilson
2016-10-11  9:56   ` John Harrison
2016-10-07  9:46 ` [PATCH 26/42] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
2016-10-07 10:10   ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 27/42] drm: Add reference counting to drm_atomic_state Chris Wilson
2016-10-07  9:46 ` [PATCH 28/42] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
2016-10-07  9:46 ` [PATCH 29/42] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
2016-10-07  9:46 ` [PATCH 30/42] drm/i915: Queue the idling context switch after all other timelines Chris Wilson
2016-10-07  9:46 ` [PATCH 31/42] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
2016-10-07  9:46 ` [PATCH 32/42] drm/i915: Introduce a global_seqno for each request Chris Wilson
2016-10-07  9:46 ` [PATCH 33/42] drm/i915: Rename ->emit_request to ->emit_breadcrumb Chris Wilson
2016-10-07  9:46 ` [PATCH 34/42] drm/i915: Record space required for breadcrumb emission Chris Wilson
2016-10-07  9:46 ` [PATCH 35/42] drm/i915: Defer " Chris Wilson
2016-10-07  9:46 ` [PATCH 36/42] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
2016-10-07  9:46 ` [PATCH 37/42] drm/i915: Create a unique name for the context Chris Wilson
2016-10-07  9:46 ` [PATCH 38/42] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
2016-10-07  9:46 ` [PATCH 39/42] drm/i915: Defer setting of global seqno on request to submission Chris Wilson
2016-10-07 10:25   ` Joonas Lahtinen
2016-10-07 10:27   ` Joonas Lahtinen
2016-10-07 11:03     ` Chris Wilson
2016-10-07 11:10       ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 40/42] drm/i915: Enable multiple timelines Chris Wilson
2016-10-07 10:29   ` Joonas Lahtinen
2016-10-07 11:00     ` Chris Wilson
2016-10-07 11:07       ` Joonas Lahtinen
2016-10-07  9:46 ` [PATCH 41/42] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson
2016-10-07  9:46 ` [PATCH 42/42] drm/i915: Support explicit fencing for execbuf Chris Wilson
2016-10-07 10:19 ` ✗ Fi.CI.BAT: warning for series starting with [01/42] drm/i915: Allow disabling error capture Patchwork
2016-10-10  7:23 ` Patchwork
2016-10-10 15:31 ` ✗ Fi.CI.BAT: failure for series starting with [01/42] drm/i915: Allow disabling error capture (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161007165830.GT22676@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.