All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Cc: "Goel, Akash" <akash.goel@intel.com>,
	Josh Triplett <josh@joshtriplett.org>
Subject: [PATCH 16/38] drm/i915: Enable lockless lookup of request tracking via RCU
Date: Fri,  3 Jun 2016 17:55:31 +0100	[thread overview]
Message-ID: <1464972953-2726-17-git-send-email-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <1464972953-2726-1-git-send-email-chris@chris-wilson.co.uk>

If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.

However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.

v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)

Paul E. McKenney wrote:

Another approach is synchronize_rcu() after some largish number of
requests.  The advantage of this approach is that it throttles the
production of callbacks at the source.  The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it.  Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
idea is to do something like this:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going.  This would not block unless there was less than
one grace period's worth of time between invocations.  But this
assumes a busy system, where there is almost always a grace period
in flight.  But you can make that happen as follows:

        cond_synchronize_rcu(cookie);
        cookie = get_state_synchronize_rcu();
        call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one.  Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
---
 drivers/gpu/drm/i915/i915_gem.c          |   7 +-
 drivers/gpu/drm/i915/i915_gem_request.c  |   2 +-
 drivers/gpu/drm/i915/i915_gem_request.h  | 110 ++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_shrinker.c |  15 +++--
 4 files changed, 113 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f6f039aad6e2..4c0e3632214f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4158,7 +4158,9 @@ i915_gem_load_init(struct drm_device *dev)
 	dev_priv->requests =
 		kmem_cache_create("i915_gem_request",
 				  sizeof(struct drm_i915_gem_request), 0,
-				  SLAB_HWCACHE_ALIGN,
+				  SLAB_HWCACHE_ALIGN |
+				  SLAB_RECLAIM_ACCOUNT |
+				  SLAB_DESTROY_BY_RCU,
 				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->context_list);
@@ -4194,6 +4196,9 @@ void i915_gem_load_cleanup(struct drm_device *dev)
 	kmem_cache_destroy(dev_priv->requests);
 	kmem_cache_destroy(dev_priv->vmas);
 	kmem_cache_destroy(dev_priv->objects);
+
+	/* And ensure that our DESTROY_BY_RCU slabs are truly destroyed */
+	rcu_barrier();
 }
 
 int i915_gem_freeze_late(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 59afc8e547c4..a0cdd3f10566 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -344,7 +344,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 		prefetchw(next);
 
 		INIT_LIST_HEAD(&active->link);
-		active->__request = NULL;
+		RCU_INIT_POINTER(active->__request, NULL);
 
 		active->retire(active, request);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index e794801baf07..6aa246848894 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -178,6 +178,12 @@ i915_gem_request_get(struct drm_i915_gem_request *req)
 	return to_request(fence_get(&req->fence));
 }
 
+static inline struct drm_i915_gem_request *
+i915_gem_request_get_rcu(struct drm_i915_gem_request *req)
+{
+	return to_request(fence_get_rcu(&req->fence));
+}
+
 static inline void
 i915_gem_request_put(struct drm_i915_gem_request *req)
 {
@@ -276,21 +282,12 @@ static inline bool i915_spin_request(const struct drm_i915_gem_request *request,
  * resource including itself.
  */
 struct i915_gem_active {
-	struct drm_i915_gem_request *__request;
+	struct drm_i915_gem_request __rcu *__request;
 	struct list_head link;
 	void (*retire)(struct i915_gem_active *,
 		       struct drm_i915_gem_request *);
 };
 
-/**
- * i915_gem_active_set - updates the tracker to watch the current request
- * @active - the active tracker
- * @request - the request to watch
- *
- * i915_gem_active_set() watches the given @request for completion. Whilst
- * that @request is busy, the @active reports busy. When that @request is
- * retired, the @active tracker is updated to report idle.
- */
 static inline void
 init_request_active(struct i915_gem_active *active,
 		    void (*func)(struct i915_gem_active *,
@@ -300,18 +297,33 @@ init_request_active(struct i915_gem_active *active,
 	active->retire = func;
 }
 
+/**
+ * i915_gem_active_set - updates the tracker to watch the current request
+ * @active - the active tracker
+ * @request - the request to watch
+ *
+ * i915_gem_active_set() watches the given @request for completion. Whilst
+ * that @request is busy, the @active reports busy. When that @request is
+ * retired, the @active tracker is updated to report idle.
+ */
 static inline void
 i915_gem_active_set(struct i915_gem_active *active,
 		    struct drm_i915_gem_request *request)
 {
 	list_move(&active->link, &request->active_list);
-	active->__request = request;
+	rcu_assign_pointer(active->__request, request);
 }
 
 static inline struct drm_i915_gem_request *
 __i915_gem_active_peek(const struct i915_gem_active *active)
 {
-	return active->__request;
+	/* Inside the error capture (running with the driver in an unknown
+	 * state), we want to bend the rules slightly (a lot).
+	 *
+	 * Work is in progress to make it safer, in the meantime this keeps
+	 * the known issue from spamming the logs.
+	 */
+	return rcu_dereference_protected(active->__request, 1);
 }
 
 /**
@@ -326,8 +338,9 @@ static inline struct drm_i915_gem_request *
 i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex)
 {
 	struct drm_i915_gem_request *request;
-       
-	request = active->__request;
+
+	request =  rcu_dereference_protected(active->__request,
+					     lockdep_is_held(mutex));
 	if (!request || i915_gem_request_completed(request))
 		return NULL;
 
@@ -348,6 +361,72 @@ i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex)
 }
 
 /**
+ * i915_gem_active_get_rcu - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_gem_active_get() returns a reference to the active request, or NULL
+ * if the active tracker is idle. The caller must hold the RCU read lock.
+ */
+static inline struct drm_i915_gem_request *
+i915_gem_active_get_rcu(const struct i915_gem_active *active)
+{
+	/* Performing a lockless retrieval of the active request is super
+	 * tricky. SLAB_DESTROY_BY_RCU merely guarantees that the backing
+	 * slab of request objects will not be freed whilst we hold the
+	 * RCU read lock. It does not guarantee that the request itself
+	 * will not be freed and then *reused*. Viz,
+	 *
+	 * Thread A			Thread B
+	 *
+	 * req = active.request
+	 * 				retire(req) -> free(req);
+	 * 				(req is now first on the slab freelist)
+	 * 				active.request = NULL
+	 *
+	 * 				req = new submission on a new object
+	 * ref(req)
+	 *
+	 * To prevent the request from being reused whilst the caller
+	 * uses it, we take a reference like normal. Whilst acquiring
+	 * the reference we check that it is not in a destroyed state
+	 * (refcnt == 0). That prevents the request being reallocated
+	 * whilst the caller holds on to it. To check that the request
+	 * was not reallocated as we acquired the reference we have to
+	 * check that our request remains the active request across
+	 * the lookup, in the same manner as a seqlock. The visibility
+	 * of the pointer versus the reference counting is controlled
+	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
+	 *
+	 * In the middle of all that, we inspect whether the request is
+	 * complete. Retiring is lazy so the request may be completed long
+	 * before the active tracker is updated. Querying whether the
+	 * request is complete is far cheaper (as it involves no locked
+	 * instructions setting cachelines to exclusive) than acquiring
+	 * the reference, so we do it first. The RCU read lock ensures the
+	 * pointer dereference is valid, but does not ensure that the
+	 * seqno nor HWS is the right one! However, if the request was
+	 * reallocated, that means the active tracker's request was complete.
+	 * If the new request is also complete, then both are and we can
+	 * just report the active tracker is idle. If the new request is
+	 * incomplete, then we acquire a reference on it and check that
+	 * it remained the active request.
+	 */
+	do {
+		struct drm_i915_gem_request *request;
+
+		request = rcu_dereference(active->__request);
+		if (!request || i915_gem_request_completed(request))
+			return NULL;
+
+		request = i915_gem_request_get_rcu(request);
+		if (!request || request == rcu_dereference(active->__request))
+			return request;
+
+		i915_gem_request_put(request);
+	} while (1);
+}
+
+/**
  * __i915_gem_active_is_busy - report whether the active tracker is assigned
  * @active - the active tracker
  *
@@ -411,7 +490,8 @@ i915_gem_active_retire(const struct i915_gem_active *active,
 {
 	struct drm_i915_gem_request *request;
 
-	request = active->__request;
+	request =  rcu_dereference_protected(active->__request,
+					     lockdep_is_held(mutex));
 	if (!request)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 5cbc4ee52c6d..6eea4abeb9ce 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -191,6 +191,8 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 		intel_runtime_pm_put(dev_priv);
 
 	i915_gem_retire_requests(dev_priv);
+	/* expedite the RCU grace period to free some request slabs */
+	synchronize_rcu_expedited();
 
 	return count;
 }
@@ -211,10 +213,15 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
  */
 unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 {
-	return i915_gem_shrink(dev_priv, -1UL,
-			       I915_SHRINK_BOUND |
-			       I915_SHRINK_UNBOUND |
-			       I915_SHRINK_ACTIVE);
+	unsigned long freed;
+
+	freed = i915_gem_shrink(dev_priv, -1UL,
+				I915_SHRINK_BOUND |
+				I915_SHRINK_UNBOUND |
+				I915_SHRINK_ACTIVE);
+	rcu_barrier(); /* wait until our RCU delayed slab frees are completed */
+
+	return freed;
 }
 
 static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-06-03 16:56 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 16:55 Tracking VMA Chris Wilson
2016-06-03 16:55 ` [PATCH 01/38] drm/i915: Combine loops within i915_gem_evict_something Chris Wilson
2016-06-03 16:55 ` [PATCH 02/38] drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something() Chris Wilson
2016-06-03 16:55 ` [PATCH 03/38] drm/i915: Double check the active status on the batch pool Chris Wilson
2016-06-03 16:55 ` [PATCH 04/38] drm/i915: Remove request retirement before each batch Chris Wilson
2016-06-06 13:40   ` Mika Kuoppala
2016-06-03 16:55 ` [PATCH 05/38] drm/i915: Remove i915_gem_execbuffer_retire_commands() Chris Wilson
2016-06-06 14:26   ` Mika Kuoppala
2016-06-03 16:55 ` [PATCH 06/38] drm/i915: Pad GTT views of exec objects up to user specified size Chris Wilson
2016-06-08  9:41   ` Daniel Vetter
2016-06-08 10:08     ` Chris Wilson
2016-06-03 16:55 ` [PATCH 07/38] drm/i915: Split insertion/binding of an object into the VM Chris Wilson
2016-06-03 16:55 ` [PATCH 08/38] drm/i915: Record allocated vma size Chris Wilson
2016-06-03 16:55 ` [PATCH 09/38] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
2016-06-03 16:55 ` [PATCH 10/38] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Chris Wilson
2016-06-08  9:43   ` Daniel Vetter
2016-06-08 12:58     ` Chris Wilson
2016-06-03 16:55 ` [PATCH 11/38] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
2016-06-03 16:55 ` [PATCH 12/38] drm/i915: Use atomics to manipulate obj->frontbuffer_bits Chris Wilson
2016-06-03 16:55 ` [PATCH 13/38] drm/i915: Move obj->active:5 to obj->flags Chris Wilson
2016-06-08  9:53   ` Daniel Vetter
2016-06-03 16:55 ` [PATCH 14/38] drm/i915: Move i915_gem_object_wait_rendering() Chris Wilson
2016-06-03 16:55 ` [PATCH 15/38] drm/i915: Mark all current requests as complete before resetting them Chris Wilson
2016-06-03 16:55 ` Chris Wilson [this message]
2016-06-03 16:55 ` [PATCH 17/38] drm/i915: Introduce i915_gem_active_wait_unlocked() Chris Wilson
2016-06-03 16:55 ` [PATCH 18/38] drm/i915: Convert non-blocking waits for requests over to using RCU Chris Wilson
2016-06-03 16:55 ` [PATCH 19/38] drm/i915: Convert non-blocking userptr " Chris Wilson
2016-06-03 16:55 ` [PATCH 20/38] drm/i915/userptr: Remove superfluous interruptible=false on waiting Chris Wilson
2016-06-03 16:55 ` [PATCH 21/38] drm/i915: Avoid requiring struct_mutex during suspend Chris Wilson
2016-06-03 16:55 ` [PATCH 22/38] drm/gem/shrinker: Wait before acquiring struct_mutex under oom Chris Wilson
2016-06-08  9:57   ` Daniel Vetter
2016-06-08 10:04     ` Chris Wilson
2016-06-03 16:55 ` [PATCH 23/38] suspend Chris Wilson
2016-06-03 16:55 ` [PATCH 24/38] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson
2016-06-03 16:55 ` [PATCH 25/38] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
2016-06-03 16:55 ` [PATCH 26/38] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
2016-06-03 16:55 ` [PATCH 27/38] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2016-06-08  9:59   ` Daniel Vetter
2016-06-08 10:03     ` Chris Wilson
2016-06-03 16:55 ` [PATCH 28/38] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2016-06-08 10:01   ` Daniel Vetter
2016-06-03 16:55 ` [PATCH 29/38] drm/i915: Remove locking for get_tiling Chris Wilson
2016-06-08 10:02   ` Daniel Vetter
2016-06-08 10:11     ` Chris Wilson
2016-06-13 14:19       ` Daniel Vetter
2016-06-03 16:55 ` [PATCH 30/38] drm/i915: Assert that the request hasn't been retired Chris Wilson
2016-06-03 16:55 ` [PATCH 31/38] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
2016-06-03 16:55 ` [PATCH 32/38] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-06-08 10:06   ` Daniel Vetter
2016-06-08 11:37     ` Chris Wilson
2016-06-03 16:55 ` [PATCH 33/38] drm/i915: Scan GGTT active list for context object Chris Wilson
2016-06-03 16:55 ` [PATCH 34/38] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
2016-06-03 16:55 ` [PATCH 35/38] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
2016-06-03 16:55 ` [PATCH 36/38] drm/i915: Track pinned vma inside guc Chris Wilson
2016-06-03 16:55 ` [PATCH 37/38] drm/i915: Track pinned VMA Chris Wilson
2016-06-08 10:08   ` Daniel Vetter
2016-06-03 16:55 ` [PATCH 38/38] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
2016-06-06 10:42 ` ✗ Ro.CI.BAT: failure for series starting with [01/38] drm/i915: Combine loops within i915_gem_evict_something Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464972953-2726-17-git-send-email-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=akash.goel@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=josh@joshtriplett.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.