dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] Waitboost drm syncobj waits
@ 2023-02-09 15:43 Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 1/3] dma-fence: Track explicit waiters Tvrtko Ursulin
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Tvrtko Ursulin @ 2023-02-09 15:43 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In i915 we have this concept of "wait boosting" where we give a priority boost
for instance to fences which are actively waited upon from userspace. This has
it's pros and cons and can certainly be discussed at lenght. However fact is
some workloads really like it.

Problem is that with the arrival of drm syncobj and a new userspace waiting
entry point it added, the waitboost mechanism was bypassed. Hence I cooked up
this mini series really (really) quickly to see if some discussion can be had.

It adds a concept of "wait count" to dma fence, which is incremented for every
explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback (like
dma_fence_add_callback but from explicit/userspace wait paths).

Individual drivers can then inspect this via dma_fence_wait_count() and decide
to wait boost the waits on such fences.

Again, quickly put together and smoke tested only - no guarantees whatsoever and
I will rely on interested parties to test and report if it even works or how
well.

Tvrtko Ursulin (3):
  dma-fence: Track explicit waiters
  drm/syncobj: Mark syncobj waits as external waiters
  drm/i915: Waitboost external waits

 drivers/dma-buf/dma-fence.c         | 102 ++++++++++++++++++++--------
 drivers/gpu/drm/drm_syncobj.c       |   6 +-
 drivers/gpu/drm/i915/i915_request.c |  13 +++-
 include/linux/dma-fence.h           |  14 ++++
 4 files changed, 101 insertions(+), 34 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC 1/3] dma-fence: Track explicit waiters
  2023-02-09 15:43 [RFC 0/3] Waitboost drm syncobj waits Tvrtko Ursulin
@ 2023-02-09 15:43 ` Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 2/3] drm/syncobj: Mark syncobj waits as external waiters Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 3/3] drm/i915: Waitboost external waits Tvrtko Ursulin
  2 siblings, 0 replies; 4+ messages in thread
From: Tvrtko Ursulin @ 2023-02-09 15:43 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Track how many callers are explicity waiting on a fence to signal and
allow querying that via new dma_fence_wait_count() API.

This provides infrastructure on top of which generic "waitboost" concepts
can be implemented by individual drivers. Wait-boosting is any reactive
activity, such as raising the GPU clocks, which happens while there are
active external waiters.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/dma-buf/dma-fence.c | 102 ++++++++++++++++++++++++++----------
 include/linux/dma-fence.h   |  14 +++++
 2 files changed, 88 insertions(+), 28 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..d463c06ebd6d 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -344,6 +344,24 @@ void __dma_fence_might_wait(void)
 }
 #endif
 
+static void incr_wait_count(struct dma_fence *fence, struct dma_fence_cb *cb)
+{
+	lockdep_assert_held(fence->lock);
+
+	__set_bit(DMA_FENCE_CB_FLAG_WAITCOUNT_BIT, &cb->flags);
+	fence->waitcount++;
+	WARN_ON_ONCE(!fence->waitcount);
+}
+
+static void decr_wait_count(struct dma_fence *fence, struct dma_fence_cb *cb)
+{
+	lockdep_assert_held(fence->lock);
+
+	if (__test_and_clear_bit(DMA_FENCE_CB_FLAG_WAITCOUNT_BIT, &cb->flags)) {
+		WARN_ON_ONCE(!fence->waitcount);
+		fence->waitcount--;
+	}
+}
 
 /**
  * dma_fence_signal_timestamp_locked - signal completion of a fence
@@ -384,6 +402,7 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 
 	list_for_each_entry_safe(cur, tmp, &cb_list, node) {
 		INIT_LIST_HEAD(&cur->node);
+		decr_wait_count(fence, cur);
 		cur->func(fence, cur);
 	}
 
@@ -612,35 +631,15 @@ void dma_fence_enable_sw_signaling(struct dma_fence *fence)
 	unsigned long flags;
 
 	spin_lock_irqsave(fence->lock, flags);
+	fence->waitcount++;
+	WARN_ON_ONCE(!fence->waitcount);
 	__dma_fence_enable_signaling(fence);
 	spin_unlock_irqrestore(fence->lock, flags);
 }
 EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
 
-/**
- * dma_fence_add_callback - add a callback to be called when the fence
- * is signaled
- * @fence: the fence to wait on
- * @cb: the callback to register
- * @func: the function to call
- *
- * Add a software callback to the fence. The caller should keep a reference to
- * the fence.
- *
- * @cb will be initialized by dma_fence_add_callback(), no initialization
- * by the caller is required. Any number of callbacks can be registered
- * to a fence, but a callback can only be registered to one fence at a time.
- *
- * If fence is already signaled, this function will return -ENOENT (and
- * *not* call the callback).
- *
- * Note that the callback can be called from an atomic context or irq context.
- *
- * Returns 0 in case of success, -ENOENT if the fence is already signaled
- * and -EINVAL in case of error.
- */
-int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
-			   dma_fence_func_t func)
+static int add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
+			dma_fence_func_t func, bool wait)
 {
 	unsigned long flags;
 	int ret = 0;
@@ -655,10 +654,15 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
 
 	spin_lock_irqsave(fence->lock, flags);
 
+	if (wait)
+		incr_wait_count(fence, cb);
+
 	if (__dma_fence_enable_signaling(fence)) {
 		cb->func = func;
 		list_add_tail(&cb->node, &fence->cb_list);
 	} else {
+		if (wait)
+			decr_wait_count(fence, cb);
 		INIT_LIST_HEAD(&cb->node);
 		ret = -ENOENT;
 	}
@@ -667,8 +671,44 @@ int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
 
 	return ret;
 }
+
+/**
+ * dma_fence_add_callback - add a callback to be called when the fence
+ * is signaled
+ * @fence: the fence to wait on
+ * @cb: the callback to register
+ * @func: the function to call
+ *
+ * Add a software callback to the fence. The caller should keep a reference to
+ * the fence.
+ *
+ * @cb will be initialized by dma_fence_add_callback(), no initialization
+ * by the caller is required. Any number of callbacks can be registered
+ * to a fence, but a callback can only be registered to one fence at a time.
+ *
+ * If fence is already signaled, this function will return -ENOENT (and
+ * *not* call the callback).
+ *
+ * Note that the callback can be called from an atomic context or irq context.
+ *
+ * Returns 0 in case of success, -ENOENT if the fence is already signaled
+ * and -EINVAL in case of error.
+ */
+int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
+			   dma_fence_func_t func)
+{
+	return add_callback(fence, cb, func, false);
+}
 EXPORT_SYMBOL(dma_fence_add_callback);
 
+int dma_fence_add_wait_callback(struct dma_fence *fence,
+				struct dma_fence_cb *cb,
+				dma_fence_func_t func)
+{
+	return add_callback(fence, cb, func, true);
+}
+EXPORT_SYMBOL(dma_fence_add_wait_callback);
+
 /**
  * dma_fence_get_status - returns the status upon completion
  * @fence: the dma_fence to query
@@ -721,8 +761,10 @@ dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)
 	spin_lock_irqsave(fence->lock, flags);
 
 	ret = !list_empty(&cb->node);
-	if (ret)
+	if (ret) {
+		decr_wait_count(fence, cb);
 		list_del_init(&cb->node);
+	}
 
 	spin_unlock_irqrestore(fence->lock, flags);
 
@@ -780,6 +822,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 
 	cb.base.func = dma_fence_default_wait_cb;
 	cb.task = current;
+	incr_wait_count(fence, &cb.base);
 	list_add(&cb.base.node, &fence->cb_list);
 
 	while (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) && ret > 0) {
@@ -796,8 +839,10 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
 			ret = -ERESTARTSYS;
 	}
 
-	if (!list_empty(&cb.base.node))
+	if (!list_empty(&cb.base.node)) {
+		decr_wait_count(fence, &cb.base);
 		list_del(&cb.base.node);
+	}
 	__set_current_state(TASK_RUNNING);
 
 out:
@@ -875,8 +920,8 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
 		struct dma_fence *fence = fences[i];
 
 		cb[i].task = current;
-		if (dma_fence_add_callback(fence, &cb[i].base,
-					   dma_fence_default_wait_cb)) {
+		if (dma_fence_add_wait_callback(fence, &cb[i].base,
+						dma_fence_default_wait_cb)) {
 			/* This fence is already signaled */
 			if (idx)
 				*idx = i;
@@ -957,6 +1002,7 @@ dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 	fence->context = context;
 	fence->seqno = seqno;
 	fence->flags = 0UL;
+	fence->waitcount = 0;
 	fence->error = 0;
 
 	trace_dma_fence_init(fence);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..d0ed923e4545 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -92,6 +92,7 @@ struct dma_fence {
 	u64 seqno;
 	unsigned long flags;
 	struct kref refcount;
+	unsigned int waitcount;
 	int error;
 };
 
@@ -116,6 +117,11 @@ typedef void (*dma_fence_func_t)(struct dma_fence *fence,
 struct dma_fence_cb {
 	struct list_head node;
 	dma_fence_func_t func;
+	unsigned long flags;
+};
+
+enum dma_fence_cb_flag_bits {
+	DMA_FENCE_CB_FLAG_WAITCOUNT_BIT,
 };
 
 /**
@@ -377,6 +383,9 @@ signed long dma_fence_default_wait(struct dma_fence *fence,
 int dma_fence_add_callback(struct dma_fence *fence,
 			   struct dma_fence_cb *cb,
 			   dma_fence_func_t func);
+int dma_fence_add_wait_callback(struct dma_fence *fence,
+				struct dma_fence_cb *cb,
+				dma_fence_func_t func);
 bool dma_fence_remove_callback(struct dma_fence *fence,
 			       struct dma_fence_cb *cb);
 void dma_fence_enable_sw_signaling(struct dma_fence *fence);
@@ -528,6 +537,11 @@ static inline int dma_fence_get_status_locked(struct dma_fence *fence)
 
 int dma_fence_get_status(struct dma_fence *fence);
 
+static inline unsigned int dma_fence_wait_count(struct dma_fence *fence)
+{
+	return fence->waitcount;
+}
+
 /**
  * dma_fence_set_error - flag an error condition on the fence
  * @fence: the dma_fence
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC 2/3] drm/syncobj: Mark syncobj waits as external waiters
  2023-02-09 15:43 [RFC 0/3] Waitboost drm syncobj waits Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 1/3] dma-fence: Track explicit waiters Tvrtko Ursulin
@ 2023-02-09 15:43 ` Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 3/3] drm/i915: Waitboost external waits Tvrtko Ursulin
  2 siblings, 0 replies; 4+ messages in thread
From: Tvrtko Ursulin @ 2023-02-09 15:43 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Use the previously added dma-fence tracking of explicit waiters.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/drm_syncobj.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0c2be8360525..776b90774a64 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1065,9 +1065,9 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
 			if ((flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE) ||
 			    dma_fence_is_signaled(fence) ||
 			    (!entries[i].fence_cb.func &&
-			     dma_fence_add_callback(fence,
-						    &entries[i].fence_cb,
-						    syncobj_wait_fence_func))) {
+			     dma_fence_add_wait_callback(fence,
+							 &entries[i].fence_cb,
+							 syncobj_wait_fence_func))) {
 				/* The fence has been signaled */
 				if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) {
 					signaled_count++;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC 3/3] drm/i915: Waitboost external waits
  2023-02-09 15:43 [RFC 0/3] Waitboost drm syncobj waits Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 1/3] dma-fence: Track explicit waiters Tvrtko Ursulin
  2023-02-09 15:43 ` [RFC 2/3] drm/syncobj: Mark syncobj waits as external waiters Tvrtko Ursulin
@ 2023-02-09 15:43 ` Tvrtko Ursulin
  2 siblings, 0 replies; 4+ messages in thread
From: Tvrtko Ursulin @ 2023-02-09 15:43 UTC (permalink / raw)
  To: Intel-gfx, dri-devel; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Userspace waits coming via the drm_syncobj route have so far been
bypassing the waitboost mechanism.

Use the previously added dma-fence wait tracking API and apply the
same waitboosting logic which applies to other entry points.

This should fix the perfomance regressions experience by clvk and
similar userspace which relies on drm_syncobj.

At the same time, but for documentation purposes only, use the new
dma-fence API from i915_request_wait too.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..e24fac5c1567 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -94,7 +94,12 @@ static bool i915_fence_signaled(struct dma_fence *fence)
 
 static bool i915_fence_enable_signaling(struct dma_fence *fence)
 {
-	return i915_request_enable_breadcrumb(to_request(fence));
+	struct i915_request *rq = to_request(fence);
+
+	if (dma_fence_wait_count(&rq->fence) && !i915_request_started(rq))
+		intel_rps_boost(rq);
+
+	return i915_request_enable_breadcrumb(rq);
 }
 
 static signed long i915_fence_wait(struct dma_fence *fence,
@@ -2037,11 +2042,13 @@ long i915_request_wait_timeout(struct i915_request *rq,
 	 * but at a cost of spending more power processing the workload
 	 * (bad for battery).
 	 */
-	if (flags & I915_WAIT_PRIORITY && !i915_request_started(rq))
+	if (((flags & I915_WAIT_PRIORITY) || dma_fence_wait_count(&rq->fence))
+	    && !i915_request_started(rq))
 		intel_rps_boost(rq);
 
 	wait.tsk = current;
-	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
+	if (dma_fence_add_wait_callback(&rq->fence, &wait.cb,
+					request_wait_wake))
 		goto out;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-09 15:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 15:43 [RFC 0/3] Waitboost drm syncobj waits Tvrtko Ursulin
2023-02-09 15:43 ` [RFC 1/3] dma-fence: Track explicit waiters Tvrtko Ursulin
2023-02-09 15:43 ` [RFC 2/3] drm/syncobj: Mark syncobj waits as external waiters Tvrtko Ursulin
2023-02-09 15:43 ` [RFC 3/3] drm/i915: Waitboost external waits Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).