linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] dma-fence: Deadline awareness
@ 2021-07-26 23:38 Rob Clark
  2021-07-26 23:38 ` [RFC 1/4] dma-fence: Add deadline awareness Rob Clark
                   ` (6 more replies)
  0 siblings, 7 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-26 23:38 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Christian König, Rob Clark,
	Alex Deucher, Andrey Grodzovsky, Christian König,
	Gustavo Padovan, Jack Zhang, Lee Jones,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list,
	open list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Maarten Lankhorst, Roy Sun, Tian Tao

From: Rob Clark <robdclark@chromium.org>

Based on discussion from a previous series[1] to add a "boost" mechanism
when, for example, vblank deadlines are missed.  Instead of a boost
callback, this approach adds a way to set a deadline on the fence, by
which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but
wanted to send this out as an RFC in case I don't have a chance to
finish the drm/msm part this week.

Original description:

In some cases, like double-buffered rendering, missing vblanks can
trick the GPU into running at a lower frequence, when really we
want to be running at a higher frequency to not miss the vblanks
in the first place.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

[1] https://patchwork.freedesktop.org/series/90331/

Rob Clark (4):
  dma-fence: Add deadline awareness
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/scheduler: Add fence deadline support

 drivers/dma-buf/dma-fence.c             | 39 +++++++++++++++++++++++++
 drivers/gpu/drm/drm_atomic_helper.c     | 36 +++++++++++++++++++++++
 drivers/gpu/drm/drm_vblank.c            | 31 ++++++++++++++++++++
 drivers/gpu/drm/scheduler/sched_fence.c | 10 +++++++
 drivers/gpu/drm/scheduler/sched_main.c  |  3 ++
 include/drm/drm_vblank.h                |  1 +
 include/linux/dma-fence.h               | 17 +++++++++++
 7 files changed, 137 insertions(+)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
@ 2021-07-26 23:38 ` Rob Clark
  2021-07-27  7:11   ` Christian König
  2021-07-26 23:38 ` [RFC 2/4] drm/vblank: Add helper to get next vblank time Rob Clark
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-26 23:38 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Christian König, Rob Clark,
	Sumit Semwal, Gustavo Padovan, Christian König,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

From: Rob Clark <robdclark@chromium.org>

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++
 include/linux/dma-fence.h   | 17 ++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index ce0f5eff575d..2e0d25ab457e 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
 }
 EXPORT_SYMBOL(dma_fence_wait_any_timeout);
 
+
+/**
+ * dma_fence_set_deadline - set desired fence-wait deadline
+ * @fence:    the fence that is to be waited on
+ * @deadline: the time by which the waiter hopes for the fence to be
+ *            signaled
+ *
+ * Inform the fence signaler of an upcoming deadline, such as vblank, by
+ * which point the waiter would prefer the fence to be signaled by.  This
+ * is intended to give feedback to the fence signaler to aid in power
+ * management decisions, such as boosting GPU frequency if a periodic
+ * vblank deadline is approaching.
+ */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+	unsigned long flags;
+
+	if (dma_fence_is_signaled(fence))
+		return;
+
+	spin_lock_irqsave(fence->lock, flags);
+
+	/* If we already have an earlier deadline, keep it: */
+	if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
+	    ktime_before(fence->deadline, deadline)) {
+		spin_unlock_irqrestore(fence->lock, flags);
+		return;
+	}
+
+	fence->deadline = deadline;
+	set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
+
+	spin_unlock_irqrestore(fence->lock, flags);
+
+	if (fence->ops->set_deadline)
+		fence->ops->set_deadline(fence, deadline);
+}
+EXPORT_SYMBOL(dma_fence_set_deadline);
+
 /**
  * dma_fence_init - Initialize a custom fence.
  * @fence: the fence to initialize
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 6ffb4b2c6371..4e6cfe4e6fbc 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -88,6 +88,7 @@ struct dma_fence {
 		/* @timestamp replaced by @rcu on dma_fence_release() */
 		struct rcu_head rcu;
 	};
+	ktime_t deadline;
 	u64 context;
 	u64 seqno;
 	unsigned long flags;
@@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
 	DMA_FENCE_FLAG_SIGNALED_BIT,
 	DMA_FENCE_FLAG_TIMESTAMP_BIT,
 	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+	DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
 	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
 };
 
@@ -261,6 +263,19 @@ struct dma_fence_ops {
 	 */
 	void (*timeline_value_str)(struct dma_fence *fence,
 				   char *str, int size);
+
+	/**
+	 * @set_deadline:
+	 *
+	 * Callback to allow a fence waiter to inform the fence signaler of an
+	 * upcoming deadline, such as vblank, by which point the waiter would
+	 * prefer the fence to be signaled by.  This is intended to give feedback
+	 * to the fence signaler to aid in power management decisions, such as
+	 * boosting GPU frequency.
+	 *
+	 * This callback is optional.
+	 */
+	void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
 };
 
 void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
 	return ret < 0 ? ret : 0;
 }
 
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
+
 struct dma_fence *dma_fence_get_stub(void);
 struct dma_fence *dma_fence_allocate_private_stub(void);
 u64 dma_fence_context_alloc(unsigned num);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC 2/4] drm/vblank: Add helper to get next vblank time
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
  2021-07-26 23:38 ` [RFC 1/4] dma-fence: Add deadline awareness Rob Clark
@ 2021-07-26 23:38 ` Rob Clark
  2021-07-26 23:38 ` [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank Rob Clark
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-26 23:38 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Christian König, Rob Clark,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/drm_vblank.c | 31 +++++++++++++++++++++++++++++++
 include/drm/drm_vblank.h     |  1 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 3417e1ac7918..88c824c294dc 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -980,6 +980,37 @@ u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
 }
 EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);
 
+/**
+ * drm_crtc_next_vblank_time - calculate the time of the next vblank
+ * @crtc: the crtc for which to calculate next vblank time
+ * @vblanktime: pointer to time to receive the next vblank timestamp.
+ *
+ * Calculate the expected time of the next vblank based on time of previous
+ * vblank and frame duration
+ */
+int drm_crtc_next_vblank_time(struct drm_crtc *crtc, ktime_t *vblanktime)
+{
+	unsigned int pipe = drm_crtc_index(crtc);
+	struct drm_vblank_crtc *vblank = &crtc->dev->vblank[pipe];
+	u64 count;
+
+	if (!vblank->framedur_ns)
+		return -EINVAL;
+
+	count = drm_vblank_count_and_time(crtc->dev, pipe, vblanktime);
+
+	/*
+	 * If we don't get a valid count, then we probably also don't
+	 * have a valid time:
+	 */
+	if (!count)
+		return -EINVAL;
+
+	*vblanktime = ktime_add(*vblanktime, ns_to_ktime(vblank->framedur_ns));
+
+	return 0;
+}
+
 static void send_vblank_event(struct drm_device *dev,
 		struct drm_pending_vblank_event *e,
 		u64 seq, ktime_t now)
diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
index 733a3e2d1d10..a63bc2c92f3c 100644
--- a/include/drm/drm_vblank.h
+++ b/include/drm/drm_vblank.h
@@ -230,6 +230,7 @@ bool drm_dev_has_vblank(const struct drm_device *dev);
 u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
 u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
 				   ktime_t *vblanktime);
+int drm_crtc_next_vblank_time(struct drm_crtc *crtc, ktime_t *vblanktime);
 void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
 			       struct drm_pending_vblank_event *e);
 void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
  2021-07-26 23:38 ` [RFC 1/4] dma-fence: Add deadline awareness Rob Clark
  2021-07-26 23:38 ` [RFC 2/4] drm/vblank: Add helper to get next vblank time Rob Clark
@ 2021-07-26 23:38 ` Rob Clark
  2021-07-27 10:44   ` Michel Dänzer
  2021-07-26 23:38 ` [RFC 4/4] drm/scheduler: Add fence deadline support Rob Clark
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-26 23:38 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Christian König, Rob Clark,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, open list

From: Rob Clark <robdclark@chromium.org>

For an atomic commit updating a single CRTC (ie. a pageflip) calculate
the next vblank time, and inform the fence(s) of that deadline.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index bc3487964fb5..f81b20775b15 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1406,6 +1406,40 @@ void drm_atomic_helper_commit_modeset_enables(struct drm_device *dev,
 }
 EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables);
 
+/*
+ * For atomic updates which touch just a single CRTC, calculate the time of the
+ * next vblank, and inform all the fences of the of the deadline.
+ */
+static void set_fence_deadline(struct drm_device *dev,
+			       struct drm_atomic_state *state)
+{
+	struct drm_crtc *crtc, *wait_crtc = NULL;
+	struct drm_crtc_state *new_crtc_state;
+	struct drm_plane *plane;
+	struct drm_plane_state *new_plane_state;
+	ktime_t vbltime;
+	int i;
+
+	for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
+		if (!wait_crtc)
+			return;
+		wait_crtc = crtc;
+	}
+
+	/* If no CRTCs updated, then nothing to do: */
+	if (!wait_crtc)
+		return;
+
+	if (drm_crtc_next_vblank_time(wait_crtc, &vbltime))
+		return;
+
+	for_each_new_plane_in_state (state, plane, new_plane_state, i) {
+		if (!new_plane_state->fence)
+			continue;
+		dma_fence_set_deadline(new_plane_state->fence, vbltime);
+	}
+}
+
 /**
  * drm_atomic_helper_wait_for_fences - wait for fences stashed in plane state
  * @dev: DRM device
@@ -1435,6 +1469,8 @@ int drm_atomic_helper_wait_for_fences(struct drm_device *dev,
 	struct drm_plane_state *new_plane_state;
 	int i, ret;
 
+	set_fence_deadline(dev, state);
+
 	for_each_new_plane_in_state(state, plane, new_plane_state, i) {
 		if (!new_plane_state->fence)
 			continue;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC 4/4] drm/scheduler: Add fence deadline support
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
                   ` (2 preceding siblings ...)
  2021-07-26 23:38 ` [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank Rob Clark
@ 2021-07-26 23:38 ` Rob Clark
  2021-07-26 23:51 ` [RFC 0/4] dma-fence: Deadline awareness Rob Clark
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-26 23:38 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Christian König, Rob Clark,
	David Airlie, Sumit Semwal, Christian König, Alex Deucher,
	Tian Tao, Luben Tuikov, Andrey Grodzovsky, Alyssa Rosenzweig,
	Maarten Lankhorst, Jack Zhang, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

From: Rob Clark <robdclark@chromium.org>

As the finished fence is the one that is exposed to userspace, and
therefore the one that other operations, like atomic update, would
block on, we need to propagate the deadline from from the finished
fence to the actual hw fence.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/scheduler/sched_fence.c | 10 ++++++++++
 drivers/gpu/drm/scheduler/sched_main.c  |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 69de2c76731f..3aa6351d2101 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -128,6 +128,15 @@ static void drm_sched_fence_release_finished(struct dma_fence *f)
 	dma_fence_put(&fence->scheduled);
 }
 
+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
+						  ktime_t deadline)
+{
+	struct drm_sched_fence *fence = to_drm_sched_fence(f);
+
+	if (fence->parent)
+		dma_fence_set_deadline(fence->parent, deadline);
+}
+
 static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
 	.get_driver_name = drm_sched_fence_get_driver_name,
 	.get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -138,6 +147,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = {
 	.get_driver_name = drm_sched_fence_get_driver_name,
 	.get_timeline_name = drm_sched_fence_get_timeline_name,
 	.release = drm_sched_fence_release_finished,
+	.set_deadline = drm_sched_fence_set_deadline_finished,
 };
 
 struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index a2a953693b45..fcc601962e92 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -818,6 +818,9 @@ static int drm_sched_main(void *param)
 
 		if (!IS_ERR_OR_NULL(fence)) {
 			s_fence->parent = dma_fence_get(fence);
+			if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
+				     &s_fence->finished.flags))
+				dma_fence_set_deadline(fence, s_fence->finished.deadline);
 			r = dma_fence_add_callback(fence, &sched_job->cb,
 						   drm_sched_job_done_cb);
 			if (r == -ENOENT)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
                   ` (3 preceding siblings ...)
  2021-07-26 23:38 ` [RFC 4/4] drm/scheduler: Add fence deadline support Rob Clark
@ 2021-07-26 23:51 ` Rob Clark
  2021-07-27 14:41 ` Michel Dänzer
  2021-07-27 21:17 ` [RFC 5/4] drm/msm: Add deadline based boost support Rob Clark
  6 siblings, 0 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-26 23:51 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Christian König, Rob Clark,
	Alex Deucher, Andrey Grodzovsky, Christian König,
	Gustavo Padovan, Jack Zhang, Lee Jones,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list,
	open list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Maarten Lankhorst, Roy Sun, Tian Tao

On Mon, Jul 26, 2021 at 4:34 PM Rob Clark <robdclark@gmail.com> wrote:
>
> From: Rob Clark <robdclark@chromium.org>
>
> Based on discussion from a previous series[1] to add a "boost" mechanism
> when, for example, vblank deadlines are missed.  Instead of a boost
> callback, this approach adds a way to set a deadline on the fence, by
> which the waiter would like to see the fence signalled.
>
> I've not yet had a chance to re-work the drm/msm part of this, but
> wanted to send this out as an RFC in case I don't have a chance to
> finish the drm/msm part this week.

Fwiw, what I'm thinking for the drm/msm part is a timer set to expire
a bit (couple ms?) before the deadline, which boosts if the timer
expires before the fence is signaled.

Assuming this is roughly in line with what other drivers would do,
possibly there is some room to build this timer into dma-fence itself?

BR,
-R

>
> Original description:
>
> In some cases, like double-buffered rendering, missing vblanks can
> trick the GPU into running at a lower frequence, when really we
> want to be running at a higher frequency to not miss the vblanks
> in the first place.
>
> This is partially inspired by a trick i915 does, but implemented
> via dma-fence for a couple of reasons:
>
> 1) To continue to be able to use the atomic helpers
> 2) To support cases where display and gpu are different drivers
>
> [1] https://patchwork.freedesktop.org/series/90331/
>
> Rob Clark (4):
>   dma-fence: Add deadline awareness
>   drm/vblank: Add helper to get next vblank time
>   drm/atomic-helper: Set fence deadline for vblank
>   drm/scheduler: Add fence deadline support
>
>  drivers/dma-buf/dma-fence.c             | 39 +++++++++++++++++++++++++
>  drivers/gpu/drm/drm_atomic_helper.c     | 36 +++++++++++++++++++++++
>  drivers/gpu/drm/drm_vblank.c            | 31 ++++++++++++++++++++
>  drivers/gpu/drm/scheduler/sched_fence.c | 10 +++++++
>  drivers/gpu/drm/scheduler/sched_main.c  |  3 ++
>  include/drm/drm_vblank.h                |  1 +
>  include/linux/dma-fence.h               | 17 +++++++++++
>  7 files changed, 137 insertions(+)
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-26 23:38 ` [RFC 1/4] dma-fence: Add deadline awareness Rob Clark
@ 2021-07-27  7:11   ` Christian König
  2021-07-27 14:25     ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-27  7:11 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Daniel Vetter, Matthew Brost, Rob Clark, Sumit Semwal,
	Gustavo Padovan, Christian König,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

Am 27.07.21 um 01:38 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Add a way to hint to the fence signaler of an upcoming deadline, such as
> vblank, which the fence waiter would prefer not to miss.  This is to aid
> the fence signaler in making power management decisions, like boosting
> frequency as the deadline approaches and awareness of missing deadlines
> so that can be factored in to the frequency scaling.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++
>   include/linux/dma-fence.h   | 17 ++++++++++++++++
>   2 files changed, 56 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ce0f5eff575d..2e0d25ab457e 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
>   }
>   EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>   
> +
> +/**
> + * dma_fence_set_deadline - set desired fence-wait deadline
> + * @fence:    the fence that is to be waited on
> + * @deadline: the time by which the waiter hopes for the fence to be
> + *            signaled
> + *
> + * Inform the fence signaler of an upcoming deadline, such as vblank, by
> + * which point the waiter would prefer the fence to be signaled by.  This
> + * is intended to give feedback to the fence signaler to aid in power
> + * management decisions, such as boosting GPU frequency if a periodic
> + * vblank deadline is approaching.
> + */
> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> +{
> +	unsigned long flags;
> +
> +	if (dma_fence_is_signaled(fence))
> +		return;
> +
> +	spin_lock_irqsave(fence->lock, flags);
> +
> +	/* If we already have an earlier deadline, keep it: */
> +	if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> +	    ktime_before(fence->deadline, deadline)) {
> +		spin_unlock_irqrestore(fence->lock, flags);
> +		return;
> +	}
> +
> +	fence->deadline = deadline;
> +	set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> +
> +	spin_unlock_irqrestore(fence->lock, flags);
> +
> +	if (fence->ops->set_deadline)
> +		fence->ops->set_deadline(fence, deadline);
> +}
> +EXPORT_SYMBOL(dma_fence_set_deadline);
> +
>   /**
>    * dma_fence_init - Initialize a custom fence.
>    * @fence: the fence to initialize
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -88,6 +88,7 @@ struct dma_fence {
>   		/* @timestamp replaced by @rcu on dma_fence_release() */
>   		struct rcu_head rcu;
>   	};
> +	ktime_t deadline;

Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the 
deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.

Regards,
Christian.

>   	u64 context;
>   	u64 seqno;
>   	unsigned long flags;
> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
>   	DMA_FENCE_FLAG_SIGNALED_BIT,
>   	DMA_FENCE_FLAG_TIMESTAMP_BIT,
>   	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +	DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
>   	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>   };
>   
> @@ -261,6 +263,19 @@ struct dma_fence_ops {
>   	 */
>   	void (*timeline_value_str)(struct dma_fence *fence,
>   				   char *str, int size);
> +
> +	/**
> +	 * @set_deadline:
> +	 *
> +	 * Callback to allow a fence waiter to inform the fence signaler of an
> +	 * upcoming deadline, such as vblank, by which point the waiter would
> +	 * prefer the fence to be signaled by.  This is intended to give feedback
> +	 * to the fence signaler to aid in power management decisions, such as
> +	 * boosting GPU frequency.
> +	 *
> +	 * This callback is optional.
> +	 */
> +	void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>   };
>   
>   void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
>   	return ret < 0 ? ret : 0;
>   }
>   
> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> +
>   struct dma_fence *dma_fence_get_stub(void);
>   struct dma_fence *dma_fence_allocate_private_stub(void);
>   u64 dma_fence_context_alloc(unsigned num);


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank
  2021-07-26 23:38 ` [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank Rob Clark
@ 2021-07-27 10:44   ` Michel Dänzer
  2021-07-27 14:33     ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Michel Dänzer @ 2021-07-27 10:44 UTC (permalink / raw)
  To: Rob Clark
  Cc: Matthew Brost, Rob Clark, Thomas Zimmermann, David Airlie,
	Christian König, open list, dri-devel

On 2021-07-27 1:38 a.m., Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> the next vblank time, and inform the fence(s) of that deadline.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index bc3487964fb5..f81b20775b15 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1406,6 +1406,40 @@ void drm_atomic_helper_commit_modeset_enables(struct drm_device *dev,
>  }
>  EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables);
>  
> +/*
> + * For atomic updates which touch just a single CRTC, calculate the time of the
> + * next vblank, and inform all the fences of the of the deadline.
> + */
> +static void set_fence_deadline(struct drm_device *dev,
> +			       struct drm_atomic_state *state)
> +{
> +	struct drm_crtc *crtc, *wait_crtc = NULL;
> +	struct drm_crtc_state *new_crtc_state;
> +	struct drm_plane *plane;
> +	struct drm_plane_state *new_plane_state;
> +	ktime_t vbltime;
> +	int i;
> +
> +	for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> +		if (!wait_crtc)
> +			return;

Either this return or the next one below would always be taken, I doubt this was intended.


> +		wait_crtc = crtc;
> +	}
> +
> +	/* If no CRTCs updated, then nothing to do: */
> +	if (!wait_crtc)
> +		return;
> +
> +	if (drm_crtc_next_vblank_time(wait_crtc, &vbltime))
> +		return;
> +
> +	for_each_new_plane_in_state (state, plane, new_plane_state, i) {
> +		if (!new_plane_state->fence)
> +			continue;
> +		dma_fence_set_deadline(new_plane_state->fence, vbltime);
> +	}

vblank timestamps correspond to the end of vertical blank, the deadline should be the start of vertical blank though.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-27  7:11   ` Christian König
@ 2021-07-27 14:25     ` Rob Clark
  2021-07-28  7:03       ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-27 14:25 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, Daniel Vetter, Matthew Brost, Rob Clark, Sumit Semwal,
	Gustavo Padovan, Christian König,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

On Tue, Jul 27, 2021 at 12:11 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 27.07.21 um 01:38 schrieb Rob Clark:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > vblank, which the fence waiter would prefer not to miss.  This is to aid
> > the fence signaler in making power management decisions, like boosting
> > frequency as the deadline approaches and awareness of missing deadlines
> > so that can be factored in to the frequency scaling.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++
> >   include/linux/dma-fence.h   | 17 ++++++++++++++++
> >   2 files changed, 56 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index ce0f5eff575d..2e0d25ab457e 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
> >   }
> >   EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >
> > +
> > +/**
> > + * dma_fence_set_deadline - set desired fence-wait deadline
> > + * @fence:    the fence that is to be waited on
> > + * @deadline: the time by which the waiter hopes for the fence to be
> > + *            signaled
> > + *
> > + * Inform the fence signaler of an upcoming deadline, such as vblank, by
> > + * which point the waiter would prefer the fence to be signaled by.  This
> > + * is intended to give feedback to the fence signaler to aid in power
> > + * management decisions, such as boosting GPU frequency if a periodic
> > + * vblank deadline is approaching.
> > + */
> > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> > +{
> > +     unsigned long flags;
> > +
> > +     if (dma_fence_is_signaled(fence))
> > +             return;
> > +
> > +     spin_lock_irqsave(fence->lock, flags);
> > +
> > +     /* If we already have an earlier deadline, keep it: */
> > +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> > +         ktime_before(fence->deadline, deadline)) {
> > +             spin_unlock_irqrestore(fence->lock, flags);
> > +             return;
> > +     }
> > +
> > +     fence->deadline = deadline;
> > +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> > +
> > +     spin_unlock_irqrestore(fence->lock, flags);
> > +
> > +     if (fence->ops->set_deadline)
> > +             fence->ops->set_deadline(fence, deadline);
> > +}
> > +EXPORT_SYMBOL(dma_fence_set_deadline);
> > +
> >   /**
> >    * dma_fence_init - Initialize a custom fence.
> >    * @fence: the fence to initialize
> > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> > --- a/include/linux/dma-fence.h
> > +++ b/include/linux/dma-fence.h
> > @@ -88,6 +88,7 @@ struct dma_fence {
> >               /* @timestamp replaced by @rcu on dma_fence_release() */
> >               struct rcu_head rcu;
> >       };
> > +     ktime_t deadline;
>
> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the
> deadline as extra field here.
>
> We tuned the dma_fence structure intentionally so that it is only 64 bytes.

Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence
class, but I think there is going to need to be some extra storage
*somewhere*.. maybe the fence signaler could get away with just
storing the nearest upcoming deadline per fence-context instead?

BR,
-R

> Regards,
> Christian.
>
> >       u64 context;
> >       u64 seqno;
> >       unsigned long flags;
> > @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> >       DMA_FENCE_FLAG_SIGNALED_BIT,
> >       DMA_FENCE_FLAG_TIMESTAMP_BIT,
> >       DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> >       DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> >   };
> >
> > @@ -261,6 +263,19 @@ struct dma_fence_ops {
> >        */
> >       void (*timeline_value_str)(struct dma_fence *fence,
> >                                  char *str, int size);
> > +
> > +     /**
> > +      * @set_deadline:
> > +      *
> > +      * Callback to allow a fence waiter to inform the fence signaler of an
> > +      * upcoming deadline, such as vblank, by which point the waiter would
> > +      * prefer the fence to be signaled by.  This is intended to give feedback
> > +      * to the fence signaler to aid in power management decisions, such as
> > +      * boosting GPU frequency.
> > +      *
> > +      * This callback is optional.
> > +      */
> > +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> >   };
> >
> >   void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> > @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
> >       return ret < 0 ? ret : 0;
> >   }
> >
> > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> > +
> >   struct dma_fence *dma_fence_get_stub(void);
> >   struct dma_fence *dma_fence_allocate_private_stub(void);
> >   u64 dma_fence_context_alloc(unsigned num);
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank
  2021-07-27 10:44   ` Michel Dänzer
@ 2021-07-27 14:33     ` Rob Clark
  0 siblings, 0 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-27 14:33 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Rob Clark, Matthew Brost, Thomas Zimmermann, David Airlie,
	Christian König, open list, dri-devel

On Tue, Jul 27, 2021 at 3:44 AM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > the next vblank time, and inform the fence(s) of that deadline.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++++++++++++++++++++
> >  1 file changed, 36 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index bc3487964fb5..f81b20775b15 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1406,6 +1406,40 @@ void drm_atomic_helper_commit_modeset_enables(struct drm_device *dev,
> >  }
> >  EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables);
> >
> > +/*
> > + * For atomic updates which touch just a single CRTC, calculate the time of the
> > + * next vblank, and inform all the fences of the of the deadline.
> > + */
> > +static void set_fence_deadline(struct drm_device *dev,
> > +                            struct drm_atomic_state *state)
> > +{
> > +     struct drm_crtc *crtc, *wait_crtc = NULL;
> > +     struct drm_crtc_state *new_crtc_state;
> > +     struct drm_plane *plane;
> > +     struct drm_plane_state *new_plane_state;
> > +     ktime_t vbltime;
> > +     int i;
> > +
> > +     for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> > +             if (!wait_crtc)
> > +                     return;
>
> Either this return or the next one below would always be taken, I doubt this was intended.

oops, the condition here is mistakenly inverted, it was meant to bail
if there is more than a single CRTC

>
> > +             wait_crtc = crtc;
> > +     }
> > +
> > +     /* If no CRTCs updated, then nothing to do: */
> > +     if (!wait_crtc)
> > +             return;
> > +
> > +     if (drm_crtc_next_vblank_time(wait_crtc, &vbltime))
> > +             return;
> > +
> > +     for_each_new_plane_in_state (state, plane, new_plane_state, i) {
> > +             if (!new_plane_state->fence)
> > +                     continue;
> > +             dma_fence_set_deadline(new_plane_state->fence, vbltime);
> > +     }
>
> vblank timestamps correspond to the end of vertical blank, the deadline should be the start of vertical blank though.
>

hmm, I suppose this depends on whether the hw actually has separate
irq's for frame-done and vblank (and whether the driver
differentiates).. and if the display controller is doing some
buffering, the point at which it wants to flip could be a bit earlier
still.  Maybe we just want a kms driver provided offset for how early
it wants the deadline relative to vblank?

BR,
-R

>
> --
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
                   ` (4 preceding siblings ...)
  2021-07-26 23:51 ` [RFC 0/4] dma-fence: Deadline awareness Rob Clark
@ 2021-07-27 14:41 ` Michel Dänzer
  2021-07-27 15:12   ` Rob Clark
  2021-07-27 21:17 ` [RFC 5/4] drm/msm: Add deadline based boost support Rob Clark
  6 siblings, 1 reply; 52+ messages in thread
From: Michel Dänzer @ 2021-07-27 14:41 UTC (permalink / raw)
  To: Rob Clark
  Cc: Matthew Brost, Rob Clark, Jack Zhang, Roy Sun,
	Christian König, open list,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK,
	dri-devel

On 2021-07-27 1:38 a.m., Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Based on discussion from a previous series[1] to add a "boost" mechanism
> when, for example, vblank deadlines are missed.  Instead of a boost
> callback, this approach adds a way to set a deadline on the fence, by
> which the waiter would like to see the fence signalled.
> 
> I've not yet had a chance to re-work the drm/msm part of this, but
> wanted to send this out as an RFC in case I don't have a chance to
> finish the drm/msm part this week.
> 
> Original description:
> 
> In some cases, like double-buffered rendering, missing vblanks can
> trick the GPU into running at a lower frequence, when really we
> want to be running at a higher frequency to not miss the vblanks
> in the first place.
> 
> This is partially inspired by a trick i915 does, but implemented
> via dma-fence for a couple of reasons:
> 
> 1) To continue to be able to use the atomic helpers
> 2) To support cases where display and gpu are different drivers
> 
> [1] https://patchwork.freedesktop.org/series/90331/

Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-27 14:41 ` Michel Dänzer
@ 2021-07-27 15:12   ` Rob Clark
  2021-07-27 15:19     ` Michel Dänzer
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-27 15:12 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Matthew Brost, Rob Clark, Jack Zhang, Roy Sun,
	Christian König, open list,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK,
	dri-devel

On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Based on discussion from a previous series[1] to add a "boost" mechanism
> > when, for example, vblank deadlines are missed.  Instead of a boost
> > callback, this approach adds a way to set a deadline on the fence, by
> > which the waiter would like to see the fence signalled.
> >
> > I've not yet had a chance to re-work the drm/msm part of this, but
> > wanted to send this out as an RFC in case I don't have a chance to
> > finish the drm/msm part this week.
> >
> > Original description:
> >
> > In some cases, like double-buffered rendering, missing vblanks can
> > trick the GPU into running at a lower frequence, when really we
> > want to be running at a higher frequency to not miss the vblanks
> > in the first place.
> >
> > This is partially inspired by a trick i915 does, but implemented
> > via dma-fence for a couple of reasons:
> >
> > 1) To continue to be able to use the atomic helpers
> > 2) To support cases where display and gpu are different drivers
> >
> > [1] https://patchwork.freedesktop.org/series/90331/
>
> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>

I guess you mean "no effect at all *except* for fullscreen..."?  Games
are usually running fullscreen, it is a case I care about a lot ;-)

I'd perhaps recommend that wayland compositors, in cases where only a
single layer is changing, not try to be clever and just push the
update down to the kernel.

BR,
-R

>
> --
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-27 15:12   ` Rob Clark
@ 2021-07-27 15:19     ` Michel Dänzer
  2021-07-27 15:37       ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Michel Dänzer @ 2021-07-27 15:19 UTC (permalink / raw)
  To: Rob Clark
  Cc: Matthew Brost, Rob Clark, Jack Zhang, Christian König,
	open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On 2021-07-27 5:12 p.m., Rob Clark wrote:
> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>
>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>> callback, this approach adds a way to set a deadline on the fence, by
>>> which the waiter would like to see the fence signalled.
>>>
>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>> wanted to send this out as an RFC in case I don't have a chance to
>>> finish the drm/msm part this week.
>>>
>>> Original description:
>>>
>>> In some cases, like double-buffered rendering, missing vblanks can
>>> trick the GPU into running at a lower frequence, when really we
>>> want to be running at a higher frequency to not miss the vblanks
>>> in the first place.
>>>
>>> This is partially inspired by a trick i915 does, but implemented
>>> via dma-fence for a couple of reasons:
>>>
>>> 1) To continue to be able to use the atomic helpers
>>> 2) To support cases where display and gpu are different drivers
>>>
>>> [1] https://patchwork.freedesktop.org/series/90331/
>>
>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>
> 
> I guess you mean "no effect at all *except* for fullscreen..."?

I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.


> I'd perhaps recommend that wayland compositors, in cases where only a
> single layer is changing, not try to be clever and just push the
> update down to the kernel.

Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.

For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-27 15:19     ` Michel Dänzer
@ 2021-07-27 15:37       ` Rob Clark
  2021-07-28 11:36         ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-27 15:37 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Rob Clark, Matthew Brost, Jack Zhang, Christian König,
	open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2021-07-27 5:12 p.m., Rob Clark wrote:
> > On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
> >>
> >> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Based on discussion from a previous series[1] to add a "boost" mechanism
> >>> when, for example, vblank deadlines are missed.  Instead of a boost
> >>> callback, this approach adds a way to set a deadline on the fence, by
> >>> which the waiter would like to see the fence signalled.
> >>>
> >>> I've not yet had a chance to re-work the drm/msm part of this, but
> >>> wanted to send this out as an RFC in case I don't have a chance to
> >>> finish the drm/msm part this week.
> >>>
> >>> Original description:
> >>>
> >>> In some cases, like double-buffered rendering, missing vblanks can
> >>> trick the GPU into running at a lower frequence, when really we
> >>> want to be running at a higher frequency to not miss the vblanks
> >>> in the first place.
> >>>
> >>> This is partially inspired by a trick i915 does, but implemented
> >>> via dma-fence for a couple of reasons:
> >>>
> >>> 1) To continue to be able to use the atomic helpers
> >>> 2) To support cases where display and gpu are different drivers
> >>>
> >>> [1] https://patchwork.freedesktop.org/series/90331/
> >>
> >> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
> >>
> >
> > I guess you mean "no effect at all *except* for fullscreen..."?
>
> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>
>
> > I'd perhaps recommend that wayland compositors, in cases where only a
> > single layer is changing, not try to be clever and just push the
> > update down to the kernel.
>
> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>
> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>

Well, in the end, there is more than one compositor out there.. and if
some wayland compositors are going this route, they can also implement
the same mechanism in userspace using the sysfs that devfreq exports.

But it sounds simpler to me for the compositor to have a sort of "game
mode" for fullscreen games.. I'm less worried about UI interactive
workloads, boosting the GPU freq upon sudden activity after a period
of inactivity seems to work reasonably well there.

BR,
-R

>
> --
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC 5/4] drm/msm: Add deadline based boost support
  2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
                   ` (5 preceding siblings ...)
  2021-07-27 14:41 ` Michel Dänzer
@ 2021-07-27 21:17 ` Rob Clark
  6 siblings, 0 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-27 21:17 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Matthew Brost, Michel Dänzer,
	Christian König, Rob Clark, Rob Clark, Sean Paul,
	David Airlie, Sumit Semwal, Christian König,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
This is a quick implementation of what I had in mind for driver side
of deadline boost.  For a couple games with bad gpu devfreq behavior
this boosts "Render quality" from ~35% to ~95%.  (The "Render quality"
metric in chrome://arc-overview-tracing is basically a measure of the
deviation in frame/commit time, so 100% would be a consistent fps
with no variantion.)  Not quite 100%, this is still a bit of a re-
active mechanism.

A similar result can be had by tuning devfreq to boost to max OPP at
a much lower threshold of busyness.  With the obvious downside that
you spend a lot of time running the GPU much faster than needed.

 drivers/gpu/drm/msm/msm_fence.c       | 76 +++++++++++++++++++++++++++
 drivers/gpu/drm/msm/msm_fence.h       | 20 +++++++
 drivers/gpu/drm/msm/msm_gpu.h         |  1 +
 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++
 4 files changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index f2cece542c3f..67c2a96e1c85 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -8,6 +8,37 @@
 
 #include "msm_drv.h"
 #include "msm_fence.h"
+#include "msm_gpu.h"
+
+static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fence);
+
+static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx)
+{
+	struct msm_drm_private *priv = fctx->dev->dev_private;
+	return priv->gpu;
+}
+
+static enum hrtimer_restart deadline_timer(struct hrtimer *t)
+{
+	struct msm_fence_context *fctx = container_of(t,
+			struct msm_fence_context, deadline_timer);
+
+	kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
+
+	return HRTIMER_NORESTART;
+}
+
+static void deadline_work(struct kthread_work *work)
+{
+	struct msm_fence_context *fctx = container_of(work,
+			struct msm_fence_context, deadline_work);
+
+	/* If deadline fence has already passed, nothing to do: */
+	if (fence_completed(fctx, fctx->next_deadline_fence))
+		return;
+
+	msm_devfreq_boost(fctx2gpu(fctx), 2);
+}
 
 
 struct msm_fence_context *
@@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr,
 	fctx->fenceptr = fenceptr;
 	spin_lock_init(&fctx->spinlock);
 
+	hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	fctx->deadline_timer.function = deadline_timer;
+
+	kthread_init_work(&fctx->deadline_work, deadline_work);
+
+	fctx->next_deadline = ktime_get();
+
 	return fctx;
 }
 
@@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
 {
 	spin_lock(&fctx->spinlock);
 	fctx->completed_fence = max(fence, fctx->completed_fence);
+	if (fence_completed(fctx, fctx->next_deadline_fence))
+		hrtimer_cancel(&fctx->deadline_timer);
 	spin_unlock(&fctx->spinlock);
 }
 
@@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence)
 	return fence_completed(f->fctx, f->base.seqno);
 }
 
+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+	struct msm_fence *f = to_msm_fence(fence);
+	struct msm_fence_context *fctx = f->fctx;
+	unsigned long flags;
+	ktime_t now;
+
+	spin_lock_irqsave(&fctx->spinlock, flags);
+	now = ktime_get();
+
+	if (ktime_after(now, fctx->next_deadline) ||
+			ktime_before(deadline, fctx->next_deadline)) {
+		fctx->next_deadline = deadline;
+		fctx->next_deadline_fence =
+			max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
+
+		/*
+		 * Set timer to trigger boost 3ms before deadline, or
+		 * if we are already less than 3ms before the deadline
+		 * schedule boost work immediately.
+		 */
+		deadline = ktime_sub(deadline, ms_to_ktime(3));
+
+		if (ktime_after(now, deadline)) {
+			kthread_queue_work(fctx2gpu(fctx)->worker,
+					&fctx->deadline_work);
+		} else {
+			hrtimer_start(&fctx->deadline_timer, deadline,
+					HRTIMER_MODE_ABS);
+		}
+	}
+
+	spin_unlock_irqrestore(&fctx->spinlock, flags);
+}
+
 static const struct dma_fence_ops msm_fence_ops = {
 	.get_driver_name = msm_fence_get_driver_name,
 	.get_timeline_name = msm_fence_get_timeline_name,
 	.signaled = msm_fence_signaled,
+	.set_deadline = msm_fence_set_deadline,
 };
 
 struct dma_fence *
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
index 4783db528bcc..d34e853c555a 100644
--- a/drivers/gpu/drm/msm/msm_fence.h
+++ b/drivers/gpu/drm/msm/msm_fence.h
@@ -50,6 +50,26 @@ struct msm_fence_context {
 	volatile uint32_t *fenceptr;
 
 	spinlock_t spinlock;
+
+	/*
+	 * TODO this doesn't really deal with multiple deadlines, like
+	 * if userspace got multiple frames ahead.. OTOH atomic updates
+	 * don't queue, so maybe that is ok
+	 */
+
+	/** next_deadline: Time of next deadline */
+	ktime_t next_deadline;
+
+	/**
+	 * next_deadline_fence:
+	 *
+	 * Fence value for next pending deadline.  The deadline timer is
+	 * canceled when this fence is signaled.
+	 */
+	uint32_t next_deadline_fence;
+
+	struct hrtimer deadline_timer;
+	struct kthread_work deadline_work;
 };
 
 struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 0e4b45bff2e6..e031c9b495ed 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -425,6 +425,7 @@ void msm_devfreq_init(struct msm_gpu *gpu);
 void msm_devfreq_cleanup(struct msm_gpu *gpu);
 void msm_devfreq_resume(struct msm_gpu *gpu);
 void msm_devfreq_suspend(struct msm_gpu *gpu);
+void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor);
 void msm_devfreq_active(struct msm_gpu *gpu);
 void msm_devfreq_idle(struct msm_gpu *gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index 0a1ee20296a2..8a8d7b9028a3 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -144,6 +144,26 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
 	devfreq_suspend_device(gpu->devfreq.devfreq);
 }
 
+void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor)
+{
+	struct msm_gpu_devfreq *df = &gpu->devfreq;
+	unsigned long freq;
+
+	/*
+	 * Hold devfreq lock to synchronize with get_dev_status()/
+	 * target() callbacks
+	 */
+	mutex_lock(&df->devfreq->lock);
+
+	freq = get_freq(gpu);
+
+	freq *= factor;
+
+	msm_devfreq_target(&gpu->pdev->dev, &freq, 0);
+
+	mutex_unlock(&df->devfreq->lock);
+}
+
 void msm_devfreq_active(struct msm_gpu *gpu)
 {
 	struct msm_gpu_devfreq *df = &gpu->devfreq;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-27 14:25     ` Rob Clark
@ 2021-07-28  7:03       ` Christian König
  2021-07-28 11:37         ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-28  7:03 UTC (permalink / raw)
  To: Rob Clark, Christian König
  Cc: dri-devel, Daniel Vetter, Matthew Brost, Rob Clark, Sumit Semwal,
	Gustavo Padovan, open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

Am 27.07.21 um 16:25 schrieb Rob Clark:
> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 27.07.21 um 01:38 schrieb Rob Clark:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Add a way to hint to the fence signaler of an upcoming deadline, such as
>>> vblank, which the fence waiter would prefer not to miss.  This is to aid
>>> the fence signaler in making power management decisions, like boosting
>>> frequency as the deadline approaches and awareness of missing deadlines
>>> so that can be factored in to the frequency scaling.
>>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>>    drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++
>>>    include/linux/dma-fence.h   | 17 ++++++++++++++++
>>>    2 files changed, 56 insertions(+)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index ce0f5eff575d..2e0d25ab457e 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
>>>    }
>>>    EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>>>
>>> +
>>> +/**
>>> + * dma_fence_set_deadline - set desired fence-wait deadline
>>> + * @fence:    the fence that is to be waited on
>>> + * @deadline: the time by which the waiter hopes for the fence to be
>>> + *            signaled
>>> + *
>>> + * Inform the fence signaler of an upcoming deadline, such as vblank, by
>>> + * which point the waiter would prefer the fence to be signaled by.  This
>>> + * is intended to give feedback to the fence signaler to aid in power
>>> + * management decisions, such as boosting GPU frequency if a periodic
>>> + * vblank deadline is approaching.
>>> + */
>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>> +{
>>> +     unsigned long flags;
>>> +
>>> +     if (dma_fence_is_signaled(fence))
>>> +             return;
>>> +
>>> +     spin_lock_irqsave(fence->lock, flags);
>>> +
>>> +     /* If we already have an earlier deadline, keep it: */
>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
>>> +         ktime_before(fence->deadline, deadline)) {
>>> +             spin_unlock_irqrestore(fence->lock, flags);
>>> +             return;
>>> +     }
>>> +
>>> +     fence->deadline = deadline;
>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
>>> +
>>> +     spin_unlock_irqrestore(fence->lock, flags);
>>> +
>>> +     if (fence->ops->set_deadline)
>>> +             fence->ops->set_deadline(fence, deadline);
>>> +}
>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
>>> +
>>>    /**
>>>     * dma_fence_init - Initialize a custom fence.
>>>     * @fence: the fence to initialize
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -88,6 +88,7 @@ struct dma_fence {
>>>                /* @timestamp replaced by @rcu on dma_fence_release() */
>>>                struct rcu_head rcu;
>>>        };
>>> +     ktime_t deadline;
>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the
>> deadline as extra field here.
>>
>> We tuned the dma_fence structure intentionally so that it is only 64 bytes.
> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
>
> We could push the ktime_t (and timer) down into the derived fence
> class, but I think there is going to need to be some extra storage
> *somewhere*.. maybe the fence signaler could get away with just
> storing the nearest upcoming deadline per fence-context instead?

I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex 
scenarios, but rather per frame. And a frame is usually composed from 
multiple fences.

Regards,
Christian.

>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>>        u64 context;
>>>        u64 seqno;
>>>        unsigned long flags;
>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
>>>        DMA_FENCE_FLAG_SIGNALED_BIT,
>>>        DMA_FENCE_FLAG_TIMESTAMP_BIT,
>>>        DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
>>>        DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>>>    };
>>>
>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
>>>         */
>>>        void (*timeline_value_str)(struct dma_fence *fence,
>>>                                   char *str, int size);
>>> +
>>> +     /**
>>> +      * @set_deadline:
>>> +      *
>>> +      * Callback to allow a fence waiter to inform the fence signaler of an
>>> +      * upcoming deadline, such as vblank, by which point the waiter would
>>> +      * prefer the fence to be signaled by.  This is intended to give feedback
>>> +      * to the fence signaler to aid in power management decisions, such as
>>> +      * boosting GPU frequency.
>>> +      *
>>> +      * This callback is optional.
>>> +      */
>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>>>    };
>>>
>>>    void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)
>>>        return ret < 0 ? ret : 0;
>>>    }
>>>
>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
>>> +
>>>    struct dma_fence *dma_fence_get_stub(void);
>>>    struct dma_fence *dma_fence_allocate_private_stub(void);
>>>    u64 dma_fence_context_alloc(unsigned num);


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-27 15:37       ` Rob Clark
@ 2021-07-28 11:36         ` Christian König
  2021-07-28 13:08           ` Michel Dänzer
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-28 11:36 UTC (permalink / raw)
  To: Rob Clark, Michel Dänzer
  Cc: Rob Clark, Matthew Brost, Jack Zhang, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

Am 27.07.21 um 17:37 schrieb Rob Clark:
> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>
>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>> callback, this approach adds a way to set a deadline on the fence, by
>>>>> which the waiter would like to see the fence signalled.
>>>>>
>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>> finish the drm/msm part this week.
>>>>>
>>>>> Original description:
>>>>>
>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>> trick the GPU into running at a lower frequence, when really we
>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>> in the first place.
>>>>>
>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>> via dma-fence for a couple of reasons:
>>>>>
>>>>> 1) To continue to be able to use the atomic helpers
>>>>> 2) To support cases where display and gpu are different drivers
>>>>>
>>>>> [1] https://patchwork.freedesktop.org/series/90331/
>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>>>
>>> I guess you mean "no effect at all *except* for fullscreen..."?
>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>>
>>
>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>> single layer is changing, not try to be clever and just push the
>>> update down to the kernel.
>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>>
>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>>
> Well, in the end, there is more than one compositor out there.. and if
> some wayland compositors are going this route, they can also implement
> the same mechanism in userspace using the sysfs that devfreq exports.
>
> But it sounds simpler to me for the compositor to have a sort of "game
> mode" for fullscreen games.. I'm less worried about UI interactive
> workloads, boosting the GPU freq upon sudden activity after a period
> of inactivity seems to work reasonably well there.

At least AMD hardware is already capable of flipping frames on GPU 
events like finishing rendering (or uploading etc).

By waiting in userspace on the CPU before send the frame to the hardware 
you are completely killing of such features.

For composing use cases that makes sense, but certainly not for full 
screen applications as far as I can see.

Regards,
Christian.

>
> BR,
> -R
>
>> --
>> Earthling Michel Dänzer               |               https://redhat.com
>> Libre software enthusiast             |             Mesa and X developer


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-28  7:03       ` Christian König
@ 2021-07-28 11:37         ` Christian König
  2021-07-28 15:15           ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-28 11:37 UTC (permalink / raw)
  To: Christian König, Rob Clark
  Cc: dri-devel, Daniel Vetter, Matthew Brost, Rob Clark, Sumit Semwal,
	Gustavo Padovan, open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

Am 28.07.21 um 09:03 schrieb Christian König:
> Am 27.07.21 um 16:25 schrieb Rob Clark:
>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
>>>> From: Rob Clark <robdclark@chromium.org>
>>>>
>>>> Add a way to hint to the fence signaler of an upcoming deadline, 
>>>> such as
>>>> vblank, which the fence waiter would prefer not to miss. This is to 
>>>> aid
>>>> the fence signaler in making power management decisions, like boosting
>>>> frequency as the deadline approaches and awareness of missing 
>>>> deadlines
>>>> so that can be factored in to the frequency scaling.
>>>>
>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>> ---
>>>>    drivers/dma-buf/dma-fence.c | 39 
>>>> +++++++++++++++++++++++++++++++++++++
>>>>    include/linux/dma-fence.h   | 17 ++++++++++++++++
>>>>    2 files changed, 56 insertions(+)
>>>>
>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>> index ce0f5eff575d..2e0d25ab457e 100644
>>>> --- a/drivers/dma-buf/dma-fence.c
>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence 
>>>> **fences, uint32_t count,
>>>>    }
>>>>    EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>>>>
>>>> +
>>>> +/**
>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
>>>> + * @fence:    the fence that is to be waited on
>>>> + * @deadline: the time by which the waiter hopes for the fence to be
>>>> + *            signaled
>>>> + *
>>>> + * Inform the fence signaler of an upcoming deadline, such as 
>>>> vblank, by
>>>> + * which point the waiter would prefer the fence to be signaled 
>>>> by.  This
>>>> + * is intended to give feedback to the fence signaler to aid in power
>>>> + * management decisions, such as boosting GPU frequency if a periodic
>>>> + * vblank deadline is approaching.
>>>> + */
>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t 
>>>> deadline)
>>>> +{
>>>> +     unsigned long flags;
>>>> +
>>>> +     if (dma_fence_is_signaled(fence))
>>>> +             return;
>>>> +
>>>> +     spin_lock_irqsave(fence->lock, flags);
>>>> +
>>>> +     /* If we already have an earlier deadline, keep it: */
>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
>>>> +         ktime_before(fence->deadline, deadline)) {
>>>> +             spin_unlock_irqrestore(fence->lock, flags);
>>>> +             return;
>>>> +     }
>>>> +
>>>> +     fence->deadline = deadline;
>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
>>>> +
>>>> +     spin_unlock_irqrestore(fence->lock, flags);
>>>> +
>>>> +     if (fence->ops->set_deadline)
>>>> +             fence->ops->set_deadline(fence, deadline);
>>>> +}
>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
>>>> +
>>>>    /**
>>>>     * dma_fence_init - Initialize a custom fence.
>>>>     * @fence: the fence to initialize
>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
>>>> --- a/include/linux/dma-fence.h
>>>> +++ b/include/linux/dma-fence.h
>>>> @@ -88,6 +88,7 @@ struct dma_fence {
>>>>                /* @timestamp replaced by @rcu on 
>>>> dma_fence_release() */
>>>>                struct rcu_head rcu;
>>>>        };
>>>> +     ktime_t deadline;
>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding 
>>> the
>>> deadline as extra field here.
>>>
>>> We tuned the dma_fence structure intentionally so that it is only 64 
>>> bytes.
>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
>>
>> We could push the ktime_t (and timer) down into the derived fence
>> class, but I think there is going to need to be some extra storage
>> *somewhere*.. maybe the fence signaler could get away with just
>> storing the nearest upcoming deadline per fence-context instead?
>
> I would just push that into the driver instead.
>
> You most likely don't want the deadline per fence anyway in complex 
> scenarios, but rather per frame. And a frame is usually composed from 
> multiple fences.

Thinking more about it we could probably kill the spinlock pointer and 
make the flags 32bit if we absolutely need that here.

But I still don't see the need for that, especially since most drivers 
probably won't implement it.

Regards,
Christian.

>
> Regards,
> Christian.
>
>>
>> BR,
>> -R
>>
>>> Regards,
>>> Christian.
>>>
>>>>        u64 context;
>>>>        u64 seqno;
>>>>        unsigned long flags;
>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
>>>>        DMA_FENCE_FLAG_SIGNALED_BIT,
>>>>        DMA_FENCE_FLAG_TIMESTAMP_BIT,
>>>>        DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
>>>>        DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>>>>    };
>>>>
>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
>>>>         */
>>>>        void (*timeline_value_str)(struct dma_fence *fence,
>>>>                                   char *str, int size);
>>>> +
>>>> +     /**
>>>> +      * @set_deadline:
>>>> +      *
>>>> +      * Callback to allow a fence waiter to inform the fence 
>>>> signaler of an
>>>> +      * upcoming deadline, such as vblank, by which point the 
>>>> waiter would
>>>> +      * prefer the fence to be signaled by.  This is intended to 
>>>> give feedback
>>>> +      * to the fence signaler to aid in power management 
>>>> decisions, such as
>>>> +      * boosting GPU frequency.
>>>> +      *
>>>> +      * This callback is optional.
>>>> +      */
>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>>>>    };
>>>>
>>>>    void dma_fence_init(struct dma_fence *fence, const struct 
>>>> dma_fence_ops *ops,
>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct 
>>>> dma_fence *fence, bool intr)
>>>>        return ret < 0 ? ret : 0;
>>>>    }
>>>>
>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t 
>>>> deadline);
>>>> +
>>>>    struct dma_fence *dma_fence_get_stub(void);
>>>>    struct dma_fence *dma_fence_allocate_private_stub(void);
>>>>    u64 dma_fence_context_alloc(unsigned num);
>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 11:36         ` Christian König
@ 2021-07-28 13:08           ` Michel Dänzer
  2021-07-28 13:13             ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Michel Dänzer @ 2021-07-28 13:08 UTC (permalink / raw)
  To: Christian König, Rob Clark
  Cc: Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones, Christian König,
	open list:DMA BUFFER SHARING FRAMEWORK

On 2021-07-28 1:36 p.m., Christian König wrote:
> Am 27.07.21 um 17:37 schrieb Rob Clark:
>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>
>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>>> callback, this approach adds a way to set a deadline on the fence, by
>>>>>> which the waiter would like to see the fence signalled.
>>>>>>
>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>>> finish the drm/msm part this week.
>>>>>>
>>>>>> Original description:
>>>>>>
>>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>>> trick the GPU into running at a lower frequence, when really we
>>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>>> in the first place.
>>>>>>
>>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>>> via dma-fence for a couple of reasons:
>>>>>>
>>>>>> 1) To continue to be able to use the atomic helpers
>>>>>> 2) To support cases where display and gpu are different drivers
>>>>>>
>>>>>> [1] https://patchwork.freedesktop.org/series/90331/
>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>>>>
>>>> I guess you mean "no effect at all *except* for fullscreen..."?
>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>>>
>>>
>>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>>> single layer is changing, not try to be clever and just push the
>>>> update down to the kernel.
>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>>>
>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>>>
>> Well, in the end, there is more than one compositor out there.. and if
>> some wayland compositors are going this route, they can also implement
>> the same mechanism in userspace using the sysfs that devfreq exports.
>>
>> But it sounds simpler to me for the compositor to have a sort of "game
>> mode" for fullscreen games.. I'm less worried about UI interactive
>> workloads, boosting the GPU freq upon sudden activity after a period
>> of inactivity seems to work reasonably well there.
> 
> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> 
> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> 
> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.

Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.

(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:08           ` Michel Dänzer
@ 2021-07-28 13:13             ` Christian König
  2021-07-28 13:24               ` Michel Dänzer
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-28 13:13 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Rob Clark
  Cc: Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 28.07.21 um 15:08 schrieb Michel Dänzer:
> On 2021-07-28 1:36 p.m., Christian König wrote:
>> Am 27.07.21 um 17:37 schrieb Rob Clark:
>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>
>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
>>>>>>> which the waiter would like to see the fence signalled.
>>>>>>>
>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>>>> finish the drm/msm part this week.
>>>>>>>
>>>>>>> Original description:
>>>>>>>
>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>>>> trick the GPU into running at a lower frequence, when really we
>>>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>>>> in the first place.
>>>>>>>
>>>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>>>> via dma-fence for a couple of reasons:
>>>>>>>
>>>>>>> 1) To continue to be able to use the atomic helpers
>>>>>>> 2) To support cases where display and gpu are different drivers
>>>>>>>
>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>>>>>
>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>>>>
>>>>
>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>>>> single layer is changing, not try to be clever and just push the
>>>>> update down to the kernel.
>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>>>>
>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>>>>
>>> Well, in the end, there is more than one compositor out there.. and if
>>> some wayland compositors are going this route, they can also implement
>>> the same mechanism in userspace using the sysfs that devfreq exports.
>>>
>>> But it sounds simpler to me for the compositor to have a sort of "game
>>> mode" for fullscreen games.. I'm less worried about UI interactive
>>> workloads, boosting the GPU freq upon sudden activity after a period
>>> of inactivity seems to work reasonably well there.
>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>
>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>
>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>
> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)

Well then let's extend the KMS API instead of hacking together 
workarounds in userspace.

Making such decisions is the responsibility of the kernel and not 
userspace in my opinion.

E.g. we could for example also need to reshuffle BOs so that a BO is 
even scanout able. Userspace can't know about such stuff before hand 
because the memory usage can change at any time.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:13             ` Christian König
@ 2021-07-28 13:24               ` Michel Dänzer
  2021-07-28 13:31                 ` Christian König
  2021-07-28 15:34                 ` Rob Clark
  0 siblings, 2 replies; 52+ messages in thread
From: Michel Dänzer @ 2021-07-28 13:24 UTC (permalink / raw)
  To: Christian König, Christian König, Rob Clark
  Cc: Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

On 2021-07-28 3:13 p.m., Christian König wrote:
> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>> Am 27.07.21 um 17:37 schrieb Rob Clark:
>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>
>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
>>>>>>>> which the waiter would like to see the fence signalled.
>>>>>>>>
>>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>>>>> finish the drm/msm part this week.
>>>>>>>>
>>>>>>>> Original description:
>>>>>>>>
>>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>>>>> trick the GPU into running at a lower frequence, when really we
>>>>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>>>>> in the first place.
>>>>>>>>
>>>>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>>>>> via dma-fence for a couple of reasons:
>>>>>>>>
>>>>>>>> 1) To continue to be able to use the atomic helpers
>>>>>>>> 2) To support cases where display and gpu are different drivers
>>>>>>>>
>>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
>>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>>>>>>
>>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
>>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>>>>>
>>>>>
>>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>>>>> single layer is changing, not try to be clever and just push the
>>>>>> update down to the kernel.
>>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>>>>>
>>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>>>>>
>>>> Well, in the end, there is more than one compositor out there.. and if
>>>> some wayland compositors are going this route, they can also implement
>>>> the same mechanism in userspace using the sysfs that devfreq exports.
>>>>
>>>> But it sounds simpler to me for the compositor to have a sort of "game
>>>> mode" for fullscreen games.. I'm less worried about UI interactive
>>>> workloads, boosting the GPU freq upon sudden activity after a period
>>>> of inactivity seems to work reasonably well there.
>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>
>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>
>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>
>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
> 
> Well then let's extend the KMS API instead of hacking together workarounds in userspace.

That's indeed a possible solution for the fullscreen / direct scanout case.

Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:24               ` Michel Dänzer
@ 2021-07-28 13:31                 ` Christian König
  2021-07-28 13:57                   ` Pekka Paalanen
  2021-07-28 15:34                 ` Rob Clark
  1 sibling, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-28 13:31 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Rob Clark
  Cc: Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 28.07.21 um 15:24 schrieb Michel Dänzer:
> On 2021-07-28 3:13 p.m., Christian König wrote:
>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:
>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>
>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
>>>>>>>>> which the waiter would like to see the fence signalled.
>>>>>>>>>
>>>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>>>>>> finish the drm/msm part this week.
>>>>>>>>>
>>>>>>>>> Original description:
>>>>>>>>>
>>>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>>>>>> trick the GPU into running at a lower frequence, when really we
>>>>>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>>>>>> in the first place.
>>>>>>>>>
>>>>>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>>>>>> via dma-fence for a couple of reasons:
>>>>>>>>>
>>>>>>>>> 1) To continue to be able to use the atomic helpers
>>>>>>>>> 2) To support cases where display and gpu are different drivers
>>>>>>>>>
>>>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Ccd365bc397c247bb96b108d951cb0686%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630754720055928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1s0obbeqH%2FoiVh1JRfNaZIqPVK3EbKB0OP9zZ%2BAz874%3D&amp;reserved=0
>>>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Ccd365bc397c247bb96b108d951cb0686%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630754720065922%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6aTvBugWzfyZ13NLRYW3Qh9t2%2BDbmKpC1390m07BxV0%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>>>>>>>
>>>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
>>>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>>>>>>
>>>>>>
>>>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>>>>>> single layer is changing, not try to be clever and just push the
>>>>>>> update down to the kernel.
>>>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>>>>>>
>>>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>>>>>>
>>>>> Well, in the end, there is more than one compositor out there.. and if
>>>>> some wayland compositors are going this route, they can also implement
>>>>> the same mechanism in userspace using the sysfs that devfreq exports.
>>>>>
>>>>> But it sounds simpler to me for the compositor to have a sort of "game
>>>>> mode" for fullscreen games.. I'm less worried about UI interactive
>>>>> workloads, boosting the GPU freq upon sudden activity after a period
>>>>> of inactivity seems to work reasonably well there.
>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>>
>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>>
>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>>
>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
> That's indeed a possible solution for the fullscreen / direct scanout case.
>
> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.

Yeah, that's true as well.

At least as long as nobody invents a mechanism to do this decision on 
the GPU instead.

Christian.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:31                 ` Christian König
@ 2021-07-28 13:57                   ` Pekka Paalanen
  2021-07-28 14:30                     ` Christian König
  2021-07-28 15:27                     ` Rob Clark
  0 siblings, 2 replies; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-28 13:57 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, Christian König, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 2090 bytes --]

On Wed, 28 Jul 2021 15:31:41 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 28.07.21 um 15:24 schrieb Michel Dänzer:
> > On 2021-07-28 3:13 p.m., Christian König wrote:  
> >> Am 28.07.21 um 15:08 schrieb Michel Dänzer:  
> >>> On 2021-07-28 1:36 p.m., Christian König wrote:  

> >>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> >>>>
> >>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> >>>>
> >>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.  
> >>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> >>>
> >>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)  
> >> Well then let's extend the KMS API instead of hacking together workarounds in userspace.  
> > That's indeed a possible solution for the fullscreen / direct scanout case.
> >
> > Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.  
> 
> Yeah, that's true as well.
> 
> At least as long as nobody invents a mechanism to do this decision on 
> the GPU instead.

That would mean putting the whole window manager into the GPU.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:57                   ` Pekka Paalanen
@ 2021-07-28 14:30                     ` Christian König
  2021-07-29  8:08                       ` Michel Dänzer
  2021-07-29  8:23                       ` Pekka Paalanen
  2021-07-28 15:27                     ` Rob Clark
  1 sibling, 2 replies; 52+ messages in thread
From: Christian König @ 2021-07-28 14:30 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Michel Dänzer, Christian König, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
> On Wed, 28 Jul 2021 15:31:41 +0200
> Christian König <christian.koenig@amd.com> wrote:
>
>> Am 28.07.21 um 15:24 schrieb Michel Dänzer:
>>> On 2021-07-28 3:13 p.m., Christian König wrote:
>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>>>>
>>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>>>>
>>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>>>>
>>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
>>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
>>> That's indeed a possible solution for the fullscreen / direct scanout case.
>>>
>>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
>> Yeah, that's true as well.
>>
>> At least as long as nobody invents a mechanism to do this decision on
>> the GPU instead.
> That would mean putting the whole window manager into the GPU.

Not really. You only need to decide if you want to use the new backing 
store or the old one based on if the new surface is ready or not.

At AMD hardware can already do this, we just don't have an OpenGL 
extension for it (but maybe already in Vulkan).

Regards,
Christian.

>
>
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-28 11:37         ` Christian König
@ 2021-07-28 15:15           ` Rob Clark
  2021-07-28 17:23             ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-28 15:15 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, dri-devel, Daniel Vetter, Matthew Brost,
	Rob Clark, Sumit Semwal, Gustavo Padovan,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

On Wed, Jul 28, 2021 at 4:37 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 28.07.21 um 09:03 schrieb Christian König:
> > Am 27.07.21 um 16:25 schrieb Rob Clark:
> >> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> >> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>> Am 27.07.21 um 01:38 schrieb Rob Clark:
> >>>> From: Rob Clark <robdclark@chromium.org>
> >>>>
> >>>> Add a way to hint to the fence signaler of an upcoming deadline,
> >>>> such as
> >>>> vblank, which the fence waiter would prefer not to miss. This is to
> >>>> aid
> >>>> the fence signaler in making power management decisions, like boosting
> >>>> frequency as the deadline approaches and awareness of missing
> >>>> deadlines
> >>>> so that can be factored in to the frequency scaling.
> >>>>
> >>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>> ---
> >>>>    drivers/dma-buf/dma-fence.c | 39
> >>>> +++++++++++++++++++++++++++++++++++++
> >>>>    include/linux/dma-fence.h   | 17 ++++++++++++++++
> >>>>    2 files changed, 56 insertions(+)
> >>>>
> >>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >>>> index ce0f5eff575d..2e0d25ab457e 100644
> >>>> --- a/drivers/dma-buf/dma-fence.c
> >>>> +++ b/drivers/dma-buf/dma-fence.c
> >>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
> >>>> **fences, uint32_t count,
> >>>>    }
> >>>>    EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >>>>
> >>>> +
> >>>> +/**
> >>>> + * dma_fence_set_deadline - set desired fence-wait deadline
> >>>> + * @fence:    the fence that is to be waited on
> >>>> + * @deadline: the time by which the waiter hopes for the fence to be
> >>>> + *            signaled
> >>>> + *
> >>>> + * Inform the fence signaler of an upcoming deadline, such as
> >>>> vblank, by
> >>>> + * which point the waiter would prefer the fence to be signaled
> >>>> by.  This
> >>>> + * is intended to give feedback to the fence signaler to aid in power
> >>>> + * management decisions, such as boosting GPU frequency if a periodic
> >>>> + * vblank deadline is approaching.
> >>>> + */
> >>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> >>>> deadline)
> >>>> +{
> >>>> +     unsigned long flags;
> >>>> +
> >>>> +     if (dma_fence_is_signaled(fence))
> >>>> +             return;
> >>>> +
> >>>> +     spin_lock_irqsave(fence->lock, flags);
> >>>> +
> >>>> +     /* If we already have an earlier deadline, keep it: */
> >>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> >>>> +         ktime_before(fence->deadline, deadline)) {
> >>>> +             spin_unlock_irqrestore(fence->lock, flags);
> >>>> +             return;
> >>>> +     }
> >>>> +
> >>>> +     fence->deadline = deadline;
> >>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> >>>> +
> >>>> +     spin_unlock_irqrestore(fence->lock, flags);
> >>>> +
> >>>> +     if (fence->ops->set_deadline)
> >>>> +             fence->ops->set_deadline(fence, deadline);
> >>>> +}
> >>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> >>>> +
> >>>>    /**
> >>>>     * dma_fence_init - Initialize a custom fence.
> >>>>     * @fence: the fence to initialize
> >>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> >>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> >>>> --- a/include/linux/dma-fence.h
> >>>> +++ b/include/linux/dma-fence.h
> >>>> @@ -88,6 +88,7 @@ struct dma_fence {
> >>>>                /* @timestamp replaced by @rcu on
> >>>> dma_fence_release() */
> >>>>                struct rcu_head rcu;
> >>>>        };
> >>>> +     ktime_t deadline;
> >>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
> >>> the
> >>> deadline as extra field here.
> >>>
> >>> We tuned the dma_fence structure intentionally so that it is only 64
> >>> bytes.
> >> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
> >>
> >> We could push the ktime_t (and timer) down into the derived fence
> >> class, but I think there is going to need to be some extra storage
> >> *somewhere*.. maybe the fence signaler could get away with just
> >> storing the nearest upcoming deadline per fence-context instead?
> >
> > I would just push that into the driver instead.
> >
> > You most likely don't want the deadline per fence anyway in complex
> > scenarios, but rather per frame. And a frame is usually composed from
> > multiple fences.

Right, I ended up keeping track of the nearest deadline in patch 5/4
which added drm/msm support:

  https://patchwork.freedesktop.org/patch/447138/

But if we do have the ktime_t in dma_fence in dma_fence, we can add
some checks and avoid calling back to the driver if a later deadline
is set on a fence that already has an earlier deadline.  OTOH I
suppose I can push all that back to the driver to start, and we can
revisit once we have more drivers implementing deadline support.

>
> Thinking more about it we could probably kill the spinlock pointer and
> make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops
pointer, and u64 context into that and replace with a single
dma_fence_context ptr, fwiw

BR,
-R

> But I still don't see the need for that, especially since most drivers
> probably won't implement it.
>
> Regards,
> Christian.
>
> >
> > Regards,
> > Christian.
> >
> >>
> >> BR,
> >> -R
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>>        u64 context;
> >>>>        u64 seqno;
> >>>>        unsigned long flags;
> >>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> >>>>        DMA_FENCE_FLAG_SIGNALED_BIT,
> >>>>        DMA_FENCE_FLAG_TIMESTAMP_BIT,
> >>>>        DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> >>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> >>>>        DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> >>>>    };
> >>>>
> >>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
> >>>>         */
> >>>>        void (*timeline_value_str)(struct dma_fence *fence,
> >>>>                                   char *str, int size);
> >>>> +
> >>>> +     /**
> >>>> +      * @set_deadline:
> >>>> +      *
> >>>> +      * Callback to allow a fence waiter to inform the fence
> >>>> signaler of an
> >>>> +      * upcoming deadline, such as vblank, by which point the
> >>>> waiter would
> >>>> +      * prefer the fence to be signaled by.  This is intended to
> >>>> give feedback
> >>>> +      * to the fence signaler to aid in power management
> >>>> decisions, such as
> >>>> +      * boosting GPU frequency.
> >>>> +      *
> >>>> +      * This callback is optional.
> >>>> +      */
> >>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> >>>>    };
> >>>>
> >>>>    void dma_fence_init(struct dma_fence *fence, const struct
> >>>> dma_fence_ops *ops,
> >>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
> >>>> dma_fence *fence, bool intr)
> >>>>        return ret < 0 ? ret : 0;
> >>>>    }
> >>>>
> >>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> >>>> deadline);
> >>>> +
> >>>>    struct dma_fence *dma_fence_get_stub(void);
> >>>>    struct dma_fence *dma_fence_allocate_private_stub(void);
> >>>>    u64 dma_fence_context_alloc(unsigned num);
> >
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:57                   ` Pekka Paalanen
  2021-07-28 14:30                     ` Christian König
@ 2021-07-28 15:27                     ` Rob Clark
  2021-07-28 17:20                       ` Christian König
  1 sibling, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-28 15:27 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Christian König, Rob Clark, Matthew Brost, Jack Zhang,
	Christian König, Michel Dänzer, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 28, 2021 at 6:57 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Wed, 28 Jul 2021 15:31:41 +0200
> Christian König <christian.koenig@amd.com> wrote:
>
> > Am 28.07.21 um 15:24 schrieb Michel Dänzer:
> > > On 2021-07-28 3:13 p.m., Christian König wrote:
> > >> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
> > >>> On 2021-07-28 1:36 p.m., Christian König wrote:
>
> > >>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> > >>>>
> > >>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> > >>>>
> > >>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
> > >>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> > >>>
> > >>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
> > >> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
> > > That's indeed a possible solution for the fullscreen / direct scanout case.
> > >
> > > Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
> >
> > Yeah, that's true as well.
> >
> > At least as long as nobody invents a mechanism to do this decision on
> > the GPU instead.
>
> That would mean putting the whole window manager into the GPU.
>

Hmm, seems like we could come up with a way for a shader to figure out
if a fence has signaled or not on the GPU, and then either sample from
the current or previous window surface?

BR,
-R

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 13:24               ` Michel Dänzer
  2021-07-28 13:31                 ` Christian König
@ 2021-07-28 15:34                 ` Rob Clark
  2021-07-29  7:09                   ` Daniel Vetter
  1 sibling, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-28 15:34 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Christian König, Christian König, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2021-07-28 3:13 p.m., Christian König wrote:
> > Am 28.07.21 um 15:08 schrieb Michel Dänzer:
> >> On 2021-07-28 1:36 p.m., Christian König wrote:
> >>> Am 27.07.21 um 17:37 schrieb Rob Clark:
> >>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
> >>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
> >>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
> >>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> >>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>
> >>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> >>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> >>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> >>>>>>>> which the waiter would like to see the fence signalled.
> >>>>>>>>
> >>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
> >>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
> >>>>>>>> finish the drm/msm part this week.
> >>>>>>>>
> >>>>>>>> Original description:
> >>>>>>>>
> >>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
> >>>>>>>> trick the GPU into running at a lower frequence, when really we
> >>>>>>>> want to be running at a higher frequency to not miss the vblanks
> >>>>>>>> in the first place.
> >>>>>>>>
> >>>>>>>> This is partially inspired by a trick i915 does, but implemented
> >>>>>>>> via dma-fence for a couple of reasons:
> >>>>>>>>
> >>>>>>>> 1) To continue to be able to use the atomic helpers
> >>>>>>>> 2) To support cases where display and gpu are different drivers
> >>>>>>>>
> >>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
> >>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
> >>>>>>>
> >>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
> >>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
> >>>>>
> >>>>>
> >>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
> >>>>>> single layer is changing, not try to be clever and just push the
> >>>>>> update down to the kernel.
> >>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
> >>>>>
> >>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
> >>>>>
> >>>> Well, in the end, there is more than one compositor out there.. and if
> >>>> some wayland compositors are going this route, they can also implement
> >>>> the same mechanism in userspace using the sysfs that devfreq exports.
> >>>>
> >>>> But it sounds simpler to me for the compositor to have a sort of "game
> >>>> mode" for fullscreen games.. I'm less worried about UI interactive
> >>>> workloads, boosting the GPU freq upon sudden activity after a period
> >>>> of inactivity seems to work reasonably well there.
> >>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> >>>
> >>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> >>>
> >>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
> >> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> >>
> >> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
> >
> > Well then let's extend the KMS API instead of hacking together workarounds in userspace.
>
> That's indeed a possible solution for the fullscreen / direct scanout case.
>
> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
>

I think solving the fullscreen game case is sufficient enough forward
progress to be useful.  And the results I'm seeing[1] are sufficiently
positive to convince me that dma-fence deadline support is the right
thing to do.

But maybe the solution to make this also useful for mutter is to, once
we have deadline support, extend it with an ioctl to the dma-fence fd
so userspace can be the one setting the deadline.

[1] https://patchwork.freedesktop.org/patch/447138/

BR,
-R

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 15:27                     ` Rob Clark
@ 2021-07-28 17:20                       ` Christian König
  0 siblings, 0 replies; 52+ messages in thread
From: Christian König @ 2021-07-28 17:20 UTC (permalink / raw)
  To: Rob Clark, Pekka Paalanen
  Cc: Rob Clark, Matthew Brost, Jack Zhang, Christian König,
	Michel Dänzer, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK



Am 28.07.21 um 17:27 schrieb Rob Clark:
> On Wed, Jul 28, 2021 at 6:57 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>> On Wed, 28 Jul 2021 15:31:41 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 28.07.21 um 15:24 schrieb Michel Dänzer:
>>>> On 2021-07-28 3:13 p.m., Christian König wrote:
>>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>>>>>
>>>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>>>>>
>>>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>>>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>>>>>
>>>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
>>>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
>>>> That's indeed a possible solution for the fullscreen / direct scanout case.
>>>>
>>>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
>>> Yeah, that's true as well.
>>>
>>> At least as long as nobody invents a mechanism to do this decision on
>>> the GPU instead.
>> That would mean putting the whole window manager into the GPU.
>>
> Hmm, seems like we could come up with a way for a shader to figure out
> if a fence has signaled or not on the GPU, and then either sample from
> the current or previous window surface?

Yeah, exactly that's my thinking as well.

Christian.

>
> BR,
> -R


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-28 15:15           ` Rob Clark
@ 2021-07-28 17:23             ` Christian König
  2021-07-28 17:58               ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-28 17:23 UTC (permalink / raw)
  To: Rob Clark, Christian König
  Cc: dri-devel, Daniel Vetter, Matthew Brost, Rob Clark, Sumit Semwal,
	Gustavo Padovan, open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list



Am 28.07.21 um 17:15 schrieb Rob Clark:
> On Wed, Jul 28, 2021 at 4:37 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 28.07.21 um 09:03 schrieb Christian König:
>>> Am 27.07.21 um 16:25 schrieb Rob Clark:
>>>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>
>>>>>> Add a way to hint to the fence signaler of an upcoming deadline,
>>>>>> such as
>>>>>> vblank, which the fence waiter would prefer not to miss. This is to
>>>>>> aid
>>>>>> the fence signaler in making power management decisions, like boosting
>>>>>> frequency as the deadline approaches and awareness of missing
>>>>>> deadlines
>>>>>> so that can be factored in to the frequency scaling.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>> ---
>>>>>>     drivers/dma-buf/dma-fence.c | 39
>>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>>     include/linux/dma-fence.h   | 17 ++++++++++++++++
>>>>>>     2 files changed, 56 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>>>> index ce0f5eff575d..2e0d25ab457e 100644
>>>>>> --- a/drivers/dma-buf/dma-fence.c
>>>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
>>>>>> **fences, uint32_t count,
>>>>>>     }
>>>>>>     EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>>>>>>
>>>>>> +
>>>>>> +/**
>>>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
>>>>>> + * @fence:    the fence that is to be waited on
>>>>>> + * @deadline: the time by which the waiter hopes for the fence to be
>>>>>> + *            signaled
>>>>>> + *
>>>>>> + * Inform the fence signaler of an upcoming deadline, such as
>>>>>> vblank, by
>>>>>> + * which point the waiter would prefer the fence to be signaled
>>>>>> by.  This
>>>>>> + * is intended to give feedback to the fence signaler to aid in power
>>>>>> + * management decisions, such as boosting GPU frequency if a periodic
>>>>>> + * vblank deadline is approaching.
>>>>>> + */
>>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
>>>>>> deadline)
>>>>>> +{
>>>>>> +     unsigned long flags;
>>>>>> +
>>>>>> +     if (dma_fence_is_signaled(fence))
>>>>>> +             return;
>>>>>> +
>>>>>> +     spin_lock_irqsave(fence->lock, flags);
>>>>>> +
>>>>>> +     /* If we already have an earlier deadline, keep it: */
>>>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
>>>>>> +         ktime_before(fence->deadline, deadline)) {
>>>>>> +             spin_unlock_irqrestore(fence->lock, flags);
>>>>>> +             return;
>>>>>> +     }
>>>>>> +
>>>>>> +     fence->deadline = deadline;
>>>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
>>>>>> +
>>>>>> +     spin_unlock_irqrestore(fence->lock, flags);
>>>>>> +
>>>>>> +     if (fence->ops->set_deadline)
>>>>>> +             fence->ops->set_deadline(fence, deadline);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
>>>>>> +
>>>>>>     /**
>>>>>>      * dma_fence_init - Initialize a custom fence.
>>>>>>      * @fence: the fence to initialize
>>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
>>>>>> --- a/include/linux/dma-fence.h
>>>>>> +++ b/include/linux/dma-fence.h
>>>>>> @@ -88,6 +88,7 @@ struct dma_fence {
>>>>>>                 /* @timestamp replaced by @rcu on
>>>>>> dma_fence_release() */
>>>>>>                 struct rcu_head rcu;
>>>>>>         };
>>>>>> +     ktime_t deadline;
>>>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
>>>>> the
>>>>> deadline as extra field here.
>>>>>
>>>>> We tuned the dma_fence structure intentionally so that it is only 64
>>>>> bytes.
>>>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
>>>>
>>>> We could push the ktime_t (and timer) down into the derived fence
>>>> class, but I think there is going to need to be some extra storage
>>>> *somewhere*.. maybe the fence signaler could get away with just
>>>> storing the nearest upcoming deadline per fence-context instead?
>>> I would just push that into the driver instead.
>>>
>>> You most likely don't want the deadline per fence anyway in complex
>>> scenarios, but rather per frame. And a frame is usually composed from
>>> multiple fences.
> Right, I ended up keeping track of the nearest deadline in patch 5/4
> which added drm/msm support:
>
>    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F447138%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cce6ace85263d448bbc9f08d951d9f06c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630819606427306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ameszAOlClaZNeUDlYr37ZdIytVXNgiEUKuctjXLqZ0%3D&amp;reserved=0
>
> But if we do have the ktime_t in dma_fence in dma_fence, we can add
> some checks and avoid calling back to the driver if a later deadline
> is set on a fence that already has an earlier deadline.  OTOH I
> suppose I can push all that back to the driver to start, and we can
> revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and 
have strong doubt that anybody else will implement that.

>> Thinking more about it we could probably kill the spinlock pointer and
>> make the flags 32bit if we absolutely need that here.
> If we had a 'struct dma_fence_context' we could push the spinlock, ops
> pointer, and u64 context into that and replace with a single
> dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate 
memory, but must allocate a context.

Christian.

>
> BR,
> -R
>
>> But I still don't see the need for that, especially since most drivers
>> probably won't implement it.
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Christian.
>>>
>>>> BR,
>>>> -R
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>         u64 context;
>>>>>>         u64 seqno;
>>>>>>         unsigned long flags;
>>>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
>>>>>>         DMA_FENCE_FLAG_SIGNALED_BIT,
>>>>>>         DMA_FENCE_FLAG_TIMESTAMP_BIT,
>>>>>>         DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>>>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
>>>>>>         DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>>>>>>     };
>>>>>>
>>>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
>>>>>>          */
>>>>>>         void (*timeline_value_str)(struct dma_fence *fence,
>>>>>>                                    char *str, int size);
>>>>>> +
>>>>>> +     /**
>>>>>> +      * @set_deadline:
>>>>>> +      *
>>>>>> +      * Callback to allow a fence waiter to inform the fence
>>>>>> signaler of an
>>>>>> +      * upcoming deadline, such as vblank, by which point the
>>>>>> waiter would
>>>>>> +      * prefer the fence to be signaled by.  This is intended to
>>>>>> give feedback
>>>>>> +      * to the fence signaler to aid in power management
>>>>>> decisions, such as
>>>>>> +      * boosting GPU frequency.
>>>>>> +      *
>>>>>> +      * This callback is optional.
>>>>>> +      */
>>>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>>>>>>     };
>>>>>>
>>>>>>     void dma_fence_init(struct dma_fence *fence, const struct
>>>>>> dma_fence_ops *ops,
>>>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
>>>>>> dma_fence *fence, bool intr)
>>>>>>         return ret < 0 ? ret : 0;
>>>>>>     }
>>>>>>
>>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
>>>>>> deadline);
>>>>>> +
>>>>>>     struct dma_fence *dma_fence_get_stub(void);
>>>>>>     struct dma_fence *dma_fence_allocate_private_stub(void);
>>>>>>     u64 dma_fence_context_alloc(unsigned num);


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-28 17:23             ` Christian König
@ 2021-07-28 17:58               ` Rob Clark
  2021-07-29  7:03                 ` Daniel Vetter
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-28 17:58 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, dri-devel, Daniel Vetter, Matthew Brost,
	Rob Clark, Sumit Semwal, Gustavo Padovan,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

On Wed, Jul 28, 2021 at 10:23 AM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 28.07.21 um 17:15 schrieb Rob Clark:
> > On Wed, Jul 28, 2021 at 4:37 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> Am 28.07.21 um 09:03 schrieb Christian König:
> >>> Am 27.07.21 um 16:25 schrieb Rob Clark:
> >>>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
> >>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>
> >>>>>> Add a way to hint to the fence signaler of an upcoming deadline,
> >>>>>> such as
> >>>>>> vblank, which the fence waiter would prefer not to miss. This is to
> >>>>>> aid
> >>>>>> the fence signaler in making power management decisions, like boosting
> >>>>>> frequency as the deadline approaches and awareness of missing
> >>>>>> deadlines
> >>>>>> so that can be factored in to the frequency scaling.
> >>>>>>
> >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>> ---
> >>>>>>     drivers/dma-buf/dma-fence.c | 39
> >>>>>> +++++++++++++++++++++++++++++++++++++
> >>>>>>     include/linux/dma-fence.h   | 17 ++++++++++++++++
> >>>>>>     2 files changed, 56 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >>>>>> index ce0f5eff575d..2e0d25ab457e 100644
> >>>>>> --- a/drivers/dma-buf/dma-fence.c
> >>>>>> +++ b/drivers/dma-buf/dma-fence.c
> >>>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
> >>>>>> **fences, uint32_t count,
> >>>>>>     }
> >>>>>>     EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >>>>>>
> >>>>>> +
> >>>>>> +/**
> >>>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
> >>>>>> + * @fence:    the fence that is to be waited on
> >>>>>> + * @deadline: the time by which the waiter hopes for the fence to be
> >>>>>> + *            signaled
> >>>>>> + *
> >>>>>> + * Inform the fence signaler of an upcoming deadline, such as
> >>>>>> vblank, by
> >>>>>> + * which point the waiter would prefer the fence to be signaled
> >>>>>> by.  This
> >>>>>> + * is intended to give feedback to the fence signaler to aid in power
> >>>>>> + * management decisions, such as boosting GPU frequency if a periodic
> >>>>>> + * vblank deadline is approaching.
> >>>>>> + */
> >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> >>>>>> deadline)
> >>>>>> +{
> >>>>>> +     unsigned long flags;
> >>>>>> +
> >>>>>> +     if (dma_fence_is_signaled(fence))
> >>>>>> +             return;
> >>>>>> +
> >>>>>> +     spin_lock_irqsave(fence->lock, flags);
> >>>>>> +
> >>>>>> +     /* If we already have an earlier deadline, keep it: */
> >>>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> >>>>>> +         ktime_before(fence->deadline, deadline)) {
> >>>>>> +             spin_unlock_irqrestore(fence->lock, flags);
> >>>>>> +             return;
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     fence->deadline = deadline;
> >>>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> >>>>>> +
> >>>>>> +     spin_unlock_irqrestore(fence->lock, flags);
> >>>>>> +
> >>>>>> +     if (fence->ops->set_deadline)
> >>>>>> +             fence->ops->set_deadline(fence, deadline);
> >>>>>> +}
> >>>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> >>>>>> +
> >>>>>>     /**
> >>>>>>      * dma_fence_init - Initialize a custom fence.
> >>>>>>      * @fence: the fence to initialize
> >>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> >>>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> >>>>>> --- a/include/linux/dma-fence.h
> >>>>>> +++ b/include/linux/dma-fence.h
> >>>>>> @@ -88,6 +88,7 @@ struct dma_fence {
> >>>>>>                 /* @timestamp replaced by @rcu on
> >>>>>> dma_fence_release() */
> >>>>>>                 struct rcu_head rcu;
> >>>>>>         };
> >>>>>> +     ktime_t deadline;
> >>>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
> >>>>> the
> >>>>> deadline as extra field here.
> >>>>>
> >>>>> We tuned the dma_fence structure intentionally so that it is only 64
> >>>>> bytes.
> >>>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
> >>>>
> >>>> We could push the ktime_t (and timer) down into the derived fence
> >>>> class, but I think there is going to need to be some extra storage
> >>>> *somewhere*.. maybe the fence signaler could get away with just
> >>>> storing the nearest upcoming deadline per fence-context instead?
> >>> I would just push that into the driver instead.
> >>>
> >>> You most likely don't want the deadline per fence anyway in complex
> >>> scenarios, but rather per frame. And a frame is usually composed from
> >>> multiple fences.
> > Right, I ended up keeping track of the nearest deadline in patch 5/4
> > which added drm/msm support:
> >
> >    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F447138%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cce6ace85263d448bbc9f08d951d9f06c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630819606427306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ameszAOlClaZNeUDlYr37ZdIytVXNgiEUKuctjXLqZ0%3D&amp;reserved=0
> >
> > But if we do have the ktime_t in dma_fence in dma_fence, we can add
> > some checks and avoid calling back to the driver if a later deadline
> > is set on a fence that already has an earlier deadline.  OTOH I
> > suppose I can push all that back to the driver to start, and we can
> > revisit once we have more drivers implementing deadline support.
>
> I still think that all of this is rather specific to your use case and
> have strong doubt that anybody else will implement that.

i915 does already have a similar thing in it's hand-rolled atomic
commit path.  So I think msm won't be the only one.  It should be also
useful to the other mobile GPUs with a gpu vs kms driver split,
although looking at the other gpu devfreq implementations, I don't
think they've yet gotten to this point in the fine tuning..

BR,
-R

> >> Thinking more about it we could probably kill the spinlock pointer and
> >> make the flags 32bit if we absolutely need that here.
> > If we had a 'struct dma_fence_context' we could push the spinlock, ops
> > pointer, and u64 context into that and replace with a single
> > dma_fence_context ptr, fwiw
>
> That won't work. We have a lot of use cases where you can't allocate
> memory, but must allocate a context.
>
> Christian.
>
> >
> > BR,
> > -R
> >
> >> But I still don't see the need for that, especially since most drivers
> >> probably won't implement it.
> >>
> >> Regards,
> >> Christian.
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>> BR,
> >>>> -R
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>>         u64 context;
> >>>>>>         u64 seqno;
> >>>>>>         unsigned long flags;
> >>>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> >>>>>>         DMA_FENCE_FLAG_SIGNALED_BIT,
> >>>>>>         DMA_FENCE_FLAG_TIMESTAMP_BIT,
> >>>>>>         DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> >>>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> >>>>>>         DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> >>>>>>     };
> >>>>>>
> >>>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
> >>>>>>          */
> >>>>>>         void (*timeline_value_str)(struct dma_fence *fence,
> >>>>>>                                    char *str, int size);
> >>>>>> +
> >>>>>> +     /**
> >>>>>> +      * @set_deadline:
> >>>>>> +      *
> >>>>>> +      * Callback to allow a fence waiter to inform the fence
> >>>>>> signaler of an
> >>>>>> +      * upcoming deadline, such as vblank, by which point the
> >>>>>> waiter would
> >>>>>> +      * prefer the fence to be signaled by.  This is intended to
> >>>>>> give feedback
> >>>>>> +      * to the fence signaler to aid in power management
> >>>>>> decisions, such as
> >>>>>> +      * boosting GPU frequency.
> >>>>>> +      *
> >>>>>> +      * This callback is optional.
> >>>>>> +      */
> >>>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> >>>>>>     };
> >>>>>>
> >>>>>>     void dma_fence_init(struct dma_fence *fence, const struct
> >>>>>> dma_fence_ops *ops,
> >>>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
> >>>>>> dma_fence *fence, bool intr)
> >>>>>>         return ret < 0 ? ret : 0;
> >>>>>>     }
> >>>>>>
> >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> >>>>>> deadline);
> >>>>>> +
> >>>>>>     struct dma_fence *dma_fence_get_stub(void);
> >>>>>>     struct dma_fence *dma_fence_allocate_private_stub(void);
> >>>>>>     u64 dma_fence_context_alloc(unsigned num);
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-28 17:58               ` Rob Clark
@ 2021-07-29  7:03                 ` Daniel Vetter
  2021-07-29 15:23                   ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2021-07-29  7:03 UTC (permalink / raw)
  To: Rob Clark
  Cc: Christian König, Christian König, dri-devel,
	Daniel Vetter, Matthew Brost, Rob Clark, Sumit Semwal,
	Gustavo Padovan, open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:
> On Wed, Jul 28, 2021 at 10:23 AM Christian König
> <christian.koenig@amd.com> wrote:
> >
> >
> >
> > Am 28.07.21 um 17:15 schrieb Rob Clark:
> > > On Wed, Jul 28, 2021 at 4:37 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > >> Am 28.07.21 um 09:03 schrieb Christian König:
> > >>> Am 27.07.21 um 16:25 schrieb Rob Clark:
> > >>>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> > >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> > >>>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
> > >>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>
> > >>>>>> Add a way to hint to the fence signaler of an upcoming deadline,
> > >>>>>> such as
> > >>>>>> vblank, which the fence waiter would prefer not to miss. This is to
> > >>>>>> aid
> > >>>>>> the fence signaler in making power management decisions, like boosting
> > >>>>>> frequency as the deadline approaches and awareness of missing
> > >>>>>> deadlines
> > >>>>>> so that can be factored in to the frequency scaling.
> > >>>>>>
> > >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >>>>>> ---
> > >>>>>>     drivers/dma-buf/dma-fence.c | 39
> > >>>>>> +++++++++++++++++++++++++++++++++++++
> > >>>>>>     include/linux/dma-fence.h   | 17 ++++++++++++++++
> > >>>>>>     2 files changed, 56 insertions(+)
> > >>>>>>
> > >>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > >>>>>> index ce0f5eff575d..2e0d25ab457e 100644
> > >>>>>> --- a/drivers/dma-buf/dma-fence.c
> > >>>>>> +++ b/drivers/dma-buf/dma-fence.c
> > >>>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
> > >>>>>> **fences, uint32_t count,
> > >>>>>>     }
> > >>>>>>     EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > >>>>>>
> > >>>>>> +
> > >>>>>> +/**
> > >>>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
> > >>>>>> + * @fence:    the fence that is to be waited on
> > >>>>>> + * @deadline: the time by which the waiter hopes for the fence to be
> > >>>>>> + *            signaled
> > >>>>>> + *
> > >>>>>> + * Inform the fence signaler of an upcoming deadline, such as
> > >>>>>> vblank, by
> > >>>>>> + * which point the waiter would prefer the fence to be signaled
> > >>>>>> by.  This
> > >>>>>> + * is intended to give feedback to the fence signaler to aid in power
> > >>>>>> + * management decisions, such as boosting GPU frequency if a periodic
> > >>>>>> + * vblank deadline is approaching.
> > >>>>>> + */
> > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > >>>>>> deadline)
> > >>>>>> +{
> > >>>>>> +     unsigned long flags;
> > >>>>>> +
> > >>>>>> +     if (dma_fence_is_signaled(fence))
> > >>>>>> +             return;
> > >>>>>> +
> > >>>>>> +     spin_lock_irqsave(fence->lock, flags);
> > >>>>>> +
> > >>>>>> +     /* If we already have an earlier deadline, keep it: */
> > >>>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> > >>>>>> +         ktime_before(fence->deadline, deadline)) {
> > >>>>>> +             spin_unlock_irqrestore(fence->lock, flags);
> > >>>>>> +             return;
> > >>>>>> +     }
> > >>>>>> +
> > >>>>>> +     fence->deadline = deadline;
> > >>>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> > >>>>>> +
> > >>>>>> +     spin_unlock_irqrestore(fence->lock, flags);
> > >>>>>> +
> > >>>>>> +     if (fence->ops->set_deadline)
> > >>>>>> +             fence->ops->set_deadline(fence, deadline);
> > >>>>>> +}
> > >>>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> > >>>>>> +
> > >>>>>>     /**
> > >>>>>>      * dma_fence_init - Initialize a custom fence.
> > >>>>>>      * @fence: the fence to initialize
> > >>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > >>>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> > >>>>>> --- a/include/linux/dma-fence.h
> > >>>>>> +++ b/include/linux/dma-fence.h
> > >>>>>> @@ -88,6 +88,7 @@ struct dma_fence {
> > >>>>>>                 /* @timestamp replaced by @rcu on
> > >>>>>> dma_fence_release() */
> > >>>>>>                 struct rcu_head rcu;
> > >>>>>>         };
> > >>>>>> +     ktime_t deadline;
> > >>>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
> > >>>>> the
> > >>>>> deadline as extra field here.
> > >>>>>
> > >>>>> We tuned the dma_fence structure intentionally so that it is only 64
> > >>>>> bytes.
> > >>>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
> > >>>>
> > >>>> We could push the ktime_t (and timer) down into the derived fence
> > >>>> class, but I think there is going to need to be some extra storage
> > >>>> *somewhere*.. maybe the fence signaler could get away with just
> > >>>> storing the nearest upcoming deadline per fence-context instead?
> > >>> I would just push that into the driver instead.
> > >>>
> > >>> You most likely don't want the deadline per fence anyway in complex
> > >>> scenarios, but rather per frame. And a frame is usually composed from
> > >>> multiple fences.
> > > Right, I ended up keeping track of the nearest deadline in patch 5/4
> > > which added drm/msm support:
> > >
> > >    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F447138%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cce6ace85263d448bbc9f08d951d9f06c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630819606427306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ameszAOlClaZNeUDlYr37ZdIytVXNgiEUKuctjXLqZ0%3D&amp;reserved=0
> > >
> > > But if we do have the ktime_t in dma_fence in dma_fence, we can add
> > > some checks and avoid calling back to the driver if a later deadline
> > > is set on a fence that already has an earlier deadline.  OTOH I
> > > suppose I can push all that back to the driver to start, and we can
> > > revisit once we have more drivers implementing deadline support.
> >
> > I still think that all of this is rather specific to your use case and
> > have strong doubt that anybody else will implement that.
> 
> i915 does already have a similar thing in it's hand-rolled atomic
> commit path.  So I think msm won't be the only one.  It should be also
> useful to the other mobile GPUs with a gpu vs kms driver split,
> although looking at the other gpu devfreq implementations, I don't
> think they've yet gotten to this point in the fine tuning..

Yeah I have a dream that maybe i915 will use the atomic commit helpers, I
originally wrote them with i915 in mind :-) even had patches!

I also think we'll need this eventually in other areas, Android also has
some hacks like this to make sure idle->first touch doesn't suck and
similar things.
-Daniel

> 
> BR,
> -R
> 
> > >> Thinking more about it we could probably kill the spinlock pointer and
> > >> make the flags 32bit if we absolutely need that here.
> > > If we had a 'struct dma_fence_context' we could push the spinlock, ops
> > > pointer, and u64 context into that and replace with a single
> > > dma_fence_context ptr, fwiw
> >
> > That won't work. We have a lot of use cases where you can't allocate
> > memory, but must allocate a context.
> >
> > Christian.
> >
> > >
> > > BR,
> > > -R
> > >
> > >> But I still don't see the need for that, especially since most drivers
> > >> probably won't implement it.
> > >>
> > >> Regards,
> > >> Christian.
> > >>
> > >>> Regards,
> > >>> Christian.
> > >>>
> > >>>> BR,
> > >>>> -R
> > >>>>
> > >>>>> Regards,
> > >>>>> Christian.
> > >>>>>
> > >>>>>>         u64 context;
> > >>>>>>         u64 seqno;
> > >>>>>>         unsigned long flags;
> > >>>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> > >>>>>>         DMA_FENCE_FLAG_SIGNALED_BIT,
> > >>>>>>         DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > >>>>>>         DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > >>>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> > >>>>>>         DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> > >>>>>>     };
> > >>>>>>
> > >>>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
> > >>>>>>          */
> > >>>>>>         void (*timeline_value_str)(struct dma_fence *fence,
> > >>>>>>                                    char *str, int size);
> > >>>>>> +
> > >>>>>> +     /**
> > >>>>>> +      * @set_deadline:
> > >>>>>> +      *
> > >>>>>> +      * Callback to allow a fence waiter to inform the fence
> > >>>>>> signaler of an
> > >>>>>> +      * upcoming deadline, such as vblank, by which point the
> > >>>>>> waiter would
> > >>>>>> +      * prefer the fence to be signaled by.  This is intended to
> > >>>>>> give feedback
> > >>>>>> +      * to the fence signaler to aid in power management
> > >>>>>> decisions, such as
> > >>>>>> +      * boosting GPU frequency.
> > >>>>>> +      *
> > >>>>>> +      * This callback is optional.
> > >>>>>> +      */
> > >>>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > >>>>>>     };
> > >>>>>>
> > >>>>>>     void dma_fence_init(struct dma_fence *fence, const struct
> > >>>>>> dma_fence_ops *ops,
> > >>>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
> > >>>>>> dma_fence *fence, bool intr)
> > >>>>>>         return ret < 0 ? ret : 0;
> > >>>>>>     }
> > >>>>>>
> > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > >>>>>> deadline);
> > >>>>>> +
> > >>>>>>     struct dma_fence *dma_fence_get_stub(void);
> > >>>>>>     struct dma_fence *dma_fence_allocate_private_stub(void);
> > >>>>>>     u64 dma_fence_context_alloc(unsigned num);
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 15:34                 ` Rob Clark
@ 2021-07-29  7:09                   ` Daniel Vetter
  2021-07-29  8:17                     ` Michel Dänzer
  0 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2021-07-29  7:09 UTC (permalink / raw)
  To: Rob Clark
  Cc: Michel Dänzer, Rob Clark, Matthew Brost, Jack Zhang,
	Christian König, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:
> >
> > On 2021-07-28 3:13 p.m., Christian König wrote:
> > > Am 28.07.21 um 15:08 schrieb Michel Dänzer:
> > >> On 2021-07-28 1:36 p.m., Christian König wrote:
> > >>> Am 27.07.21 um 17:37 schrieb Rob Clark:
> > >>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
> > >>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
> > >>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
> > >>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> > >>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>
> > >>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> > >>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> > >>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> > >>>>>>>> which the waiter would like to see the fence signalled.
> > >>>>>>>>
> > >>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
> > >>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
> > >>>>>>>> finish the drm/msm part this week.
> > >>>>>>>>
> > >>>>>>>> Original description:
> > >>>>>>>>
> > >>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
> > >>>>>>>> trick the GPU into running at a lower frequence, when really we
> > >>>>>>>> want to be running at a higher frequency to not miss the vblanks
> > >>>>>>>> in the first place.
> > >>>>>>>>
> > >>>>>>>> This is partially inspired by a trick i915 does, but implemented
> > >>>>>>>> via dma-fence for a couple of reasons:
> > >>>>>>>>
> > >>>>>>>> 1) To continue to be able to use the atomic helpers
> > >>>>>>>> 2) To support cases where display and gpu are different drivers
> > >>>>>>>>
> > >>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
> > >>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
> > >>>>>>>
> > >>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
> > >>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
> > >>>>>
> > >>>>>
> > >>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
> > >>>>>> single layer is changing, not try to be clever and just push the
> > >>>>>> update down to the kernel.
> > >>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
> > >>>>>
> > >>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
> > >>>>>
> > >>>> Well, in the end, there is more than one compositor out there.. and if
> > >>>> some wayland compositors are going this route, they can also implement
> > >>>> the same mechanism in userspace using the sysfs that devfreq exports.
> > >>>>
> > >>>> But it sounds simpler to me for the compositor to have a sort of "game
> > >>>> mode" for fullscreen games.. I'm less worried about UI interactive
> > >>>> workloads, boosting the GPU freq upon sudden activity after a period
> > >>>> of inactivity seems to work reasonably well there.
> > >>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> > >>>
> > >>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> > >>>
> > >>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
> > >> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> > >>
> > >> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
> > >
> > > Well then let's extend the KMS API instead of hacking together workarounds in userspace.
> >
> > That's indeed a possible solution for the fullscreen / direct scanout case.
> >
> > Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
> >
> 
> I think solving the fullscreen game case is sufficient enough forward
> progress to be useful.  And the results I'm seeing[1] are sufficiently
> positive to convince me that dma-fence deadline support is the right
> thing to do.
> 
> But maybe the solution to make this also useful for mutter is to, once
> we have deadline support, extend it with an ioctl to the dma-fence fd
> so userspace can be the one setting the deadline.

atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the
option to bail out with an old frame if it's too late?

Also mutter would need to supply the deadline, because we need to fit the
rendering in still before the actual flip. So gets a bit quirky maybe ...
-Daniel
> 
> [1] https://patchwork.freedesktop.org/patch/447138/
> 
> BR,
> -R

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 14:30                     ` Christian König
@ 2021-07-29  8:08                       ` Michel Dänzer
  2021-07-29  8:23                       ` Pekka Paalanen
  1 sibling, 0 replies; 52+ messages in thread
From: Michel Dänzer @ 2021-07-29  8:08 UTC (permalink / raw)
  To: Christian König, Pekka Paalanen
  Cc: Rob Clark, Matthew Brost, Jack Zhang, Christian König,
	open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

On 2021-07-28 4:30 p.m., Christian König wrote:
> Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
>> On Wed, 28 Jul 2021 15:31:41 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> Am 28.07.21 um 15:24 schrieb Michel Dänzer:
>>>> On 2021-07-28 3:13 p.m., Christian König wrote:
>>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>>>>>
>>>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>>>>>
>>>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>>>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>>>>>
>>>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
>>>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
>>>> That's indeed a possible solution for the fullscreen / direct scanout case.
>>>>
>>>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
>>> Yeah, that's true as well.
>>>
>>> At least as long as nobody invents a mechanism to do this decision on
>>> the GPU instead.
>> That would mean putting the whole window manager into the GPU.
> 
> Not really. You only need to decide if you want to use the new backing store or the old one based on if the new surface is ready or not.

While something like that might be a possible optimization for (probably common) cases where the new buffer does not come with other state changes which affect the output frame beyond the buffer contents, there will always be cases (at least until a Wayland compositor can fully run on the GPU, as Pekka noted somewhat jokingly :) where this needs to be handled on the CPU.

I'm currently focusing on that general case. Optimizations for special cases may follow later.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  7:09                   ` Daniel Vetter
@ 2021-07-29  8:17                     ` Michel Dänzer
  2021-07-29  9:03                       ` Daniel Vetter
  0 siblings, 1 reply; 52+ messages in thread
From: Michel Dänzer @ 2021-07-29  8:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, Rob Clark, Matthew Brost, Christian König,
	open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
> On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
>> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:
>>> On 2021-07-28 3:13 p.m., Christian König wrote:
>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:
>>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
>>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
>>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>
>>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
>>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
>>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
>>>>>>>>>>> which the waiter would like to see the fence signalled.
>>>>>>>>>>>
>>>>>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
>>>>>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
>>>>>>>>>>> finish the drm/msm part this week.
>>>>>>>>>>>
>>>>>>>>>>> Original description:
>>>>>>>>>>>
>>>>>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
>>>>>>>>>>> trick the GPU into running at a lower frequence, when really we
>>>>>>>>>>> want to be running at a higher frequency to not miss the vblanks
>>>>>>>>>>> in the first place.
>>>>>>>>>>>
>>>>>>>>>>> This is partially inspired by a trick i915 does, but implemented
>>>>>>>>>>> via dma-fence for a couple of reasons:
>>>>>>>>>>>
>>>>>>>>>>> 1) To continue to be able to use the atomic helpers
>>>>>>>>>>> 2) To support cases where display and gpu are different drivers
>>>>>>>>>>>
>>>>>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
>>>>>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
>>>>>>>>>>
>>>>>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
>>>>>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
>>>>>>>>> single layer is changing, not try to be clever and just push the
>>>>>>>>> update down to the kernel.
>>>>>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
>>>>>>>>
>>>>>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
>>>>>>>>
>>>>>>> Well, in the end, there is more than one compositor out there.. and if
>>>>>>> some wayland compositors are going this route, they can also implement
>>>>>>> the same mechanism in userspace using the sysfs that devfreq exports.
>>>>>>>
>>>>>>> But it sounds simpler to me for the compositor to have a sort of "game
>>>>>>> mode" for fullscreen games.. I'm less worried about UI interactive
>>>>>>> workloads, boosting the GPU freq upon sudden activity after a period
>>>>>>> of inactivity seems to work reasonably well there.
>>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>>>>
>>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>>>>
>>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>>>>
>>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
>>>>
>>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
>>>
>>> That's indeed a possible solution for the fullscreen / direct scanout case.
>>>
>>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
>>
>> I think solving the fullscreen game case is sufficient enough forward
>> progress to be useful.  And the results I'm seeing[1] are sufficiently
>> positive to convince me that dma-fence deadline support is the right
>> thing to do.

I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.


>> But maybe the solution to make this also useful for mutter

It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.


>> is to, once we have deadline support, extend it with an ioctl to the
>> dma-fence fd so userspace can be the one setting the deadline.

I was thinking in a similar direction.

> atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the
> option to bail out with an old frame if it's too late?

This is a bit cryptic though, can you elaborate?


> Also mutter would need to supply the deadline, because we need to fit the
> rendering in still before the actual flip. So gets a bit quirky maybe ...

That should be fine. mutter is already keeping track of how long its rendering takes.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-28 14:30                     ` Christian König
  2021-07-29  8:08                       ` Michel Dänzer
@ 2021-07-29  8:23                       ` Pekka Paalanen
  2021-07-29  8:43                         ` Christian König
  1 sibling, 1 reply; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29  8:23 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, Christian König, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 3414 bytes --]

On Wed, 28 Jul 2021 16:30:13 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
> > On Wed, 28 Jul 2021 15:31:41 +0200
> > Christian König <christian.koenig@amd.com> wrote:
> >  
> >> Am 28.07.21 um 15:24 schrieb Michel Dänzer:  
> >>> On 2021-07-28 3:13 p.m., Christian König wrote:  
> >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:  
> >>>>> On 2021-07-28 1:36 p.m., Christian König wrote:  
> >>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> >>>>>>
> >>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> >>>>>>
> >>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.  
> >>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> >>>>>
> >>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)  
> >>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.  
> >>> That's indeed a possible solution for the fullscreen / direct scanout case.
> >>>
> >>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.  
> >> Yeah, that's true as well.
> >>
> >> At least as long as nobody invents a mechanism to do this decision on
> >> the GPU instead.  
> > That would mean putting the whole window manager into the GPU.  
> 
> Not really. You only need to decide if you want to use the new backing 
> store or the old one based on if the new surface is ready or not.

Except that a window content update in Wayland must be synchronised with
all the possible and arbitrary other window system state changes, that
will affect how and where other windows will get drawn *this frame*,
how input events are routed, and more.

But, if the window manager made sure that *only* window contents are
about to change and *all* other state remains as it was, then it would
be possible to let the GPU decide which frame it uses. As long as it
also tells back which one it actually did, so that presentation
feedback etc. can trigger the right Wayland events.

Wayland has "atomic commits" to windows, and arbitrary protocol
extensions can add arbitrary state to be tracked with it. A bit like KMS
properties. Even atomic commits affecting multiple windows together are
a thing, and they must be latched either all or none.

So it's quite a lot of work to determine if one can allow the GPU to
choose the buffer it will texture from, or not.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  8:23                       ` Pekka Paalanen
@ 2021-07-29  8:43                         ` Christian König
  2021-07-29  9:15                           ` Pekka Paalanen
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-29  8:43 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Michel Dänzer, Christian König, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 29.07.21 um 10:23 schrieb Pekka Paalanen:
> On Wed, 28 Jul 2021 16:30:13 +0200
> Christian König <christian.koenig@amd.com> wrote:
>
>> Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
>>> On Wed, 28 Jul 2021 15:31:41 +0200
>>> Christian König <christian.koenig@amd.com> wrote:
>>>   
>>>> Am 28.07.21 um 15:24 schrieb Michel Dänzer:
>>>>> On 2021-07-28 3:13 p.m., Christian König wrote:
>>>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
>>>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
>>>>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
>>>>>>>>
>>>>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
>>>>>>>>
>>>>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
>>>>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
>>>>>>>
>>>>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
>>>>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
>>>>> That's indeed a possible solution for the fullscreen / direct scanout case.
>>>>>
>>>>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
>>>> Yeah, that's true as well.
>>>>
>>>> At least as long as nobody invents a mechanism to do this decision on
>>>> the GPU instead.
>>> That would mean putting the whole window manager into the GPU.
>> Not really. You only need to decide if you want to use the new backing
>> store or the old one based on if the new surface is ready or not.
> Except that a window content update in Wayland must be synchronised with
> all the possible and arbitrary other window system state changes, that
> will affect how and where other windows will get drawn *this frame*,
> how input events are routed, and more.
>
> But, if the window manager made sure that *only* window contents are
> about to change and *all* other state remains as it was, then it would
> be possible to let the GPU decide which frame it uses. As long as it
> also tells back which one it actually did, so that presentation
> feedback etc. can trigger the right Wayland events.
>
> Wayland has "atomic commits" to windows, and arbitrary protocol
> extensions can add arbitrary state to be tracked with it. A bit like KMS
> properties. Even atomic commits affecting multiple windows together are
> a thing, and they must be latched either all or none.
>
> So it's quite a lot of work to determine if one can allow the GPU to
> choose the buffer it will texture from, or not.

But how does it then help to wait on the CPU instead?

See what I'm proposing is to either render the next state of the window 
or compose from the old state (including all atomic properties).

E.g. what do you do if you timeout and can't have the new window content 
on time? What's the fallback here?

Regards,
Christian.

>
>
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  8:17                     ` Michel Dänzer
@ 2021-07-29  9:03                       ` Daniel Vetter
  2021-07-29  9:37                         ` Pekka Paalanen
  0 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2021-07-29  9:03 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Matthew Brost,
	Christian König, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
> On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
> > On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
> >> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:
> >>> On 2021-07-28 3:13 p.m., Christian König wrote:
> >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
> >>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
> >>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:
> >>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
> >>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
> >>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
> >>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> >>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>
> >>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> >>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> >>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> >>>>>>>>>>> which the waiter would like to see the fence signalled.
> >>>>>>>>>>>
> >>>>>>>>>>> I've not yet had a chance to re-work the drm/msm part of this, but
> >>>>>>>>>>> wanted to send this out as an RFC in case I don't have a chance to
> >>>>>>>>>>> finish the drm/msm part this week.
> >>>>>>>>>>>
> >>>>>>>>>>> Original description:
> >>>>>>>>>>>
> >>>>>>>>>>> In some cases, like double-buffered rendering, missing vblanks can
> >>>>>>>>>>> trick the GPU into running at a lower frequence, when really we
> >>>>>>>>>>> want to be running at a higher frequency to not miss the vblanks
> >>>>>>>>>>> in the first place.
> >>>>>>>>>>>
> >>>>>>>>>>> This is partially inspired by a trick i915 does, but implemented
> >>>>>>>>>>> via dma-fence for a couple of reasons:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) To continue to be able to use the atomic helpers
> >>>>>>>>>>> 2) To support cases where display and gpu are different drivers
> >>>>>>>>>>>
> >>>>>>>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F90331%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eYaSOSS5wOngNAd9wufp5eWCx5GtAwo6GkultJgrjmA%3D&amp;reserved=0
> >>>>>>>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C269b2df3e1dc4f0b856d08d951c8c768%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630745091538563%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1ZkOzLqbiKSyCixGZ0u7Hd%2Fc1YnUZub%2F%2Fx7RuEclFKg%3D&amp;reserved=0 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
> >>>>>>>>>>
> >>>>>>>>> I guess you mean "no effect at all *except* for fullscreen..."?
> >>>>>>>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I'd perhaps recommend that wayland compositors, in cases where only a
> >>>>>>>>> single layer is changing, not try to be clever and just push the
> >>>>>>>>> update down to the kernel.
> >>>>>>>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
> >>>>>>>>
> >>>>>>>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
> >>>>>>>>
> >>>>>>> Well, in the end, there is more than one compositor out there.. and if
> >>>>>>> some wayland compositors are going this route, they can also implement
> >>>>>>> the same mechanism in userspace using the sysfs that devfreq exports.
> >>>>>>>
> >>>>>>> But it sounds simpler to me for the compositor to have a sort of "game
> >>>>>>> mode" for fullscreen games.. I'm less worried about UI interactive
> >>>>>>> workloads, boosting the GPU freq upon sudden activity after a period
> >>>>>>> of inactivity seems to work reasonably well there.
> >>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> >>>>>>
> >>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> >>>>>>
> >>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
> >>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> >>>>>
> >>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
> >>>>
> >>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.
> >>>
> >>> That's indeed a possible solution for the fullscreen / direct scanout case.
> >>>
> >>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
> >>
> >> I think solving the fullscreen game case is sufficient enough forward
> >> progress to be useful.  And the results I'm seeing[1] are sufficiently
> >> positive to convince me that dma-fence deadline support is the right
> >> thing to do.
> 
> I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
> 
> 
> >> But maybe the solution to make this also useful for mutter
> 
> It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
> 
> 
> >> is to, once we have deadline support, extend it with an ioctl to the
> >> dma-fence fd so userspace can be the one setting the deadline.
> 
> I was thinking in a similar direction.
> 
> > atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the
> > option to bail out with an old frame if it's too late?
> 
> This is a bit cryptic though, can you elaborate?

So essentially when the mutter compositor guesstimator is fairly confident
about the next frame's composition (recall you're keeping track of clients
to estimate their usual latency or something like that), then it does a
TEST_ONLY commit to check it all works and prep the rendering, but _not_
yet fire it off.

Instead it waits until all buffers complete, and if some don't, pick the
previous one. Which I guess in an extreme case would mean you need a
different window tree configuration and maybe different TEST_ONLY check
and all that, not sure how you solve that.

Anyway, in that TEST_ONLY commit my idea is that you'd also supply all the
in-fences you expect to depend upon (maybe we need an additional list of
in-fences for your rendering job), plus a deadline when you want to have
them done (so that there's enough time for your render job still). And the
kernel then calls dma_fence_set_deadline on all of them.

Pondering this more, maybe a separate ioctl is simpler where you just
supply a list of in-fences and deadlines.

The real reason I want to tie this to atomic is for priviledge checking
reasons. I don't think normal userspace should have the power to set
arbitrary deadlines like this - at least on i915 it will also give you a
slight priority boost and stuff like that, to make sure your rendering for
the current frame goes in ahead of the next frame's prep work.

So maybe just a new ioctl that does this which is limited to the current
kms owner (aka drm_master)?

In i915 we also do a mild boost for when userspace waits on a buffer
(assuming it's blocking the cpu), but that boost has a pretty sharp
decay/cooldown to prevent abuse and keeping the gpu artificially
upclocked. That really is just meant to avoid the tailspin when you have a
ping-pong workload between gpu and cpu and both downclock in turn because
the other side is too slow and the gpu/cpu aren't really busy enough.
Until you're crawling at idle clocks getting nothing done.

I think on the windows side they "fix" this by making the clock
adjustments extremely conservative and slow (except when they detect that
it's a game/benchmark). Good enough for battery tests, not so great in
reality.
-Daniel

> > Also mutter would need to supply the deadline, because we need to fit the
> > rendering in still before the actual flip. So gets a bit quirky maybe ...
> 
> That should be fine. mutter is already keeping track of how long its rendering takes.
> 
> 
> -- 
> Earthling Michel Dänzer               |               https://redhat.com
> Libre software enthusiast             |             Mesa and X developer

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  8:43                         ` Christian König
@ 2021-07-29  9:15                           ` Pekka Paalanen
  2021-07-29 10:14                             ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29  9:15 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, Christian König, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 5188 bytes --]

On Thu, 29 Jul 2021 10:43:16 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 29.07.21 um 10:23 schrieb Pekka Paalanen:
> > On Wed, 28 Jul 2021 16:30:13 +0200
> > Christian König <christian.koenig@amd.com> wrote:
> >  
> >> Am 28.07.21 um 15:57 schrieb Pekka Paalanen:  
> >>> On Wed, 28 Jul 2021 15:31:41 +0200
> >>> Christian König <christian.koenig@amd.com> wrote:
> >>>     
> >>>> Am 28.07.21 um 15:24 schrieb Michel Dänzer:  
> >>>>> On 2021-07-28 3:13 p.m., Christian König wrote:  
> >>>>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:  
> >>>>>>> On 2021-07-28 1:36 p.m., Christian König wrote:  
> >>>>>>>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
> >>>>>>>>
> >>>>>>>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
> >>>>>>>>
> >>>>>>>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.  
> >>>>>>> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
> >>>>>>>
> >>>>>>> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)  
> >>>>>> Well then let's extend the KMS API instead of hacking together workarounds in userspace.  
> >>>>> That's indeed a possible solution for the fullscreen / direct scanout case.
> >>>>>
> >>>>> Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.  
> >>>> Yeah, that's true as well.
> >>>>
> >>>> At least as long as nobody invents a mechanism to do this decision on
> >>>> the GPU instead.  
> >>> That would mean putting the whole window manager into the GPU.  
> >> Not really. You only need to decide if you want to use the new backing
> >> store or the old one based on if the new surface is ready or not.  
> > Except that a window content update in Wayland must be synchronised with
> > all the possible and arbitrary other window system state changes, that
> > will affect how and where other windows will get drawn *this frame*,
> > how input events are routed, and more.
> >
> > But, if the window manager made sure that *only* window contents are
> > about to change and *all* other state remains as it was, then it would
> > be possible to let the GPU decide which frame it uses. As long as it
> > also tells back which one it actually did, so that presentation
> > feedback etc. can trigger the right Wayland events.
> >
> > Wayland has "atomic commits" to windows, and arbitrary protocol
> > extensions can add arbitrary state to be tracked with it. A bit like KMS
> > properties. Even atomic commits affecting multiple windows together are
> > a thing, and they must be latched either all or none.
> >
> > So it's quite a lot of work to determine if one can allow the GPU to
> > choose the buffer it will texture from, or not.  
> 
> But how does it then help to wait on the CPU instead?

A compositor does not "wait" literally. It would only check which state
set is ready to be used, and uses the most recent set that is ready. Any
state sets that are not ready are ignored and reconsidered the next
time the compositor updates the screen.

Depending on which state sets are selected for a screen update, the
global window manager state may be updated accordingly, before the
drawing commands for the composition can be created.

> See what I'm proposing is to either render the next state of the window 
> or compose from the old state (including all atomic properties).

Yes, that's exactly how it would work. It's just that state for a
window is not an independent thing, it can affect how unrelated windows
are managed.

A simplified example would be two windows side by side where the
resizing of one causes the other to move. You can't resize the window
or move the other until the buffer with the new size is ready. Until
then the compositor uses the old state.

> E.g. what do you do if you timeout and can't have the new window content 
> on time? What's the fallback here?

As there is no wait, there is no timeout either.

If the app happens to be frozen (e.g. some weird bug in fence handling
to make it never ready, or maybe it's just bugged itself and never
drawing again), then the app is frozen, and all the rest of the desktop
continues running normally without a glitch.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  9:03                       ` Daniel Vetter
@ 2021-07-29  9:37                         ` Pekka Paalanen
  2021-07-29 12:18                           ` Daniel Vetter
  0 siblings, 1 reply; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29  9:37 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Michel Dänzer, Rob Clark, Matthew Brost, Roy Sun,
	Christian König, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 5068 bytes --]

On Thu, 29 Jul 2021 11:03:36 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
> > On 2021-07-29 9:09 a.m., Daniel Vetter wrote:  
> > > On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:  
> > >> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:  
> > >>> On 2021-07-28 3:13 p.m., Christian König wrote:  
> > >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:  
> > >>>>> On 2021-07-28 1:36 p.m., Christian König wrote:  
> > >>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:  
> > >>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:  
> > >>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:  
> > >>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:  
> > >>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:  
> > >>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> > >>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> > >>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> > >>>>>>>>>>> which the waiter would like to see the fence signalled.

...

> > I'm not questioning that this approach helps when there's a direct
> > chain of fences from the client to the page flip. I'm pointing out
> > there will not always be such a chain.
> > 
> >   
> > >> But maybe the solution to make this also useful for mutter  
> > 
> > It's not just mutter BTW. I understand gamescope has been doing
> > this for some time already. And there seems to be consensus among
> > developers of Wayland compositors that this is needed, so I expect
> > at least all the major compositors to do this longer term.
> > 
> >   
> > >> is to, once we have deadline support, extend it with an ioctl to
> > >> the dma-fence fd so userspace can be the one setting the
> > >> deadline.  
> > 
> > I was thinking in a similar direction.
> >   
> > > atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter
> > > the option to bail out with an old frame if it's too late?  
> > 
> > This is a bit cryptic though, can you elaborate?  
> 
> So essentially when the mutter compositor guesstimator is fairly
> confident about the next frame's composition (recall you're keeping
> track of clients to estimate their usual latency or something like
> that), then it does a TEST_ONLY commit to check it all works and prep
> the rendering, but _not_ yet fire it off.
> 
> Instead it waits until all buffers complete, and if some don't, pick
> the previous one. Which I guess in an extreme case would mean you
> need a different window tree configuration and maybe different
> TEST_ONLY check and all that, not sure how you solve that.
> 
> Anyway, in that TEST_ONLY commit my idea is that you'd also supply
> all the in-fences you expect to depend upon (maybe we need an
> additional list of in-fences for your rendering job), plus a deadline
> when you want to have them done (so that there's enough time for your
> render job still). And the kernel then calls dma_fence_set_deadline
> on all of them.
> 
> Pondering this more, maybe a separate ioctl is simpler where you just
> supply a list of in-fences and deadlines.
> 
> The real reason I want to tie this to atomic is for priviledge
> checking reasons. I don't think normal userspace should have the
> power to set arbitrary deadlines like this - at least on i915 it will
> also give you a slight priority boost and stuff like that, to make
> sure your rendering for the current frame goes in ahead of the next
> frame's prep work.
> 
> So maybe just a new ioctl that does this which is limited to the
> current kms owner (aka drm_master)?

Yeah.

Why not have a Wayland compositor *always* "set the deadlines" for the
next screen update as soon as it gets the wl_surface.commit with the
new buffer and fences (a simplified description of what is actually
necessary to take a new window state set into use)?

The Wayland client posted the frame to the compositor, so surely it
wants it ready and displayed ASAP. If we happen to have a Wayland frame
queuing extension, then also take that into account when setting the
deadline.

Then, *independently* of that, the compositor will choose which frames
it will actually use in its composition when the time comes.

No need for any KMS atomic commit fiddling, userspace just explicitly
sets the deadline on the fence and that's it. You could tie the
privilege of setting deadlines to simply holding DRM master on whatever
device? So the ioctl would need both the fence and any DRM device fd.

A rogue application opening a DRM device and becoming DRM master on it
just to be able to abuse deadlines feels both unlikely and with
insignificant consequences. It stops the obvious abuse, and if someone
actually goes the extra effort, then so what.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  9:15                           ` Pekka Paalanen
@ 2021-07-29 10:14                             ` Christian König
  2021-07-29 10:28                               ` Michel Dänzer
  2021-07-29 11:00                               ` Pekka Paalanen
  0 siblings, 2 replies; 52+ messages in thread
From: Christian König @ 2021-07-29 10:14 UTC (permalink / raw)
  To: Pekka Paalanen, Christian König
  Cc: Michel Dänzer, Rob Clark, Matthew Brost, Jack Zhang,
	Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
> [SNIP]
>> But how does it then help to wait on the CPU instead?
> A compositor does not "wait" literally. It would only check which state
> set is ready to be used, and uses the most recent set that is ready. Any
> state sets that are not ready are ignored and reconsidered the next
> time the compositor updates the screen.

Mhm, then I'm not understanding what Michel's changes are actually doing.

> Depending on which state sets are selected for a screen update, the
> global window manager state may be updated accordingly, before the
> drawing commands for the composition can be created.
>
>> See what I'm proposing is to either render the next state of the window
>> or compose from the old state (including all atomic properties).
> Yes, that's exactly how it would work. It's just that state for a
> window is not an independent thing, it can affect how unrelated windows
> are managed.
>
> A simplified example would be two windows side by side where the
> resizing of one causes the other to move. You can't resize the window
> or move the other until the buffer with the new size is ready. Until
> then the compositor uses the old state.
>
>> E.g. what do you do if you timeout and can't have the new window content
>> on time? What's the fallback here?
> As there is no wait, there is no timeout either.
>
> If the app happens to be frozen (e.g. some weird bug in fence handling
> to make it never ready, or maybe it's just bugged itself and never
> drawing again), then the app is frozen, and all the rest of the desktop
> continues running normally without a glitch.

But that is in contradict to what you told me before.

See when the window should move but fails to draw it's new content what 
happens?

Are the other windows which would be affected by the move not drawn as well?

Regards,
Christian.

>
>
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 10:14                             ` Christian König
@ 2021-07-29 10:28                               ` Michel Dänzer
  2021-07-29 11:00                               ` Pekka Paalanen
  1 sibling, 0 replies; 52+ messages in thread
From: Michel Dänzer @ 2021-07-29 10:28 UTC (permalink / raw)
  To: Christian König, Pekka Paalanen, Christian König
  Cc: Rob Clark, Matthew Brost, Jack Zhang, Gustavo Padovan, open list,
	dri-devel, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Luben Tuikov, Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

On 2021-07-29 12:14 p.m., Christian König wrote:
> Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
>> [SNIP]
>>> But how does it then help to wait on the CPU instead?
>> A compositor does not "wait" literally. It would only check which state
>> set is ready to be used, and uses the most recent set that is ready. Any
>> state sets that are not ready are ignored and reconsidered the next
>> time the compositor updates the screen.
> 
> Mhm, then I'm not understanding what Michel's changes are actually doing.

In a nutshell, my mutter MR holds back all Wayland state changes which were committed together with a new buffer (and dependent later ones) until the dma-buf file descriptors for that buffer have become readable. This is achieved by adding the fds to the main event loop (if they aren't readable already when the buffer is committed), and when they become readable, all corresponding state changes are propagated such that they will be taken into account for drawing the next frame.


>> Depending on which state sets are selected for a screen update, the
>> global window manager state may be updated accordingly, before the
>> drawing commands for the composition can be created.
>>
>>> See what I'm proposing is to either render the next state of the window
>>> or compose from the old state (including all atomic properties).
>> Yes, that's exactly how it would work. It's just that state for a
>> window is not an independent thing, it can affect how unrelated windows
>> are managed.
>>
>> A simplified example would be two windows side by side where the
>> resizing of one causes the other to move. You can't resize the window
>> or move the other until the buffer with the new size is ready. Until
>> then the compositor uses the old state.
>>
>>> E.g. what do you do if you timeout and can't have the new window content
>>> on time? What's the fallback here?
>> As there is no wait, there is no timeout either.
>>
>> If the app happens to be frozen (e.g. some weird bug in fence handling
>> to make it never ready, or maybe it's just bugged itself and never
>> drawing again), then the app is frozen, and all the rest of the desktop
>> continues running normally without a glitch.
> 
> But that is in contradict to what you told me before.
> 
> See when the window should move but fails to draw it's new content what happens?
> 
> Are the other windows which would be affected by the move not drawn as well?

Basically, the compositor draws its output as if the new buffer and all connected Wayland state changes had not been committed yet.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 10:14                             ` Christian König
  2021-07-29 10:28                               ` Michel Dänzer
@ 2021-07-29 11:00                               ` Pekka Paalanen
  2021-07-29 11:43                                 ` Christian König
  1 sibling, 1 reply; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29 11:00 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Michel Dänzer, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 2088 bytes --]

On Thu, 29 Jul 2021 12:14:18 +0200
Christian König <ckoenig.leichtzumerken@gmail.com> wrote:

> Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
> >
> > If the app happens to be frozen (e.g. some weird bug in fence handling
> > to make it never ready, or maybe it's just bugged itself and never
> > drawing again), then the app is frozen, and all the rest of the desktop
> > continues running normally without a glitch.  
> 
> But that is in contradict to what you told me before.
> 
> See when the window should move but fails to draw it's new content what 
> happens?
> 
> Are the other windows which would be affected by the move not drawn as well?

No, all the other windows will continue behaving normally just like
they always did. It's just that one frozen window there that won't
update; it won't resize, so there is no reason to move that other
window either.

Everything continues as if the frozen window never even sent anything
to the compositor after its last good update.

We have a principle in Wayland: the compositor cannot afford to wait
for clients, the desktop as a whole must remain responsive. So there is
always a backup plan even for cases where the compositor expects the
client to change something. For resizes, in a floating-window manager
it's easy: just let things continue as they are; in a tiling window
manager they may have a timeout after which... whatever is appropriate.

Another example: If a compositor decides to make a window maximized, it
tells the client the new size and state it must have. Until the client
acks that specific state change, the compositor will continue managing
that window as if nothing changed. Given the asynchronous nature of
Wayland, the client might even continue submitting updates
non-maximized for a while, and that will go through as if the
compositor didn't ask for maximized. But at some point the client acks
the window state change, and from that point on if it doesn't behave
like maximized state requires, it will get a protocol error and be
disconnected.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 11:00                               ` Pekka Paalanen
@ 2021-07-29 11:43                                 ` Christian König
  2021-07-29 12:49                                   ` Pekka Paalanen
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-29 11:43 UTC (permalink / raw)
  To: Pekka Paalanen, Christian König
  Cc: Michel Dänzer, Rob Clark, Matthew Brost, Jack Zhang,
	Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
> On Thu, 29 Jul 2021 12:14:18 +0200
> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
>>> If the app happens to be frozen (e.g. some weird bug in fence handling
>>> to make it never ready, or maybe it's just bugged itself and never
>>> drawing again), then the app is frozen, and all the rest of the desktop
>>> continues running normally without a glitch.
>> But that is in contradict to what you told me before.
>>
>> See when the window should move but fails to draw it's new content what
>> happens?
>>
>> Are the other windows which would be affected by the move not drawn as well?
> No, all the other windows will continue behaving normally just like
> they always did. It's just that one frozen window there that won't
> update; it won't resize, so there is no reason to move that other
> window either.
>
> Everything continues as if the frozen window never even sent anything
> to the compositor after its last good update.
>
> We have a principle in Wayland: the compositor cannot afford to wait
> for clients, the desktop as a whole must remain responsive. So there is
> always a backup plan even for cases where the compositor expects the
> client to change something. For resizes, in a floating-window manager
> it's easy: just let things continue as they are; in a tiling window
> manager they may have a timeout after which... whatever is appropriate.
>
> Another example: If a compositor decides to make a window maximized, it
> tells the client the new size and state it must have. Until the client
> acks that specific state change, the compositor will continue managing
> that window as if nothing changed. Given the asynchronous nature of
> Wayland, the client might even continue submitting updates
> non-maximized for a while, and that will go through as if the
> compositor didn't ask for maximized. But at some point the client acks
> the window state change, and from that point on if it doesn't behave
> like maximized state requires, it will get a protocol error and be
> disconnected.

Yeah and all of this totally makes sense.

The problem is that not forwarding the state changes to the hardware 
adds a CPU round trip which is rather bad for the driver design, 
especially power management.

E.g. when you submit the work only after everybody becomes available the 
GPU becomes idle in between and might think it is a good idea to reduce 
clocks etc...

How about doing this instead:

1. As soon as at least one window has new committed state you submit the 
rendering.
         As far as I understand it that is already the case anyway.

2. Before starting rendering the hardware driver waits with a timeout 
for all the window content to become ready.
         The timeout is picked in a way so that we at least reach a 
reasonable fps. Making that depending on the maximum refresh rate of the 
display device sounds reasonable to me.

3a. If all windows become ready on time we draw the frame as expected.
3b. If a timeout occurs the compositor is noted of this and goes on a 
fallback path rendering only the content known to be ready.

4. Repeat.

This way we should be able to handle all use cases gracefully, e.g. the 
hanging client won't cause the server to block and when everything 
becomes ready on time we just render as expected.

Regards,
Christian.

>
>
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29  9:37                         ` Pekka Paalanen
@ 2021-07-29 12:18                           ` Daniel Vetter
  2021-07-29 12:59                             ` Pekka Paalanen
  0 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2021-07-29 12:18 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Daniel Vetter, Michel Dänzer, Rob Clark, Matthew Brost,
	Roy Sun, Christian König, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 29, 2021 at 12:37:32PM +0300, Pekka Paalanen wrote:
> On Thu, 29 Jul 2021 11:03:36 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
> > > On 2021-07-29 9:09 a.m., Daniel Vetter wrote:  
> > > > On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:  
> > > >> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:  
> > > >>> On 2021-07-28 3:13 p.m., Christian König wrote:  
> > > >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:  
> > > >>>>> On 2021-07-28 1:36 p.m., Christian König wrote:  
> > > >>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:  
> > > >>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:  
> > > >>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:  
> > > >>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:  
> > > >>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:  
> > > >>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> > > >>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> > > >>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> > > >>>>>>>>>>> which the waiter would like to see the fence signalled.
> 
> ...
> 
> > > I'm not questioning that this approach helps when there's a direct
> > > chain of fences from the client to the page flip. I'm pointing out
> > > there will not always be such a chain.
> > > 
> > >   
> > > >> But maybe the solution to make this also useful for mutter  
> > > 
> > > It's not just mutter BTW. I understand gamescope has been doing
> > > this for some time already. And there seems to be consensus among
> > > developers of Wayland compositors that this is needed, so I expect
> > > at least all the major compositors to do this longer term.
> > > 
> > >   
> > > >> is to, once we have deadline support, extend it with an ioctl to
> > > >> the dma-fence fd so userspace can be the one setting the
> > > >> deadline.  
> > > 
> > > I was thinking in a similar direction.
> > >   
> > > > atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter
> > > > the option to bail out with an old frame if it's too late?  
> > > 
> > > This is a bit cryptic though, can you elaborate?  
> > 
> > So essentially when the mutter compositor guesstimator is fairly
> > confident about the next frame's composition (recall you're keeping
> > track of clients to estimate their usual latency or something like
> > that), then it does a TEST_ONLY commit to check it all works and prep
> > the rendering, but _not_ yet fire it off.
> > 
> > Instead it waits until all buffers complete, and if some don't, pick
> > the previous one. Which I guess in an extreme case would mean you
> > need a different window tree configuration and maybe different
> > TEST_ONLY check and all that, not sure how you solve that.
> > 
> > Anyway, in that TEST_ONLY commit my idea is that you'd also supply
> > all the in-fences you expect to depend upon (maybe we need an
> > additional list of in-fences for your rendering job), plus a deadline
> > when you want to have them done (so that there's enough time for your
> > render job still). And the kernel then calls dma_fence_set_deadline
> > on all of them.
> > 
> > Pondering this more, maybe a separate ioctl is simpler where you just
> > supply a list of in-fences and deadlines.
> > 
> > The real reason I want to tie this to atomic is for priviledge
> > checking reasons. I don't think normal userspace should have the
> > power to set arbitrary deadlines like this - at least on i915 it will
> > also give you a slight priority boost and stuff like that, to make
> > sure your rendering for the current frame goes in ahead of the next
> > frame's prep work.
> > 
> > So maybe just a new ioctl that does this which is limited to the
> > current kms owner (aka drm_master)?
> 
> Yeah.
> 
> Why not have a Wayland compositor *always* "set the deadlines" for the
> next screen update as soon as it gets the wl_surface.commit with the
> new buffer and fences (a simplified description of what is actually
> necessary to take a new window state set into use)?

Yeah taht's probably best. And if the frame is scheduled (video at 24fps
or whatever) you can also immediately set the deadline for that too, just
a few frames later. Always minus compositor budget taken into account.

> The Wayland client posted the frame to the compositor, so surely it
> wants it ready and displayed ASAP. If we happen to have a Wayland frame
> queuing extension, then also take that into account when setting the
> deadline.
> 
> Then, *independently* of that, the compositor will choose which frames
> it will actually use in its composition when the time comes.
> 
> No need for any KMS atomic commit fiddling, userspace just explicitly
> sets the deadline on the fence and that's it. You could tie the
> privilege of setting deadlines to simply holding DRM master on whatever
> device? So the ioctl would need both the fence and any DRM device fd.

Yeah tying that up with atomic doesn't make sense.

> A rogue application opening a DRM device and becoming DRM master on it
> just to be able to abuse deadlines feels both unlikely and with
> insignificant consequences. It stops the obvious abuse, and if someone
> actually goes the extra effort, then so what.

With logind you can't become drm master just for lolz anymore, so I'm not
worried about that. On such systems only logind has the rights to access
the primary node, everyone doing headless goes through the render node.

So just limiting the deadline ioctl to current kms owner is imo perfectly
good enough for a security model.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 11:43                                 ` Christian König
@ 2021-07-29 12:49                                   ` Pekka Paalanen
  2021-07-29 13:41                                     ` Christian König
  0 siblings, 1 reply; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29 12:49 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Michel Dänzer, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 5389 bytes --]

On Thu, 29 Jul 2021 13:43:20 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
> > On Thu, 29 Jul 2021 12:14:18 +0200
> > Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
> >  
> >> Am 29.07.21 um 11:15 schrieb Pekka Paalanen:  
> >>> If the app happens to be frozen (e.g. some weird bug in fence handling
> >>> to make it never ready, or maybe it's just bugged itself and never
> >>> drawing again), then the app is frozen, and all the rest of the desktop
> >>> continues running normally without a glitch.  
> >> But that is in contradict to what you told me before.
> >>
> >> See when the window should move but fails to draw it's new content what
> >> happens?
> >>
> >> Are the other windows which would be affected by the move not drawn as well?  
> > No, all the other windows will continue behaving normally just like
> > they always did. It's just that one frozen window there that won't
> > update; it won't resize, so there is no reason to move that other
> > window either.
> >
> > Everything continues as if the frozen window never even sent anything
> > to the compositor after its last good update.
> >
> > We have a principle in Wayland: the compositor cannot afford to wait
> > for clients, the desktop as a whole must remain responsive. So there is
> > always a backup plan even for cases where the compositor expects the
> > client to change something. For resizes, in a floating-window manager
> > it's easy: just let things continue as they are; in a tiling window
> > manager they may have a timeout after which... whatever is appropriate.
> >
> > Another example: If a compositor decides to make a window maximized, it
> > tells the client the new size and state it must have. Until the client
> > acks that specific state change, the compositor will continue managing
> > that window as if nothing changed. Given the asynchronous nature of
> > Wayland, the client might even continue submitting updates
> > non-maximized for a while, and that will go through as if the
> > compositor didn't ask for maximized. But at some point the client acks
> > the window state change, and from that point on if it doesn't behave
> > like maximized state requires, it will get a protocol error and be
> > disconnected.  
> 
> Yeah and all of this totally makes sense.
> 
> The problem is that not forwarding the state changes to the hardware 
> adds a CPU round trip which is rather bad for the driver design, 
> especially power management.
> 
> E.g. when you submit the work only after everybody becomes available the 
> GPU becomes idle in between and might think it is a good idea to reduce 
> clocks etc...

Everybody does not need to be available. The compositor can submit its
work anyway, it just uses old state for some of the windows.

But if everybody happens to be ready before the compositor repaints,
then the GPU will be idle anyway, whether the compositor looked at the
buffer readyness at all or not.

Given that Wayland clients are not expected (but can if they want) to
draw again until the frame callback which ensures that their previous
frame is definitely going to be used on screen, this idling of GPU
might happen regularly with well-behaved clients I guess?

The aim is that co-operative clients never draw a frame that will only
get discarded.

> How about doing this instead:
> 
> 1. As soon as at least one window has new committed state you submit the 
> rendering.
>          As far as I understand it that is already the case anyway.

At least Weston does not work like that. Doing that means that the
first client to send a new frame will lock all other client updates out
of that update cycle.

Hence, a compositor usually waits until some point before the target
vblank before it starts the repaint, which locks the window state in
place for the frame.

Any client update could contain window state changes that prevents the
GPU from choosing the content buffer to use.

> 2. Before starting rendering the hardware driver waits with a timeout 
> for all the window content to become ready.
>          The timeout is picked in a way so that we at least reach a 
> reasonable fps. Making that depending on the maximum refresh rate of the 
> display device sounds reasonable to me.
> 
> 3a. If all windows become ready on time we draw the frame as expected.
> 3b. If a timeout occurs the compositor is noted of this and goes on a 
> fallback path rendering only the content known to be ready.

Sounds like the fallback path, where the compositor's rendering is
already late, would need to re-do all the rendering with an extremely
tight schedule just before the KMS submission deadline. IOW, when
you're not going to make it in time, you have to do even more work and
ping-pong even more between CPU and GPU after being a bit late already.
Is that really a good idea?

It also means the compositor cannot submit the KMS atomic commit until
the GPU is done or timed out the compositing job, which is another
GPU-CPU ping-pong.

> 4. Repeat.
> 
> This way we should be able to handle all use cases gracefully, e.g. the 
> hanging client won't cause the server to block and when everything 
> becomes ready on time we just render as expected.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 12:18                           ` Daniel Vetter
@ 2021-07-29 12:59                             ` Pekka Paalanen
  2021-07-29 14:05                               ` Daniel Vetter
  0 siblings, 1 reply; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29 12:59 UTC (permalink / raw)
  To: Daniel Vetter, Simon Ser
  Cc: Michel Dänzer, Rob Clark, Matthew Brost, Roy Sun,
	Christian König, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 7080 bytes --]

On Thu, 29 Jul 2021 14:18:29 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Thu, Jul 29, 2021 at 12:37:32PM +0300, Pekka Paalanen wrote:
> > On Thu, 29 Jul 2021 11:03:36 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:  
> > > > On 2021-07-29 9:09 a.m., Daniel Vetter wrote:    
> > > > > On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:    
> > > > >> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:    
> > > > >>> On 2021-07-28 3:13 p.m., Christian König wrote:    
> > > > >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:    
> > > > >>>>> On 2021-07-28 1:36 p.m., Christian König wrote:    
> > > > >>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:    
> > > > >>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:    
> > > > >>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:    
> > > > >>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:    
> > > > >>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:    
> > > > >>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> > > > >>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> > > > >>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> > > > >>>>>>>>>>> which the waiter would like to see the fence signalled.  
> > 
> > ...
> >   
> > > > I'm not questioning that this approach helps when there's a direct
> > > > chain of fences from the client to the page flip. I'm pointing out
> > > > there will not always be such a chain.
> > > > 
> > > >     
> > > > >> But maybe the solution to make this also useful for mutter    
> > > > 
> > > > It's not just mutter BTW. I understand gamescope has been doing
> > > > this for some time already. And there seems to be consensus among
> > > > developers of Wayland compositors that this is needed, so I expect
> > > > at least all the major compositors to do this longer term.
> > > > 
> > > >     
> > > > >> is to, once we have deadline support, extend it with an ioctl to
> > > > >> the dma-fence fd so userspace can be the one setting the
> > > > >> deadline.    
> > > > 
> > > > I was thinking in a similar direction.
> > > >     
> > > > > atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter
> > > > > the option to bail out with an old frame if it's too late?    
> > > > 
> > > > This is a bit cryptic though, can you elaborate?    
> > > 
> > > So essentially when the mutter compositor guesstimator is fairly
> > > confident about the next frame's composition (recall you're keeping
> > > track of clients to estimate their usual latency or something like
> > > that), then it does a TEST_ONLY commit to check it all works and prep
> > > the rendering, but _not_ yet fire it off.
> > > 
> > > Instead it waits until all buffers complete, and if some don't, pick
> > > the previous one. Which I guess in an extreme case would mean you
> > > need a different window tree configuration and maybe different
> > > TEST_ONLY check and all that, not sure how you solve that.
> > > 
> > > Anyway, in that TEST_ONLY commit my idea is that you'd also supply
> > > all the in-fences you expect to depend upon (maybe we need an
> > > additional list of in-fences for your rendering job), plus a deadline
> > > when you want to have them done (so that there's enough time for your
> > > render job still). And the kernel then calls dma_fence_set_deadline
> > > on all of them.
> > > 
> > > Pondering this more, maybe a separate ioctl is simpler where you just
> > > supply a list of in-fences and deadlines.
> > > 
> > > The real reason I want to tie this to atomic is for priviledge
> > > checking reasons. I don't think normal userspace should have the
> > > power to set arbitrary deadlines like this - at least on i915 it will
> > > also give you a slight priority boost and stuff like that, to make
> > > sure your rendering for the current frame goes in ahead of the next
> > > frame's prep work.
> > > 
> > > So maybe just a new ioctl that does this which is limited to the
> > > current kms owner (aka drm_master)?  
> > 
> > Yeah.
> > 
> > Why not have a Wayland compositor *always* "set the deadlines" for the
> > next screen update as soon as it gets the wl_surface.commit with the
> > new buffer and fences (a simplified description of what is actually
> > necessary to take a new window state set into use)?  
> 
> Yeah taht's probably best. And if the frame is scheduled (video at 24fps
> or whatever) you can also immediately set the deadline for that too, just
> a few frames later. Always minus compositor budget taken into account.
> 
> > The Wayland client posted the frame to the compositor, so surely it
> > wants it ready and displayed ASAP. If we happen to have a Wayland frame
> > queuing extension, then also take that into account when setting the
> > deadline.
> > 
> > Then, *independently* of that, the compositor will choose which frames
> > it will actually use in its composition when the time comes.
> > 
> > No need for any KMS atomic commit fiddling, userspace just explicitly
> > sets the deadline on the fence and that's it. You could tie the
> > privilege of setting deadlines to simply holding DRM master on whatever
> > device? So the ioctl would need both the fence and any DRM device fd.  
> 
> Yeah tying that up with atomic doesn't make sense.
> 
> > A rogue application opening a DRM device and becoming DRM master on it
> > just to be able to abuse deadlines feels both unlikely and with
> > insignificant consequences. It stops the obvious abuse, and if someone
> > actually goes the extra effort, then so what.  
> 
> With logind you can't become drm master just for lolz anymore, so I'm not
> worried about that. On such systems only logind has the rights to access
> the primary node, everyone doing headless goes through the render node.

Mm, I hope the DRM leasing protocols don't rely on clients being able
to open KMS nodes anymore... they used to at some point, I think, for
the initial resource discovery before actually leasing anything.

"only logind has rights" might be a bit off still.

> So just limiting the deadline ioctl to current kms owner is imo perfectly
> good enough for a security model.

There could be multiple DRM devices. Including VKMS. Some of them not
used. The deadline setting ioctl can't guarantee the fenced buffer is
going to be used on the same DRM device the ioctl was called with. Or
used at all with KMS.

Anyway, even if that is not completely secure, I wouldn't think that
setting deadlines can do more than change GPU job priorities and power
consumption, which seem quite benign. It's enough hoops to jump through
that I think it stops everything we care to stop.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 12:49                                   ` Pekka Paalanen
@ 2021-07-29 13:41                                     ` Christian König
  2021-07-29 14:10                                       ` Pekka Paalanen
  0 siblings, 1 reply; 52+ messages in thread
From: Christian König @ 2021-07-29 13:41 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Christian König, Michel Dänzer, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 29.07.21 um 14:49 schrieb Pekka Paalanen:
> On Thu, 29 Jul 2021 13:43:20 +0200
> Christian König <christian.koenig@amd.com> wrote:
>
>> Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
>>> On Thu, 29 Jul 2021 12:14:18 +0200
>>> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
>>>   
>>>> Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
>>>>> If the app happens to be frozen (e.g. some weird bug in fence handling
>>>>> to make it never ready, or maybe it's just bugged itself and never
>>>>> drawing again), then the app is frozen, and all the rest of the desktop
>>>>> continues running normally without a glitch.
>>>> But that is in contradict to what you told me before.
>>>>
>>>> See when the window should move but fails to draw it's new content what
>>>> happens?
>>>>
>>>> Are the other windows which would be affected by the move not drawn as well?
>>> No, all the other windows will continue behaving normally just like
>>> they always did. It's just that one frozen window there that won't
>>> update; it won't resize, so there is no reason to move that other
>>> window either.
>>>
>>> Everything continues as if the frozen window never even sent anything
>>> to the compositor after its last good update.
>>>
>>> We have a principle in Wayland: the compositor cannot afford to wait
>>> for clients, the desktop as a whole must remain responsive. So there is
>>> always a backup plan even for cases where the compositor expects the
>>> client to change something. For resizes, in a floating-window manager
>>> it's easy: just let things continue as they are; in a tiling window
>>> manager they may have a timeout after which... whatever is appropriate.
>>>
>>> Another example: If a compositor decides to make a window maximized, it
>>> tells the client the new size and state it must have. Until the client
>>> acks that specific state change, the compositor will continue managing
>>> that window as if nothing changed. Given the asynchronous nature of
>>> Wayland, the client might even continue submitting updates
>>> non-maximized for a while, and that will go through as if the
>>> compositor didn't ask for maximized. But at some point the client acks
>>> the window state change, and from that point on if it doesn't behave
>>> like maximized state requires, it will get a protocol error and be
>>> disconnected.
>> Yeah and all of this totally makes sense.
>>
>> The problem is that not forwarding the state changes to the hardware
>> adds a CPU round trip which is rather bad for the driver design,
>> especially power management.
>>
>> E.g. when you submit the work only after everybody becomes available the
>> GPU becomes idle in between and might think it is a good idea to reduce
>> clocks etc...
> Everybody does not need to be available. The compositor can submit its
> work anyway, it just uses old state for some of the windows.
>
> But if everybody happens to be ready before the compositor repaints,
> then the GPU will be idle anyway, whether the compositor looked at the
> buffer readyness at all or not.

Ok good point.

> Given that Wayland clients are not expected (but can if they want) to
> draw again until the frame callback which ensures that their previous
> frame is definitely going to be used on screen, this idling of GPU
> might happen regularly with well-behaved clients I guess?

Maybe I wasn't clear what the problem is: That the GPU goes idle is 
expected, but it should it should just not go idle multiple times.

> The aim is that co-operative clients never draw a frame that will only
> get discarded.
>
>> How about doing this instead:
>>
>> 1. As soon as at least one window has new committed state you submit the
>> rendering.
>>           As far as I understand it that is already the case anyway.
> At least Weston does not work like that. Doing that means that the
> first client to send a new frame will lock all other client updates out
> of that update cycle.
>
> Hence, a compositor usually waits until some point before the target
> vblank before it starts the repaint, which locks the window state in
> place for the frame.

Uff, that means we have lost this game anyway.

See you get the best energy utilization if the hardware wakes up as few 
as possible and still get everything done.

So what happens in the case you describes is that the hardware comes out 
of sleep at least twice, once for the client and once for the server 
which is rather sub optimal.

> Any client update could contain window state changes that prevents the
> GPU from choosing the content buffer to use.
>
>> 2. Before starting rendering the hardware driver waits with a timeout
>> for all the window content to become ready.
>>           The timeout is picked in a way so that we at least reach a
>> reasonable fps. Making that depending on the maximum refresh rate of the
>> display device sounds reasonable to me.
>>
>> 3a. If all windows become ready on time we draw the frame as expected.
>> 3b. If a timeout occurs the compositor is noted of this and goes on a
>> fallback path rendering only the content known to be ready.
> Sounds like the fallback path, where the compositor's rendering is
> already late, would need to re-do all the rendering with an extremely
> tight schedule just before the KMS submission deadline. IOW, when
> you're not going to make it in time, you have to do even more work and
> ping-pong even more between CPU and GPU after being a bit late already.
> Is that really a good idea?

My idea is that both the fallback path and the normal rendering are 
submitted at the same time, just with a big if/then/else around it. E.g. 
the timeout happens on the GPU hardware and not on the CPU.

But I think that stuff is just to complicated to implement.

I want to describe once more what the ideal configuration would be:
1. When you render a frame one or more clients submit jobs to the hardware.
2. Those jobs then execute on the hardware asynchronously to the CPU.
3. At the same time the CPU prepares a composition job which takes all 
the window content from clients and renders a new frame.
4. This new frame gets submitted to the hardware driver as new content 
on the screen.
5. The hardware driver waits for all the rendering to be completed and 
flips the screen.


The idea is that you have only one block of activity on the hardware, 
e.g. something like this:
_------------_______flip_-------------_____flip.....


But what happens with Wayland currently is that you end up with:
_--------_______-__flip_------------___-__flip.....


Or even worse when you have multiple clients rendering at random times:
_---_---_---____-__flip_---_---_---___-__flip.....


I'm actually not that of a power management guy, but it is rather 
obvious that this is not optimal.

Regards,
Christian.


> It also means the compositor cannot submit the KMS atomic commit until
> the GPU is done or timed out the compositing job, which is another
> GPU-CPU ping-pong.
>
>> 4. Repeat.
>>
>> This way we should be able to handle all use cases gracefully, e.g. the
>> hanging client won't cause the server to block and when everything
>> becomes ready on time we just render as expected.
>
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 12:59                             ` Pekka Paalanen
@ 2021-07-29 14:05                               ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2021-07-29 14:05 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Simon Ser, Michel Dänzer, Rob Clark, Matthew Brost, Roy Sun,
	Christian König, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Gustavo Padovan, Alex Deucher, Tian Tao, Lee Jones,
	Christian König, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 29, 2021 at 3:00 PM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> On Thu, 29 Jul 2021 14:18:29 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Thu, Jul 29, 2021 at 12:37:32PM +0300, Pekka Paalanen wrote:
> > > On Thu, 29 Jul 2021 11:03:36 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
> > > > > On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
> > > > > > On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
> > > > > >> On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer <michel@daenzer.net> wrote:
> > > > > >>> On 2021-07-28 3:13 p.m., Christian König wrote:
> > > > > >>>> Am 28.07.21 um 15:08 schrieb Michel Dänzer:
> > > > > >>>>> On 2021-07-28 1:36 p.m., Christian König wrote:
> > > > > >>>>>> Am 27.07.21 um 17:37 schrieb Rob Clark:
> > > > > >>>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer <michel@daenzer.net> wrote:
> > > > > >>>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote:
> > > > > >>>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer <michel@daenzer.net> wrote:
> > > > > >>>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote:
> > > > > >>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism
> > > > > >>>>>>>>>>> when, for example, vblank deadlines are missed.  Instead of a boost
> > > > > >>>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by
> > > > > >>>>>>>>>>> which the waiter would like to see the fence signalled.
> > >
> > > ...
> > >
> > > > > I'm not questioning that this approach helps when there's a direct
> > > > > chain of fences from the client to the page flip. I'm pointing out
> > > > > there will not always be such a chain.
> > > > >
> > > > >
> > > > > >> But maybe the solution to make this also useful for mutter
> > > > >
> > > > > It's not just mutter BTW. I understand gamescope has been doing
> > > > > this for some time already. And there seems to be consensus among
> > > > > developers of Wayland compositors that this is needed, so I expect
> > > > > at least all the major compositors to do this longer term.
> > > > >
> > > > >
> > > > > >> is to, once we have deadline support, extend it with an ioctl to
> > > > > >> the dma-fence fd so userspace can be the one setting the
> > > > > >> deadline.
> > > > >
> > > > > I was thinking in a similar direction.
> > > > >
> > > > > > atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter
> > > > > > the option to bail out with an old frame if it's too late?
> > > > >
> > > > > This is a bit cryptic though, can you elaborate?
> > > >
> > > > So essentially when the mutter compositor guesstimator is fairly
> > > > confident about the next frame's composition (recall you're keeping
> > > > track of clients to estimate their usual latency or something like
> > > > that), then it does a TEST_ONLY commit to check it all works and prep
> > > > the rendering, but _not_ yet fire it off.
> > > >
> > > > Instead it waits until all buffers complete, and if some don't, pick
> > > > the previous one. Which I guess in an extreme case would mean you
> > > > need a different window tree configuration and maybe different
> > > > TEST_ONLY check and all that, not sure how you solve that.
> > > >
> > > > Anyway, in that TEST_ONLY commit my idea is that you'd also supply
> > > > all the in-fences you expect to depend upon (maybe we need an
> > > > additional list of in-fences for your rendering job), plus a deadline
> > > > when you want to have them done (so that there's enough time for your
> > > > render job still). And the kernel then calls dma_fence_set_deadline
> > > > on all of them.
> > > >
> > > > Pondering this more, maybe a separate ioctl is simpler where you just
> > > > supply a list of in-fences and deadlines.
> > > >
> > > > The real reason I want to tie this to atomic is for priviledge
> > > > checking reasons. I don't think normal userspace should have the
> > > > power to set arbitrary deadlines like this - at least on i915 it will
> > > > also give you a slight priority boost and stuff like that, to make
> > > > sure your rendering for the current frame goes in ahead of the next
> > > > frame's prep work.
> > > >
> > > > So maybe just a new ioctl that does this which is limited to the
> > > > current kms owner (aka drm_master)?
> > >
> > > Yeah.
> > >
> > > Why not have a Wayland compositor *always* "set the deadlines" for the
> > > next screen update as soon as it gets the wl_surface.commit with the
> > > new buffer and fences (a simplified description of what is actually
> > > necessary to take a new window state set into use)?
> >
> > Yeah taht's probably best. And if the frame is scheduled (video at 24fps
> > or whatever) you can also immediately set the deadline for that too, just
> > a few frames later. Always minus compositor budget taken into account.
> >
> > > The Wayland client posted the frame to the compositor, so surely it
> > > wants it ready and displayed ASAP. If we happen to have a Wayland frame
> > > queuing extension, then also take that into account when setting the
> > > deadline.
> > >
> > > Then, *independently* of that, the compositor will choose which frames
> > > it will actually use in its composition when the time comes.
> > >
> > > No need for any KMS atomic commit fiddling, userspace just explicitly
> > > sets the deadline on the fence and that's it. You could tie the
> > > privilege of setting deadlines to simply holding DRM master on whatever
> > > device? So the ioctl would need both the fence and any DRM device fd.
> >
> > Yeah tying that up with atomic doesn't make sense.
> >
> > > A rogue application opening a DRM device and becoming DRM master on it
> > > just to be able to abuse deadlines feels both unlikely and with
> > > insignificant consequences. It stops the obvious abuse, and if someone
> > > actually goes the extra effort, then so what.
> >
> > With logind you can't become drm master just for lolz anymore, so I'm not
> > worried about that. On such systems only logind has the rights to access
> > the primary node, everyone doing headless goes through the render node.
>
> Mm, I hope the DRM leasing protocols don't rely on clients being able
> to open KMS nodes anymore... they used to at some point, I think, for
> the initial resource discovery before actually leasing anything.

Yeah I thought that was fixed with additional xrandr/wayland discovery
protocols. It doesn't work anyone on systems with display/render
split. I think that was just to get it all going.

> "only logind has rights" might be a bit off still.
>
> > So just limiting the deadline ioctl to current kms owner is imo perfectly
> > good enough for a security model.
>
> There could be multiple DRM devices. Including VKMS. Some of them not
> used. The deadline setting ioctl can't guarantee the fenced buffer is
> going to be used on the same DRM device the ioctl was called with. Or
> used at all with KMS.

That's not a problem, fence deadline interface is cross-driver.

> Anyway, even if that is not completely secure, I wouldn't think that
> setting deadlines can do more than change GPU job priorities and power
> consumption, which seem quite benign. It's enough hoops to jump through
> that I think it stops everything we care to stop.

Yeah. Plus with this patch set you can do this already, just need to
send out an atomic flip with all the fences merged together into your
in-fence slots.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 0/4] dma-fence: Deadline awareness
  2021-07-29 13:41                                     ` Christian König
@ 2021-07-29 14:10                                       ` Pekka Paalanen
  0 siblings, 0 replies; 52+ messages in thread
From: Pekka Paalanen @ 2021-07-29 14:10 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Michel Dänzer, Rob Clark,
	Matthew Brost, Jack Zhang, Gustavo Padovan, open list, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Luben Tuikov,
	Roy Sun, Alex Deucher, Tian Tao, Lee Jones,
	open list:DMA BUFFER SHARING FRAMEWORK

[-- Attachment #1: Type: text/plain, Size: 9414 bytes --]

On Thu, 29 Jul 2021 15:41:09 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 29.07.21 um 14:49 schrieb Pekka Paalanen:
> > On Thu, 29 Jul 2021 13:43:20 +0200
> > Christian König <christian.koenig@amd.com> wrote:
> >  
> >> Am 29.07.21 um 13:00 schrieb Pekka Paalanen:  
> >>> On Thu, 29 Jul 2021 12:14:18 +0200
> >>> Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>     
> >>>> Am 29.07.21 um 11:15 schrieb Pekka Paalanen:  
> >>>>> If the app happens to be frozen (e.g. some weird bug in fence handling
> >>>>> to make it never ready, or maybe it's just bugged itself and never
> >>>>> drawing again), then the app is frozen, and all the rest of the desktop
> >>>>> continues running normally without a glitch.  
> >>>> But that is in contradict to what you told me before.
> >>>>
> >>>> See when the window should move but fails to draw it's new content what
> >>>> happens?
> >>>>
> >>>> Are the other windows which would be affected by the move not drawn as well?  
> >>> No, all the other windows will continue behaving normally just like
> >>> they always did. It's just that one frozen window there that won't
> >>> update; it won't resize, so there is no reason to move that other
> >>> window either.
> >>>
> >>> Everything continues as if the frozen window never even sent anything
> >>> to the compositor after its last good update.
> >>>
> >>> We have a principle in Wayland: the compositor cannot afford to wait
> >>> for clients, the desktop as a whole must remain responsive. So there is
> >>> always a backup plan even for cases where the compositor expects the
> >>> client to change something. For resizes, in a floating-window manager
> >>> it's easy: just let things continue as they are; in a tiling window
> >>> manager they may have a timeout after which... whatever is appropriate.
> >>>
> >>> Another example: If a compositor decides to make a window maximized, it
> >>> tells the client the new size and state it must have. Until the client
> >>> acks that specific state change, the compositor will continue managing
> >>> that window as if nothing changed. Given the asynchronous nature of
> >>> Wayland, the client might even continue submitting updates
> >>> non-maximized for a while, and that will go through as if the
> >>> compositor didn't ask for maximized. But at some point the client acks
> >>> the window state change, and from that point on if it doesn't behave
> >>> like maximized state requires, it will get a protocol error and be
> >>> disconnected.  
> >> Yeah and all of this totally makes sense.
> >>
> >> The problem is that not forwarding the state changes to the hardware
> >> adds a CPU round trip which is rather bad for the driver design,
> >> especially power management.
> >>
> >> E.g. when you submit the work only after everybody becomes available the
> >> GPU becomes idle in between and might think it is a good idea to reduce
> >> clocks etc...  
> > Everybody does not need to be available. The compositor can submit its
> > work anyway, it just uses old state for some of the windows.
> >
> > But if everybody happens to be ready before the compositor repaints,
> > then the GPU will be idle anyway, whether the compositor looked at the
> > buffer readyness at all or not.  
> 
> Ok good point.
> 
> > Given that Wayland clients are not expected (but can if they want) to
> > draw again until the frame callback which ensures that their previous
> > frame is definitely going to be used on screen, this idling of GPU
> > might happen regularly with well-behaved clients I guess?  
> 
> Maybe I wasn't clear what the problem is: That the GPU goes idle is 
> expected, but it should it should just not go idle multiple times.
> 
> > The aim is that co-operative clients never draw a frame that will only
> > get discarded.
> >  
> >> How about doing this instead:
> >>
> >> 1. As soon as at least one window has new committed state you submit the
> >> rendering.
> >>           As far as I understand it that is already the case anyway.  
> > At least Weston does not work like that. Doing that means that the
> > first client to send a new frame will lock all other client updates out
> > of that update cycle.
> >
> > Hence, a compositor usually waits until some point before the target
> > vblank before it starts the repaint, which locks the window state in
> > place for the frame.  
> 
> Uff, that means we have lost this game anyway.
> 
> See you get the best energy utilization if the hardware wakes up as few 
> as possible and still get everything done.
> 
> So what happens in the case you describes is that the hardware comes out 
> of sleep at least twice, once for the client and once for the server 
> which is rather sub optimal.

I can see the point, but what do we know about its significance?

If the alternative is the first-to-win and everyone else gets postponed
by another full refresh cycle, isn't that worse? It could even cause
jitter rather than just "high" latency to screen.

Is there any approach that would not have either disadvantage?

Here is an analysis of why Weston does what it does right now (the new
algorithm):
https://ppaalanen.blogspot.com/2015/02/weston-repaint-scheduling.html


Are we talking about desktops in general, or games, or fullscreen use
case?

It's not unthinkable to have a different compositor scheduling policy
for outputs that happen have only one fullscreen window.

> > Any client update could contain window state changes that prevents the
> > GPU from choosing the content buffer to use.
> >  
> >> 2. Before starting rendering the hardware driver waits with a timeout
> >> for all the window content to become ready.
> >>           The timeout is picked in a way so that we at least reach a
> >> reasonable fps. Making that depending on the maximum refresh rate of the
> >> display device sounds reasonable to me.
> >>
> >> 3a. If all windows become ready on time we draw the frame as expected.
> >> 3b. If a timeout occurs the compositor is noted of this and goes on a
> >> fallback path rendering only the content known to be ready.  
> > Sounds like the fallback path, where the compositor's rendering is
> > already late, would need to re-do all the rendering with an extremely
> > tight schedule just before the KMS submission deadline. IOW, when
> > you're not going to make it in time, you have to do even more work and
> > ping-pong even more between CPU and GPU after being a bit late already.
> > Is that really a good idea?  
> 
> My idea is that both the fallback path and the normal rendering are 
> submitted at the same time, just with a big if/then/else around it. E.g. 
> the timeout happens on the GPU hardware and not on the CPU.

So for every refresh, the compositor needs to prepare a combinatorial
explosion number of possible compositions to be rendered?

Or do we have the assumption that everything we talk about here is
conditional to not having any window state changes other than content
change?

Remember the example where one window is pending a resize, and if/when
that happens another window needs to move.

> But I think that stuff is just to complicated to implement.
> 
> I want to describe once more what the ideal configuration would be:
> 1. When you render a frame one or more clients submit jobs to the hardware.
> 2. Those jobs then execute on the hardware asynchronously to the CPU.
> 3. At the same time the CPU prepares a composition job which takes all 
> the window content from clients and renders a new frame.
> 4. This new frame gets submitted to the hardware driver as new content 
> on the screen.
> 5. The hardware driver waits for all the rendering to be completed and 
> flips the screen.

I believe this is what happens right now, when compositors do not take
into account that client buffers might not be ready, with the problem
that any client GPU job that takes ages will stall the whole desktop's
refresh.

> 
> The idea is that you have only one block of activity on the hardware, 
> e.g. something like this:
> _------------_______flip_-------------_____flip.....
> 
> 
> But what happens with Wayland currently is that you end up with:
> _--------_______-__flip_------------___-__flip.....
> 
> 
> Or even worse when you have multiple clients rendering at random times:
> _---_---_---____-__flip_---_---_---___-__flip.....
> 
> 
> I'm actually not that of a power management guy, but it is rather 
> obvious that this is not optimal.

Possibly, but I haven't seen anyone come up with a better plan given the
constraints that Wayland window state management raises.

Seems like it all boils down to the fundamental trade-off between
latency and throughput, or latency and power efficiency.


Thanks,
pq

> 
> Regards,
> Christian.
> 
> 
> > It also means the compositor cannot submit the KMS atomic commit until
> > the GPU is done or timed out the compositing job, which is another
> > GPU-CPU ping-pong.
> >  
> >> 4. Repeat.
> >>
> >> This way we should be able to handle all use cases gracefully, e.g. the
> >> hanging client won't cause the server to block and when everything
> >> becomes ready on time we just render as expected.  
> >
> > Thanks,
> > pq  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-29  7:03                 ` Daniel Vetter
@ 2021-07-29 15:23                   ` Rob Clark
  2021-07-29 16:18                     ` Daniel Vetter
  0 siblings, 1 reply; 52+ messages in thread
From: Rob Clark @ 2021-07-29 15:23 UTC (permalink / raw)
  To: Rob Clark, Christian König, Christian König, dri-devel,
	Matthew Brost, Rob Clark, Sumit Semwal, Gustavo Padovan,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list
  Cc: Daniel Vetter

On Thu, Jul 29, 2021 at 12:03 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:
> > On Wed, Jul 28, 2021 at 10:23 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > >
> > >
> > >
> > > Am 28.07.21 um 17:15 schrieb Rob Clark:
> > > > On Wed, Jul 28, 2021 at 4:37 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > >> Am 28.07.21 um 09:03 schrieb Christian König:
> > > >>> Am 27.07.21 um 16:25 schrieb Rob Clark:
> > > >>>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> > > >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> > > >>>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
> > > >>>>>> From: Rob Clark <robdclark@chromium.org>
> > > >>>>>>
> > > >>>>>> Add a way to hint to the fence signaler of an upcoming deadline,
> > > >>>>>> such as
> > > >>>>>> vblank, which the fence waiter would prefer not to miss. This is to
> > > >>>>>> aid
> > > >>>>>> the fence signaler in making power management decisions, like boosting
> > > >>>>>> frequency as the deadline approaches and awareness of missing
> > > >>>>>> deadlines
> > > >>>>>> so that can be factored in to the frequency scaling.
> > > >>>>>>
> > > >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > >>>>>> ---
> > > >>>>>>     drivers/dma-buf/dma-fence.c | 39
> > > >>>>>> +++++++++++++++++++++++++++++++++++++
> > > >>>>>>     include/linux/dma-fence.h   | 17 ++++++++++++++++
> > > >>>>>>     2 files changed, 56 insertions(+)
> > > >>>>>>
> > > >>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > >>>>>> index ce0f5eff575d..2e0d25ab457e 100644
> > > >>>>>> --- a/drivers/dma-buf/dma-fence.c
> > > >>>>>> +++ b/drivers/dma-buf/dma-fence.c
> > > >>>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
> > > >>>>>> **fences, uint32_t count,
> > > >>>>>>     }
> > > >>>>>>     EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > >>>>>>
> > > >>>>>> +
> > > >>>>>> +/**
> > > >>>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
> > > >>>>>> + * @fence:    the fence that is to be waited on
> > > >>>>>> + * @deadline: the time by which the waiter hopes for the fence to be
> > > >>>>>> + *            signaled
> > > >>>>>> + *
> > > >>>>>> + * Inform the fence signaler of an upcoming deadline, such as
> > > >>>>>> vblank, by
> > > >>>>>> + * which point the waiter would prefer the fence to be signaled
> > > >>>>>> by.  This
> > > >>>>>> + * is intended to give feedback to the fence signaler to aid in power
> > > >>>>>> + * management decisions, such as boosting GPU frequency if a periodic
> > > >>>>>> + * vblank deadline is approaching.
> > > >>>>>> + */
> > > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > > >>>>>> deadline)
> > > >>>>>> +{
> > > >>>>>> +     unsigned long flags;
> > > >>>>>> +
> > > >>>>>> +     if (dma_fence_is_signaled(fence))
> > > >>>>>> +             return;
> > > >>>>>> +
> > > >>>>>> +     spin_lock_irqsave(fence->lock, flags);
> > > >>>>>> +
> > > >>>>>> +     /* If we already have an earlier deadline, keep it: */
> > > >>>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> > > >>>>>> +         ktime_before(fence->deadline, deadline)) {
> > > >>>>>> +             spin_unlock_irqrestore(fence->lock, flags);
> > > >>>>>> +             return;
> > > >>>>>> +     }
> > > >>>>>> +
> > > >>>>>> +     fence->deadline = deadline;
> > > >>>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> > > >>>>>> +
> > > >>>>>> +     spin_unlock_irqrestore(fence->lock, flags);
> > > >>>>>> +
> > > >>>>>> +     if (fence->ops->set_deadline)
> > > >>>>>> +             fence->ops->set_deadline(fence, deadline);
> > > >>>>>> +}
> > > >>>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> > > >>>>>> +
> > > >>>>>>     /**
> > > >>>>>>      * dma_fence_init - Initialize a custom fence.
> > > >>>>>>      * @fence: the fence to initialize
> > > >>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > >>>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> > > >>>>>> --- a/include/linux/dma-fence.h
> > > >>>>>> +++ b/include/linux/dma-fence.h
> > > >>>>>> @@ -88,6 +88,7 @@ struct dma_fence {
> > > >>>>>>                 /* @timestamp replaced by @rcu on
> > > >>>>>> dma_fence_release() */
> > > >>>>>>                 struct rcu_head rcu;
> > > >>>>>>         };
> > > >>>>>> +     ktime_t deadline;
> > > >>>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
> > > >>>>> the
> > > >>>>> deadline as extra field here.
> > > >>>>>
> > > >>>>> We tuned the dma_fence structure intentionally so that it is only 64
> > > >>>>> bytes.
> > > >>>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
> > > >>>>
> > > >>>> We could push the ktime_t (and timer) down into the derived fence
> > > >>>> class, but I think there is going to need to be some extra storage
> > > >>>> *somewhere*.. maybe the fence signaler could get away with just
> > > >>>> storing the nearest upcoming deadline per fence-context instead?
> > > >>> I would just push that into the driver instead.
> > > >>>
> > > >>> You most likely don't want the deadline per fence anyway in complex
> > > >>> scenarios, but rather per frame. And a frame is usually composed from
> > > >>> multiple fences.
> > > > Right, I ended up keeping track of the nearest deadline in patch 5/4
> > > > which added drm/msm support:
> > > >
> > > >    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F447138%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cce6ace85263d448bbc9f08d951d9f06c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630819606427306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ameszAOlClaZNeUDlYr37ZdIytVXNgiEUKuctjXLqZ0%3D&amp;reserved=0
> > > >
> > > > But if we do have the ktime_t in dma_fence in dma_fence, we can add
> > > > some checks and avoid calling back to the driver if a later deadline
> > > > is set on a fence that already has an earlier deadline.  OTOH I
> > > > suppose I can push all that back to the driver to start, and we can
> > > > revisit once we have more drivers implementing deadline support.
> > >
> > > I still think that all of this is rather specific to your use case and
> > > have strong doubt that anybody else will implement that.
> >
> > i915 does already have a similar thing in it's hand-rolled atomic
> > commit path.  So I think msm won't be the only one.  It should be also
> > useful to the other mobile GPUs with a gpu vs kms driver split,
> > although looking at the other gpu devfreq implementations, I don't
> > think they've yet gotten to this point in the fine tuning..
>
> Yeah I have a dream that maybe i915 will use the atomic commit helpers, I
> originally wrote them with i915 in mind :-) even had patches!
>
> I also think we'll need this eventually in other areas, Android also has
> some hacks like this to make sure idle->first touch doesn't suck and
> similar things.

input-boost is another thing I have on my roadmap.. part of the solution is:

    commit 9bc95570175a7fbca29d86d22c54bbf399f4ad5a
    Author:     Rob Clark <robdclark@chromium.org>
    AuthorDate: Mon Jul 26 07:46:50 2021 -0700
    Commit:     Rob Clark <robdclark@chromium.org>
    CommitDate: Tue Jul 27 17:54:36 2021 -0700

        drm/msm: Devfreq tuning

which gives the freq a bit of a nudge if the GPU has been idle for
longer than a certain threshold.

But the other part is that if the GPU has been idle for more than 66ms
(typical autosuspend delay for adreno) it will suspend.  For modern
adreno's it takes ~2ms to "boot up" the GPU from suspend.  Which is
something you want to take out of the submit/execbuf path if you are
trying to reduce input-to-pageflip latency.

We have a downstream patch that boosts the CPUs on input events (with
a cooldown period to prevent spacebar-heater) and I have been thinking
of something along those lines to trigger resuming the GPU.. it is
straightforward enough for touch based devices, but gets more
complicated with keyboard input.  In particular, some keys you want to
trigger boost on key-release.  Ie. modifier keys (ctrl/shift/alt/etc..
the "search" key on chromebooks, etc) you want to boost on
key-release, not on key-press because unless you type *really* fast
you'll be in the cooldown period when the key-release event happens.
Unfortunately the kernel doesn't really know this "policy" sort of
information about which keys should boost on press vs release.  So I
think the long-term/upstream solution is to do input-boost in
userspace.. sysfs already has all the knobs that a userspace
input-boost daemon would need to twiddle, so no real need for this to
be in the kernel.  I guess the only drawback is the sysfs knobs are a
bit less standardized on the "desktop GPUs" which don't use devfreq.

BR,
-R

> -Daniel
>
> >
> > BR,
> > -R
> >
> > > >> Thinking more about it we could probably kill the spinlock pointer and
> > > >> make the flags 32bit if we absolutely need that here.
> > > > If we had a 'struct dma_fence_context' we could push the spinlock, ops
> > > > pointer, and u64 context into that and replace with a single
> > > > dma_fence_context ptr, fwiw
> > >
> > > That won't work. We have a lot of use cases where you can't allocate
> > > memory, but must allocate a context.
> > >
> > > Christian.
> > >
> > > >
> > > > BR,
> > > > -R
> > > >
> > > >> But I still don't see the need for that, especially since most drivers
> > > >> probably won't implement it.
> > > >>
> > > >> Regards,
> > > >> Christian.
> > > >>
> > > >>> Regards,
> > > >>> Christian.
> > > >>>
> > > >>>> BR,
> > > >>>> -R
> > > >>>>
> > > >>>>> Regards,
> > > >>>>> Christian.
> > > >>>>>
> > > >>>>>>         u64 context;
> > > >>>>>>         u64 seqno;
> > > >>>>>>         unsigned long flags;
> > > >>>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> > > >>>>>>         DMA_FENCE_FLAG_SIGNALED_BIT,
> > > >>>>>>         DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > > >>>>>>         DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > > >>>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> > > >>>>>>         DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> > > >>>>>>     };
> > > >>>>>>
> > > >>>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
> > > >>>>>>          */
> > > >>>>>>         void (*timeline_value_str)(struct dma_fence *fence,
> > > >>>>>>                                    char *str, int size);
> > > >>>>>> +
> > > >>>>>> +     /**
> > > >>>>>> +      * @set_deadline:
> > > >>>>>> +      *
> > > >>>>>> +      * Callback to allow a fence waiter to inform the fence
> > > >>>>>> signaler of an
> > > >>>>>> +      * upcoming deadline, such as vblank, by which point the
> > > >>>>>> waiter would
> > > >>>>>> +      * prefer the fence to be signaled by.  This is intended to
> > > >>>>>> give feedback
> > > >>>>>> +      * to the fence signaler to aid in power management
> > > >>>>>> decisions, such as
> > > >>>>>> +      * boosting GPU frequency.
> > > >>>>>> +      *
> > > >>>>>> +      * This callback is optional.
> > > >>>>>> +      */
> > > >>>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > > >>>>>>     };
> > > >>>>>>
> > > >>>>>>     void dma_fence_init(struct dma_fence *fence, const struct
> > > >>>>>> dma_fence_ops *ops,
> > > >>>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
> > > >>>>>> dma_fence *fence, bool intr)
> > > >>>>>>         return ret < 0 ? ret : 0;
> > > >>>>>>     }
> > > >>>>>>
> > > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > > >>>>>> deadline);
> > > >>>>>> +
> > > >>>>>>     struct dma_fence *dma_fence_get_stub(void);
> > > >>>>>>     struct dma_fence *dma_fence_allocate_private_stub(void);
> > > >>>>>>     u64 dma_fence_context_alloc(unsigned num);
> > >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-29 15:23                   ` Rob Clark
@ 2021-07-29 16:18                     ` Daniel Vetter
  2021-07-29 17:32                       ` Rob Clark
  0 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2021-07-29 16:18 UTC (permalink / raw)
  To: Rob Clark
  Cc: Christian König, Christian König, dri-devel,
	Matthew Brost, Rob Clark, Sumit Semwal, Gustavo Padovan,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

On Thu, Jul 29, 2021 at 5:19 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Thu, Jul 29, 2021 at 12:03 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:
> > > On Wed, Jul 28, 2021 at 10:23 AM Christian König
> > > <christian.koenig@amd.com> wrote:
> > > >
> > > >
> > > >
> > > > Am 28.07.21 um 17:15 schrieb Rob Clark:
> > > > > On Wed, Jul 28, 2021 at 4:37 AM Christian König
> > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > >> Am 28.07.21 um 09:03 schrieb Christian König:
> > > > >>> Am 27.07.21 um 16:25 schrieb Rob Clark:
> > > > >>>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> > > > >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > >>>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
> > > > >>>>>> From: Rob Clark <robdclark@chromium.org>
> > > > >>>>>>
> > > > >>>>>> Add a way to hint to the fence signaler of an upcoming deadline,
> > > > >>>>>> such as
> > > > >>>>>> vblank, which the fence waiter would prefer not to miss. This is to
> > > > >>>>>> aid
> > > > >>>>>> the fence signaler in making power management decisions, like boosting
> > > > >>>>>> frequency as the deadline approaches and awareness of missing
> > > > >>>>>> deadlines
> > > > >>>>>> so that can be factored in to the frequency scaling.
> > > > >>>>>>
> > > > >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > >>>>>> ---
> > > > >>>>>>     drivers/dma-buf/dma-fence.c | 39
> > > > >>>>>> +++++++++++++++++++++++++++++++++++++
> > > > >>>>>>     include/linux/dma-fence.h   | 17 ++++++++++++++++
> > > > >>>>>>     2 files changed, 56 insertions(+)
> > > > >>>>>>
> > > > >>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > >>>>>> index ce0f5eff575d..2e0d25ab457e 100644
> > > > >>>>>> --- a/drivers/dma-buf/dma-fence.c
> > > > >>>>>> +++ b/drivers/dma-buf/dma-fence.c
> > > > >>>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
> > > > >>>>>> **fences, uint32_t count,
> > > > >>>>>>     }
> > > > >>>>>>     EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > >>>>>>
> > > > >>>>>> +
> > > > >>>>>> +/**
> > > > >>>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
> > > > >>>>>> + * @fence:    the fence that is to be waited on
> > > > >>>>>> + * @deadline: the time by which the waiter hopes for the fence to be
> > > > >>>>>> + *            signaled
> > > > >>>>>> + *
> > > > >>>>>> + * Inform the fence signaler of an upcoming deadline, such as
> > > > >>>>>> vblank, by
> > > > >>>>>> + * which point the waiter would prefer the fence to be signaled
> > > > >>>>>> by.  This
> > > > >>>>>> + * is intended to give feedback to the fence signaler to aid in power
> > > > >>>>>> + * management decisions, such as boosting GPU frequency if a periodic
> > > > >>>>>> + * vblank deadline is approaching.
> > > > >>>>>> + */
> > > > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > > > >>>>>> deadline)
> > > > >>>>>> +{
> > > > >>>>>> +     unsigned long flags;
> > > > >>>>>> +
> > > > >>>>>> +     if (dma_fence_is_signaled(fence))
> > > > >>>>>> +             return;
> > > > >>>>>> +
> > > > >>>>>> +     spin_lock_irqsave(fence->lock, flags);
> > > > >>>>>> +
> > > > >>>>>> +     /* If we already have an earlier deadline, keep it: */
> > > > >>>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> > > > >>>>>> +         ktime_before(fence->deadline, deadline)) {
> > > > >>>>>> +             spin_unlock_irqrestore(fence->lock, flags);
> > > > >>>>>> +             return;
> > > > >>>>>> +     }
> > > > >>>>>> +
> > > > >>>>>> +     fence->deadline = deadline;
> > > > >>>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> > > > >>>>>> +
> > > > >>>>>> +     spin_unlock_irqrestore(fence->lock, flags);
> > > > >>>>>> +
> > > > >>>>>> +     if (fence->ops->set_deadline)
> > > > >>>>>> +             fence->ops->set_deadline(fence, deadline);
> > > > >>>>>> +}
> > > > >>>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> > > > >>>>>> +
> > > > >>>>>>     /**
> > > > >>>>>>      * dma_fence_init - Initialize a custom fence.
> > > > >>>>>>      * @fence: the fence to initialize
> > > > >>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > > >>>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> > > > >>>>>> --- a/include/linux/dma-fence.h
> > > > >>>>>> +++ b/include/linux/dma-fence.h
> > > > >>>>>> @@ -88,6 +88,7 @@ struct dma_fence {
> > > > >>>>>>                 /* @timestamp replaced by @rcu on
> > > > >>>>>> dma_fence_release() */
> > > > >>>>>>                 struct rcu_head rcu;
> > > > >>>>>>         };
> > > > >>>>>> +     ktime_t deadline;
> > > > >>>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
> > > > >>>>> the
> > > > >>>>> deadline as extra field here.
> > > > >>>>>
> > > > >>>>> We tuned the dma_fence structure intentionally so that it is only 64
> > > > >>>>> bytes.
> > > > >>>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
> > > > >>>>
> > > > >>>> We could push the ktime_t (and timer) down into the derived fence
> > > > >>>> class, but I think there is going to need to be some extra storage
> > > > >>>> *somewhere*.. maybe the fence signaler could get away with just
> > > > >>>> storing the nearest upcoming deadline per fence-context instead?
> > > > >>> I would just push that into the driver instead.
> > > > >>>
> > > > >>> You most likely don't want the deadline per fence anyway in complex
> > > > >>> scenarios, but rather per frame. And a frame is usually composed from
> > > > >>> multiple fences.
> > > > > Right, I ended up keeping track of the nearest deadline in patch 5/4
> > > > > which added drm/msm support:
> > > > >
> > > > >    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F447138%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cce6ace85263d448bbc9f08d951d9f06c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630819606427306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ameszAOlClaZNeUDlYr37ZdIytVXNgiEUKuctjXLqZ0%3D&amp;reserved=0
> > > > >
> > > > > But if we do have the ktime_t in dma_fence in dma_fence, we can add
> > > > > some checks and avoid calling back to the driver if a later deadline
> > > > > is set on a fence that already has an earlier deadline.  OTOH I
> > > > > suppose I can push all that back to the driver to start, and we can
> > > > > revisit once we have more drivers implementing deadline support.
> > > >
> > > > I still think that all of this is rather specific to your use case and
> > > > have strong doubt that anybody else will implement that.
> > >
> > > i915 does already have a similar thing in it's hand-rolled atomic
> > > commit path.  So I think msm won't be the only one.  It should be also
> > > useful to the other mobile GPUs with a gpu vs kms driver split,
> > > although looking at the other gpu devfreq implementations, I don't
> > > think they've yet gotten to this point in the fine tuning..
> >
> > Yeah I have a dream that maybe i915 will use the atomic commit helpers, I
> > originally wrote them with i915 in mind :-) even had patches!
> >
> > I also think we'll need this eventually in other areas, Android also has
> > some hacks like this to make sure idle->first touch doesn't suck and
> > similar things.
>
> input-boost is another thing I have on my roadmap.. part of the solution is:
>
>     commit 9bc95570175a7fbca29d86d22c54bbf399f4ad5a
>     Author:     Rob Clark <robdclark@chromium.org>
>     AuthorDate: Mon Jul 26 07:46:50 2021 -0700
>     Commit:     Rob Clark <robdclark@chromium.org>
>     CommitDate: Tue Jul 27 17:54:36 2021 -0700
>
>         drm/msm: Devfreq tuning
>
> which gives the freq a bit of a nudge if the GPU has been idle for
> longer than a certain threshold.
>
> But the other part is that if the GPU has been idle for more than 66ms
> (typical autosuspend delay for adreno) it will suspend.  For modern
> adreno's it takes ~2ms to "boot up" the GPU from suspend.  Which is
> something you want to take out of the submit/execbuf path if you are
> trying to reduce input-to-pageflip latency.
>
> We have a downstream patch that boosts the CPUs on input events (with
> a cooldown period to prevent spacebar-heater) and I have been thinking
> of something along those lines to trigger resuming the GPU.. it is
> straightforward enough for touch based devices, but gets more
> complicated with keyboard input.  In particular, some keys you want to
> trigger boost on key-release.  Ie. modifier keys (ctrl/shift/alt/etc..
> the "search" key on chromebooks, etc) you want to boost on
> key-release, not on key-press because unless you type *really* fast
> you'll be in the cooldown period when the key-release event happens.
> Unfortunately the kernel doesn't really know this "policy" sort of
> information about which keys should boost on press vs release.  So I
> think the long-term/upstream solution is to do input-boost in
> userspace.. sysfs already has all the knobs that a userspace
> input-boost daemon would need to twiddle, so no real need for this to
> be in the kernel.  I guess the only drawback is the sysfs knobs are a
> bit less standardized on the "desktop GPUs" which don't use devfreq.

I think we could do a standard interface for this, either on the drm
owner/master or somewhere in sysfs. Essentially "I expect to use the
gpu for the very next frame, get it going". Across all hw there's a
lot of things we can do. I think abuse is pretty easy to prevent with
a cooldown or similar.
-Daniel

>
> BR,
> -R
>
> > -Daniel
> >
> > >
> > > BR,
> > > -R
> > >
> > > > >> Thinking more about it we could probably kill the spinlock pointer and
> > > > >> make the flags 32bit if we absolutely need that here.
> > > > > If we had a 'struct dma_fence_context' we could push the spinlock, ops
> > > > > pointer, and u64 context into that and replace with a single
> > > > > dma_fence_context ptr, fwiw
> > > >
> > > > That won't work. We have a lot of use cases where you can't allocate
> > > > memory, but must allocate a context.
> > > >
> > > > Christian.
> > > >
> > > > >
> > > > > BR,
> > > > > -R
> > > > >
> > > > >> But I still don't see the need for that, especially since most drivers
> > > > >> probably won't implement it.
> > > > >>
> > > > >> Regards,
> > > > >> Christian.
> > > > >>
> > > > >>> Regards,
> > > > >>> Christian.
> > > > >>>
> > > > >>>> BR,
> > > > >>>> -R
> > > > >>>>
> > > > >>>>> Regards,
> > > > >>>>> Christian.
> > > > >>>>>
> > > > >>>>>>         u64 context;
> > > > >>>>>>         u64 seqno;
> > > > >>>>>>         unsigned long flags;
> > > > >>>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> > > > >>>>>>         DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > >>>>>>         DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > > > >>>>>>         DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > > > >>>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> > > > >>>>>>         DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> > > > >>>>>>     };
> > > > >>>>>>
> > > > >>>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
> > > > >>>>>>          */
> > > > >>>>>>         void (*timeline_value_str)(struct dma_fence *fence,
> > > > >>>>>>                                    char *str, int size);
> > > > >>>>>> +
> > > > >>>>>> +     /**
> > > > >>>>>> +      * @set_deadline:
> > > > >>>>>> +      *
> > > > >>>>>> +      * Callback to allow a fence waiter to inform the fence
> > > > >>>>>> signaler of an
> > > > >>>>>> +      * upcoming deadline, such as vblank, by which point the
> > > > >>>>>> waiter would
> > > > >>>>>> +      * prefer the fence to be signaled by.  This is intended to
> > > > >>>>>> give feedback
> > > > >>>>>> +      * to the fence signaler to aid in power management
> > > > >>>>>> decisions, such as
> > > > >>>>>> +      * boosting GPU frequency.
> > > > >>>>>> +      *
> > > > >>>>>> +      * This callback is optional.
> > > > >>>>>> +      */
> > > > >>>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > > > >>>>>>     };
> > > > >>>>>>
> > > > >>>>>>     void dma_fence_init(struct dma_fence *fence, const struct
> > > > >>>>>> dma_fence_ops *ops,
> > > > >>>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
> > > > >>>>>> dma_fence *fence, bool intr)
> > > > >>>>>>         return ret < 0 ? ret : 0;
> > > > >>>>>>     }
> > > > >>>>>>
> > > > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > > > >>>>>> deadline);
> > > > >>>>>> +
> > > > >>>>>>     struct dma_fence *dma_fence_get_stub(void);
> > > > >>>>>>     struct dma_fence *dma_fence_allocate_private_stub(void);
> > > > >>>>>>     u64 dma_fence_context_alloc(unsigned num);
> > > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC 1/4] dma-fence: Add deadline awareness
  2021-07-29 16:18                     ` Daniel Vetter
@ 2021-07-29 17:32                       ` Rob Clark
  0 siblings, 0 replies; 52+ messages in thread
From: Rob Clark @ 2021-07-29 17:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Christian König, dri-devel,
	Matthew Brost, Rob Clark, Sumit Semwal, Gustavo Padovan,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

On Thu, Jul 29, 2021 at 9:18 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Jul 29, 2021 at 5:19 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Thu, Jul 29, 2021 at 12:03 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:
> > > > On Wed, Jul 28, 2021 at 10:23 AM Christian König
> > > > <christian.koenig@amd.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Am 28.07.21 um 17:15 schrieb Rob Clark:
> > > > > > On Wed, Jul 28, 2021 at 4:37 AM Christian König
> > > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > >> Am 28.07.21 um 09:03 schrieb Christian König:
> > > > > >>> Am 27.07.21 um 16:25 schrieb Rob Clark:
> > > > > >>>> On Tue, Jul 27, 2021 at 12:11 AM Christian König
> > > > > >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > >>>>> Am 27.07.21 um 01:38 schrieb Rob Clark:
> > > > > >>>>>> From: Rob Clark <robdclark@chromium.org>
> > > > > >>>>>>
> > > > > >>>>>> Add a way to hint to the fence signaler of an upcoming deadline,
> > > > > >>>>>> such as
> > > > > >>>>>> vblank, which the fence waiter would prefer not to miss. This is to
> > > > > >>>>>> aid
> > > > > >>>>>> the fence signaler in making power management decisions, like boosting
> > > > > >>>>>> frequency as the deadline approaches and awareness of missing
> > > > > >>>>>> deadlines
> > > > > >>>>>> so that can be factored in to the frequency scaling.
> > > > > >>>>>>
> > > > > >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > >>>>>> ---
> > > > > >>>>>>     drivers/dma-buf/dma-fence.c | 39
> > > > > >>>>>> +++++++++++++++++++++++++++++++++++++
> > > > > >>>>>>     include/linux/dma-fence.h   | 17 ++++++++++++++++
> > > > > >>>>>>     2 files changed, 56 insertions(+)
> > > > > >>>>>>
> > > > > >>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > > >>>>>> index ce0f5eff575d..2e0d25ab457e 100644
> > > > > >>>>>> --- a/drivers/dma-buf/dma-fence.c
> > > > > >>>>>> +++ b/drivers/dma-buf/dma-fence.c
> > > > > >>>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence
> > > > > >>>>>> **fences, uint32_t count,
> > > > > >>>>>>     }
> > > > > >>>>>>     EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > >>>>>>
> > > > > >>>>>> +
> > > > > >>>>>> +/**
> > > > > >>>>>> + * dma_fence_set_deadline - set desired fence-wait deadline
> > > > > >>>>>> + * @fence:    the fence that is to be waited on
> > > > > >>>>>> + * @deadline: the time by which the waiter hopes for the fence to be
> > > > > >>>>>> + *            signaled
> > > > > >>>>>> + *
> > > > > >>>>>> + * Inform the fence signaler of an upcoming deadline, such as
> > > > > >>>>>> vblank, by
> > > > > >>>>>> + * which point the waiter would prefer the fence to be signaled
> > > > > >>>>>> by.  This
> > > > > >>>>>> + * is intended to give feedback to the fence signaler to aid in power
> > > > > >>>>>> + * management decisions, such as boosting GPU frequency if a periodic
> > > > > >>>>>> + * vblank deadline is approaching.
> > > > > >>>>>> + */
> > > > > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > > > > >>>>>> deadline)
> > > > > >>>>>> +{
> > > > > >>>>>> +     unsigned long flags;
> > > > > >>>>>> +
> > > > > >>>>>> +     if (dma_fence_is_signaled(fence))
> > > > > >>>>>> +             return;
> > > > > >>>>>> +
> > > > > >>>>>> +     spin_lock_irqsave(fence->lock, flags);
> > > > > >>>>>> +
> > > > > >>>>>> +     /* If we already have an earlier deadline, keep it: */
> > > > > >>>>>> +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
> > > > > >>>>>> +         ktime_before(fence->deadline, deadline)) {
> > > > > >>>>>> +             spin_unlock_irqrestore(fence->lock, flags);
> > > > > >>>>>> +             return;
> > > > > >>>>>> +     }
> > > > > >>>>>> +
> > > > > >>>>>> +     fence->deadline = deadline;
> > > > > >>>>>> +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
> > > > > >>>>>> +
> > > > > >>>>>> +     spin_unlock_irqrestore(fence->lock, flags);
> > > > > >>>>>> +
> > > > > >>>>>> +     if (fence->ops->set_deadline)
> > > > > >>>>>> +             fence->ops->set_deadline(fence, deadline);
> > > > > >>>>>> +}
> > > > > >>>>>> +EXPORT_SYMBOL(dma_fence_set_deadline);
> > > > > >>>>>> +
> > > > > >>>>>>     /**
> > > > > >>>>>>      * dma_fence_init - Initialize a custom fence.
> > > > > >>>>>>      * @fence: the fence to initialize
> > > > > >>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > > > > >>>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644
> > > > > >>>>>> --- a/include/linux/dma-fence.h
> > > > > >>>>>> +++ b/include/linux/dma-fence.h
> > > > > >>>>>> @@ -88,6 +88,7 @@ struct dma_fence {
> > > > > >>>>>>                 /* @timestamp replaced by @rcu on
> > > > > >>>>>> dma_fence_release() */
> > > > > >>>>>>                 struct rcu_head rcu;
> > > > > >>>>>>         };
> > > > > >>>>>> +     ktime_t deadline;
> > > > > >>>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding
> > > > > >>>>> the
> > > > > >>>>> deadline as extra field here.
> > > > > >>>>>
> > > > > >>>>> We tuned the dma_fence structure intentionally so that it is only 64
> > > > > >>>>> bytes.
> > > > > >>>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?
> > > > > >>>>
> > > > > >>>> We could push the ktime_t (and timer) down into the derived fence
> > > > > >>>> class, but I think there is going to need to be some extra storage
> > > > > >>>> *somewhere*.. maybe the fence signaler could get away with just
> > > > > >>>> storing the nearest upcoming deadline per fence-context instead?
> > > > > >>> I would just push that into the driver instead.
> > > > > >>>
> > > > > >>> You most likely don't want the deadline per fence anyway in complex
> > > > > >>> scenarios, but rather per frame. And a frame is usually composed from
> > > > > >>> multiple fences.
> > > > > > Right, I ended up keeping track of the nearest deadline in patch 5/4
> > > > > > which added drm/msm support:
> > > > > >
> > > > > >    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F447138%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cce6ace85263d448bbc9f08d951d9f06c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637630819606427306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ameszAOlClaZNeUDlYr37ZdIytVXNgiEUKuctjXLqZ0%3D&amp;reserved=0
> > > > > >
> > > > > > But if we do have the ktime_t in dma_fence in dma_fence, we can add
> > > > > > some checks and avoid calling back to the driver if a later deadline
> > > > > > is set on a fence that already has an earlier deadline.  OTOH I
> > > > > > suppose I can push all that back to the driver to start, and we can
> > > > > > revisit once we have more drivers implementing deadline support.
> > > > >
> > > > > I still think that all of this is rather specific to your use case and
> > > > > have strong doubt that anybody else will implement that.
> > > >
> > > > i915 does already have a similar thing in it's hand-rolled atomic
> > > > commit path.  So I think msm won't be the only one.  It should be also
> > > > useful to the other mobile GPUs with a gpu vs kms driver split,
> > > > although looking at the other gpu devfreq implementations, I don't
> > > > think they've yet gotten to this point in the fine tuning..
> > >
> > > Yeah I have a dream that maybe i915 will use the atomic commit helpers, I
> > > originally wrote them with i915 in mind :-) even had patches!
> > >
> > > I also think we'll need this eventually in other areas, Android also has
> > > some hacks like this to make sure idle->first touch doesn't suck and
> > > similar things.
> >
> > input-boost is another thing I have on my roadmap.. part of the solution is:
> >
> >     commit 9bc95570175a7fbca29d86d22c54bbf399f4ad5a
> >     Author:     Rob Clark <robdclark@chromium.org>
> >     AuthorDate: Mon Jul 26 07:46:50 2021 -0700
> >     Commit:     Rob Clark <robdclark@chromium.org>
> >     CommitDate: Tue Jul 27 17:54:36 2021 -0700
> >
> >         drm/msm: Devfreq tuning
> >
> > which gives the freq a bit of a nudge if the GPU has been idle for
> > longer than a certain threshold.
> >
> > But the other part is that if the GPU has been idle for more than 66ms
> > (typical autosuspend delay for adreno) it will suspend.  For modern
> > adreno's it takes ~2ms to "boot up" the GPU from suspend.  Which is
> > something you want to take out of the submit/execbuf path if you are
> > trying to reduce input-to-pageflip latency.
> >
> > We have a downstream patch that boosts the CPUs on input events (with
> > a cooldown period to prevent spacebar-heater) and I have been thinking
> > of something along those lines to trigger resuming the GPU.. it is
> > straightforward enough for touch based devices, but gets more
> > complicated with keyboard input.  In particular, some keys you want to
> > trigger boost on key-release.  Ie. modifier keys (ctrl/shift/alt/etc..
> > the "search" key on chromebooks, etc) you want to boost on
> > key-release, not on key-press because unless you type *really* fast
> > you'll be in the cooldown period when the key-release event happens.
> > Unfortunately the kernel doesn't really know this "policy" sort of
> > information about which keys should boost on press vs release.  So I
> > think the long-term/upstream solution is to do input-boost in
> > userspace.. sysfs already has all the knobs that a userspace
> > input-boost daemon would need to twiddle, so no real need for this to
> > be in the kernel.  I guess the only drawback is the sysfs knobs are a
> > bit less standardized on the "desktop GPUs" which don't use devfreq.
>
> I think we could do a standard interface for this, either on the drm
> owner/master or somewhere in sysfs. Essentially "I expect to use the
> gpu for the very next frame, get it going". Across all hw there's a
> lot of things we can do. I think abuse is pretty easy to prevent with
> a cooldown or similar.

The userspace input-boost ends up needing to be either part of the
compositor, or a privileged process, in order to sniff input events..
so I don't think the kernel needs to try to prevent abuse here (but
the userspace part defn wants a cooldown period)

BR,
-R

> -Daniel
>
> >
> > BR,
> > -R
> >
> > > -Daniel
> > >
> > > >
> > > > BR,
> > > > -R
> > > >
> > > > > >> Thinking more about it we could probably kill the spinlock pointer and
> > > > > >> make the flags 32bit if we absolutely need that here.
> > > > > > If we had a 'struct dma_fence_context' we could push the spinlock, ops
> > > > > > pointer, and u64 context into that and replace with a single
> > > > > > dma_fence_context ptr, fwiw
> > > > >
> > > > > That won't work. We have a lot of use cases where you can't allocate
> > > > > memory, but must allocate a context.
> > > > >
> > > > > Christian.
> > > > >
> > > > > >
> > > > > > BR,
> > > > > > -R
> > > > > >
> > > > > >> But I still don't see the need for that, especially since most drivers
> > > > > >> probably won't implement it.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Christian.
> > > > > >>
> > > > > >>> Regards,
> > > > > >>> Christian.
> > > > > >>>
> > > > > >>>> BR,
> > > > > >>>> -R
> > > > > >>>>
> > > > > >>>>> Regards,
> > > > > >>>>> Christian.
> > > > > >>>>>
> > > > > >>>>>>         u64 context;
> > > > > >>>>>>         u64 seqno;
> > > > > >>>>>>         unsigned long flags;
> > > > > >>>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {
> > > > > >>>>>>         DMA_FENCE_FLAG_SIGNALED_BIT,
> > > > > >>>>>>         DMA_FENCE_FLAG_TIMESTAMP_BIT,
> > > > > >>>>>>         DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> > > > > >>>>>> +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> > > > > >>>>>>         DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> > > > > >>>>>>     };
> > > > > >>>>>>
> > > > > >>>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops {
> > > > > >>>>>>          */
> > > > > >>>>>>         void (*timeline_value_str)(struct dma_fence *fence,
> > > > > >>>>>>                                    char *str, int size);
> > > > > >>>>>> +
> > > > > >>>>>> +     /**
> > > > > >>>>>> +      * @set_deadline:
> > > > > >>>>>> +      *
> > > > > >>>>>> +      * Callback to allow a fence waiter to inform the fence
> > > > > >>>>>> signaler of an
> > > > > >>>>>> +      * upcoming deadline, such as vblank, by which point the
> > > > > >>>>>> waiter would
> > > > > >>>>>> +      * prefer the fence to be signaled by.  This is intended to
> > > > > >>>>>> give feedback
> > > > > >>>>>> +      * to the fence signaler to aid in power management
> > > > > >>>>>> decisions, such as
> > > > > >>>>>> +      * boosting GPU frequency.
> > > > > >>>>>> +      *
> > > > > >>>>>> +      * This callback is optional.
> > > > > >>>>>> +      */
> > > > > >>>>>> +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
> > > > > >>>>>>     };
> > > > > >>>>>>
> > > > > >>>>>>     void dma_fence_init(struct dma_fence *fence, const struct
> > > > > >>>>>> dma_fence_ops *ops,
> > > > > >>>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct
> > > > > >>>>>> dma_fence *fence, bool intr)
> > > > > >>>>>>         return ret < 0 ? ret : 0;
> > > > > >>>>>>     }
> > > > > >>>>>>
> > > > > >>>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t
> > > > > >>>>>> deadline);
> > > > > >>>>>> +
> > > > > >>>>>>     struct dma_fence *dma_fence_get_stub(void);
> > > > > >>>>>>     struct dma_fence *dma_fence_allocate_private_stub(void);
> > > > > >>>>>>     u64 dma_fence_context_alloc(unsigned num);
> > > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-07-29 17:28 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 23:38 [RFC 0/4] dma-fence: Deadline awareness Rob Clark
2021-07-26 23:38 ` [RFC 1/4] dma-fence: Add deadline awareness Rob Clark
2021-07-27  7:11   ` Christian König
2021-07-27 14:25     ` Rob Clark
2021-07-28  7:03       ` Christian König
2021-07-28 11:37         ` Christian König
2021-07-28 15:15           ` Rob Clark
2021-07-28 17:23             ` Christian König
2021-07-28 17:58               ` Rob Clark
2021-07-29  7:03                 ` Daniel Vetter
2021-07-29 15:23                   ` Rob Clark
2021-07-29 16:18                     ` Daniel Vetter
2021-07-29 17:32                       ` Rob Clark
2021-07-26 23:38 ` [RFC 2/4] drm/vblank: Add helper to get next vblank time Rob Clark
2021-07-26 23:38 ` [RFC 3/4] drm/atomic-helper: Set fence deadline for vblank Rob Clark
2021-07-27 10:44   ` Michel Dänzer
2021-07-27 14:33     ` Rob Clark
2021-07-26 23:38 ` [RFC 4/4] drm/scheduler: Add fence deadline support Rob Clark
2021-07-26 23:51 ` [RFC 0/4] dma-fence: Deadline awareness Rob Clark
2021-07-27 14:41 ` Michel Dänzer
2021-07-27 15:12   ` Rob Clark
2021-07-27 15:19     ` Michel Dänzer
2021-07-27 15:37       ` Rob Clark
2021-07-28 11:36         ` Christian König
2021-07-28 13:08           ` Michel Dänzer
2021-07-28 13:13             ` Christian König
2021-07-28 13:24               ` Michel Dänzer
2021-07-28 13:31                 ` Christian König
2021-07-28 13:57                   ` Pekka Paalanen
2021-07-28 14:30                     ` Christian König
2021-07-29  8:08                       ` Michel Dänzer
2021-07-29  8:23                       ` Pekka Paalanen
2021-07-29  8:43                         ` Christian König
2021-07-29  9:15                           ` Pekka Paalanen
2021-07-29 10:14                             ` Christian König
2021-07-29 10:28                               ` Michel Dänzer
2021-07-29 11:00                               ` Pekka Paalanen
2021-07-29 11:43                                 ` Christian König
2021-07-29 12:49                                   ` Pekka Paalanen
2021-07-29 13:41                                     ` Christian König
2021-07-29 14:10                                       ` Pekka Paalanen
2021-07-28 15:27                     ` Rob Clark
2021-07-28 17:20                       ` Christian König
2021-07-28 15:34                 ` Rob Clark
2021-07-29  7:09                   ` Daniel Vetter
2021-07-29  8:17                     ` Michel Dänzer
2021-07-29  9:03                       ` Daniel Vetter
2021-07-29  9:37                         ` Pekka Paalanen
2021-07-29 12:18                           ` Daniel Vetter
2021-07-29 12:59                             ` Pekka Paalanen
2021-07-29 14:05                               ` Daniel Vetter
2021-07-27 21:17 ` [RFC 5/4] drm/msm: Add deadline based boost support Rob Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).