linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/msm: Get rid of fence allocation in job_run()
@ 2023-03-11 17:35 Rob Clark
  2023-03-11 17:35 ` [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref() Rob Clark
  2023-03-11 17:35 ` [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit Rob Clark
  0 siblings, 2 replies; 10+ messages in thread
From: Rob Clark @ 2023-03-11 17:35 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Rob Clark, Gustavo Padovan,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list,
	open list:DMA BUFFER SHARING FRAMEWORK, Sean Paul

From: Rob Clark <robdclark@chromium.org>

Inspired by https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vetter@ffwll.ch/
it seemed like a good idea to get rid of memory allocation in job_run()
by embedding the hw dma_fence in the job/submit struct.

Applies on top of https://patchwork.freedesktop.org/series/93035/ but I
can re-work it to swap the order.  I think the first patch would be
useful to amdgpu and perhaps anyone else embedding the hw_fence in the
struct containing drm_sched_job.

Rob Clark (2):
  dma-buf/dma-fence: Add dma_fence_init_noref()
  drm/msm: Embed the hw_fence in msm_gem_submit

 drivers/dma-buf/dma-fence.c          | 43 +++++++++++++++++++-------
 drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
 drivers/gpu/drm/msm/msm_fence.h      |  2 +-
 drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
 drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
 drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
 drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
 include/linux/dma-fence.h            |  2 ++
 8 files changed, 66 insertions(+), 52 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref()
  2023-03-11 17:35 [PATCH 0/2] drm/msm: Get rid of fence allocation in job_run() Rob Clark
@ 2023-03-11 17:35 ` Rob Clark
  2023-03-13  7:13   ` Christian König
  2023-03-11 17:35 ` [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit Rob Clark
  1 sibling, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-03-11 17:35 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Rob Clark, Sumit Semwal,
	Gustavo Padovan, Christian König,
	open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

From: Rob Clark <robdclark@chromium.org>

Add a way to initialize a fence without touching the refcount.  This is
useful, for example, if the fence is embedded in a drm_sched_job.  In
this case the refcount will be initialized before the job is queued.
But the seqno of the hw_fence is not known until job_run().

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/dma-buf/dma-fence.c | 43 ++++++++++++++++++++++++++++---------
 include/linux/dma-fence.h   |  2 ++
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 74e36f6d05b0..97c05a465cb4 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -989,28 +989,27 @@ void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
 EXPORT_SYMBOL(dma_fence_describe);
 
 /**
- * dma_fence_init - Initialize a custom fence.
+ * dma_fence_init_noref - Initialize a custom fence without initializing refcount.
  * @fence: the fence to initialize
  * @ops: the dma_fence_ops for operations on this fence
  * @lock: the irqsafe spinlock to use for locking this fence
  * @context: the execution context this fence is run on
  * @seqno: a linear increasing sequence number for this context
  *
- * Initializes an allocated fence, the caller doesn't have to keep its
- * refcount after committing with this fence, but it will need to hold a
- * refcount again if &dma_fence_ops.enable_signaling gets called.
- *
- * context and seqno are used for easy comparison between fences, allowing
- * to check which fence is later by simply using dma_fence_later().
+ * Like &dma_fence_init but does not initialize the refcount.  Suitable
+ * for cases where the fence is embedded in another struct which has it's
+ * refcount initialized before the fence is initialized.  Such as embedding
+ * in a &drm_sched_job, where the job is created before knowing the seqno
+ * of the hw_fence.
  */
 void
-dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
-	       spinlock_t *lock, u64 context, u64 seqno)
+dma_fence_init_noref(struct dma_fence *fence, const struct dma_fence_ops *ops,
+		     spinlock_t *lock, u64 context, u64 seqno)
 {
 	BUG_ON(!lock);
 	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
+	BUG_ON(!kref_read(&fence->refcount));
 
-	kref_init(&fence->refcount);
 	fence->ops = ops;
 	INIT_LIST_HEAD(&fence->cb_list);
 	fence->lock = lock;
@@ -1021,4 +1020,28 @@ dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 
 	trace_dma_fence_init(fence);
 }
+EXPORT_SYMBOL(dma_fence_init_noref);
+
+/**
+ * dma_fence_init - Initialize a custom fence.
+ * @fence: the fence to initialize
+ * @ops: the dma_fence_ops for operations on this fence
+ * @lock: the irqsafe spinlock to use for locking this fence
+ * @context: the execution context this fence is run on
+ * @seqno: a linear increasing sequence number for this context
+ *
+ * Initializes an allocated fence, the caller doesn't have to keep its
+ * refcount after committing with this fence, but it will need to hold a
+ * refcount again if &dma_fence_ops.enable_signaling gets called.
+ *
+ * context and seqno are used for easy comparison between fences, allowing
+ * to check which fence is later by simply using dma_fence_later().
+ */
+void
+dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
+	       spinlock_t *lock, u64 context, u64 seqno)
+{
+	kref_init(&fence->refcount);
+	dma_fence_init_noref(fence, ops, lock, context, seqno);
+}
 EXPORT_SYMBOL(dma_fence_init);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index d54b595a0fe0..f617c78a2e0a 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -279,6 +279,8 @@ struct dma_fence_ops {
 	void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
 };
 
+void dma_fence_init_noref(struct dma_fence *fence, const struct dma_fence_ops *ops,
+			  spinlock_t *lock, u64 context, u64 seqno);
 void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
 		    spinlock_t *lock, u64 context, u64 seqno);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit
  2023-03-11 17:35 [PATCH 0/2] drm/msm: Get rid of fence allocation in job_run() Rob Clark
  2023-03-11 17:35 ` [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref() Rob Clark
@ 2023-03-11 17:35 ` Rob Clark
  2023-03-13  7:19   ` Christian König
  1 sibling, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-03-11 17:35 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Rob Clark, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, David Airlie, Daniel Vetter,
	Sumit Semwal, Christian König, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

From: Rob Clark <robdclark@chromium.org>

Avoid allocating memory in job_run() by embedding the fence in the
submit object.  Since msm gpu fences are always 1:1 with msm_gem_submit
we can just use the fence's refcnt to track the submit.  And since we
can get the fence ctx from the submit we can just drop the msm_fence
struct altogether.  This uses the new dma_fence_init_noref() to deal
with the fact that the fence's refcnt is initialized when the submit is
created, long before job_run().

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
Note that this applies on top of https://patchwork.freedesktop.org/series/93035/
out of convenience for myself, but I can re-work it to go before
depending on the order that things land.

 drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
 drivers/gpu/drm/msm/msm_fence.h      |  2 +-
 drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
 drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
 drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
 drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
 6 files changed, 31 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 51b461f32103..51f9f1f0cb66 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -103,14 +103,9 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
 	spin_unlock_irqrestore(&fctx->spinlock, flags);
 }
 
-struct msm_fence {
-	struct dma_fence base;
-	struct msm_fence_context *fctx;
-};
-
-static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
+static inline struct msm_gem_submit *fence_to_submit(struct dma_fence *fence)
 {
-	return container_of(fence, struct msm_fence, base);
+	return container_of(fence, struct msm_gem_submit, hw_fence);
 }
 
 static const char *msm_fence_get_driver_name(struct dma_fence *fence)
@@ -120,20 +115,20 @@ static const char *msm_fence_get_driver_name(struct dma_fence *fence)
 
 static const char *msm_fence_get_timeline_name(struct dma_fence *fence)
 {
-	struct msm_fence *f = to_msm_fence(fence);
-	return f->fctx->name;
+	struct msm_gem_submit *submit = fence_to_submit(fence);
+	return submit->ring->fctx->name;
 }
 
 static bool msm_fence_signaled(struct dma_fence *fence)
 {
-	struct msm_fence *f = to_msm_fence(fence);
-	return msm_fence_completed(f->fctx, f->base.seqno);
+	struct msm_gem_submit *submit = fence_to_submit(fence);
+	return msm_fence_completed(submit->ring->fctx, fence->seqno);
 }
 
 static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 {
-	struct msm_fence *f = to_msm_fence(fence);
-	struct msm_fence_context *fctx = f->fctx;
+	struct msm_gem_submit *submit = fence_to_submit(fence);
+	struct msm_fence_context *fctx = submit->ring->fctx;
 	unsigned long flags;
 	ktime_t now;
 
@@ -165,26 +160,22 @@ static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 	spin_unlock_irqrestore(&fctx->spinlock, flags);
 }
 
+static void msm_fence_release(struct dma_fence *fence)
+{
+	__msm_gem_submit_destroy(fence_to_submit(fence));
+}
+
 static const struct dma_fence_ops msm_fence_ops = {
 	.get_driver_name = msm_fence_get_driver_name,
 	.get_timeline_name = msm_fence_get_timeline_name,
 	.signaled = msm_fence_signaled,
 	.set_deadline = msm_fence_set_deadline,
+	.release = msm_fence_release,
 };
 
-struct dma_fence *
-msm_fence_alloc(struct msm_fence_context *fctx)
+void
+msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f)
 {
-	struct msm_fence *f;
-
-	f = kzalloc(sizeof(*f), GFP_KERNEL);
-	if (!f)
-		return ERR_PTR(-ENOMEM);
-
-	f->fctx = fctx;
-
-	dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
-		       fctx->context, ++fctx->last_fence);
-
-	return &f->base;
+	dma_fence_init_noref(f, &msm_fence_ops, &fctx->spinlock,
+			     fctx->context, ++fctx->last_fence);
 }
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
index cdaebfb94f5c..8fca37e9773b 100644
--- a/drivers/gpu/drm/msm/msm_fence.h
+++ b/drivers/gpu/drm/msm/msm_fence.h
@@ -81,7 +81,7 @@ void msm_fence_context_free(struct msm_fence_context *fctx);
 bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence);
 void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
 
-struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
+void msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f);
 
 static inline bool
 fence_before(uint32_t a, uint32_t b)
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index c4844cf3a585..e06afed99d5b 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -259,10 +259,10 @@ struct msm_gem_submit {
 	struct ww_acquire_ctx ticket;
 	uint32_t seqno;		/* Sequence number of the submit on the ring */
 
-	/* Hw fence, which is created when the scheduler executes the job, and
+	/* Hw fence, which is initialized when the scheduler executes the job, and
 	 * is signaled when the hw finishes (via seqno write from cmdstream)
 	 */
-	struct dma_fence *hw_fence;
+	struct dma_fence hw_fence;
 
 	/* Userspace visible fence, which is signaled by the scheduler after
 	 * the hw_fence is signaled.
@@ -309,16 +309,16 @@ static inline struct msm_gem_submit *to_msm_submit(struct drm_sched_job *job)
 	return container_of(job, struct msm_gem_submit, base);
 }
 
-void __msm_gem_submit_destroy(struct kref *kref);
+void __msm_gem_submit_destroy(struct msm_gem_submit *submit);
 
 static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
 {
-	kref_get(&submit->ref);
+	dma_fence_get(&submit->hw_fence);
 }
 
 static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
 {
-	kref_put(&submit->ref, __msm_gem_submit_destroy);
+	dma_fence_put(&submit->hw_fence);
 }
 
 void msm_submit_retire(struct msm_gem_submit *submit);
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index be4bf77103cd..522c8c82e827 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -47,7 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return ERR_PTR(ret);
 	}
 
-	kref_init(&submit->ref);
+	kref_init(&submit->hw_fence.refcount);
 	submit->dev = dev;
 	submit->aspace = queue->ctx->aspace;
 	submit->gpu = gpu;
@@ -65,10 +65,9 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 	return submit;
 }
 
-void __msm_gem_submit_destroy(struct kref *kref)
+/* Called when the hw_fence is destroyed: */
+void __msm_gem_submit_destroy(struct msm_gem_submit *submit)
 {
-	struct msm_gem_submit *submit =
-			container_of(kref, struct msm_gem_submit, ref);
 	unsigned i;
 
 	if (submit->fence_id) {
@@ -78,7 +77,6 @@ void __msm_gem_submit_destroy(struct kref *kref)
 	}
 
 	dma_fence_put(submit->user_fence);
-	dma_fence_put(submit->hw_fence);
 
 	put_pid(submit->pid);
 	msm_submitqueue_put(submit->queue);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 380249500325..a82d11dd5fcf 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -716,7 +716,7 @@ static void retire_submits(struct msm_gpu *gpu)
 			 * been signalled, then later submits are not signalled
 			 * either, so we are also done.
 			 */
-			if (submit && dma_fence_is_signaled(submit->hw_fence)) {
+			if (submit && dma_fence_is_signaled(&submit->hw_fence)) {
 				retire_submit(gpu, ring, submit);
 			} else {
 				break;
@@ -760,7 +760,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 
 	msm_gpu_hw_init(gpu);
 
-	submit->seqno = submit->hw_fence->seqno;
+	submit->seqno = submit->hw_fence.seqno;
 
 	msm_rd_dump_submit(priv->rd, submit, NULL);
 
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 57a8e9564540..5c54befa2427 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -18,7 +18,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 	struct msm_gpu *gpu = submit->gpu;
 	int i;
 
-	submit->hw_fence = msm_fence_alloc(fctx);
+	msm_fence_init(fctx, &submit->hw_fence);
 
 	for (i = 0; i < submit->nr_bos; i++) {
 		struct drm_gem_object *obj = &submit->bos[i].obj->base;
@@ -37,7 +37,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 
 	mutex_unlock(&gpu->lock);
 
-	return dma_fence_get(submit->hw_fence);
+	return dma_fence_get(&submit->hw_fence);
 }
 
 static void msm_job_free(struct drm_sched_job *job)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref()
  2023-03-11 17:35 ` [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref() Rob Clark
@ 2023-03-13  7:13   ` Christian König
  2023-03-13  7:31     ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2023-03-13  7:13 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Rob Clark, Sumit Semwal,
	Gustavo Padovan, open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

Am 11.03.23 um 18:35 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Add a way to initialize a fence without touching the refcount.  This is
> useful, for example, if the fence is embedded in a drm_sched_job.  In
> this case the refcount will be initialized before the job is queued.
> But the seqno of the hw_fence is not known until job_run().
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>

Well that approach won't work. The fence can only be initialized in the 
job_run() callback because only then the sequence number can be determined.

Regards,
Christian.

> ---
>   drivers/dma-buf/dma-fence.c | 43 ++++++++++++++++++++++++++++---------
>   include/linux/dma-fence.h   |  2 ++
>   2 files changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 74e36f6d05b0..97c05a465cb4 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -989,28 +989,27 @@ void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
>   EXPORT_SYMBOL(dma_fence_describe);
>   
>   /**
> - * dma_fence_init - Initialize a custom fence.
> + * dma_fence_init_noref - Initialize a custom fence without initializing refcount.
>    * @fence: the fence to initialize
>    * @ops: the dma_fence_ops for operations on this fence
>    * @lock: the irqsafe spinlock to use for locking this fence
>    * @context: the execution context this fence is run on
>    * @seqno: a linear increasing sequence number for this context
>    *
> - * Initializes an allocated fence, the caller doesn't have to keep its
> - * refcount after committing with this fence, but it will need to hold a
> - * refcount again if &dma_fence_ops.enable_signaling gets called.
> - *
> - * context and seqno are used for easy comparison between fences, allowing
> - * to check which fence is later by simply using dma_fence_later().
> + * Like &dma_fence_init but does not initialize the refcount.  Suitable
> + * for cases where the fence is embedded in another struct which has it's
> + * refcount initialized before the fence is initialized.  Such as embedding
> + * in a &drm_sched_job, where the job is created before knowing the seqno
> + * of the hw_fence.
>    */
>   void
> -dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> -	       spinlock_t *lock, u64 context, u64 seqno)
> +dma_fence_init_noref(struct dma_fence *fence, const struct dma_fence_ops *ops,
> +		     spinlock_t *lock, u64 context, u64 seqno)
>   {
>   	BUG_ON(!lock);
>   	BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
> +	BUG_ON(!kref_read(&fence->refcount));
>   
> -	kref_init(&fence->refcount);
>   	fence->ops = ops;
>   	INIT_LIST_HEAD(&fence->cb_list);
>   	fence->lock = lock;
> @@ -1021,4 +1020,28 @@ dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   
>   	trace_dma_fence_init(fence);
>   }
> +EXPORT_SYMBOL(dma_fence_init_noref);
> +
> +/**
> + * dma_fence_init - Initialize a custom fence.
> + * @fence: the fence to initialize
> + * @ops: the dma_fence_ops for operations on this fence
> + * @lock: the irqsafe spinlock to use for locking this fence
> + * @context: the execution context this fence is run on
> + * @seqno: a linear increasing sequence number for this context
> + *
> + * Initializes an allocated fence, the caller doesn't have to keep its
> + * refcount after committing with this fence, but it will need to hold a
> + * refcount again if &dma_fence_ops.enable_signaling gets called.
> + *
> + * context and seqno are used for easy comparison between fences, allowing
> + * to check which fence is later by simply using dma_fence_later().
> + */
> +void
> +dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> +	       spinlock_t *lock, u64 context, u64 seqno)
> +{
> +	kref_init(&fence->refcount);
> +	dma_fence_init_noref(fence, ops, lock, context, seqno);
> +}
>   EXPORT_SYMBOL(dma_fence_init);
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index d54b595a0fe0..f617c78a2e0a 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -279,6 +279,8 @@ struct dma_fence_ops {
>   	void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>   };
>   
> +void dma_fence_init_noref(struct dma_fence *fence, const struct dma_fence_ops *ops,
> +			  spinlock_t *lock, u64 context, u64 seqno);
>   void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
>   		    spinlock_t *lock, u64 context, u64 seqno);
>   


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit
  2023-03-11 17:35 ` [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit Rob Clark
@ 2023-03-13  7:19   ` Christian König
  2023-03-13 14:45     ` Rob Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2023-03-13  7:19 UTC (permalink / raw)
  To: Rob Clark, dri-devel, Tuikov, Luben
  Cc: linux-arm-msm, freedreno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, David Airlie, Daniel Vetter,
	Sumit Semwal, open list, open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

Am 11.03.23 um 18:35 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Avoid allocating memory in job_run() by embedding the fence in the
> submit object.  Since msm gpu fences are always 1:1 with msm_gem_submit
> we can just use the fence's refcnt to track the submit.  And since we
> can get the fence ctx from the submit we can just drop the msm_fence
> struct altogether.  This uses the new dma_fence_init_noref() to deal
> with the fact that the fence's refcnt is initialized when the submit is
> created, long before job_run().

Well this is a very very bad idea, we made the same mistake with amdgpu 
as well.

It's true that you should not have any memory allocation in your run_job 
callback, but you could also just allocate the hw fence during job 
creation and initializing it later on.

I've suggested to embed the fence into the job for amdgpu because some 
people insisted of re-submitting jobs during timeout and GPU reset. This 
turned into a nightmare with tons of bug fixes on top of bug fixes on 
top of bug fixes because it messes up the job and fence lifetime as 
defined by the DRM scheduler and DMA-buf framework.

Luben is currently working on cleaning all this up.

Regards,
Christian.

>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> Note that this applies on top of https://patchwork.freedesktop.org/series/93035/
> out of convenience for myself, but I can re-work it to go before
> depending on the order that things land.
>
>   drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
>   drivers/gpu/drm/msm/msm_fence.h      |  2 +-
>   drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
>   drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
>   drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
>   drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
>   6 files changed, 31 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> index 51b461f32103..51f9f1f0cb66 100644
> --- a/drivers/gpu/drm/msm/msm_fence.c
> +++ b/drivers/gpu/drm/msm/msm_fence.c
> @@ -103,14 +103,9 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
>   	spin_unlock_irqrestore(&fctx->spinlock, flags);
>   }
>   
> -struct msm_fence {
> -	struct dma_fence base;
> -	struct msm_fence_context *fctx;
> -};
> -
> -static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
> +static inline struct msm_gem_submit *fence_to_submit(struct dma_fence *fence)
>   {
> -	return container_of(fence, struct msm_fence, base);
> +	return container_of(fence, struct msm_gem_submit, hw_fence);
>   }
>   
>   static const char *msm_fence_get_driver_name(struct dma_fence *fence)
> @@ -120,20 +115,20 @@ static const char *msm_fence_get_driver_name(struct dma_fence *fence)
>   
>   static const char *msm_fence_get_timeline_name(struct dma_fence *fence)
>   {
> -	struct msm_fence *f = to_msm_fence(fence);
> -	return f->fctx->name;
> +	struct msm_gem_submit *submit = fence_to_submit(fence);
> +	return submit->ring->fctx->name;
>   }
>   
>   static bool msm_fence_signaled(struct dma_fence *fence)
>   {
> -	struct msm_fence *f = to_msm_fence(fence);
> -	return msm_fence_completed(f->fctx, f->base.seqno);
> +	struct msm_gem_submit *submit = fence_to_submit(fence);
> +	return msm_fence_completed(submit->ring->fctx, fence->seqno);
>   }
>   
>   static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>   {
> -	struct msm_fence *f = to_msm_fence(fence);
> -	struct msm_fence_context *fctx = f->fctx;
> +	struct msm_gem_submit *submit = fence_to_submit(fence);
> +	struct msm_fence_context *fctx = submit->ring->fctx;
>   	unsigned long flags;
>   	ktime_t now;
>   
> @@ -165,26 +160,22 @@ static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>   	spin_unlock_irqrestore(&fctx->spinlock, flags);
>   }
>   
> +static void msm_fence_release(struct dma_fence *fence)
> +{
> +	__msm_gem_submit_destroy(fence_to_submit(fence));
> +}
> +
>   static const struct dma_fence_ops msm_fence_ops = {
>   	.get_driver_name = msm_fence_get_driver_name,
>   	.get_timeline_name = msm_fence_get_timeline_name,
>   	.signaled = msm_fence_signaled,
>   	.set_deadline = msm_fence_set_deadline,
> +	.release = msm_fence_release,
>   };
>   
> -struct dma_fence *
> -msm_fence_alloc(struct msm_fence_context *fctx)
> +void
> +msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f)
>   {
> -	struct msm_fence *f;
> -
> -	f = kzalloc(sizeof(*f), GFP_KERNEL);
> -	if (!f)
> -		return ERR_PTR(-ENOMEM);
> -
> -	f->fctx = fctx;
> -
> -	dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
> -		       fctx->context, ++fctx->last_fence);
> -
> -	return &f->base;
> +	dma_fence_init_noref(f, &msm_fence_ops, &fctx->spinlock,
> +			     fctx->context, ++fctx->last_fence);
>   }
> diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
> index cdaebfb94f5c..8fca37e9773b 100644
> --- a/drivers/gpu/drm/msm/msm_fence.h
> +++ b/drivers/gpu/drm/msm/msm_fence.h
> @@ -81,7 +81,7 @@ void msm_fence_context_free(struct msm_fence_context *fctx);
>   bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence);
>   void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
>   
> -struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
> +void msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f);
>   
>   static inline bool
>   fence_before(uint32_t a, uint32_t b)
> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> index c4844cf3a585..e06afed99d5b 100644
> --- a/drivers/gpu/drm/msm/msm_gem.h
> +++ b/drivers/gpu/drm/msm/msm_gem.h
> @@ -259,10 +259,10 @@ struct msm_gem_submit {
>   	struct ww_acquire_ctx ticket;
>   	uint32_t seqno;		/* Sequence number of the submit on the ring */
>   
> -	/* Hw fence, which is created when the scheduler executes the job, and
> +	/* Hw fence, which is initialized when the scheduler executes the job, and
>   	 * is signaled when the hw finishes (via seqno write from cmdstream)
>   	 */
> -	struct dma_fence *hw_fence;
> +	struct dma_fence hw_fence;
>   
>   	/* Userspace visible fence, which is signaled by the scheduler after
>   	 * the hw_fence is signaled.
> @@ -309,16 +309,16 @@ static inline struct msm_gem_submit *to_msm_submit(struct drm_sched_job *job)
>   	return container_of(job, struct msm_gem_submit, base);
>   }
>   
> -void __msm_gem_submit_destroy(struct kref *kref);
> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit);
>   
>   static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
>   {
> -	kref_get(&submit->ref);
> +	dma_fence_get(&submit->hw_fence);
>   }
>   
>   static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
>   {
> -	kref_put(&submit->ref, __msm_gem_submit_destroy);
> +	dma_fence_put(&submit->hw_fence);
>   }
>   
>   void msm_submit_retire(struct msm_gem_submit *submit);
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index be4bf77103cd..522c8c82e827 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -47,7 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>   		return ERR_PTR(ret);
>   	}
>   
> -	kref_init(&submit->ref);
> +	kref_init(&submit->hw_fence.refcount);
>   	submit->dev = dev;
>   	submit->aspace = queue->ctx->aspace;
>   	submit->gpu = gpu;
> @@ -65,10 +65,9 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>   	return submit;
>   }
>   
> -void __msm_gem_submit_destroy(struct kref *kref)
> +/* Called when the hw_fence is destroyed: */
> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit)
>   {
> -	struct msm_gem_submit *submit =
> -			container_of(kref, struct msm_gem_submit, ref);
>   	unsigned i;
>   
>   	if (submit->fence_id) {
> @@ -78,7 +77,6 @@ void __msm_gem_submit_destroy(struct kref *kref)
>   	}
>   
>   	dma_fence_put(submit->user_fence);
> -	dma_fence_put(submit->hw_fence);
>   
>   	put_pid(submit->pid);
>   	msm_submitqueue_put(submit->queue);
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index 380249500325..a82d11dd5fcf 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -716,7 +716,7 @@ static void retire_submits(struct msm_gpu *gpu)
>   			 * been signalled, then later submits are not signalled
>   			 * either, so we are also done.
>   			 */
> -			if (submit && dma_fence_is_signaled(submit->hw_fence)) {
> +			if (submit && dma_fence_is_signaled(&submit->hw_fence)) {
>   				retire_submit(gpu, ring, submit);
>   			} else {
>   				break;
> @@ -760,7 +760,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>   
>   	msm_gpu_hw_init(gpu);
>   
> -	submit->seqno = submit->hw_fence->seqno;
> +	submit->seqno = submit->hw_fence.seqno;
>   
>   	msm_rd_dump_submit(priv->rd, submit, NULL);
>   
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index 57a8e9564540..5c54befa2427 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -18,7 +18,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
>   	struct msm_gpu *gpu = submit->gpu;
>   	int i;
>   
> -	submit->hw_fence = msm_fence_alloc(fctx);
> +	msm_fence_init(fctx, &submit->hw_fence);
>   
>   	for (i = 0; i < submit->nr_bos; i++) {
>   		struct drm_gem_object *obj = &submit->bos[i].obj->base;
> @@ -37,7 +37,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
>   
>   	mutex_unlock(&gpu->lock);
>   
> -	return dma_fence_get(submit->hw_fence);
> +	return dma_fence_get(&submit->hw_fence);
>   }
>   
>   static void msm_job_free(struct drm_sched_job *job)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref()
  2023-03-13  7:13   ` Christian König
@ 2023-03-13  7:31     ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2023-03-13  7:31 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Rob Clark, Sumit Semwal,
	Gustavo Padovan, open list:SYNC FILE FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, open list

Am 13.03.23 um 08:13 schrieb Christian König:
> Am 11.03.23 um 18:35 schrieb Rob Clark:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Add a way to initialize a fence without touching the refcount. This is
>> useful, for example, if the fence is embedded in a drm_sched_job.  In
>> this case the refcount will be initialized before the job is queued.
>> But the seqno of the hw_fence is not known until job_run().
>>
>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>
> Well that approach won't work. The fence can only be initialized in 
> the job_run() callback because only then the sequence number can be 
> determined.

Ah, wait a second! After reading the msm code I realized you are going 
to use this exactly the other way around that I think you use it.

In this case it would work, but that really needs better documentation. 
And I'm pretty sure it's not a good idea for msm, but let's discuss that 
on the other patch.

Regards,
Christian.

>
> Regards,
> Christian.
>
>> ---
>>   drivers/dma-buf/dma-fence.c | 43 ++++++++++++++++++++++++++++---------
>>   include/linux/dma-fence.h   |  2 ++
>>   2 files changed, 35 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 74e36f6d05b0..97c05a465cb4 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -989,28 +989,27 @@ void dma_fence_describe(struct dma_fence 
>> *fence, struct seq_file *seq)
>>   EXPORT_SYMBOL(dma_fence_describe);
>>     /**
>> - * dma_fence_init - Initialize a custom fence.
>> + * dma_fence_init_noref - Initialize a custom fence without 
>> initializing refcount.
>>    * @fence: the fence to initialize
>>    * @ops: the dma_fence_ops for operations on this fence
>>    * @lock: the irqsafe spinlock to use for locking this fence
>>    * @context: the execution context this fence is run on
>>    * @seqno: a linear increasing sequence number for this context
>>    *
>> - * Initializes an allocated fence, the caller doesn't have to keep its
>> - * refcount after committing with this fence, but it will need to 
>> hold a
>> - * refcount again if &dma_fence_ops.enable_signaling gets called.
>> - *
>> - * context and seqno are used for easy comparison between fences, 
>> allowing
>> - * to check which fence is later by simply using dma_fence_later().
>> + * Like &dma_fence_init but does not initialize the refcount.  Suitable
>> + * for cases where the fence is embedded in another struct which has 
>> it's
>> + * refcount initialized before the fence is initialized.  Such as 
>> embedding
>> + * in a &drm_sched_job, where the job is created before knowing the 
>> seqno
>> + * of the hw_fence.
>>    */
>>   void
>> -dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops 
>> *ops,
>> -           spinlock_t *lock, u64 context, u64 seqno)
>> +dma_fence_init_noref(struct dma_fence *fence, const struct 
>> dma_fence_ops *ops,
>> +             spinlock_t *lock, u64 context, u64 seqno)
>>   {
>>       BUG_ON(!lock);
>>       BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
>> +    BUG_ON(!kref_read(&fence->refcount));
>>   -    kref_init(&fence->refcount);
>>       fence->ops = ops;
>>       INIT_LIST_HEAD(&fence->cb_list);
>>       fence->lock = lock;
>> @@ -1021,4 +1020,28 @@ dma_fence_init(struct dma_fence *fence, const 
>> struct dma_fence_ops *ops,
>>         trace_dma_fence_init(fence);
>>   }
>> +EXPORT_SYMBOL(dma_fence_init_noref);
>> +
>> +/**
>> + * dma_fence_init - Initialize a custom fence.
>> + * @fence: the fence to initialize
>> + * @ops: the dma_fence_ops for operations on this fence
>> + * @lock: the irqsafe spinlock to use for locking this fence
>> + * @context: the execution context this fence is run on
>> + * @seqno: a linear increasing sequence number for this context
>> + *
>> + * Initializes an allocated fence, the caller doesn't have to keep its
>> + * refcount after committing with this fence, but it will need to 
>> hold a
>> + * refcount again if &dma_fence_ops.enable_signaling gets called.
>> + *
>> + * context and seqno are used for easy comparison between fences, 
>> allowing
>> + * to check which fence is later by simply using dma_fence_later().
>> + */
>> +void
>> +dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops 
>> *ops,
>> +           spinlock_t *lock, u64 context, u64 seqno)
>> +{
>> +    kref_init(&fence->refcount);
>> +    dma_fence_init_noref(fence, ops, lock, context, seqno);
>> +}
>>   EXPORT_SYMBOL(dma_fence_init);
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index d54b595a0fe0..f617c78a2e0a 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -279,6 +279,8 @@ struct dma_fence_ops {
>>       void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>>   };
>>   +void dma_fence_init_noref(struct dma_fence *fence, const struct 
>> dma_fence_ops *ops,
>> +              spinlock_t *lock, u64 context, u64 seqno);
>>   void dma_fence_init(struct dma_fence *fence, const struct 
>> dma_fence_ops *ops,
>>               spinlock_t *lock, u64 context, u64 seqno);
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit
  2023-03-13  7:19   ` Christian König
@ 2023-03-13 14:45     ` Rob Clark
  2023-03-13 16:15       ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-03-13 14:45 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, Tuikov, Luben, linux-arm-msm, freedreno, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, Sumit Semwal, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Mon, Mar 13, 2023 at 12:19 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 11.03.23 um 18:35 schrieb Rob Clark:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Avoid allocating memory in job_run() by embedding the fence in the
> > submit object.  Since msm gpu fences are always 1:1 with msm_gem_submit
> > we can just use the fence's refcnt to track the submit.  And since we
> > can get the fence ctx from the submit we can just drop the msm_fence
> > struct altogether.  This uses the new dma_fence_init_noref() to deal
> > with the fact that the fence's refcnt is initialized when the submit is
> > created, long before job_run().
>
> Well this is a very very bad idea, we made the same mistake with amdgpu
> as well.
>
> It's true that you should not have any memory allocation in your run_job
> callback, but you could also just allocate the hw fence during job
> creation and initializing it later on.
>
> I've suggested to embed the fence into the job for amdgpu because some
> people insisted of re-submitting jobs during timeout and GPU reset. This
> turned into a nightmare with tons of bug fixes on top of bug fixes on
> top of bug fixes because it messes up the job and fence lifetime as
> defined by the DRM scheduler and DMA-buf framework.
>
> Luben is currently working on cleaning all this up.

This actually shouldn't be a problem with msm, as the fence doesn't
change if there is a gpu reset.  We simply signal the fence for the
offending job, reset the GPU, and re-play the remaining in-flight jobs
(ie. things that already had their job_run() called) with the original
fences.  (We don't use gpu sched's reset/timeout handling.. when I
migrated to gpu sched I kept our existing hangcheck/recovery
mechanism.)

BR,
-R

> Regards,
> Christian.
>
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > Note that this applies on top of https://patchwork.freedesktop.org/series/93035/
> > out of convenience for myself, but I can re-work it to go before
> > depending on the order that things land.
> >
> >   drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
> >   drivers/gpu/drm/msm/msm_fence.h      |  2 +-
> >   drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
> >   drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
> >   drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
> >   drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
> >   6 files changed, 31 insertions(+), 42 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > index 51b461f32103..51f9f1f0cb66 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -103,14 +103,9 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> >       spin_unlock_irqrestore(&fctx->spinlock, flags);
> >   }
> >
> > -struct msm_fence {
> > -     struct dma_fence base;
> > -     struct msm_fence_context *fctx;
> > -};
> > -
> > -static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
> > +static inline struct msm_gem_submit *fence_to_submit(struct dma_fence *fence)
> >   {
> > -     return container_of(fence, struct msm_fence, base);
> > +     return container_of(fence, struct msm_gem_submit, hw_fence);
> >   }
> >
> >   static const char *msm_fence_get_driver_name(struct dma_fence *fence)
> > @@ -120,20 +115,20 @@ static const char *msm_fence_get_driver_name(struct dma_fence *fence)
> >
> >   static const char *msm_fence_get_timeline_name(struct dma_fence *fence)
> >   {
> > -     struct msm_fence *f = to_msm_fence(fence);
> > -     return f->fctx->name;
> > +     struct msm_gem_submit *submit = fence_to_submit(fence);
> > +     return submit->ring->fctx->name;
> >   }
> >
> >   static bool msm_fence_signaled(struct dma_fence *fence)
> >   {
> > -     struct msm_fence *f = to_msm_fence(fence);
> > -     return msm_fence_completed(f->fctx, f->base.seqno);
> > +     struct msm_gem_submit *submit = fence_to_submit(fence);
> > +     return msm_fence_completed(submit->ring->fctx, fence->seqno);
> >   }
> >
> >   static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> >   {
> > -     struct msm_fence *f = to_msm_fence(fence);
> > -     struct msm_fence_context *fctx = f->fctx;
> > +     struct msm_gem_submit *submit = fence_to_submit(fence);
> > +     struct msm_fence_context *fctx = submit->ring->fctx;
> >       unsigned long flags;
> >       ktime_t now;
> >
> > @@ -165,26 +160,22 @@ static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> >       spin_unlock_irqrestore(&fctx->spinlock, flags);
> >   }
> >
> > +static void msm_fence_release(struct dma_fence *fence)
> > +{
> > +     __msm_gem_submit_destroy(fence_to_submit(fence));
> > +}
> > +
> >   static const struct dma_fence_ops msm_fence_ops = {
> >       .get_driver_name = msm_fence_get_driver_name,
> >       .get_timeline_name = msm_fence_get_timeline_name,
> >       .signaled = msm_fence_signaled,
> >       .set_deadline = msm_fence_set_deadline,
> > +     .release = msm_fence_release,
> >   };
> >
> > -struct dma_fence *
> > -msm_fence_alloc(struct msm_fence_context *fctx)
> > +void
> > +msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f)
> >   {
> > -     struct msm_fence *f;
> > -
> > -     f = kzalloc(sizeof(*f), GFP_KERNEL);
> > -     if (!f)
> > -             return ERR_PTR(-ENOMEM);
> > -
> > -     f->fctx = fctx;
> > -
> > -     dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
> > -                    fctx->context, ++fctx->last_fence);
> > -
> > -     return &f->base;
> > +     dma_fence_init_noref(f, &msm_fence_ops, &fctx->spinlock,
> > +                          fctx->context, ++fctx->last_fence);
> >   }
> > diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
> > index cdaebfb94f5c..8fca37e9773b 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.h
> > +++ b/drivers/gpu/drm/msm/msm_fence.h
> > @@ -81,7 +81,7 @@ void msm_fence_context_free(struct msm_fence_context *fctx);
> >   bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence);
> >   void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
> >
> > -struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
> > +void msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f);
> >
> >   static inline bool
> >   fence_before(uint32_t a, uint32_t b)
> > diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> > index c4844cf3a585..e06afed99d5b 100644
> > --- a/drivers/gpu/drm/msm/msm_gem.h
> > +++ b/drivers/gpu/drm/msm/msm_gem.h
> > @@ -259,10 +259,10 @@ struct msm_gem_submit {
> >       struct ww_acquire_ctx ticket;
> >       uint32_t seqno;         /* Sequence number of the submit on the ring */
> >
> > -     /* Hw fence, which is created when the scheduler executes the job, and
> > +     /* Hw fence, which is initialized when the scheduler executes the job, and
> >        * is signaled when the hw finishes (via seqno write from cmdstream)
> >        */
> > -     struct dma_fence *hw_fence;
> > +     struct dma_fence hw_fence;
> >
> >       /* Userspace visible fence, which is signaled by the scheduler after
> >        * the hw_fence is signaled.
> > @@ -309,16 +309,16 @@ static inline struct msm_gem_submit *to_msm_submit(struct drm_sched_job *job)
> >       return container_of(job, struct msm_gem_submit, base);
> >   }
> >
> > -void __msm_gem_submit_destroy(struct kref *kref);
> > +void __msm_gem_submit_destroy(struct msm_gem_submit *submit);
> >
> >   static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
> >   {
> > -     kref_get(&submit->ref);
> > +     dma_fence_get(&submit->hw_fence);
> >   }
> >
> >   static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
> >   {
> > -     kref_put(&submit->ref, __msm_gem_submit_destroy);
> > +     dma_fence_put(&submit->hw_fence);
> >   }
> >
> >   void msm_submit_retire(struct msm_gem_submit *submit);
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index be4bf77103cd..522c8c82e827 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -47,7 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
> >               return ERR_PTR(ret);
> >       }
> >
> > -     kref_init(&submit->ref);
> > +     kref_init(&submit->hw_fence.refcount);
> >       submit->dev = dev;
> >       submit->aspace = queue->ctx->aspace;
> >       submit->gpu = gpu;
> > @@ -65,10 +65,9 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
> >       return submit;
> >   }
> >
> > -void __msm_gem_submit_destroy(struct kref *kref)
> > +/* Called when the hw_fence is destroyed: */
> > +void __msm_gem_submit_destroy(struct msm_gem_submit *submit)
> >   {
> > -     struct msm_gem_submit *submit =
> > -                     container_of(kref, struct msm_gem_submit, ref);
> >       unsigned i;
> >
> >       if (submit->fence_id) {
> > @@ -78,7 +77,6 @@ void __msm_gem_submit_destroy(struct kref *kref)
> >       }
> >
> >       dma_fence_put(submit->user_fence);
> > -     dma_fence_put(submit->hw_fence);
> >
> >       put_pid(submit->pid);
> >       msm_submitqueue_put(submit->queue);
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index 380249500325..a82d11dd5fcf 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -716,7 +716,7 @@ static void retire_submits(struct msm_gpu *gpu)
> >                        * been signalled, then later submits are not signalled
> >                        * either, so we are also done.
> >                        */
> > -                     if (submit && dma_fence_is_signaled(submit->hw_fence)) {
> > +                     if (submit && dma_fence_is_signaled(&submit->hw_fence)) {
> >                               retire_submit(gpu, ring, submit);
> >                       } else {
> >                               break;
> > @@ -760,7 +760,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >
> >       msm_gpu_hw_init(gpu);
> >
> > -     submit->seqno = submit->hw_fence->seqno;
> > +     submit->seqno = submit->hw_fence.seqno;
> >
> >       msm_rd_dump_submit(priv->rd, submit, NULL);
> >
> > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > index 57a8e9564540..5c54befa2427 100644
> > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > @@ -18,7 +18,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
> >       struct msm_gpu *gpu = submit->gpu;
> >       int i;
> >
> > -     submit->hw_fence = msm_fence_alloc(fctx);
> > +     msm_fence_init(fctx, &submit->hw_fence);
> >
> >       for (i = 0; i < submit->nr_bos; i++) {
> >               struct drm_gem_object *obj = &submit->bos[i].obj->base;
> > @@ -37,7 +37,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
> >
> >       mutex_unlock(&gpu->lock);
> >
> > -     return dma_fence_get(submit->hw_fence);
> > +     return dma_fence_get(&submit->hw_fence);
> >   }
> >
> >   static void msm_job_free(struct drm_sched_job *job)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit
  2023-03-13 14:45     ` Rob Clark
@ 2023-03-13 16:15       ` Christian König
  2023-03-13 16:43         ` Rob Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2023-03-13 16:15 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, Tuikov, Luben, linux-arm-msm, freedreno, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, Sumit Semwal, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

Am 13.03.23 um 15:45 schrieb Rob Clark:
> On Mon, Mar 13, 2023 at 12:19 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 11.03.23 um 18:35 schrieb Rob Clark:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Avoid allocating memory in job_run() by embedding the fence in the
>>> submit object.  Since msm gpu fences are always 1:1 with msm_gem_submit
>>> we can just use the fence's refcnt to track the submit.  And since we
>>> can get the fence ctx from the submit we can just drop the msm_fence
>>> struct altogether.  This uses the new dma_fence_init_noref() to deal
>>> with the fact that the fence's refcnt is initialized when the submit is
>>> created, long before job_run().
>> Well this is a very very bad idea, we made the same mistake with amdgpu
>> as well.
>>
>> It's true that you should not have any memory allocation in your run_job
>> callback, but you could also just allocate the hw fence during job
>> creation and initializing it later on.
>>
>> I've suggested to embed the fence into the job for amdgpu because some
>> people insisted of re-submitting jobs during timeout and GPU reset. This
>> turned into a nightmare with tons of bug fixes on top of bug fixes on
>> top of bug fixes because it messes up the job and fence lifetime as
>> defined by the DRM scheduler and DMA-buf framework.
>>
>> Luben is currently working on cleaning all this up.
> This actually shouldn't be a problem with msm, as the fence doesn't
> change if there is a gpu reset.  We simply signal the fence for the
> offending job, reset the GPU, and re-play the remaining in-flight jobs
> (ie. things that already had their job_run() called) with the original
> fences.  (We don't use gpu sched's reset/timeout handling.. when I
> migrated to gpu sched I kept our existing hangcheck/recovery
> mechanism.)

That sounds much saner than what we did.

So you basically need the dma_fence reference counting separate to 
initializing the other dma_fence fields?

What would happen if a dma_fence which is not completely initialized 
gets freed? E.g. because of an error?

Would it be to much to just keep the handling as it is today and only 
allocate the dma_fence without initializing it? If necessary we could 
easily add a dma_fence_is_initialized() function which checks the 
fence_ops for NULL.

Thanks,
Christian.

>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>> Note that this applies on top of https://patchwork.freedesktop.org/series/93035/
>>> out of convenience for myself, but I can re-work it to go before
>>> depending on the order that things land.
>>>
>>>    drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
>>>    drivers/gpu/drm/msm/msm_fence.h      |  2 +-
>>>    drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
>>>    drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
>>>    drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
>>>    drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
>>>    6 files changed, 31 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
>>> index 51b461f32103..51f9f1f0cb66 100644
>>> --- a/drivers/gpu/drm/msm/msm_fence.c
>>> +++ b/drivers/gpu/drm/msm/msm_fence.c
>>> @@ -103,14 +103,9 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
>>>        spin_unlock_irqrestore(&fctx->spinlock, flags);
>>>    }
>>>
>>> -struct msm_fence {
>>> -     struct dma_fence base;
>>> -     struct msm_fence_context *fctx;
>>> -};
>>> -
>>> -static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
>>> +static inline struct msm_gem_submit *fence_to_submit(struct dma_fence *fence)
>>>    {
>>> -     return container_of(fence, struct msm_fence, base);
>>> +     return container_of(fence, struct msm_gem_submit, hw_fence);
>>>    }
>>>
>>>    static const char *msm_fence_get_driver_name(struct dma_fence *fence)
>>> @@ -120,20 +115,20 @@ static const char *msm_fence_get_driver_name(struct dma_fence *fence)
>>>
>>>    static const char *msm_fence_get_timeline_name(struct dma_fence *fence)
>>>    {
>>> -     struct msm_fence *f = to_msm_fence(fence);
>>> -     return f->fctx->name;
>>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
>>> +     return submit->ring->fctx->name;
>>>    }
>>>
>>>    static bool msm_fence_signaled(struct dma_fence *fence)
>>>    {
>>> -     struct msm_fence *f = to_msm_fence(fence);
>>> -     return msm_fence_completed(f->fctx, f->base.seqno);
>>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
>>> +     return msm_fence_completed(submit->ring->fctx, fence->seqno);
>>>    }
>>>
>>>    static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>>    {
>>> -     struct msm_fence *f = to_msm_fence(fence);
>>> -     struct msm_fence_context *fctx = f->fctx;
>>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
>>> +     struct msm_fence_context *fctx = submit->ring->fctx;
>>>        unsigned long flags;
>>>        ktime_t now;
>>>
>>> @@ -165,26 +160,22 @@ static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>>        spin_unlock_irqrestore(&fctx->spinlock, flags);
>>>    }
>>>
>>> +static void msm_fence_release(struct dma_fence *fence)
>>> +{
>>> +     __msm_gem_submit_destroy(fence_to_submit(fence));
>>> +}
>>> +
>>>    static const struct dma_fence_ops msm_fence_ops = {
>>>        .get_driver_name = msm_fence_get_driver_name,
>>>        .get_timeline_name = msm_fence_get_timeline_name,
>>>        .signaled = msm_fence_signaled,
>>>        .set_deadline = msm_fence_set_deadline,
>>> +     .release = msm_fence_release,
>>>    };
>>>
>>> -struct dma_fence *
>>> -msm_fence_alloc(struct msm_fence_context *fctx)
>>> +void
>>> +msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f)
>>>    {
>>> -     struct msm_fence *f;
>>> -
>>> -     f = kzalloc(sizeof(*f), GFP_KERNEL);
>>> -     if (!f)
>>> -             return ERR_PTR(-ENOMEM);
>>> -
>>> -     f->fctx = fctx;
>>> -
>>> -     dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
>>> -                    fctx->context, ++fctx->last_fence);
>>> -
>>> -     return &f->base;
>>> +     dma_fence_init_noref(f, &msm_fence_ops, &fctx->spinlock,
>>> +                          fctx->context, ++fctx->last_fence);
>>>    }
>>> diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
>>> index cdaebfb94f5c..8fca37e9773b 100644
>>> --- a/drivers/gpu/drm/msm/msm_fence.h
>>> +++ b/drivers/gpu/drm/msm/msm_fence.h
>>> @@ -81,7 +81,7 @@ void msm_fence_context_free(struct msm_fence_context *fctx);
>>>    bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence);
>>>    void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
>>>
>>> -struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
>>> +void msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f);
>>>
>>>    static inline bool
>>>    fence_before(uint32_t a, uint32_t b)
>>> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
>>> index c4844cf3a585..e06afed99d5b 100644
>>> --- a/drivers/gpu/drm/msm/msm_gem.h
>>> +++ b/drivers/gpu/drm/msm/msm_gem.h
>>> @@ -259,10 +259,10 @@ struct msm_gem_submit {
>>>        struct ww_acquire_ctx ticket;
>>>        uint32_t seqno;         /* Sequence number of the submit on the ring */
>>>
>>> -     /* Hw fence, which is created when the scheduler executes the job, and
>>> +     /* Hw fence, which is initialized when the scheduler executes the job, and
>>>         * is signaled when the hw finishes (via seqno write from cmdstream)
>>>         */
>>> -     struct dma_fence *hw_fence;
>>> +     struct dma_fence hw_fence;
>>>
>>>        /* Userspace visible fence, which is signaled by the scheduler after
>>>         * the hw_fence is signaled.
>>> @@ -309,16 +309,16 @@ static inline struct msm_gem_submit *to_msm_submit(struct drm_sched_job *job)
>>>        return container_of(job, struct msm_gem_submit, base);
>>>    }
>>>
>>> -void __msm_gem_submit_destroy(struct kref *kref);
>>> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit);
>>>
>>>    static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
>>>    {
>>> -     kref_get(&submit->ref);
>>> +     dma_fence_get(&submit->hw_fence);
>>>    }
>>>
>>>    static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
>>>    {
>>> -     kref_put(&submit->ref, __msm_gem_submit_destroy);
>>> +     dma_fence_put(&submit->hw_fence);
>>>    }
>>>
>>>    void msm_submit_retire(struct msm_gem_submit *submit);
>>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
>>> index be4bf77103cd..522c8c82e827 100644
>>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
>>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
>>> @@ -47,7 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>>>                return ERR_PTR(ret);
>>>        }
>>>
>>> -     kref_init(&submit->ref);
>>> +     kref_init(&submit->hw_fence.refcount);
>>>        submit->dev = dev;
>>>        submit->aspace = queue->ctx->aspace;
>>>        submit->gpu = gpu;
>>> @@ -65,10 +65,9 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>>>        return submit;
>>>    }
>>>
>>> -void __msm_gem_submit_destroy(struct kref *kref)
>>> +/* Called when the hw_fence is destroyed: */
>>> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit)
>>>    {
>>> -     struct msm_gem_submit *submit =
>>> -                     container_of(kref, struct msm_gem_submit, ref);
>>>        unsigned i;
>>>
>>>        if (submit->fence_id) {
>>> @@ -78,7 +77,6 @@ void __msm_gem_submit_destroy(struct kref *kref)
>>>        }
>>>
>>>        dma_fence_put(submit->user_fence);
>>> -     dma_fence_put(submit->hw_fence);
>>>
>>>        put_pid(submit->pid);
>>>        msm_submitqueue_put(submit->queue);
>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
>>> index 380249500325..a82d11dd5fcf 100644
>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>> @@ -716,7 +716,7 @@ static void retire_submits(struct msm_gpu *gpu)
>>>                         * been signalled, then later submits are not signalled
>>>                         * either, so we are also done.
>>>                         */
>>> -                     if (submit && dma_fence_is_signaled(submit->hw_fence)) {
>>> +                     if (submit && dma_fence_is_signaled(&submit->hw_fence)) {
>>>                                retire_submit(gpu, ring, submit);
>>>                        } else {
>>>                                break;
>>> @@ -760,7 +760,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>>>
>>>        msm_gpu_hw_init(gpu);
>>>
>>> -     submit->seqno = submit->hw_fence->seqno;
>>> +     submit->seqno = submit->hw_fence.seqno;
>>>
>>>        msm_rd_dump_submit(priv->rd, submit, NULL);
>>>
>>> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
>>> index 57a8e9564540..5c54befa2427 100644
>>> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
>>> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
>>> @@ -18,7 +18,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
>>>        struct msm_gpu *gpu = submit->gpu;
>>>        int i;
>>>
>>> -     submit->hw_fence = msm_fence_alloc(fctx);
>>> +     msm_fence_init(fctx, &submit->hw_fence);
>>>
>>>        for (i = 0; i < submit->nr_bos; i++) {
>>>                struct drm_gem_object *obj = &submit->bos[i].obj->base;
>>> @@ -37,7 +37,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
>>>
>>>        mutex_unlock(&gpu->lock);
>>>
>>> -     return dma_fence_get(submit->hw_fence);
>>> +     return dma_fence_get(&submit->hw_fence);
>>>    }
>>>
>>>    static void msm_job_free(struct drm_sched_job *job)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit
  2023-03-13 16:15       ` Christian König
@ 2023-03-13 16:43         ` Rob Clark
  2023-03-13 19:21           ` [Linaro-mm-sig] " Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-03-13 16:43 UTC (permalink / raw)
  To: Christian König
  Cc: dri-devel, Tuikov, Luben, linux-arm-msm, freedreno, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, Sumit Semwal, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Mon, Mar 13, 2023 at 9:15 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 13.03.23 um 15:45 schrieb Rob Clark:
> > On Mon, Mar 13, 2023 at 12:19 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 11.03.23 um 18:35 schrieb Rob Clark:
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Avoid allocating memory in job_run() by embedding the fence in the
> >>> submit object.  Since msm gpu fences are always 1:1 with msm_gem_submit
> >>> we can just use the fence's refcnt to track the submit.  And since we
> >>> can get the fence ctx from the submit we can just drop the msm_fence
> >>> struct altogether.  This uses the new dma_fence_init_noref() to deal
> >>> with the fact that the fence's refcnt is initialized when the submit is
> >>> created, long before job_run().
> >> Well this is a very very bad idea, we made the same mistake with amdgpu
> >> as well.
> >>
> >> It's true that you should not have any memory allocation in your run_job
> >> callback, but you could also just allocate the hw fence during job
> >> creation and initializing it later on.
> >>
> >> I've suggested to embed the fence into the job for amdgpu because some
> >> people insisted of re-submitting jobs during timeout and GPU reset. This
> >> turned into a nightmare with tons of bug fixes on top of bug fixes on
> >> top of bug fixes because it messes up the job and fence lifetime as
> >> defined by the DRM scheduler and DMA-buf framework.
> >>
> >> Luben is currently working on cleaning all this up.
> > This actually shouldn't be a problem with msm, as the fence doesn't
> > change if there is a gpu reset.  We simply signal the fence for the
> > offending job, reset the GPU, and re-play the remaining in-flight jobs
> > (ie. things that already had their job_run() called) with the original
> > fences.  (We don't use gpu sched's reset/timeout handling.. when I
> > migrated to gpu sched I kept our existing hangcheck/recovery
> > mechanism.)
>
> That sounds much saner than what we did.
>
> So you basically need the dma_fence reference counting separate to
> initializing the other dma_fence fields?

yeah, that was the idea

> What would happen if a dma_fence which is not completely initialized
> gets freed? E.g. because of an error?

hmm, yes, this would be a problem since ops->release is not set yet..
and I'm relying on that to free the submit

> Would it be to much to just keep the handling as it is today and only
> allocate the dma_fence without initializing it? If necessary we could
> easily add a dma_fence_is_initialized() function which checks the
> fence_ops for NULL.

Yeah, that would also be possible

I guess we could split creation of the fence (initializing ops,
refcount) and "arming" it later when the seqno is known?  But maybe
that is going to too many lengths to avoid a separate allocation..

BR,
-R

>
> Thanks,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>> ---
> >>> Note that this applies on top of https://patchwork.freedesktop.org/series/93035/
> >>> out of convenience for myself, but I can re-work it to go before
> >>> depending on the order that things land.
> >>>
> >>>    drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
> >>>    drivers/gpu/drm/msm/msm_fence.h      |  2 +-
> >>>    drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
> >>>    drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
> >>>    drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
> >>>    drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
> >>>    6 files changed, 31 insertions(+), 42 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> >>> index 51b461f32103..51f9f1f0cb66 100644
> >>> --- a/drivers/gpu/drm/msm/msm_fence.c
> >>> +++ b/drivers/gpu/drm/msm/msm_fence.c
> >>> @@ -103,14 +103,9 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> >>>        spin_unlock_irqrestore(&fctx->spinlock, flags);
> >>>    }
> >>>
> >>> -struct msm_fence {
> >>> -     struct dma_fence base;
> >>> -     struct msm_fence_context *fctx;
> >>> -};
> >>> -
> >>> -static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
> >>> +static inline struct msm_gem_submit *fence_to_submit(struct dma_fence *fence)
> >>>    {
> >>> -     return container_of(fence, struct msm_fence, base);
> >>> +     return container_of(fence, struct msm_gem_submit, hw_fence);
> >>>    }
> >>>
> >>>    static const char *msm_fence_get_driver_name(struct dma_fence *fence)
> >>> @@ -120,20 +115,20 @@ static const char *msm_fence_get_driver_name(struct dma_fence *fence)
> >>>
> >>>    static const char *msm_fence_get_timeline_name(struct dma_fence *fence)
> >>>    {
> >>> -     struct msm_fence *f = to_msm_fence(fence);
> >>> -     return f->fctx->name;
> >>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
> >>> +     return submit->ring->fctx->name;
> >>>    }
> >>>
> >>>    static bool msm_fence_signaled(struct dma_fence *fence)
> >>>    {
> >>> -     struct msm_fence *f = to_msm_fence(fence);
> >>> -     return msm_fence_completed(f->fctx, f->base.seqno);
> >>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
> >>> +     return msm_fence_completed(submit->ring->fctx, fence->seqno);
> >>>    }
> >>>
> >>>    static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> >>>    {
> >>> -     struct msm_fence *f = to_msm_fence(fence);
> >>> -     struct msm_fence_context *fctx = f->fctx;
> >>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
> >>> +     struct msm_fence_context *fctx = submit->ring->fctx;
> >>>        unsigned long flags;
> >>>        ktime_t now;
> >>>
> >>> @@ -165,26 +160,22 @@ static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> >>>        spin_unlock_irqrestore(&fctx->spinlock, flags);
> >>>    }
> >>>
> >>> +static void msm_fence_release(struct dma_fence *fence)
> >>> +{
> >>> +     __msm_gem_submit_destroy(fence_to_submit(fence));
> >>> +}
> >>> +
> >>>    static const struct dma_fence_ops msm_fence_ops = {
> >>>        .get_driver_name = msm_fence_get_driver_name,
> >>>        .get_timeline_name = msm_fence_get_timeline_name,
> >>>        .signaled = msm_fence_signaled,
> >>>        .set_deadline = msm_fence_set_deadline,
> >>> +     .release = msm_fence_release,
> >>>    };
> >>>
> >>> -struct dma_fence *
> >>> -msm_fence_alloc(struct msm_fence_context *fctx)
> >>> +void
> >>> +msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f)
> >>>    {
> >>> -     struct msm_fence *f;
> >>> -
> >>> -     f = kzalloc(sizeof(*f), GFP_KERNEL);
> >>> -     if (!f)
> >>> -             return ERR_PTR(-ENOMEM);
> >>> -
> >>> -     f->fctx = fctx;
> >>> -
> >>> -     dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
> >>> -                    fctx->context, ++fctx->last_fence);
> >>> -
> >>> -     return &f->base;
> >>> +     dma_fence_init_noref(f, &msm_fence_ops, &fctx->spinlock,
> >>> +                          fctx->context, ++fctx->last_fence);
> >>>    }
> >>> diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
> >>> index cdaebfb94f5c..8fca37e9773b 100644
> >>> --- a/drivers/gpu/drm/msm/msm_fence.h
> >>> +++ b/drivers/gpu/drm/msm/msm_fence.h
> >>> @@ -81,7 +81,7 @@ void msm_fence_context_free(struct msm_fence_context *fctx);
> >>>    bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence);
> >>>    void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
> >>>
> >>> -struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
> >>> +void msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f);
> >>>
> >>>    static inline bool
> >>>    fence_before(uint32_t a, uint32_t b)
> >>> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> >>> index c4844cf3a585..e06afed99d5b 100644
> >>> --- a/drivers/gpu/drm/msm/msm_gem.h
> >>> +++ b/drivers/gpu/drm/msm/msm_gem.h
> >>> @@ -259,10 +259,10 @@ struct msm_gem_submit {
> >>>        struct ww_acquire_ctx ticket;
> >>>        uint32_t seqno;         /* Sequence number of the submit on the ring */
> >>>
> >>> -     /* Hw fence, which is created when the scheduler executes the job, and
> >>> +     /* Hw fence, which is initialized when the scheduler executes the job, and
> >>>         * is signaled when the hw finishes (via seqno write from cmdstream)
> >>>         */
> >>> -     struct dma_fence *hw_fence;
> >>> +     struct dma_fence hw_fence;
> >>>
> >>>        /* Userspace visible fence, which is signaled by the scheduler after
> >>>         * the hw_fence is signaled.
> >>> @@ -309,16 +309,16 @@ static inline struct msm_gem_submit *to_msm_submit(struct drm_sched_job *job)
> >>>        return container_of(job, struct msm_gem_submit, base);
> >>>    }
> >>>
> >>> -void __msm_gem_submit_destroy(struct kref *kref);
> >>> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit);
> >>>
> >>>    static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
> >>>    {
> >>> -     kref_get(&submit->ref);
> >>> +     dma_fence_get(&submit->hw_fence);
> >>>    }
> >>>
> >>>    static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
> >>>    {
> >>> -     kref_put(&submit->ref, __msm_gem_submit_destroy);
> >>> +     dma_fence_put(&submit->hw_fence);
> >>>    }
> >>>
> >>>    void msm_submit_retire(struct msm_gem_submit *submit);
> >>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> >>> index be4bf77103cd..522c8c82e827 100644
> >>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> >>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> >>> @@ -47,7 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
> >>>                return ERR_PTR(ret);
> >>>        }
> >>>
> >>> -     kref_init(&submit->ref);
> >>> +     kref_init(&submit->hw_fence.refcount);
> >>>        submit->dev = dev;
> >>>        submit->aspace = queue->ctx->aspace;
> >>>        submit->gpu = gpu;
> >>> @@ -65,10 +65,9 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
> >>>        return submit;
> >>>    }
> >>>
> >>> -void __msm_gem_submit_destroy(struct kref *kref)
> >>> +/* Called when the hw_fence is destroyed: */
> >>> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit)
> >>>    {
> >>> -     struct msm_gem_submit *submit =
> >>> -                     container_of(kref, struct msm_gem_submit, ref);
> >>>        unsigned i;
> >>>
> >>>        if (submit->fence_id) {
> >>> @@ -78,7 +77,6 @@ void __msm_gem_submit_destroy(struct kref *kref)
> >>>        }
> >>>
> >>>        dma_fence_put(submit->user_fence);
> >>> -     dma_fence_put(submit->hw_fence);
> >>>
> >>>        put_pid(submit->pid);
> >>>        msm_submitqueue_put(submit->queue);
> >>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> >>> index 380249500325..a82d11dd5fcf 100644
> >>> --- a/drivers/gpu/drm/msm/msm_gpu.c
> >>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> >>> @@ -716,7 +716,7 @@ static void retire_submits(struct msm_gpu *gpu)
> >>>                         * been signalled, then later submits are not signalled
> >>>                         * either, so we are also done.
> >>>                         */
> >>> -                     if (submit && dma_fence_is_signaled(submit->hw_fence)) {
> >>> +                     if (submit && dma_fence_is_signaled(&submit->hw_fence)) {
> >>>                                retire_submit(gpu, ring, submit);
> >>>                        } else {
> >>>                                break;
> >>> @@ -760,7 +760,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >>>
> >>>        msm_gpu_hw_init(gpu);
> >>>
> >>> -     submit->seqno = submit->hw_fence->seqno;
> >>> +     submit->seqno = submit->hw_fence.seqno;
> >>>
> >>>        msm_rd_dump_submit(priv->rd, submit, NULL);
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
> >>> index 57a8e9564540..5c54befa2427 100644
> >>> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> >>> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> >>> @@ -18,7 +18,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
> >>>        struct msm_gpu *gpu = submit->gpu;
> >>>        int i;
> >>>
> >>> -     submit->hw_fence = msm_fence_alloc(fctx);
> >>> +     msm_fence_init(fctx, &submit->hw_fence);
> >>>
> >>>        for (i = 0; i < submit->nr_bos; i++) {
> >>>                struct drm_gem_object *obj = &submit->bos[i].obj->base;
> >>> @@ -37,7 +37,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
> >>>
> >>>        mutex_unlock(&gpu->lock);
> >>>
> >>> -     return dma_fence_get(submit->hw_fence);
> >>> +     return dma_fence_get(&submit->hw_fence);
> >>>    }
> >>>
> >>>    static void msm_job_free(struct drm_sched_job *job)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Linaro-mm-sig] Re: [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit
  2023-03-13 16:43         ` Rob Clark
@ 2023-03-13 19:21           ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2023-03-13 19:21 UTC (permalink / raw)
  To: Rob Clark, Christian König
  Cc: dri-devel, Tuikov, Luben, linux-arm-msm, freedreno, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, Daniel Vetter,
	Sumit Semwal, open list, open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

Am 13.03.23 um 17:43 schrieb Rob Clark:
> On Mon, Mar 13, 2023 at 9:15 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 13.03.23 um 15:45 schrieb Rob Clark:
>>> On Mon, Mar 13, 2023 at 12:19 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 11.03.23 um 18:35 schrieb Rob Clark:
>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>
>>>>> Avoid allocating memory in job_run() by embedding the fence in the
>>>>> submit object.  Since msm gpu fences are always 1:1 with msm_gem_submit
>>>>> we can just use the fence's refcnt to track the submit.  And since we
>>>>> can get the fence ctx from the submit we can just drop the msm_fence
>>>>> struct altogether.  This uses the new dma_fence_init_noref() to deal
>>>>> with the fact that the fence's refcnt is initialized when the submit is
>>>>> created, long before job_run().
>>>> Well this is a very very bad idea, we made the same mistake with amdgpu
>>>> as well.
>>>>
>>>> It's true that you should not have any memory allocation in your run_job
>>>> callback, but you could also just allocate the hw fence during job
>>>> creation and initializing it later on.
>>>>
>>>> I've suggested to embed the fence into the job for amdgpu because some
>>>> people insisted of re-submitting jobs during timeout and GPU reset. This
>>>> turned into a nightmare with tons of bug fixes on top of bug fixes on
>>>> top of bug fixes because it messes up the job and fence lifetime as
>>>> defined by the DRM scheduler and DMA-buf framework.
>>>>
>>>> Luben is currently working on cleaning all this up.
>>> This actually shouldn't be a problem with msm, as the fence doesn't
>>> change if there is a gpu reset.  We simply signal the fence for the
>>> offending job, reset the GPU, and re-play the remaining in-flight jobs
>>> (ie. things that already had their job_run() called) with the original
>>> fences.  (We don't use gpu sched's reset/timeout handling.. when I
>>> migrated to gpu sched I kept our existing hangcheck/recovery
>>> mechanism.)
>> That sounds much saner than what we did.
>>
>> So you basically need the dma_fence reference counting separate to
>> initializing the other dma_fence fields?
> yeah, that was the idea
>
>> What would happen if a dma_fence which is not completely initialized
>> gets freed? E.g. because of an error?
> hmm, yes, this would be a problem since ops->release is not set yet..
> and I'm relying on that to free the submit
>
>> Would it be to much to just keep the handling as it is today and only
>> allocate the dma_fence without initializing it? If necessary we could
>> easily add a dma_fence_is_initialized() function which checks the
>> fence_ops for NULL.
> Yeah, that would also be possible
>
> I guess we could split creation of the fence (initializing ops,
> refcount) and "arming" it later when the seqno is known?  But maybe
> that is going to too many lengths to avoid a separate allocation..

I would really like to avoid that. It give people the opportunity once 
more to do multiple "arm" operations on the same fence, and that was a 
really bad idea for us.

So yeah if that's just to avoid the extra allocation it's probably not 
worth it.

Christian.

>
> BR,
> -R
>
>> Thanks,
>> Christian.
>>
>>> BR,
>>> -R
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>> ---
>>>>> Note that this applies on top of https://patchwork.freedesktop.org/series/93035/
>>>>> out of convenience for myself, but I can re-work it to go before
>>>>> depending on the order that things land.
>>>>>
>>>>>     drivers/gpu/drm/msm/msm_fence.c      | 45 +++++++++++-----------------
>>>>>     drivers/gpu/drm/msm/msm_fence.h      |  2 +-
>>>>>     drivers/gpu/drm/msm/msm_gem.h        | 10 +++----
>>>>>     drivers/gpu/drm/msm/msm_gem_submit.c |  8 ++---
>>>>>     drivers/gpu/drm/msm/msm_gpu.c        |  4 +--
>>>>>     drivers/gpu/drm/msm/msm_ringbuffer.c |  4 +--
>>>>>     6 files changed, 31 insertions(+), 42 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
>>>>> index 51b461f32103..51f9f1f0cb66 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_fence.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_fence.c
>>>>> @@ -103,14 +103,9 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
>>>>>         spin_unlock_irqrestore(&fctx->spinlock, flags);
>>>>>     }
>>>>>
>>>>> -struct msm_fence {
>>>>> -     struct dma_fence base;
>>>>> -     struct msm_fence_context *fctx;
>>>>> -};
>>>>> -
>>>>> -static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
>>>>> +static inline struct msm_gem_submit *fence_to_submit(struct dma_fence *fence)
>>>>>     {
>>>>> -     return container_of(fence, struct msm_fence, base);
>>>>> +     return container_of(fence, struct msm_gem_submit, hw_fence);
>>>>>     }
>>>>>
>>>>>     static const char *msm_fence_get_driver_name(struct dma_fence *fence)
>>>>> @@ -120,20 +115,20 @@ static const char *msm_fence_get_driver_name(struct dma_fence *fence)
>>>>>
>>>>>     static const char *msm_fence_get_timeline_name(struct dma_fence *fence)
>>>>>     {
>>>>> -     struct msm_fence *f = to_msm_fence(fence);
>>>>> -     return f->fctx->name;
>>>>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
>>>>> +     return submit->ring->fctx->name;
>>>>>     }
>>>>>
>>>>>     static bool msm_fence_signaled(struct dma_fence *fence)
>>>>>     {
>>>>> -     struct msm_fence *f = to_msm_fence(fence);
>>>>> -     return msm_fence_completed(f->fctx, f->base.seqno);
>>>>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
>>>>> +     return msm_fence_completed(submit->ring->fctx, fence->seqno);
>>>>>     }
>>>>>
>>>>>     static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>>>>     {
>>>>> -     struct msm_fence *f = to_msm_fence(fence);
>>>>> -     struct msm_fence_context *fctx = f->fctx;
>>>>> +     struct msm_gem_submit *submit = fence_to_submit(fence);
>>>>> +     struct msm_fence_context *fctx = submit->ring->fctx;
>>>>>         unsigned long flags;
>>>>>         ktime_t now;
>>>>>
>>>>> @@ -165,26 +160,22 @@ static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
>>>>>         spin_unlock_irqrestore(&fctx->spinlock, flags);
>>>>>     }
>>>>>
>>>>> +static void msm_fence_release(struct dma_fence *fence)
>>>>> +{
>>>>> +     __msm_gem_submit_destroy(fence_to_submit(fence));
>>>>> +}
>>>>> +
>>>>>     static const struct dma_fence_ops msm_fence_ops = {
>>>>>         .get_driver_name = msm_fence_get_driver_name,
>>>>>         .get_timeline_name = msm_fence_get_timeline_name,
>>>>>         .signaled = msm_fence_signaled,
>>>>>         .set_deadline = msm_fence_set_deadline,
>>>>> +     .release = msm_fence_release,
>>>>>     };
>>>>>
>>>>> -struct dma_fence *
>>>>> -msm_fence_alloc(struct msm_fence_context *fctx)
>>>>> +void
>>>>> +msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f)
>>>>>     {
>>>>> -     struct msm_fence *f;
>>>>> -
>>>>> -     f = kzalloc(sizeof(*f), GFP_KERNEL);
>>>>> -     if (!f)
>>>>> -             return ERR_PTR(-ENOMEM);
>>>>> -
>>>>> -     f->fctx = fctx;
>>>>> -
>>>>> -     dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
>>>>> -                    fctx->context, ++fctx->last_fence);
>>>>> -
>>>>> -     return &f->base;
>>>>> +     dma_fence_init_noref(f, &msm_fence_ops, &fctx->spinlock,
>>>>> +                          fctx->context, ++fctx->last_fence);
>>>>>     }
>>>>> diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
>>>>> index cdaebfb94f5c..8fca37e9773b 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_fence.h
>>>>> +++ b/drivers/gpu/drm/msm/msm_fence.h
>>>>> @@ -81,7 +81,7 @@ void msm_fence_context_free(struct msm_fence_context *fctx);
>>>>>     bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence);
>>>>>     void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
>>>>>
>>>>> -struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
>>>>> +void msm_fence_init(struct msm_fence_context *fctx, struct dma_fence *f);
>>>>>
>>>>>     static inline bool
>>>>>     fence_before(uint32_t a, uint32_t b)
>>>>> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
>>>>> index c4844cf3a585..e06afed99d5b 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_gem.h
>>>>> +++ b/drivers/gpu/drm/msm/msm_gem.h
>>>>> @@ -259,10 +259,10 @@ struct msm_gem_submit {
>>>>>         struct ww_acquire_ctx ticket;
>>>>>         uint32_t seqno;         /* Sequence number of the submit on the ring */
>>>>>
>>>>> -     /* Hw fence, which is created when the scheduler executes the job, and
>>>>> +     /* Hw fence, which is initialized when the scheduler executes the job, and
>>>>>          * is signaled when the hw finishes (via seqno write from cmdstream)
>>>>>          */
>>>>> -     struct dma_fence *hw_fence;
>>>>> +     struct dma_fence hw_fence;
>>>>>
>>>>>         /* Userspace visible fence, which is signaled by the scheduler after
>>>>>          * the hw_fence is signaled.
>>>>> @@ -309,16 +309,16 @@ static inline struct msm_gem_submit *to_msm_submit(struct drm_sched_job *job)
>>>>>         return container_of(job, struct msm_gem_submit, base);
>>>>>     }
>>>>>
>>>>> -void __msm_gem_submit_destroy(struct kref *kref);
>>>>> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit);
>>>>>
>>>>>     static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
>>>>>     {
>>>>> -     kref_get(&submit->ref);
>>>>> +     dma_fence_get(&submit->hw_fence);
>>>>>     }
>>>>>
>>>>>     static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
>>>>>     {
>>>>> -     kref_put(&submit->ref, __msm_gem_submit_destroy);
>>>>> +     dma_fence_put(&submit->hw_fence);
>>>>>     }
>>>>>
>>>>>     void msm_submit_retire(struct msm_gem_submit *submit);
>>>>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
>>>>> index be4bf77103cd..522c8c82e827 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
>>>>> @@ -47,7 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>>>>>                 return ERR_PTR(ret);
>>>>>         }
>>>>>
>>>>> -     kref_init(&submit->ref);
>>>>> +     kref_init(&submit->hw_fence.refcount);
>>>>>         submit->dev = dev;
>>>>>         submit->aspace = queue->ctx->aspace;
>>>>>         submit->gpu = gpu;
>>>>> @@ -65,10 +65,9 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>>>>>         return submit;
>>>>>     }
>>>>>
>>>>> -void __msm_gem_submit_destroy(struct kref *kref)
>>>>> +/* Called when the hw_fence is destroyed: */
>>>>> +void __msm_gem_submit_destroy(struct msm_gem_submit *submit)
>>>>>     {
>>>>> -     struct msm_gem_submit *submit =
>>>>> -                     container_of(kref, struct msm_gem_submit, ref);
>>>>>         unsigned i;
>>>>>
>>>>>         if (submit->fence_id) {
>>>>> @@ -78,7 +77,6 @@ void __msm_gem_submit_destroy(struct kref *kref)
>>>>>         }
>>>>>
>>>>>         dma_fence_put(submit->user_fence);
>>>>> -     dma_fence_put(submit->hw_fence);
>>>>>
>>>>>         put_pid(submit->pid);
>>>>>         msm_submitqueue_put(submit->queue);
>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
>>>>> index 380249500325..a82d11dd5fcf 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>>>> @@ -716,7 +716,7 @@ static void retire_submits(struct msm_gpu *gpu)
>>>>>                          * been signalled, then later submits are not signalled
>>>>>                          * either, so we are also done.
>>>>>                          */
>>>>> -                     if (submit && dma_fence_is_signaled(submit->hw_fence)) {
>>>>> +                     if (submit && dma_fence_is_signaled(&submit->hw_fence)) {
>>>>>                                 retire_submit(gpu, ring, submit);
>>>>>                         } else {
>>>>>                                 break;
>>>>> @@ -760,7 +760,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>>>>>
>>>>>         msm_gpu_hw_init(gpu);
>>>>>
>>>>> -     submit->seqno = submit->hw_fence->seqno;
>>>>> +     submit->seqno = submit->hw_fence.seqno;
>>>>>
>>>>>         msm_rd_dump_submit(priv->rd, submit, NULL);
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
>>>>> index 57a8e9564540..5c54befa2427 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
>>>>> @@ -18,7 +18,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
>>>>>         struct msm_gpu *gpu = submit->gpu;
>>>>>         int i;
>>>>>
>>>>> -     submit->hw_fence = msm_fence_alloc(fctx);
>>>>> +     msm_fence_init(fctx, &submit->hw_fence);
>>>>>
>>>>>         for (i = 0; i < submit->nr_bos; i++) {
>>>>>                 struct drm_gem_object *obj = &submit->bos[i].obj->base;
>>>>> @@ -37,7 +37,7 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
>>>>>
>>>>>         mutex_unlock(&gpu->lock);
>>>>>
>>>>> -     return dma_fence_get(submit->hw_fence);
>>>>> +     return dma_fence_get(&submit->hw_fence);
>>>>>     }
>>>>>
>>>>>     static void msm_job_free(struct drm_sched_job *job)
> _______________________________________________
> Linaro-mm-sig mailing list -- linaro-mm-sig@lists.linaro.org
> To unsubscribe send an email to linaro-mm-sig-leave@lists.linaro.org


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-13 19:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-11 17:35 [PATCH 0/2] drm/msm: Get rid of fence allocation in job_run() Rob Clark
2023-03-11 17:35 ` [PATCH 1/2] dma-buf/dma-fence: Add dma_fence_init_noref() Rob Clark
2023-03-13  7:13   ` Christian König
2023-03-13  7:31     ` Christian König
2023-03-11 17:35 ` [PATCH 2/2] drm/msm: Embed the hw_fence in msm_gem_submit Rob Clark
2023-03-13  7:19   ` Christian König
2023-03-13 14:45     ` Rob Clark
2023-03-13 16:15       ` Christian König
2023-03-13 16:43         ` Rob Clark
2023-03-13 19:21           ` [Linaro-mm-sig] " Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).