All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] drm/scheduler dependency tracking
@ 2021-07-02 21:38 Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                   ` (10 more replies)
  0 siblings, 11 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development; +Cc: Daniel Vetter

Hi all

2nd major round of my scheduler dependency handling patches.

Emma noticed a big fumble in that I just didn't bother cleaning up between
drm_sched_job_init() and drm_sched_job_arm(). This here should fix it now.

Review and testing very much welcome.

Cheers, Daniel

Daniel Vetter (11):
  drm/sched: Split drm_sched_job_init
  drm/sched: Add dependency tracking
  drm/sched: drop entity parameter from drm_sched_push_job
  drm/panfrost: use scheduler dependency tracking
  drm/lima: use scheduler dependency tracking
  drm/v3d: Move drm_sched_job_init to v3d_job_init
  drm/v3d: Use scheduler dependency handling
  drm/etnaviv: Use scheduler dependency handling
  drm/gem: Delete gem array fencing helpers
  drm/sched: Don't store self-dependencies
  drm/sched: Check locking in drm_sched_job_await_implicit

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c       |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c      |   4 +-
 drivers/gpu/drm/drm_gem.c                    |  96 -----------
 drivers/gpu/drm/etnaviv/etnaviv_gem.h        |   5 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c |  32 ++--
 drivers/gpu/drm/etnaviv/etnaviv_sched.c      |  63 +-------
 drivers/gpu/drm/etnaviv/etnaviv_sched.h      |   3 +-
 drivers/gpu/drm/lima/lima_gem.c              |   7 +-
 drivers/gpu/drm/lima/lima_sched.c            |  28 +---
 drivers/gpu/drm/lima/lima_sched.h            |   6 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c      |  16 +-
 drivers/gpu/drm/panfrost/panfrost_job.c      |  39 +----
 drivers/gpu/drm/panfrost/panfrost_job.h      |   5 +-
 drivers/gpu/drm/scheduler/sched_entity.c     |  30 ++--
 drivers/gpu/drm/scheduler/sched_fence.c      |  17 +-
 drivers/gpu/drm/scheduler/sched_main.c       | 158 ++++++++++++++++++-
 drivers/gpu/drm/v3d/v3d_drv.h                |   6 +-
 drivers/gpu/drm/v3d/v3d_gem.c                | 115 ++++++--------
 drivers/gpu/drm/v3d/v3d_sched.c              |  44 +-----
 include/drm/drm_gem.h                        |   5 -
 include/drm/gpu_scheduler.h                  |  41 ++++-
 21 files changed, 330 insertions(+), 394 deletions(-)

-- 
2.32.0.rc2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
@ 2021-07-02 21:38   ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Daniel Vetter,
	Sumit Semwal, Christian König, Masahiro Yamada, Kees Cook,
	Adam Borowski, Nick Terrell, Mauro Carvalho Chehab, Paul Menzel,
	Sami Tolvanen, Viresh Kumar, Alex Deucher, Dave Airlie,
	Nirmoy Das, Deepak R Varma, Lee Jones, Kevin Wang, Chen Li,
	Luben Tuikov, Marek Olšák, Dennis Li,
	Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, etnaviv, lima,
	linux-media, linaro-mm-sig, Emma Anholt

This is a very confusingly named function, because not just does it
init an object, it arms it and provides a point of no return for
pushing a job into the scheduler. It would be nice if that's a bit
clearer in the interface.

But the real reason is that I want to push the dependency tracking
helpers into the scheduler code, and that means drm_sched_job_init
must be called a lot earlier, without arming the job.

v2:
- don't change .gitignore (Steven)
- don't forget v3d (Emma)

v3: Emma noticed that I leak the memory allocated in
drm_sched_job_init if we bail out before the point of no return in
subsequent driver patches. To be able to fix this change
drm_sched_job_cleanup() so it can handle being called both before and
after drm_sched_job_arm().

Also improve the kerneldoc for this.

Acked-by: Steven Price <steven.price@arm.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: Deepak R Varma <mh12gx2825@gmail.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Kevin Wang <kevin1.wang@amd.com>
Cc: Chen Li <chenli@uniontech.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Sonny Jiang <sonny.jiang@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Tian Tao <tiantao6@hisilicon.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: Emma Anholt <emma@anholt.net>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
 drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
 drivers/gpu/drm/lima/lima_sched.c        |  2 ++
 drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
 drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
 drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
 drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
 drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
 include/drm/gpu_scheduler.h              |  7 +++-
 10 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c5386d13eb4a..a4ec092af9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	if (r)
 		goto error_unlock;
 
+	drm_sched_job_arm(&job->base);
+
 	/* No memory allocation is allowed while holding the notifier lock.
 	 * The lock is held until amdgpu_cs_submit is finished and fence is
 	 * added to BOs.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d33e6d97cc89..5ddb955d2315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
 	if (r)
 		return r;
 
+	drm_sched_job_arm(&job->base);
+
 	*f = dma_fence_get(&job->base.s_fence->finished);
 	amdgpu_job_free_resources(job);
 	drm_sched_entity_push_job(&job->base, entity);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index feb6da1b6ceb..05f412204118 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
 	if (ret)
 		goto out_unlock;
 
+	drm_sched_job_arm(&submit->sched_job);
+
 	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
 	submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
 						submit->out_fence, 0,
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index dba8329937a3..38f755580507 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
 		return err;
 	}
 
+	drm_sched_job_arm(&task->base);
+
 	task->num_bos = num_bos;
 	task->vm = lima_vm_get(vm);
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 71a72fb50e6b..2992dc85325f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
 		goto unlock;
 	}
 
+	drm_sched_job_arm(&job->base);
+
 	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
 
 	ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 79554aa4dbb1..f7347c284886 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
  * @sched_job: job to submit
  * @entity: scheduler entity
  *
- * Note: To guarantee that the order of insertion to queue matches
- * the job's fence sequence number this function should be
- * called with drm_sched_job_init under common lock.
+ * Note: To guarantee that the order of insertion to queue matches the job's
+ * fence sequence number this function should be called with drm_sched_job_arm()
+ * under common lock.
  *
  * Returns 0 for success, negative error code otherwise.
  */
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 69de2c76731f..c451ee9a30d7 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
  *
  * Free up the fence memory after the RCU grace period.
  */
-static void drm_sched_fence_free(struct rcu_head *rcu)
+void drm_sched_fence_free(struct rcu_head *rcu)
 {
 	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
 	struct drm_sched_fence *fence = to_drm_sched_fence(f);
@@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
 }
 EXPORT_SYMBOL(to_drm_sched_fence);
 
-struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
-					       void *owner)
+struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
+					      void *owner)
 {
 	struct drm_sched_fence *fence = NULL;
-	unsigned seq;
 
 	fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
 	if (fence == NULL)
@@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
 	fence->sched = entity->rq->sched;
 	spin_lock_init(&fence->lock);
 
+	return fence;
+}
+
+void drm_sched_fence_init(struct drm_sched_fence *fence,
+			  struct drm_sched_entity *entity)
+{
+	unsigned seq;
+
 	seq = atomic_inc_return(&entity->fence_seq);
 	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
 		       &fence->lock, entity->fence_context, seq);
 	dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
 		       &fence->lock, entity->fence_context + 1, seq);
-
-	return fence;
 }
 
 module_init(drm_sched_fence_slab_init);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 33c414d55fab..5e84e1500c32 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,9 +48,11 @@
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
+#include <linux/dma-resv.h>
 #include <uapi/linux/sched/types.h>
 
 #include <drm/drm_print.h>
+#include <drm/drm_gem.h>
 #include <drm/gpu_scheduler.h>
 #include <drm/spsc_queue.h>
 
@@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
 
 /**
  * drm_sched_job_init - init a scheduler job
- *
  * @job: scheduler job to init
  * @entity: scheduler entity to use
  * @owner: job owner for debugging
@@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
  * Refer to drm_sched_entity_push_job() documentation
  * for locking considerations.
  *
+ * Drivers must make sure drm_sched_job_cleanup() if this function returns
+ * successfully, even when @job is aborted before drm_sched_job_arm() is called.
+ *
  * Returns 0 for success, negative error code otherwise.
  */
 int drm_sched_job_init(struct drm_sched_job *job,
@@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 	job->sched = sched;
 	job->entity = entity;
 	job->s_priority = entity->rq - sched->sched_rq;
-	job->s_fence = drm_sched_fence_create(entity, owner);
+	job->s_fence = drm_sched_fence_alloc(entity, owner);
 	if (!job->s_fence)
 		return -ENOMEM;
 	job->id = atomic64_inc_return(&sched->job_id_count);
@@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
 EXPORT_SYMBOL(drm_sched_job_init);
 
 /**
- * drm_sched_job_cleanup - clean up scheduler job resources
+ * drm_sched_job_arm - arm a scheduler job for execution
+ * @job: scheduler job to arm
+ *
+ * This arms a scheduler job for execution. Specifically it initializes the
+ * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
+ * or other places that need to track the completion of this job.
+ *
+ * Refer to drm_sched_entity_push_job() documentation for locking
+ * considerations.
  *
+ * This can only be called if drm_sched_job_init() succeeded.
+ */
+void drm_sched_job_arm(struct drm_sched_job *job)
+{
+	drm_sched_fence_init(job->s_fence, job->entity);
+}
+EXPORT_SYMBOL(drm_sched_job_arm);
+
+/**
+ * drm_sched_job_cleanup - clean up scheduler job resources
  * @job: scheduler job to clean up
+ *
+ * Cleans up the resources allocated with drm_sched_job_init().
+ *
+ * Drivers should call this from their error unwind code if @job is aborted
+ * before drm_sched_job_arm() is called.
+ *
+ * After that point of no return @job is committed to be executed by the
+ * scheduler, and this function should be called from the
+ * &drm_sched_backend_ops.free_job callback.
  */
 void drm_sched_job_cleanup(struct drm_sched_job *job)
 {
-	dma_fence_put(&job->s_fence->finished);
+	if (!kref_read(&job->s_fence->finished.refcount)) {
+		/* drm_sched_job_arm() has been called */
+		dma_fence_put(&job->s_fence->finished);
+	} else {
+		/* aborted job before committing to run it */
+		drm_sched_fence_free(&job->s_fence->finished.rcu);
+	}
+
 	job->s_fence = NULL;
 }
 EXPORT_SYMBOL(drm_sched_job_cleanup);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 4eb354226972..5c3a99027ecd 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
 	if (ret)
 		return ret;
 
+	drm_sched_job_arm(&job->base);
+
 	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
 
 	/* put by scheduler job completion */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 88ae7f331bb1..83afc3aa8e2f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       void *owner);
+void drm_sched_job_arm(struct drm_sched_job *job);
 void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
@@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 				   enum drm_sched_priority priority);
 bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
 
-struct drm_sched_fence *drm_sched_fence_create(
+struct drm_sched_fence *drm_sched_fence_alloc(
 	struct drm_sched_entity *s_entity, void *owner);
+void drm_sched_fence_init(struct drm_sched_fence *fence,
+			  struct drm_sched_entity *entity);
+void drm_sched_fence_free(struct rcu_head *rcu);
+
 void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
 void drm_sched_fence_finished(struct drm_sched_fence *fence);
 
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-02 21:38   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Emma Anholt, Adam Borowski, David Airlie, Daniel Vetter,
	Sonny Jiang, Nirmoy Das, Daniel Vetter, Lee Jones, Jack Zhang,
	lima, Mauro Carvalho Chehab, Masahiro Yamada, Steven Price,
	Luben Tuikov, Alyssa Rosenzweig, Sami Tolvanen, Viresh Kumar,
	Dave Airlie, Dennis Li, Chen Li, Paul Menzel, Kevin Wang,
	Kees Cook, Marek Olšák, Russell King, etnaviv,
	linaro-mm-sig, Deepak R Varma, Tomeu Vizoso, Nick Terrell,
	Boris Brezillon, Qiang Yu, Alex Deucher, Tian Tao, linux-media,
	Christian König

This is a very confusingly named function, because not just does it
init an object, it arms it and provides a point of no return for
pushing a job into the scheduler. It would be nice if that's a bit
clearer in the interface.

But the real reason is that I want to push the dependency tracking
helpers into the scheduler code, and that means drm_sched_job_init
must be called a lot earlier, without arming the job.

v2:
- don't change .gitignore (Steven)
- don't forget v3d (Emma)

v3: Emma noticed that I leak the memory allocated in
drm_sched_job_init if we bail out before the point of no return in
subsequent driver patches. To be able to fix this change
drm_sched_job_cleanup() so it can handle being called both before and
after drm_sched_job_arm().

Also improve the kerneldoc for this.

Acked-by: Steven Price <steven.price@arm.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: Deepak R Varma <mh12gx2825@gmail.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Kevin Wang <kevin1.wang@amd.com>
Cc: Chen Li <chenli@uniontech.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Sonny Jiang <sonny.jiang@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Tian Tao <tiantao6@hisilicon.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: Emma Anholt <emma@anholt.net>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
 drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
 drivers/gpu/drm/lima/lima_sched.c        |  2 ++
 drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
 drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
 drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
 drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
 drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
 include/drm/gpu_scheduler.h              |  7 +++-
 10 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c5386d13eb4a..a4ec092af9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	if (r)
 		goto error_unlock;
 
+	drm_sched_job_arm(&job->base);
+
 	/* No memory allocation is allowed while holding the notifier lock.
 	 * The lock is held until amdgpu_cs_submit is finished and fence is
 	 * added to BOs.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d33e6d97cc89..5ddb955d2315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
 	if (r)
 		return r;
 
+	drm_sched_job_arm(&job->base);
+
 	*f = dma_fence_get(&job->base.s_fence->finished);
 	amdgpu_job_free_resources(job);
 	drm_sched_entity_push_job(&job->base, entity);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index feb6da1b6ceb..05f412204118 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
 	if (ret)
 		goto out_unlock;
 
+	drm_sched_job_arm(&submit->sched_job);
+
 	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
 	submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
 						submit->out_fence, 0,
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index dba8329937a3..38f755580507 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
 		return err;
 	}
 
+	drm_sched_job_arm(&task->base);
+
 	task->num_bos = num_bos;
 	task->vm = lima_vm_get(vm);
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 71a72fb50e6b..2992dc85325f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
 		goto unlock;
 	}
 
+	drm_sched_job_arm(&job->base);
+
 	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
 
 	ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 79554aa4dbb1..f7347c284886 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
  * @sched_job: job to submit
  * @entity: scheduler entity
  *
- * Note: To guarantee that the order of insertion to queue matches
- * the job's fence sequence number this function should be
- * called with drm_sched_job_init under common lock.
+ * Note: To guarantee that the order of insertion to queue matches the job's
+ * fence sequence number this function should be called with drm_sched_job_arm()
+ * under common lock.
  *
  * Returns 0 for success, negative error code otherwise.
  */
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 69de2c76731f..c451ee9a30d7 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
  *
  * Free up the fence memory after the RCU grace period.
  */
-static void drm_sched_fence_free(struct rcu_head *rcu)
+void drm_sched_fence_free(struct rcu_head *rcu)
 {
 	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
 	struct drm_sched_fence *fence = to_drm_sched_fence(f);
@@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
 }
 EXPORT_SYMBOL(to_drm_sched_fence);
 
-struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
-					       void *owner)
+struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
+					      void *owner)
 {
 	struct drm_sched_fence *fence = NULL;
-	unsigned seq;
 
 	fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
 	if (fence == NULL)
@@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
 	fence->sched = entity->rq->sched;
 	spin_lock_init(&fence->lock);
 
+	return fence;
+}
+
+void drm_sched_fence_init(struct drm_sched_fence *fence,
+			  struct drm_sched_entity *entity)
+{
+	unsigned seq;
+
 	seq = atomic_inc_return(&entity->fence_seq);
 	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
 		       &fence->lock, entity->fence_context, seq);
 	dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
 		       &fence->lock, entity->fence_context + 1, seq);
-
-	return fence;
 }
 
 module_init(drm_sched_fence_slab_init);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 33c414d55fab..5e84e1500c32 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,9 +48,11 @@
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
+#include <linux/dma-resv.h>
 #include <uapi/linux/sched/types.h>
 
 #include <drm/drm_print.h>
+#include <drm/drm_gem.h>
 #include <drm/gpu_scheduler.h>
 #include <drm/spsc_queue.h>
 
@@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
 
 /**
  * drm_sched_job_init - init a scheduler job
- *
  * @job: scheduler job to init
  * @entity: scheduler entity to use
  * @owner: job owner for debugging
@@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
  * Refer to drm_sched_entity_push_job() documentation
  * for locking considerations.
  *
+ * Drivers must make sure drm_sched_job_cleanup() if this function returns
+ * successfully, even when @job is aborted before drm_sched_job_arm() is called.
+ *
  * Returns 0 for success, negative error code otherwise.
  */
 int drm_sched_job_init(struct drm_sched_job *job,
@@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 	job->sched = sched;
 	job->entity = entity;
 	job->s_priority = entity->rq - sched->sched_rq;
-	job->s_fence = drm_sched_fence_create(entity, owner);
+	job->s_fence = drm_sched_fence_alloc(entity, owner);
 	if (!job->s_fence)
 		return -ENOMEM;
 	job->id = atomic64_inc_return(&sched->job_id_count);
@@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
 EXPORT_SYMBOL(drm_sched_job_init);
 
 /**
- * drm_sched_job_cleanup - clean up scheduler job resources
+ * drm_sched_job_arm - arm a scheduler job for execution
+ * @job: scheduler job to arm
+ *
+ * This arms a scheduler job for execution. Specifically it initializes the
+ * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
+ * or other places that need to track the completion of this job.
+ *
+ * Refer to drm_sched_entity_push_job() documentation for locking
+ * considerations.
  *
+ * This can only be called if drm_sched_job_init() succeeded.
+ */
+void drm_sched_job_arm(struct drm_sched_job *job)
+{
+	drm_sched_fence_init(job->s_fence, job->entity);
+}
+EXPORT_SYMBOL(drm_sched_job_arm);
+
+/**
+ * drm_sched_job_cleanup - clean up scheduler job resources
  * @job: scheduler job to clean up
+ *
+ * Cleans up the resources allocated with drm_sched_job_init().
+ *
+ * Drivers should call this from their error unwind code if @job is aborted
+ * before drm_sched_job_arm() is called.
+ *
+ * After that point of no return @job is committed to be executed by the
+ * scheduler, and this function should be called from the
+ * &drm_sched_backend_ops.free_job callback.
  */
 void drm_sched_job_cleanup(struct drm_sched_job *job)
 {
-	dma_fence_put(&job->s_fence->finished);
+	if (!kref_read(&job->s_fence->finished.refcount)) {
+		/* drm_sched_job_arm() has been called */
+		dma_fence_put(&job->s_fence->finished);
+	} else {
+		/* aborted job before committing to run it */
+		drm_sched_fence_free(&job->s_fence->finished.rcu);
+	}
+
 	job->s_fence = NULL;
 }
 EXPORT_SYMBOL(drm_sched_job_cleanup);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 4eb354226972..5c3a99027ecd 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
 	if (ret)
 		return ret;
 
+	drm_sched_job_arm(&job->base);
+
 	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
 
 	/* put by scheduler job completion */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 88ae7f331bb1..83afc3aa8e2f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       void *owner);
+void drm_sched_job_arm(struct drm_sched_job *job);
 void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
@@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 				   enum drm_sched_priority priority);
 bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
 
-struct drm_sched_fence *drm_sched_fence_create(
+struct drm_sched_fence *drm_sched_fence_alloc(
 	struct drm_sched_entity *s_entity, void *owner);
+void drm_sched_fence_init(struct drm_sched_fence *fence,
+			  struct drm_sched_entity *entity);
+void drm_sched_fence_free(struct rcu_head *rcu);
+
 void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
 void drm_sched_fence_finished(struct drm_sched_fence *fence);
 
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 02/11] drm/sched: Add dependency tracking
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
@ 2021-07-02 21:38   ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Steven Price, Daniel Vetter, David Airlie,
	Daniel Vetter, Sumit Semwal, Christian König,
	Andrey Grodzovsky, Lee Jones, Nirmoy Das, Boris Brezillon,
	Luben Tuikov, Alex Deucher, Jack Zhang, linux-media,
	linaro-mm-sig

Instead of just a callback we can just glue in the gem helpers that
panfrost, v3d and lima currently use. There's really not that many
ways to skin this cat.

On the naming bikeshed: The idea for using _await_ to denote adding
dependencies to a job comes from i915, where that's used quite
extensively all over the place, in lots of datastructures.

v2: Rebased.

Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/scheduler/sched_entity.c |  18 +++-
 drivers/gpu/drm/scheduler/sched_main.c   | 103 +++++++++++++++++++++++
 include/drm/gpu_scheduler.h              |  31 ++++++-
 3 files changed, 146 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index f7347c284886..b6f72fafd504 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 	job->sched->ops->free_job(job);
 }
 
+static struct dma_fence *
+drm_sched_job_dependency(struct drm_sched_job *job,
+			 struct drm_sched_entity *entity)
+{
+	if (!xa_empty(&job->dependencies))
+		return xa_erase(&job->dependencies, job->last_dependency++);
+
+	if (job->sched->ops->dependency)
+		return job->sched->ops->dependency(job, entity);
+
+	return NULL;
+}
+
 /**
  * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed
  *
@@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
 		struct drm_sched_fence *s_fence = job->s_fence;
 
 		/* Wait for all dependencies to avoid data corruptions */
-		while ((f = job->sched->ops->dependency(job, entity)))
+		while ((f = drm_sched_job_dependency(job, entity)))
 			dma_fence_wait(f, false);
 
 		drm_sched_fence_scheduled(s_fence);
@@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
  */
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 {
-	struct drm_gpu_scheduler *sched = entity->rq->sched;
 	struct drm_sched_job *sched_job;
 
 	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
@@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 		return NULL;
 
 	while ((entity->dependency =
-			sched->ops->dependency(sched_job, entity))) {
+			drm_sched_job_dependency(sched_job, entity))) {
 		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
 		if (drm_sched_entity_add_dependency_cb(entity))
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 5e84e1500c32..12d533486518 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
 
 	INIT_LIST_HEAD(&job->list);
 
+	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_sched_job_init);
@@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 }
 EXPORT_SYMBOL(drm_sched_job_arm);
 
+/**
+ * drm_sched_job_await_fence - adds the fence as a job dependency
+ * @job: scheduler job to add the dependencies to
+ * @fence: the dma_fence to add to the list of dependencies.
+ *
+ * Note that @fence is consumed in both the success and error cases.
+ *
+ * Returns:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_await_fence(struct drm_sched_job *job,
+			      struct dma_fence *fence)
+{
+	struct dma_fence *entry;
+	unsigned long index;
+	u32 id = 0;
+	int ret;
+
+	if (!fence)
+		return 0;
+
+	/* Deduplicate if we already depend on a fence from the same context.
+	 * This lets the size of the array of deps scale with the number of
+	 * engines involved, rather than the number of BOs.
+	 */
+	xa_for_each(&job->dependencies, index, entry) {
+		if (entry->context != fence->context)
+			continue;
+
+		if (dma_fence_is_later(fence, entry)) {
+			dma_fence_put(entry);
+			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
+		} else {
+			dma_fence_put(fence);
+		}
+		return 0;
+	}
+
+	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
+	if (ret != 0)
+		dma_fence_put(fence);
+
+	return ret;
+}
+EXPORT_SYMBOL(drm_sched_job_await_fence);
+
+/**
+ * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies
+ * @job: scheduler job to add the dependencies to
+ * @obj: the gem object to add new dependencies from.
+ * @write: whether the job might write the object (so we need to depend on
+ * shared fences in the reservation object).
+ *
+ * This should be called after drm_gem_lock_reservations() on your array of
+ * GEM objects used in the job but before updating the reservations with your
+ * own fences.
+ *
+ * Returns:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_await_implicit(struct drm_sched_job *job,
+				 struct drm_gem_object *obj,
+				 bool write)
+{
+	int ret;
+	struct dma_fence **fences;
+	unsigned int i, fence_count;
+
+	if (!write) {
+		struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
+
+		return drm_sched_job_await_fence(job, fence);
+	}
+
+	ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
+	if (ret || !fence_count)
+		return ret;
+
+	for (i = 0; i < fence_count; i++) {
+		ret = drm_sched_job_await_fence(job, fences[i]);
+		if (ret)
+			break;
+	}
+
+	for (; i < fence_count; i++)
+		dma_fence_put(fences[i]);
+	kfree(fences);
+	return ret;
+}
+EXPORT_SYMBOL(drm_sched_job_await_implicit);
+
+
 /**
  * drm_sched_job_cleanup - clean up scheduler job resources
  * @job: scheduler job to clean up
@@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm);
  */
 void drm_sched_job_cleanup(struct drm_sched_job *job)
 {
+	struct dma_fence *fence;
+	unsigned long index;
+
 	if (!kref_read(&job->s_fence->finished.refcount)) {
 		/* drm_sched_job_arm() has been called */
 		dma_fence_put(&job->s_fence->finished);
@@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 	}
 
 	job->s_fence = NULL;
+
+	xa_for_each(&job->dependencies, index, fence) {
+		dma_fence_put(fence);
+	}
+	xa_destroy(&job->dependencies);
+
 }
 EXPORT_SYMBOL(drm_sched_job_cleanup);
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 83afc3aa8e2f..74fb321dbc44 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -27,9 +27,12 @@
 #include <drm/spsc_queue.h>
 #include <linux/dma-fence.h>
 #include <linux/completion.h>
+#include <linux/xarray.h>
 
 #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
 
+struct drm_gem_object;
+
 struct drm_gpu_scheduler;
 struct drm_sched_rq;
 
@@ -198,6 +201,16 @@ struct drm_sched_job {
 	enum drm_sched_priority		s_priority;
 	struct drm_sched_entity         *entity;
 	struct dma_fence_cb		cb;
+	/**
+	 * @dependencies:
+	 *
+	 * Contains the dependencies as struct dma_fence for this job, see
+	 * drm_sched_job_await_fence() and drm_sched_job_await_implicit().
+	 */
+	struct xarray			dependencies;
+
+	/** @last_dependency: tracks @dependencies as they signal */
+	unsigned long			last_dependency;
 };
 
 static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
@@ -220,9 +233,14 @@ enum drm_gpu_sched_stat {
  */
 struct drm_sched_backend_ops {
 	/**
-         * @dependency: Called when the scheduler is considering scheduling
-         * this job next, to get another struct dma_fence for this job to
-	 * block on.  Once it returns NULL, run_job() may be called.
+	 * @dependency:
+	 *
+	 * Called when the scheduler is considering scheduling this job next, to
+	 * get another struct dma_fence for this job to block on.  Once it
+	 * returns NULL, run_job() may be called.
+	 *
+	 * If a driver exclusively uses drm_sched_job_await_fence() and
+	 * drm_sched_job_await_implicit() this can be ommitted and left as NULL.
 	 */
 	struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
 					struct drm_sched_entity *s_entity);
@@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
+int drm_sched_job_await_fence(struct drm_sched_job *job,
+			      struct dma_fence *fence);
+int drm_sched_job_await_implicit(struct drm_sched_job *job,
+				 struct drm_gem_object *obj,
+				 bool write);
+
+
 void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 02/11] drm/sched: Add dependency tracking
@ 2021-07-02 21:38   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Jack Zhang, Christian König, David Airlie, Daniel Vetter,
	Steven Price, linaro-mm-sig, Boris Brezillon, Alex Deucher,
	Daniel Vetter, linux-media, Lee Jones, Luben Tuikov, Nirmoy Das

Instead of just a callback we can just glue in the gem helpers that
panfrost, v3d and lima currently use. There's really not that many
ways to skin this cat.

On the naming bikeshed: The idea for using _await_ to denote adding
dependencies to a job comes from i915, where that's used quite
extensively all over the place, in lots of datastructures.

v2: Rebased.

Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/scheduler/sched_entity.c |  18 +++-
 drivers/gpu/drm/scheduler/sched_main.c   | 103 +++++++++++++++++++++++
 include/drm/gpu_scheduler.h              |  31 ++++++-
 3 files changed, 146 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index f7347c284886..b6f72fafd504 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 	job->sched->ops->free_job(job);
 }
 
+static struct dma_fence *
+drm_sched_job_dependency(struct drm_sched_job *job,
+			 struct drm_sched_entity *entity)
+{
+	if (!xa_empty(&job->dependencies))
+		return xa_erase(&job->dependencies, job->last_dependency++);
+
+	if (job->sched->ops->dependency)
+		return job->sched->ops->dependency(job, entity);
+
+	return NULL;
+}
+
 /**
  * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed
  *
@@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
 		struct drm_sched_fence *s_fence = job->s_fence;
 
 		/* Wait for all dependencies to avoid data corruptions */
-		while ((f = job->sched->ops->dependency(job, entity)))
+		while ((f = drm_sched_job_dependency(job, entity)))
 			dma_fence_wait(f, false);
 
 		drm_sched_fence_scheduled(s_fence);
@@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
  */
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 {
-	struct drm_gpu_scheduler *sched = entity->rq->sched;
 	struct drm_sched_job *sched_job;
 
 	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
@@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 		return NULL;
 
 	while ((entity->dependency =
-			sched->ops->dependency(sched_job, entity))) {
+			drm_sched_job_dependency(sched_job, entity))) {
 		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
 		if (drm_sched_entity_add_dependency_cb(entity))
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 5e84e1500c32..12d533486518 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
 
 	INIT_LIST_HEAD(&job->list);
 
+	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_sched_job_init);
@@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job)
 }
 EXPORT_SYMBOL(drm_sched_job_arm);
 
+/**
+ * drm_sched_job_await_fence - adds the fence as a job dependency
+ * @job: scheduler job to add the dependencies to
+ * @fence: the dma_fence to add to the list of dependencies.
+ *
+ * Note that @fence is consumed in both the success and error cases.
+ *
+ * Returns:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_await_fence(struct drm_sched_job *job,
+			      struct dma_fence *fence)
+{
+	struct dma_fence *entry;
+	unsigned long index;
+	u32 id = 0;
+	int ret;
+
+	if (!fence)
+		return 0;
+
+	/* Deduplicate if we already depend on a fence from the same context.
+	 * This lets the size of the array of deps scale with the number of
+	 * engines involved, rather than the number of BOs.
+	 */
+	xa_for_each(&job->dependencies, index, entry) {
+		if (entry->context != fence->context)
+			continue;
+
+		if (dma_fence_is_later(fence, entry)) {
+			dma_fence_put(entry);
+			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
+		} else {
+			dma_fence_put(fence);
+		}
+		return 0;
+	}
+
+	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
+	if (ret != 0)
+		dma_fence_put(fence);
+
+	return ret;
+}
+EXPORT_SYMBOL(drm_sched_job_await_fence);
+
+/**
+ * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies
+ * @job: scheduler job to add the dependencies to
+ * @obj: the gem object to add new dependencies from.
+ * @write: whether the job might write the object (so we need to depend on
+ * shared fences in the reservation object).
+ *
+ * This should be called after drm_gem_lock_reservations() on your array of
+ * GEM objects used in the job but before updating the reservations with your
+ * own fences.
+ *
+ * Returns:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_await_implicit(struct drm_sched_job *job,
+				 struct drm_gem_object *obj,
+				 bool write)
+{
+	int ret;
+	struct dma_fence **fences;
+	unsigned int i, fence_count;
+
+	if (!write) {
+		struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
+
+		return drm_sched_job_await_fence(job, fence);
+	}
+
+	ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
+	if (ret || !fence_count)
+		return ret;
+
+	for (i = 0; i < fence_count; i++) {
+		ret = drm_sched_job_await_fence(job, fences[i]);
+		if (ret)
+			break;
+	}
+
+	for (; i < fence_count; i++)
+		dma_fence_put(fences[i]);
+	kfree(fences);
+	return ret;
+}
+EXPORT_SYMBOL(drm_sched_job_await_implicit);
+
+
 /**
  * drm_sched_job_cleanup - clean up scheduler job resources
  * @job: scheduler job to clean up
@@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm);
  */
 void drm_sched_job_cleanup(struct drm_sched_job *job)
 {
+	struct dma_fence *fence;
+	unsigned long index;
+
 	if (!kref_read(&job->s_fence->finished.refcount)) {
 		/* drm_sched_job_arm() has been called */
 		dma_fence_put(&job->s_fence->finished);
@@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 	}
 
 	job->s_fence = NULL;
+
+	xa_for_each(&job->dependencies, index, fence) {
+		dma_fence_put(fence);
+	}
+	xa_destroy(&job->dependencies);
+
 }
 EXPORT_SYMBOL(drm_sched_job_cleanup);
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 83afc3aa8e2f..74fb321dbc44 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -27,9 +27,12 @@
 #include <drm/spsc_queue.h>
 #include <linux/dma-fence.h>
 #include <linux/completion.h>
+#include <linux/xarray.h>
 
 #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
 
+struct drm_gem_object;
+
 struct drm_gpu_scheduler;
 struct drm_sched_rq;
 
@@ -198,6 +201,16 @@ struct drm_sched_job {
 	enum drm_sched_priority		s_priority;
 	struct drm_sched_entity         *entity;
 	struct dma_fence_cb		cb;
+	/**
+	 * @dependencies:
+	 *
+	 * Contains the dependencies as struct dma_fence for this job, see
+	 * drm_sched_job_await_fence() and drm_sched_job_await_implicit().
+	 */
+	struct xarray			dependencies;
+
+	/** @last_dependency: tracks @dependencies as they signal */
+	unsigned long			last_dependency;
 };
 
 static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
@@ -220,9 +233,14 @@ enum drm_gpu_sched_stat {
  */
 struct drm_sched_backend_ops {
 	/**
-         * @dependency: Called when the scheduler is considering scheduling
-         * this job next, to get another struct dma_fence for this job to
-	 * block on.  Once it returns NULL, run_job() may be called.
+	 * @dependency:
+	 *
+	 * Called when the scheduler is considering scheduling this job next, to
+	 * get another struct dma_fence for this job to block on.  Once it
+	 * returns NULL, run_job() may be called.
+	 *
+	 * If a driver exclusively uses drm_sched_job_await_fence() and
+	 * drm_sched_job_await_implicit() this can be ommitted and left as NULL.
 	 */
 	struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
 					struct drm_sched_entity *s_entity);
@@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
+int drm_sched_job_await_fence(struct drm_sched_job *job,
+			      struct dma_fence *fence);
+int drm_sched_job_await_implicit(struct drm_sched_job *job,
+				 struct drm_gem_object *obj,
+				 bool write);
+
+
 void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 03/11] drm/sched: drop entity parameter from drm_sched_push_job
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
@ 2021-07-02 21:38   ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, Emma Anholt, David Airlie,
	Daniel Vetter, Sumit Semwal, Christian König, Alex Deucher,
	Nirmoy Das, Dave Airlie, Chen Li, Lee Jones, Deepak R Varma,
	Kevin Wang, Luben Tuikov, Marek Olšák,
	Maarten Lankhorst, Andrey Grodzovsky, Dennis Li, Boris Brezillon,
	etnaviv, lima, linux-media, linaro-mm-sig

Originally a job was only bound to the queue when we pushed this, but
now that's done in drm_sched_job_init, making that parameter entirely
redundant.

Remove it.

The same applies to the context parameter in
lima_sched_context_queue_task, simplify that too.

Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chen Li <chenli@uniontech.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Deepak R Varma <mh12gx2825@gmail.com>
Cc: Kevin Wang <kevin1.wang@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c  | 2 +-
 drivers/gpu/drm/lima/lima_gem.c          | 3 +--
 drivers/gpu/drm/lima/lima_sched.c        | 5 ++---
 drivers/gpu/drm/lima/lima_sched.h        | 3 +--
 drivers/gpu/drm/panfrost/panfrost_job.c  | 2 +-
 drivers/gpu/drm/scheduler/sched_entity.c | 6 ++----
 drivers/gpu/drm/v3d/v3d_gem.c            | 2 +-
 include/drm/gpu_scheduler.h              | 3 +--
 10 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index a4ec092af9a7..18f63567fb69 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1267,7 +1267,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
 	trace_amdgpu_cs_ioctl(job);
 	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket);
-	drm_sched_entity_push_job(&job->base, entity);
+	drm_sched_entity_push_job(&job->base);
 
 	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 5ddb955d2315..b8609cccc9c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -174,7 +174,7 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
 
 	*f = dma_fence_get(&job->base.s_fence->finished);
 	amdgpu_job_free_resources(job);
-	drm_sched_entity_push_job(&job->base, entity);
+	drm_sched_entity_push_job(&job->base);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 05f412204118..180bb633d5c5 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -178,7 +178,7 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
 	/* the scheduler holds on to the job now */
 	kref_get(&submit->refcount);
 
-	drm_sched_entity_push_job(&submit->sched_job, sched_entity);
+	drm_sched_entity_push_job(&submit->sched_job);
 
 out_unlock:
 	mutex_unlock(&submit->gpu->fence_lock);
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index de62966243cd..c528f40981bb 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -359,8 +359,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 			goto err_out2;
 	}
 
-	fence = lima_sched_context_queue_task(
-		submit->ctx->context + submit->pipe, submit->task);
+	fence = lima_sched_context_queue_task(submit->task);
 
 	for (i = 0; i < submit->nr_bos; i++) {
 		if (submit->bos[i].flags & LIMA_SUBMIT_BO_WRITE)
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 38f755580507..e968b5a8f0b0 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -177,13 +177,12 @@ void lima_sched_context_fini(struct lima_sched_pipe *pipe,
 	drm_sched_entity_fini(&context->base);
 }
 
-struct dma_fence *lima_sched_context_queue_task(struct lima_sched_context *context,
-						struct lima_sched_task *task)
+struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task)
 {
 	struct dma_fence *fence = dma_fence_get(&task->base.s_fence->finished);
 
 	trace_lima_task_submit(task);
-	drm_sched_entity_push_job(&task->base, &context->base);
+	drm_sched_entity_push_job(&task->base);
 	return fence;
 }
 
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index 90f03c48ef4a..ac70006b0e26 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -98,8 +98,7 @@ int lima_sched_context_init(struct lima_sched_pipe *pipe,
 			    atomic_t *guilty);
 void lima_sched_context_fini(struct lima_sched_pipe *pipe,
 			     struct lima_sched_context *context);
-struct dma_fence *lima_sched_context_queue_task(struct lima_sched_context *context,
-						struct lima_sched_task *task);
+struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task);
 
 int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name);
 void lima_sched_pipe_fini(struct lima_sched_pipe *pipe);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 2992dc85325f..4bc962763e1f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -301,7 +301,7 @@ int panfrost_job_push(struct panfrost_job *job)
 
 	kref_get(&job->refcount); /* put by scheduler job completion */
 
-	drm_sched_entity_push_job(&job->base, entity);
+	drm_sched_entity_push_job(&job->base);
 
 	mutex_unlock(&pfdev->sched_lock);
 
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index b6f72fafd504..2ab1b9e648f2 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -493,9 +493,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 
 /**
  * drm_sched_entity_push_job - Submit a job to the entity's job queue
- *
  * @sched_job: job to submit
- * @entity: scheduler entity
  *
  * Note: To guarantee that the order of insertion to queue matches the job's
  * fence sequence number this function should be called with drm_sched_job_arm()
@@ -503,9 +501,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
  *
  * Returns 0 for success, negative error code otherwise.
  */
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
-			       struct drm_sched_entity *entity)
+void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
+	struct drm_sched_entity *entity = sched_job->entity;
 	bool first;
 
 	trace_drm_sched_job(sched_job, entity);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 5c3a99027ecd..69ac20e11b09 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -482,7 +482,7 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
 	/* put by scheduler job completion */
 	kref_get(&job->refcount);
 
-	drm_sched_entity_push_job(&job->base, &v3d_priv->sched_entity[queue]);
+	drm_sched_entity_push_job(&job->base);
 
 	return 0;
 }
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 74fb321dbc44..2bb1869f2352 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -407,8 +407,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
 void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity);
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
-			       struct drm_sched_entity *entity);
+void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 				   enum drm_sched_priority priority);
 bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 03/11] drm/sched: drop entity parameter from drm_sched_push_job
@ 2021-07-02 21:38   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Nirmoy Das,
	Daniel Vetter, Lee Jones, lima, Steven Price, Luben Tuikov,
	Alyssa Rosenzweig, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Marek Olšák, Kevin Wang, etnaviv, linaro-mm-sig,
	Deepak R Varma, Tomeu Vizoso, Boris Brezillon, Qiang Yu,
	Alex Deucher, linux-media, Christian König

Originally a job was only bound to the queue when we pushed this, but
now that's done in drm_sched_job_init, making that parameter entirely
redundant.

Remove it.

The same applies to the context parameter in
lima_sched_context_queue_task, simplify that too.

Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chen Li <chenli@uniontech.com>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Deepak R Varma <mh12gx2825@gmail.com>
Cc: Kevin Wang <kevin1.wang@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Marek Olšák" <marek.olsak@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Dennis Li <Dennis.Li@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c  | 2 +-
 drivers/gpu/drm/lima/lima_gem.c          | 3 +--
 drivers/gpu/drm/lima/lima_sched.c        | 5 ++---
 drivers/gpu/drm/lima/lima_sched.h        | 3 +--
 drivers/gpu/drm/panfrost/panfrost_job.c  | 2 +-
 drivers/gpu/drm/scheduler/sched_entity.c | 6 ++----
 drivers/gpu/drm/v3d/v3d_gem.c            | 2 +-
 include/drm/gpu_scheduler.h              | 3 +--
 10 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index a4ec092af9a7..18f63567fb69 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1267,7 +1267,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
 	trace_amdgpu_cs_ioctl(job);
 	amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket);
-	drm_sched_entity_push_job(&job->base, entity);
+	drm_sched_entity_push_job(&job->base);
 
 	amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 5ddb955d2315..b8609cccc9c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -174,7 +174,7 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
 
 	*f = dma_fence_get(&job->base.s_fence->finished);
 	amdgpu_job_free_resources(job);
-	drm_sched_entity_push_job(&job->base, entity);
+	drm_sched_entity_push_job(&job->base);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 05f412204118..180bb633d5c5 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -178,7 +178,7 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
 	/* the scheduler holds on to the job now */
 	kref_get(&submit->refcount);
 
-	drm_sched_entity_push_job(&submit->sched_job, sched_entity);
+	drm_sched_entity_push_job(&submit->sched_job);
 
 out_unlock:
 	mutex_unlock(&submit->gpu->fence_lock);
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index de62966243cd..c528f40981bb 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -359,8 +359,7 @@ int lima_gem_submit(struct drm_file *file, struct lima_submit *submit)
 			goto err_out2;
 	}
 
-	fence = lima_sched_context_queue_task(
-		submit->ctx->context + submit->pipe, submit->task);
+	fence = lima_sched_context_queue_task(submit->task);
 
 	for (i = 0; i < submit->nr_bos; i++) {
 		if (submit->bos[i].flags & LIMA_SUBMIT_BO_WRITE)
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 38f755580507..e968b5a8f0b0 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -177,13 +177,12 @@ void lima_sched_context_fini(struct lima_sched_pipe *pipe,
 	drm_sched_entity_fini(&context->base);
 }
 
-struct dma_fence *lima_sched_context_queue_task(struct lima_sched_context *context,
-						struct lima_sched_task *task)
+struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task)
 {
 	struct dma_fence *fence = dma_fence_get(&task->base.s_fence->finished);
 
 	trace_lima_task_submit(task);
-	drm_sched_entity_push_job(&task->base, &context->base);
+	drm_sched_entity_push_job(&task->base);
 	return fence;
 }
 
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index 90f03c48ef4a..ac70006b0e26 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -98,8 +98,7 @@ int lima_sched_context_init(struct lima_sched_pipe *pipe,
 			    atomic_t *guilty);
 void lima_sched_context_fini(struct lima_sched_pipe *pipe,
 			     struct lima_sched_context *context);
-struct dma_fence *lima_sched_context_queue_task(struct lima_sched_context *context,
-						struct lima_sched_task *task);
+struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task);
 
 int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name);
 void lima_sched_pipe_fini(struct lima_sched_pipe *pipe);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 2992dc85325f..4bc962763e1f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -301,7 +301,7 @@ int panfrost_job_push(struct panfrost_job *job)
 
 	kref_get(&job->refcount); /* put by scheduler job completion */
 
-	drm_sched_entity_push_job(&job->base, entity);
+	drm_sched_entity_push_job(&job->base);
 
 	mutex_unlock(&pfdev->sched_lock);
 
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index b6f72fafd504..2ab1b9e648f2 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -493,9 +493,7 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
 
 /**
  * drm_sched_entity_push_job - Submit a job to the entity's job queue
- *
  * @sched_job: job to submit
- * @entity: scheduler entity
  *
  * Note: To guarantee that the order of insertion to queue matches the job's
  * fence sequence number this function should be called with drm_sched_job_arm()
@@ -503,9 +501,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
  *
  * Returns 0 for success, negative error code otherwise.
  */
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
-			       struct drm_sched_entity *entity)
+void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
+	struct drm_sched_entity *entity = sched_job->entity;
 	bool first;
 
 	trace_drm_sched_job(sched_job, entity);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 5c3a99027ecd..69ac20e11b09 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -482,7 +482,7 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
 	/* put by scheduler job completion */
 	kref_get(&job->refcount);
 
-	drm_sched_entity_push_job(&job->base, &v3d_priv->sched_entity[queue]);
+	drm_sched_entity_push_job(&job->base);
 
 	return 0;
 }
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 74fb321dbc44..2bb1869f2352 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -407,8 +407,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
 void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
 struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity);
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
-			       struct drm_sched_entity *entity);
+void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
 				   enum drm_sched_priority priority);
 bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 04/11] drm/panfrost: use scheduler dependency tracking
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
@ 2021-07-02 21:38   ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Steven Price, Daniel Vetter, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, Sumit Semwal,
	Christian König, linux-media, linaro-mm-sig

Just deletes some code that's now more shared.

Note that thanks to the split into drm_sched_job_init/arm we can now
easily pull the _init() part from under the submission lock way ahead
where we're adding the sync file in-fences as dependencies.

v2: Correctly clean up the partially set up job, now that job_init()
and job_arm() are apart (Emma).

Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 16 ++++++++---
 drivers/gpu/drm/panfrost/panfrost_job.c | 37 +++----------------------
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 +---
 3 files changed, 17 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 1ffaef5ec5ff..9f53bea07d61 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -218,7 +218,7 @@ panfrost_copy_in_sync(struct drm_device *dev,
 		if (ret)
 			goto fail;
 
-		ret = drm_gem_fence_array_add(&job->deps, fence);
+		ret = drm_sched_job_await_fence(&job->base, fence);
 
 		if (ret)
 			goto fail;
@@ -236,7 +236,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 	struct drm_panfrost_submit *args = data;
 	struct drm_syncobj *sync_out = NULL;
 	struct panfrost_job *job;
-	int ret = 0;
+	int ret = 0, slot;
 
 	if (!args->jc)
 		return -EINVAL;
@@ -258,14 +258,20 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 
 	kref_init(&job->refcount);
 
-	xa_init_flags(&job->deps, XA_FLAGS_ALLOC);
-
 	job->pfdev = pfdev;
 	job->jc = args->jc;
 	job->requirements = args->requirements;
 	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
 	job->file_priv = file->driver_priv;
 
+	slot = panfrost_job_get_slot(job);
+
+	ret = drm_sched_job_init(&job->base,
+				 &job->file_priv->sched_entity[slot],
+				 NULL);
+	if (ret)
+		goto fail_job_put;
+
 	ret = panfrost_copy_in_sync(dev, file, args, job);
 	if (ret)
 		goto fail_job;
@@ -283,6 +289,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 		drm_syncobj_replace_fence(sync_out, job->render_done_fence);
 
 fail_job:
+	drm_sched_job_cleanup(&job->base);
+fail_job_put:
 	panfrost_job_put(job);
 fail_out_sync:
 	if (sync_out)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 4bc962763e1f..86c843d8822e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -102,7 +102,7 @@ static struct dma_fence *panfrost_fence_create(struct panfrost_device *pfdev, in
 	return &fence->base;
 }
 
-static int panfrost_job_get_slot(struct panfrost_job *job)
+int panfrost_job_get_slot(struct panfrost_job *job)
 {
 	/* JS0: fragment jobs.
 	 * JS1: vertex/tiler jobs
@@ -242,13 +242,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 
 static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
 					  int bo_count,
-					  struct xarray *deps)
+					  struct drm_sched_job *job)
 {
 	int i, ret;
 
 	for (i = 0; i < bo_count; i++) {
 		/* panfrost always uses write mode in its current uapi */
-		ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
+		ret = drm_sched_job_await_implicit(job, bos[i], true);
 		if (ret)
 			return ret;
 	}
@@ -269,31 +269,21 @@ static void panfrost_attach_object_fences(struct drm_gem_object **bos,
 int panfrost_job_push(struct panfrost_job *job)
 {
 	struct panfrost_device *pfdev = job->pfdev;
-	int slot = panfrost_job_get_slot(job);
-	struct drm_sched_entity *entity = &job->file_priv->sched_entity[slot];
 	struct ww_acquire_ctx acquire_ctx;
 	int ret = 0;
 
-
 	ret = drm_gem_lock_reservations(job->bos, job->bo_count,
 					    &acquire_ctx);
 	if (ret)
 		return ret;
 
 	mutex_lock(&pfdev->sched_lock);
-
-	ret = drm_sched_job_init(&job->base, entity, NULL);
-	if (ret) {
-		mutex_unlock(&pfdev->sched_lock);
-		goto unlock;
-	}
-
 	drm_sched_job_arm(&job->base);
 
 	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
 
 	ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
-					     &job->deps);
+					     &job->base);
 	if (ret) {
 		mutex_unlock(&pfdev->sched_lock);
 		goto unlock;
@@ -318,15 +308,8 @@ static void panfrost_job_cleanup(struct kref *ref)
 {
 	struct panfrost_job *job = container_of(ref, struct panfrost_job,
 						refcount);
-	struct dma_fence *fence;
-	unsigned long index;
 	unsigned int i;
 
-	xa_for_each(&job->deps, index, fence) {
-		dma_fence_put(fence);
-	}
-	xa_destroy(&job->deps);
-
 	dma_fence_put(job->done_fence);
 	dma_fence_put(job->render_done_fence);
 
@@ -365,17 +348,6 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
 	panfrost_job_put(job);
 }
 
-static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job,
-						 struct drm_sched_entity *s_entity)
-{
-	struct panfrost_job *job = to_panfrost_job(sched_job);
-
-	if (!xa_empty(&job->deps))
-		return xa_erase(&job->deps, job->last_dep++);
-
-	return NULL;
-}
-
 static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job)
 {
 	struct panfrost_job *job = to_panfrost_job(sched_job);
@@ -765,7 +737,6 @@ static void panfrost_reset_work(struct work_struct *work)
 }
 
 static const struct drm_sched_backend_ops panfrost_sched_ops = {
-	.dependency = panfrost_job_dependency,
 	.run_job = panfrost_job_run,
 	.timedout_job = panfrost_job_timedout,
 	.free_job = panfrost_job_free
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
index 82306a03b57e..77e6d0e6f612 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -19,10 +19,6 @@ struct panfrost_job {
 	struct panfrost_device *pfdev;
 	struct panfrost_file_priv *file_priv;
 
-	/* Contains both explicit and implicit fences */
-	struct xarray deps;
-	unsigned long last_dep;
-
 	/* Fence to be signaled by IRQ handler when the job is complete. */
 	struct dma_fence *done_fence;
 
@@ -42,6 +38,7 @@ int panfrost_job_init(struct panfrost_device *pfdev);
 void panfrost_job_fini(struct panfrost_device *pfdev);
 int panfrost_job_open(struct panfrost_file_priv *panfrost_priv);
 void panfrost_job_close(struct panfrost_file_priv *panfrost_priv);
+int panfrost_job_get_slot(struct panfrost_job *job);
 int panfrost_job_push(struct panfrost_job *job);
 void panfrost_job_put(struct panfrost_job *job);
 void panfrost_job_enable_interrupts(struct panfrost_device *pfdev);
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 04/11] drm/panfrost: use scheduler dependency tracking
@ 2021-07-02 21:38   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Tomeu Vizoso, Christian König, Daniel Vetter, Steven Price,
	linaro-mm-sig, Alyssa Rosenzweig, Daniel Vetter, linux-media

Just deletes some code that's now more shared.

Note that thanks to the split into drm_sched_job_init/arm we can now
easily pull the _init() part from under the submission lock way ahead
where we're adding the sync file in-fences as dependencies.

v2: Correctly clean up the partially set up job, now that job_init()
and job_arm() are apart (Emma).

Reviewed-by: Steven Price <steven.price@arm.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 16 ++++++++---
 drivers/gpu/drm/panfrost/panfrost_job.c | 37 +++----------------------
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 +---
 3 files changed, 17 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 1ffaef5ec5ff..9f53bea07d61 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -218,7 +218,7 @@ panfrost_copy_in_sync(struct drm_device *dev,
 		if (ret)
 			goto fail;
 
-		ret = drm_gem_fence_array_add(&job->deps, fence);
+		ret = drm_sched_job_await_fence(&job->base, fence);
 
 		if (ret)
 			goto fail;
@@ -236,7 +236,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 	struct drm_panfrost_submit *args = data;
 	struct drm_syncobj *sync_out = NULL;
 	struct panfrost_job *job;
-	int ret = 0;
+	int ret = 0, slot;
 
 	if (!args->jc)
 		return -EINVAL;
@@ -258,14 +258,20 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 
 	kref_init(&job->refcount);
 
-	xa_init_flags(&job->deps, XA_FLAGS_ALLOC);
-
 	job->pfdev = pfdev;
 	job->jc = args->jc;
 	job->requirements = args->requirements;
 	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
 	job->file_priv = file->driver_priv;
 
+	slot = panfrost_job_get_slot(job);
+
+	ret = drm_sched_job_init(&job->base,
+				 &job->file_priv->sched_entity[slot],
+				 NULL);
+	if (ret)
+		goto fail_job_put;
+
 	ret = panfrost_copy_in_sync(dev, file, args, job);
 	if (ret)
 		goto fail_job;
@@ -283,6 +289,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 		drm_syncobj_replace_fence(sync_out, job->render_done_fence);
 
 fail_job:
+	drm_sched_job_cleanup(&job->base);
+fail_job_put:
 	panfrost_job_put(job);
 fail_out_sync:
 	if (sync_out)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 4bc962763e1f..86c843d8822e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -102,7 +102,7 @@ static struct dma_fence *panfrost_fence_create(struct panfrost_device *pfdev, in
 	return &fence->base;
 }
 
-static int panfrost_job_get_slot(struct panfrost_job *job)
+int panfrost_job_get_slot(struct panfrost_job *job)
 {
 	/* JS0: fragment jobs.
 	 * JS1: vertex/tiler jobs
@@ -242,13 +242,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 
 static int panfrost_acquire_object_fences(struct drm_gem_object **bos,
 					  int bo_count,
-					  struct xarray *deps)
+					  struct drm_sched_job *job)
 {
 	int i, ret;
 
 	for (i = 0; i < bo_count; i++) {
 		/* panfrost always uses write mode in its current uapi */
-		ret = drm_gem_fence_array_add_implicit(deps, bos[i], true);
+		ret = drm_sched_job_await_implicit(job, bos[i], true);
 		if (ret)
 			return ret;
 	}
@@ -269,31 +269,21 @@ static void panfrost_attach_object_fences(struct drm_gem_object **bos,
 int panfrost_job_push(struct panfrost_job *job)
 {
 	struct panfrost_device *pfdev = job->pfdev;
-	int slot = panfrost_job_get_slot(job);
-	struct drm_sched_entity *entity = &job->file_priv->sched_entity[slot];
 	struct ww_acquire_ctx acquire_ctx;
 	int ret = 0;
 
-
 	ret = drm_gem_lock_reservations(job->bos, job->bo_count,
 					    &acquire_ctx);
 	if (ret)
 		return ret;
 
 	mutex_lock(&pfdev->sched_lock);
-
-	ret = drm_sched_job_init(&job->base, entity, NULL);
-	if (ret) {
-		mutex_unlock(&pfdev->sched_lock);
-		goto unlock;
-	}
-
 	drm_sched_job_arm(&job->base);
 
 	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
 
 	ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
-					     &job->deps);
+					     &job->base);
 	if (ret) {
 		mutex_unlock(&pfdev->sched_lock);
 		goto unlock;
@@ -318,15 +308,8 @@ static void panfrost_job_cleanup(struct kref *ref)
 {
 	struct panfrost_job *job = container_of(ref, struct panfrost_job,
 						refcount);
-	struct dma_fence *fence;
-	unsigned long index;
 	unsigned int i;
 
-	xa_for_each(&job->deps, index, fence) {
-		dma_fence_put(fence);
-	}
-	xa_destroy(&job->deps);
-
 	dma_fence_put(job->done_fence);
 	dma_fence_put(job->render_done_fence);
 
@@ -365,17 +348,6 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
 	panfrost_job_put(job);
 }
 
-static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job,
-						 struct drm_sched_entity *s_entity)
-{
-	struct panfrost_job *job = to_panfrost_job(sched_job);
-
-	if (!xa_empty(&job->deps))
-		return xa_erase(&job->deps, job->last_dep++);
-
-	return NULL;
-}
-
 static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job)
 {
 	struct panfrost_job *job = to_panfrost_job(sched_job);
@@ -765,7 +737,6 @@ static void panfrost_reset_work(struct work_struct *work)
 }
 
 static const struct drm_sched_backend_ops panfrost_sched_ops = {
-	.dependency = panfrost_job_dependency,
 	.run_job = panfrost_job_run,
 	.timedout_job = panfrost_job_timedout,
 	.free_job = panfrost_job_free
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
index 82306a03b57e..77e6d0e6f612 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -19,10 +19,6 @@ struct panfrost_job {
 	struct panfrost_device *pfdev;
 	struct panfrost_file_priv *file_priv;
 
-	/* Contains both explicit and implicit fences */
-	struct xarray deps;
-	unsigned long last_dep;
-
 	/* Fence to be signaled by IRQ handler when the job is complete. */
 	struct dma_fence *done_fence;
 
@@ -42,6 +38,7 @@ int panfrost_job_init(struct panfrost_device *pfdev);
 void panfrost_job_fini(struct panfrost_device *pfdev);
 int panfrost_job_open(struct panfrost_file_priv *panfrost_priv);
 void panfrost_job_close(struct panfrost_file_priv *panfrost_priv);
+int panfrost_job_get_slot(struct panfrost_job *job);
 int panfrost_job_push(struct panfrost_job *job);
 void panfrost_job_put(struct panfrost_job *job);
 void panfrost_job_enable_interrupts(struct panfrost_device *pfdev);
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 05/11] drm/lima: use scheduler dependency tracking
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
                   ` (3 preceding siblings ...)
  2021-07-02 21:38   ` Daniel Vetter
@ 2021-07-02 21:38 ` Daniel Vetter
  2021-07-02 21:38 ` [PATCH v2 06/11] drm/v3d: Move drm_sched_job_init to v3d_job_init Daniel Vetter
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development; +Cc: Daniel Vetter, Daniel Vetter

Nothing special going on here.

Aside reviewing the code, it seems like drm_sched_job_arm() should be
moved into lima_sched_context_queue_task and put under some mutex
together with drm_sched_push_job(). See the kerneldoc for
drm_sched_push_job().

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/lima/lima_gem.c   |  4 ++--
 drivers/gpu/drm/lima/lima_sched.c | 21 ---------------------
 drivers/gpu/drm/lima/lima_sched.h |  3 ---
 3 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index c528f40981bb..e54a88d5037a 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -267,7 +267,7 @@ static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo,
 	if (explicit)
 		return 0;
 
-	return drm_gem_fence_array_add_implicit(&task->deps, &bo->base.base, write);
+	return drm_sched_job_await_implicit(&task->base, &bo->base.base, write);
 }
 
 static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit)
@@ -285,7 +285,7 @@ static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit)
 		if (err)
 			return err;
 
-		err = drm_gem_fence_array_add(&submit->task->deps, fence);
+		err = drm_sched_job_await_fence(&submit->task->base, fence);
 		if (err) {
 			dma_fence_put(fence);
 			return err;
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index e968b5a8f0b0..99d5f6f1a882 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -134,24 +134,15 @@ int lima_sched_task_init(struct lima_sched_task *task,
 	task->num_bos = num_bos;
 	task->vm = lima_vm_get(vm);
 
-	xa_init_flags(&task->deps, XA_FLAGS_ALLOC);
-
 	return 0;
 }
 
 void lima_sched_task_fini(struct lima_sched_task *task)
 {
-	struct dma_fence *fence;
-	unsigned long index;
 	int i;
 
 	drm_sched_job_cleanup(&task->base);
 
-	xa_for_each(&task->deps, index, fence) {
-		dma_fence_put(fence);
-	}
-	xa_destroy(&task->deps);
-
 	if (task->bos) {
 		for (i = 0; i < task->num_bos; i++)
 			drm_gem_object_put(&task->bos[i]->base.base);
@@ -186,17 +177,6 @@ struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task)
 	return fence;
 }
 
-static struct dma_fence *lima_sched_dependency(struct drm_sched_job *job,
-					       struct drm_sched_entity *entity)
-{
-	struct lima_sched_task *task = to_lima_task(job);
-
-	if (!xa_empty(&task->deps))
-		return xa_erase(&task->deps, task->last_dep++);
-
-	return NULL;
-}
-
 static int lima_pm_busy(struct lima_device *ldev)
 {
 	int ret;
@@ -472,7 +452,6 @@ static void lima_sched_free_job(struct drm_sched_job *job)
 }
 
 static const struct drm_sched_backend_ops lima_sched_ops = {
-	.dependency = lima_sched_dependency,
 	.run_job = lima_sched_run_job,
 	.timedout_job = lima_sched_timedout_job,
 	.free_job = lima_sched_free_job,
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index ac70006b0e26..6a11764d87b3 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -23,9 +23,6 @@ struct lima_sched_task {
 	struct lima_vm *vm;
 	void *frame;
 
-	struct xarray deps;
-	unsigned long last_dep;
-
 	struct lima_bo **bos;
 	int num_bos;
 
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 06/11] drm/v3d: Move drm_sched_job_init to v3d_job_init
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
                   ` (4 preceding siblings ...)
  2021-07-02 21:38 ` [PATCH v2 05/11] drm/lima: " Daniel Vetter
@ 2021-07-02 21:38 ` Daniel Vetter
  2021-07-02 21:38 ` [PATCH v2 07/11] drm/v3d: Use scheduler dependency handling Daniel Vetter
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development; +Cc: Daniel Vetter, Emma Anholt, Daniel Vetter

Prep work for using the scheduler dependency handling. We need to call
drm_sched_job_init earlier so we can use the new drm_sched_job_await*
functions for dependency handling here.

v2: Slightly better commit message and rebase to include the
drm_sched_job_arm() call (Emma).

v3: Cleanup jobs under construction correctly (Emma)

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Emma Anholt <emma@anholt.net>
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  1 +
 drivers/gpu/drm/v3d/v3d_gem.c   | 88 ++++++++++++++-------------------
 drivers/gpu/drm/v3d/v3d_sched.c | 15 +++---
 3 files changed, 44 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 8a390738d65b..1d870261eaac 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -332,6 +332,7 @@ int v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
 		      struct drm_file *file_priv);
+void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
 void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 69ac20e11b09..5eccd3658938 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -392,6 +392,12 @@ v3d_render_job_free(struct kref *ref)
 	v3d_job_free(ref);
 }
 
+void v3d_job_cleanup(struct v3d_job *job)
+{
+	drm_sched_job_cleanup(&job->base);
+	v3d_job_put(job);
+}
+
 void v3d_job_put(struct v3d_job *job)
 {
 	kref_put(&job->refcount, job->free);
@@ -433,9 +439,10 @@ v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	     struct v3d_job *job, void (*free)(struct kref *ref),
-	     u32 in_sync)
+	     u32 in_sync, enum v3d_queue queue)
 {
 	struct dma_fence *in_fence = NULL;
+	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	int ret;
 
 	job->v3d = v3d;
@@ -446,35 +453,33 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 		return ret;
 
 	xa_init_flags(&job->deps, XA_FLAGS_ALLOC);
+	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
+				 v3d_priv);
+	if (ret)
+		goto fail;
 
 	ret = drm_syncobj_find_fence(file_priv, in_sync, 0, 0, &in_fence);
 	if (ret == -EINVAL)
-		goto fail;
+		goto fail_job;
 
 	ret = drm_gem_fence_array_add(&job->deps, in_fence);
 	if (ret)
-		goto fail;
+		goto fail_job;
 
 	kref_init(&job->refcount);
 
 	return 0;
+fail_job:
+	drm_sched_job_cleanup(&job->base);
 fail:
 	xa_destroy(&job->deps);
 	pm_runtime_put_autosuspend(v3d->drm.dev);
 	return ret;
 }
 
-static int
-v3d_push_job(struct v3d_file_priv *v3d_priv,
-	     struct v3d_job *job, enum v3d_queue queue)
+static void
+v3d_push_job(struct v3d_job *job)
 {
-	int ret;
-
-	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
-				 v3d_priv);
-	if (ret)
-		return ret;
-
 	drm_sched_job_arm(&job->base);
 
 	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
@@ -483,8 +488,6 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
 	kref_get(&job->refcount);
 
 	drm_sched_entity_push_job(&job->base);
-
-	return 0;
 }
 
 static void
@@ -530,7 +533,6 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 		    struct drm_file *file_priv)
 {
 	struct v3d_dev *v3d = to_v3d_dev(dev);
-	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct drm_v3d_submit_cl *args = data;
 	struct v3d_bin_job *bin = NULL;
 	struct v3d_render_job *render;
@@ -556,7 +558,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	INIT_LIST_HEAD(&render->unref_list);
 
 	ret = v3d_job_init(v3d, file_priv, &render->base,
-			   v3d_render_job_free, args->in_sync_rcl);
+			   v3d_render_job_free, args->in_sync_rcl, V3D_RENDER);
 	if (ret) {
 		kfree(render);
 		return ret;
@@ -570,7 +572,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 		}
 
 		ret = v3d_job_init(v3d, file_priv, &bin->base,
-				   v3d_job_free, args->in_sync_bcl);
+				   v3d_job_free, args->in_sync_bcl, V3D_BIN);
 		if (ret) {
 			v3d_job_put(&render->base);
 			kfree(bin);
@@ -592,7 +594,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 			goto fail;
 		}
 
-		ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0);
+		ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0, V3D_CACHE_CLEAN);
 		if (ret) {
 			kfree(clean_job);
 			clean_job = NULL;
@@ -615,9 +617,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
 	mutex_lock(&v3d->sched_lock);
 	if (bin) {
-		ret = v3d_push_job(v3d_priv, &bin->base, V3D_BIN);
-		if (ret)
-			goto fail_unreserve;
+		v3d_push_job(&bin->base);
 
 		ret = drm_gem_fence_array_add(&render->base.deps,
 					      dma_fence_get(bin->base.done_fence));
@@ -625,9 +625,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 			goto fail_unreserve;
 	}
 
-	ret = v3d_push_job(v3d_priv, &render->base, V3D_RENDER);
-	if (ret)
-		goto fail_unreserve;
+	v3d_push_job(&render->base);
 
 	if (clean_job) {
 		struct dma_fence *render_fence =
@@ -635,9 +633,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 		ret = drm_gem_fence_array_add(&clean_job->deps, render_fence);
 		if (ret)
 			goto fail_unreserve;
-		ret = v3d_push_job(v3d_priv, clean_job, V3D_CACHE_CLEAN);
-		if (ret)
-			goto fail_unreserve;
+		v3d_push_job(clean_job);
 	}
 
 	mutex_unlock(&v3d->sched_lock);
@@ -662,10 +658,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 				    last_job->bo_count, &acquire_ctx);
 fail:
 	if (bin)
-		v3d_job_put(&bin->base);
-	v3d_job_put(&render->base);
+		v3d_job_cleanup(&bin->base);
+	v3d_job_cleanup(&render->base);
 	if (clean_job)
-		v3d_job_put(clean_job);
+		v3d_job_cleanup(clean_job);
 
 	return ret;
 }
@@ -684,7 +680,6 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 		     struct drm_file *file_priv)
 {
 	struct v3d_dev *v3d = to_v3d_dev(dev);
-	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct drm_v3d_submit_tfu *args = data;
 	struct v3d_tfu_job *job;
 	struct ww_acquire_ctx acquire_ctx;
@@ -697,7 +692,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 		return -ENOMEM;
 
 	ret = v3d_job_init(v3d, file_priv, &job->base,
-			   v3d_job_free, args->in_sync);
+			   v3d_job_free, args->in_sync, V3D_TFU);
 	if (ret) {
 		kfree(job);
 		return ret;
@@ -741,9 +736,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 		goto fail;
 
 	mutex_lock(&v3d->sched_lock);
-	ret = v3d_push_job(v3d_priv, &job->base, V3D_TFU);
-	if (ret)
-		goto fail_unreserve;
+	v3d_push_job(&job->base);
 	mutex_unlock(&v3d->sched_lock);
 
 	v3d_attach_fences_and_unlock_reservation(file_priv,
@@ -755,12 +748,8 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 
 	return 0;
 
-fail_unreserve:
-	mutex_unlock(&v3d->sched_lock);
-	drm_gem_unlock_reservations(job->base.bo, job->base.bo_count,
-				    &acquire_ctx);
 fail:
-	v3d_job_put(&job->base);
+	v3d_job_cleanup(&job->base);
 
 	return ret;
 }
@@ -779,7 +768,6 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 		     struct drm_file *file_priv)
 {
 	struct v3d_dev *v3d = to_v3d_dev(dev);
-	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct drm_v3d_submit_csd *args = data;
 	struct v3d_csd_job *job;
 	struct v3d_job *clean_job;
@@ -798,7 +786,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 		return -ENOMEM;
 
 	ret = v3d_job_init(v3d, file_priv, &job->base,
-			   v3d_job_free, args->in_sync);
+			   v3d_job_free, args->in_sync, V3D_CSD);
 	if (ret) {
 		kfree(job);
 		return ret;
@@ -811,7 +799,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 		return -ENOMEM;
 	}
 
-	ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0);
+	ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0, V3D_CACHE_CLEAN);
 	if (ret) {
 		v3d_job_put(&job->base);
 		kfree(clean_job);
@@ -830,18 +818,14 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 		goto fail;
 
 	mutex_lock(&v3d->sched_lock);
-	ret = v3d_push_job(v3d_priv, &job->base, V3D_CSD);
-	if (ret)
-		goto fail_unreserve;
+	v3d_push_job(&job->base);
 
 	ret = drm_gem_fence_array_add(&clean_job->deps,
 				      dma_fence_get(job->base.done_fence));
 	if (ret)
 		goto fail_unreserve;
 
-	ret = v3d_push_job(v3d_priv, clean_job, V3D_CACHE_CLEAN);
-	if (ret)
-		goto fail_unreserve;
+	v3d_push_job(clean_job);
 	mutex_unlock(&v3d->sched_lock);
 
 	v3d_attach_fences_and_unlock_reservation(file_priv,
@@ -860,8 +844,8 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
 				    &acquire_ctx);
 fail:
-	v3d_job_put(&job->base);
-	v3d_job_put(clean_job);
+	v3d_job_cleanup(&job->base);
+	v3d_job_cleanup(clean_job);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index a39bdd5cfc4f..3f352d73af9c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -55,12 +55,11 @@ to_csd_job(struct drm_sched_job *sched_job)
 }
 
 static void
-v3d_job_free(struct drm_sched_job *sched_job)
+v3d_sched_job_free(struct drm_sched_job *sched_job)
 {
 	struct v3d_job *job = to_v3d_job(sched_job);
 
-	drm_sched_job_cleanup(sched_job);
-	v3d_job_put(job);
+	v3d_job_cleanup(job);
 }
 
 /*
@@ -360,35 +359,35 @@ static const struct drm_sched_backend_ops v3d_bin_sched_ops = {
 	.dependency = v3d_job_dependency,
 	.run_job = v3d_bin_job_run,
 	.timedout_job = v3d_bin_job_timedout,
-	.free_job = v3d_job_free,
+	.free_job = v3d_sched_job_free,
 };
 
 static const struct drm_sched_backend_ops v3d_render_sched_ops = {
 	.dependency = v3d_job_dependency,
 	.run_job = v3d_render_job_run,
 	.timedout_job = v3d_render_job_timedout,
-	.free_job = v3d_job_free,
+	.free_job = v3d_sched_job_free,
 };
 
 static const struct drm_sched_backend_ops v3d_tfu_sched_ops = {
 	.dependency = v3d_job_dependency,
 	.run_job = v3d_tfu_job_run,
 	.timedout_job = v3d_generic_job_timedout,
-	.free_job = v3d_job_free,
+	.free_job = v3d_sched_job_free,
 };
 
 static const struct drm_sched_backend_ops v3d_csd_sched_ops = {
 	.dependency = v3d_job_dependency,
 	.run_job = v3d_csd_job_run,
 	.timedout_job = v3d_csd_job_timedout,
-	.free_job = v3d_job_free
+	.free_job = v3d_sched_job_free
 };
 
 static const struct drm_sched_backend_ops v3d_cache_clean_sched_ops = {
 	.dependency = v3d_job_dependency,
 	.run_job = v3d_cache_clean_job_run,
 	.timedout_job = v3d_generic_job_timedout,
-	.free_job = v3d_job_free
+	.free_job = v3d_sched_job_free
 };
 
 int
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 07/11] drm/v3d: Use scheduler dependency handling
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
                   ` (5 preceding siblings ...)
  2021-07-02 21:38 ` [PATCH v2 06/11] drm/v3d: Move drm_sched_job_init to v3d_job_init Daniel Vetter
@ 2021-07-02 21:38 ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development; +Cc: Daniel Vetter, Daniel Vetter

With the prep work out of the way this isn't tricky anymore.

Aside: The chaining of the various jobs is a bit awkward, with the
possibility of failure in bad places. I think with the
drm_sched_job_init/arm split and maybe preloading the
job->dependencies xarray this should be fixable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h   |  5 -----
 drivers/gpu/drm/v3d/v3d_gem.c   | 25 ++++++++-----------------
 drivers/gpu/drm/v3d/v3d_sched.c | 29 +----------------------------
 3 files changed, 9 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1d870261eaac..f80f4ff1f7aa 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -192,11 +192,6 @@ struct v3d_job {
 	struct drm_gem_object **bo;
 	u32 bo_count;
 
-	/* Array of struct dma_fence * to block on before submitting this job.
-	 */
-	struct xarray deps;
-	unsigned long last_dep;
-
 	/* v3d fence to be signaled by IRQ handler when the job is complete. */
 	struct dma_fence *irq_fence;
 
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 5eccd3658938..42b07ffbea5e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -257,8 +257,8 @@ v3d_lock_bo_reservations(struct v3d_job *job,
 		return ret;
 
 	for (i = 0; i < job->bo_count; i++) {
-		ret = drm_gem_fence_array_add_implicit(&job->deps,
-						       job->bo[i], true);
+		ret = drm_sched_job_await_implicit(&job->base,
+						   job->bo[i], true);
 		if (ret) {
 			drm_gem_unlock_reservations(job->bo, job->bo_count,
 						    acquire_ctx);
@@ -354,8 +354,6 @@ static void
 v3d_job_free(struct kref *ref)
 {
 	struct v3d_job *job = container_of(ref, struct v3d_job, refcount);
-	unsigned long index;
-	struct dma_fence *fence;
 	int i;
 
 	for (i = 0; i < job->bo_count; i++) {
@@ -364,11 +362,6 @@ v3d_job_free(struct kref *ref)
 	}
 	kvfree(job->bo);
 
-	xa_for_each(&job->deps, index, fence) {
-		dma_fence_put(fence);
-	}
-	xa_destroy(&job->deps);
-
 	dma_fence_put(job->irq_fence);
 	dma_fence_put(job->done_fence);
 
@@ -452,7 +445,6 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	if (ret < 0)
 		return ret;
 
-	xa_init_flags(&job->deps, XA_FLAGS_ALLOC);
 	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
 				 v3d_priv);
 	if (ret)
@@ -462,7 +454,7 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	if (ret == -EINVAL)
 		goto fail_job;
 
-	ret = drm_gem_fence_array_add(&job->deps, in_fence);
+	ret = drm_sched_job_await_fence(&job->base, in_fence);
 	if (ret)
 		goto fail_job;
 
@@ -472,7 +464,6 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 fail_job:
 	drm_sched_job_cleanup(&job->base);
 fail:
-	xa_destroy(&job->deps);
 	pm_runtime_put_autosuspend(v3d->drm.dev);
 	return ret;
 }
@@ -619,8 +610,8 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	if (bin) {
 		v3d_push_job(&bin->base);
 
-		ret = drm_gem_fence_array_add(&render->base.deps,
-					      dma_fence_get(bin->base.done_fence));
+		ret = drm_sched_job_await_fence(&render->base.base,
+						dma_fence_get(bin->base.done_fence));
 		if (ret)
 			goto fail_unreserve;
 	}
@@ -630,7 +621,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	if (clean_job) {
 		struct dma_fence *render_fence =
 			dma_fence_get(render->base.done_fence);
-		ret = drm_gem_fence_array_add(&clean_job->deps, render_fence);
+		ret = drm_sched_job_await_fence(&clean_job->base, render_fence);
 		if (ret)
 			goto fail_unreserve;
 		v3d_push_job(clean_job);
@@ -820,8 +811,8 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	mutex_lock(&v3d->sched_lock);
 	v3d_push_job(&job->base);
 
-	ret = drm_gem_fence_array_add(&clean_job->deps,
-				      dma_fence_get(job->base.done_fence));
+	ret = drm_sched_job_await_fence(&clean_job->base,
+					dma_fence_get(job->base.done_fence));
 	if (ret)
 		goto fail_unreserve;
 
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 3f352d73af9c..f0de584f452c 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -13,7 +13,7 @@
  * jobs when bulk background jobs are queued up, we submit a new job
  * to the HW only when it has completed the last one, instead of
  * filling up the CT[01]Q FIFOs with jobs.  Similarly, we use
- * v3d_job_dependency() to manage the dependency between bin and
+ * drm_sched_job_await_fence() to manage the dependency between bin and
  * render, instead of having the clients submit jobs using the HW's
  * semaphores to interlock between them.
  */
@@ -62,28 +62,6 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
 	v3d_job_cleanup(job);
 }
 
-/*
- * Returns the fences that the job depends on, one by one.
- *
- * If placed in the scheduler's .dependency method, the corresponding
- * .run_job won't be called until all of them have been signaled.
- */
-static struct dma_fence *
-v3d_job_dependency(struct drm_sched_job *sched_job,
-		   struct drm_sched_entity *s_entity)
-{
-	struct v3d_job *job = to_v3d_job(sched_job);
-
-	/* XXX: Wait on a fence for switching the GMP if necessary,
-	 * and then do so.
-	 */
-
-	if (!xa_empty(&job->deps))
-		return xa_erase(&job->deps, job->last_dep++);
-
-	return NULL;
-}
-
 static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_bin_job *job = to_bin_job(sched_job);
@@ -356,35 +334,30 @@ v3d_csd_job_timedout(struct drm_sched_job *sched_job)
 }
 
 static const struct drm_sched_backend_ops v3d_bin_sched_ops = {
-	.dependency = v3d_job_dependency,
 	.run_job = v3d_bin_job_run,
 	.timedout_job = v3d_bin_job_timedout,
 	.free_job = v3d_sched_job_free,
 };
 
 static const struct drm_sched_backend_ops v3d_render_sched_ops = {
-	.dependency = v3d_job_dependency,
 	.run_job = v3d_render_job_run,
 	.timedout_job = v3d_render_job_timedout,
 	.free_job = v3d_sched_job_free,
 };
 
 static const struct drm_sched_backend_ops v3d_tfu_sched_ops = {
-	.dependency = v3d_job_dependency,
 	.run_job = v3d_tfu_job_run,
 	.timedout_job = v3d_generic_job_timedout,
 	.free_job = v3d_sched_job_free,
 };
 
 static const struct drm_sched_backend_ops v3d_csd_sched_ops = {
-	.dependency = v3d_job_dependency,
 	.run_job = v3d_csd_job_run,
 	.timedout_job = v3d_csd_job_timedout,
 	.free_job = v3d_sched_job_free
 };
 
 static const struct drm_sched_backend_ops v3d_cache_clean_sched_ops = {
-	.dependency = v3d_job_dependency,
 	.run_job = v3d_cache_clean_job_run,
 	.timedout_job = v3d_generic_job_timedout,
 	.free_job = v3d_sched_job_free
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
@ 2021-07-02 21:38   ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Daniel Vetter, Lucas Stach, Russell King,
	Christian Gmeiner, Sumit Semwal, Christian König, etnaviv,
	linux-media, linaro-mm-sig

We need to pull the drm_sched_job_init much earlier, but that's very
minor surgery.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: etnaviv@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
 drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
 drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
 4 files changed, 20 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
index 98e60df882b6..63688e6e4580 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
@@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
 	u64 va;
 	struct etnaviv_gem_object *obj;
 	struct etnaviv_vram_mapping *mapping;
-	struct dma_fence *excl;
-	unsigned int nr_shared;
-	struct dma_fence **shared;
 };
 
 /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
@@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
 	struct etnaviv_file_private *ctx;
 	struct etnaviv_gpu *gpu;
 	struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
-	struct dma_fence *out_fence, *in_fence;
+	struct dma_fence *out_fence;
 	int out_fence_id;
 	struct list_head node; /* GPU active submit list */
 	struct etnaviv_cmdbuf cmdbuf;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 4dd7d9d541c0..92478a50a580 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
 		if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
 			continue;
 
-		if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
-			ret = dma_resv_get_fences(robj, &bo->excl,
-						  &bo->nr_shared,
-						  &bo->shared);
-			if (ret)
-				return ret;
-		} else {
-			bo->excl = dma_resv_get_excl_unlocked(robj);
-		}
-
+		ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
+						   bo->flags & ETNA_SUBMIT_BO_WRITE);
+		if (ret)
+			return ret;
 	}
 
 	return ret;
@@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
 
 	wake_up_all(&submit->gpu->fence_event);
 
-	if (submit->in_fence)
-		dma_fence_put(submit->in_fence);
 	if (submit->out_fence) {
 		/* first remove from IDR, so fence can not be found anymore */
 		mutex_lock(&submit->gpu->fence_lock);
@@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	submit->exec_state = args->exec_state;
 	submit->flags = args->flags;
 
+	ret = drm_sched_job_init(&submit->sched_job,
+				 &ctx->sched_entity[args->pipe],
+				 submit->ctx);
+	if (ret)
+		goto err_submit_objects;
+
 	ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
 	if (ret)
 		goto err_submit_objects;
@@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
-		submit->in_fence = sync_file_get_fence(args->fence_fd);
-		if (!submit->in_fence) {
+		struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
+		if (!in_fence) {
 			ret = -EINVAL;
 			goto err_submit_objects;
 		}
+
+		ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
+		if (ret)
+			goto err_submit_objects;
 	}
 
 	ret = submit_pin_objects(submit);
@@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	if (ret)
 		goto err_submit_objects;
 
-	ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
+	ret = etnaviv_sched_push_job(submit);
 	if (ret)
 		goto err_submit_objects;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 180bb633d5c5..c98d67320be3 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
 static int etnaviv_hw_jobs_limit = 4;
 module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
 
-static struct dma_fence *
-etnaviv_sched_dependency(struct drm_sched_job *sched_job,
-			 struct drm_sched_entity *entity)
-{
-	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
-	struct dma_fence *fence;
-	int i;
-
-	if (unlikely(submit->in_fence)) {
-		fence = submit->in_fence;
-		submit->in_fence = NULL;
-
-		if (!dma_fence_is_signaled(fence))
-			return fence;
-
-		dma_fence_put(fence);
-	}
-
-	for (i = 0; i < submit->nr_bos; i++) {
-		struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
-		int j;
-
-		if (bo->excl) {
-			fence = bo->excl;
-			bo->excl = NULL;
-
-			if (!dma_fence_is_signaled(fence))
-				return fence;
-
-			dma_fence_put(fence);
-		}
-
-		for (j = 0; j < bo->nr_shared; j++) {
-			if (!bo->shared[j])
-				continue;
-
-			fence = bo->shared[j];
-			bo->shared[j] = NULL;
-
-			if (!dma_fence_is_signaled(fence))
-				return fence;
-
-			dma_fence_put(fence);
-		}
-		kfree(bo->shared);
-		bo->nr_shared = 0;
-		bo->shared = NULL;
-	}
-
-	return NULL;
-}
-
 static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
 {
 	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
@@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
 }
 
 static const struct drm_sched_backend_ops etnaviv_sched_ops = {
-	.dependency = etnaviv_sched_dependency,
 	.run_job = etnaviv_sched_run_job,
 	.timedout_job = etnaviv_sched_timedout_job,
 	.free_job = etnaviv_sched_free_job,
 };
 
-int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
-			   struct etnaviv_gem_submit *submit)
+int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
 {
 	int ret = 0;
 
@@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
 	 */
 	mutex_lock(&submit->gpu->fence_lock);
 
-	ret = drm_sched_job_init(&submit->sched_job, sched_entity,
-				 submit->ctx);
-	if (ret)
-		goto out_unlock;
-
 	drm_sched_job_arm(&submit->sched_job);
 
 	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
index c0a6796e22c9..baebfa069afc 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
@@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
 
 int etnaviv_sched_init(struct etnaviv_gpu *gpu);
 void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
-int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
-			   struct etnaviv_gem_submit *submit);
+int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
 
 #endif /* __ETNAVIV_SCHED_H__ */
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
@ 2021-07-02 21:38   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, etnaviv, Christian König, linaro-mm-sig,
	Russell King, Daniel Vetter, linux-media

We need to pull the drm_sched_job_init much earlier, but that's very
minor surgery.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: etnaviv@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
 drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
 drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
 4 files changed, 20 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
index 98e60df882b6..63688e6e4580 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
@@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
 	u64 va;
 	struct etnaviv_gem_object *obj;
 	struct etnaviv_vram_mapping *mapping;
-	struct dma_fence *excl;
-	unsigned int nr_shared;
-	struct dma_fence **shared;
 };
 
 /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
@@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
 	struct etnaviv_file_private *ctx;
 	struct etnaviv_gpu *gpu;
 	struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
-	struct dma_fence *out_fence, *in_fence;
+	struct dma_fence *out_fence;
 	int out_fence_id;
 	struct list_head node; /* GPU active submit list */
 	struct etnaviv_cmdbuf cmdbuf;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 4dd7d9d541c0..92478a50a580 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
 		if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
 			continue;
 
-		if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
-			ret = dma_resv_get_fences(robj, &bo->excl,
-						  &bo->nr_shared,
-						  &bo->shared);
-			if (ret)
-				return ret;
-		} else {
-			bo->excl = dma_resv_get_excl_unlocked(robj);
-		}
-
+		ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
+						   bo->flags & ETNA_SUBMIT_BO_WRITE);
+		if (ret)
+			return ret;
 	}
 
 	return ret;
@@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
 
 	wake_up_all(&submit->gpu->fence_event);
 
-	if (submit->in_fence)
-		dma_fence_put(submit->in_fence);
 	if (submit->out_fence) {
 		/* first remove from IDR, so fence can not be found anymore */
 		mutex_lock(&submit->gpu->fence_lock);
@@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	submit->exec_state = args->exec_state;
 	submit->flags = args->flags;
 
+	ret = drm_sched_job_init(&submit->sched_job,
+				 &ctx->sched_entity[args->pipe],
+				 submit->ctx);
+	if (ret)
+		goto err_submit_objects;
+
 	ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
 	if (ret)
 		goto err_submit_objects;
@@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
-		submit->in_fence = sync_file_get_fence(args->fence_fd);
-		if (!submit->in_fence) {
+		struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
+		if (!in_fence) {
 			ret = -EINVAL;
 			goto err_submit_objects;
 		}
+
+		ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
+		if (ret)
+			goto err_submit_objects;
 	}
 
 	ret = submit_pin_objects(submit);
@@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 	if (ret)
 		goto err_submit_objects;
 
-	ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
+	ret = etnaviv_sched_push_job(submit);
 	if (ret)
 		goto err_submit_objects;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 180bb633d5c5..c98d67320be3 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
 static int etnaviv_hw_jobs_limit = 4;
 module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
 
-static struct dma_fence *
-etnaviv_sched_dependency(struct drm_sched_job *sched_job,
-			 struct drm_sched_entity *entity)
-{
-	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
-	struct dma_fence *fence;
-	int i;
-
-	if (unlikely(submit->in_fence)) {
-		fence = submit->in_fence;
-		submit->in_fence = NULL;
-
-		if (!dma_fence_is_signaled(fence))
-			return fence;
-
-		dma_fence_put(fence);
-	}
-
-	for (i = 0; i < submit->nr_bos; i++) {
-		struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
-		int j;
-
-		if (bo->excl) {
-			fence = bo->excl;
-			bo->excl = NULL;
-
-			if (!dma_fence_is_signaled(fence))
-				return fence;
-
-			dma_fence_put(fence);
-		}
-
-		for (j = 0; j < bo->nr_shared; j++) {
-			if (!bo->shared[j])
-				continue;
-
-			fence = bo->shared[j];
-			bo->shared[j] = NULL;
-
-			if (!dma_fence_is_signaled(fence))
-				return fence;
-
-			dma_fence_put(fence);
-		}
-		kfree(bo->shared);
-		bo->nr_shared = 0;
-		bo->shared = NULL;
-	}
-
-	return NULL;
-}
-
 static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
 {
 	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
@@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
 }
 
 static const struct drm_sched_backend_ops etnaviv_sched_ops = {
-	.dependency = etnaviv_sched_dependency,
 	.run_job = etnaviv_sched_run_job,
 	.timedout_job = etnaviv_sched_timedout_job,
 	.free_job = etnaviv_sched_free_job,
 };
 
-int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
-			   struct etnaviv_gem_submit *submit)
+int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
 {
 	int ret = 0;
 
@@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
 	 */
 	mutex_lock(&submit->gpu->fence_lock);
 
-	ret = drm_sched_job_init(&submit->sched_job, sched_entity,
-				 submit->ctx);
-	if (ret)
-		goto out_unlock;
-
 	drm_sched_job_arm(&submit->sched_job);
 
 	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
index c0a6796e22c9..baebfa069afc 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
@@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
 
 int etnaviv_sched_init(struct etnaviv_gpu *gpu);
 void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
-int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
-			   struct etnaviv_gem_submit *submit);
+int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
 
 #endif /* __ETNAVIV_SCHED_H__ */
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 09/11] drm/gem: Delete gem array fencing helpers
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
@ 2021-07-02 21:38   ` Daniel Vetter
  2021-07-02 21:38   ` Daniel Vetter
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Daniel Vetter, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Sumit Semwal,
	Christian König, linux-media, linaro-mm-sig

Integrated into the scheduler now and all users converted over.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/drm_gem.c | 96 ---------------------------------------
 include/drm/drm_gem.h     |  5 --
 2 files changed, 101 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 68deb1de8235..24d49a2636e0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1294,99 +1294,3 @@ drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
 	ww_acquire_fini(acquire_ctx);
 }
 EXPORT_SYMBOL(drm_gem_unlock_reservations);
-
-/**
- * drm_gem_fence_array_add - Adds the fence to an array of fences to be
- * waited on, deduplicating fences from the same context.
- *
- * @fence_array: array of dma_fence * for the job to block on.
- * @fence: the dma_fence to add to the list of dependencies.
- *
- * This functions consumes the reference for @fence both on success and error
- * cases.
- *
- * Returns:
- * 0 on success, or an error on failing to expand the array.
- */
-int drm_gem_fence_array_add(struct xarray *fence_array,
-			    struct dma_fence *fence)
-{
-	struct dma_fence *entry;
-	unsigned long index;
-	u32 id = 0;
-	int ret;
-
-	if (!fence)
-		return 0;
-
-	/* Deduplicate if we already depend on a fence from the same context.
-	 * This lets the size of the array of deps scale with the number of
-	 * engines involved, rather than the number of BOs.
-	 */
-	xa_for_each(fence_array, index, entry) {
-		if (entry->context != fence->context)
-			continue;
-
-		if (dma_fence_is_later(fence, entry)) {
-			dma_fence_put(entry);
-			xa_store(fence_array, index, fence, GFP_KERNEL);
-		} else {
-			dma_fence_put(fence);
-		}
-		return 0;
-	}
-
-	ret = xa_alloc(fence_array, &id, fence, xa_limit_32b, GFP_KERNEL);
-	if (ret != 0)
-		dma_fence_put(fence);
-
-	return ret;
-}
-EXPORT_SYMBOL(drm_gem_fence_array_add);
-
-/**
- * drm_gem_fence_array_add_implicit - Adds the implicit dependencies tracked
- * in the GEM object's reservation object to an array of dma_fences for use in
- * scheduling a rendering job.
- *
- * This should be called after drm_gem_lock_reservations() on your array of
- * GEM objects used in the job but before updating the reservations with your
- * own fences.
- *
- * @fence_array: array of dma_fence * for the job to block on.
- * @obj: the gem object to add new dependencies from.
- * @write: whether the job might write the object (so we need to depend on
- * shared fences in the reservation object).
- */
-int drm_gem_fence_array_add_implicit(struct xarray *fence_array,
-				     struct drm_gem_object *obj,
-				     bool write)
-{
-	int ret;
-	struct dma_fence **fences;
-	unsigned int i, fence_count;
-
-	if (!write) {
-		struct dma_fence *fence =
-			dma_resv_get_excl_unlocked(obj->resv);
-
-		return drm_gem_fence_array_add(fence_array, fence);
-	}
-
-	ret = dma_resv_get_fences(obj->resv, NULL,
-						&fence_count, &fences);
-	if (ret || !fence_count)
-		return ret;
-
-	for (i = 0; i < fence_count; i++) {
-		ret = drm_gem_fence_array_add(fence_array, fences[i]);
-		if (ret)
-			break;
-	}
-
-	for (; i < fence_count; i++)
-		dma_fence_put(fences[i]);
-	kfree(fences);
-	return ret;
-}
-EXPORT_SYMBOL(drm_gem_fence_array_add_implicit);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 240049566592..6d5e33b89074 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -409,11 +409,6 @@ int drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
 			      struct ww_acquire_ctx *acquire_ctx);
 void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
 				 struct ww_acquire_ctx *acquire_ctx);
-int drm_gem_fence_array_add(struct xarray *fence_array,
-			    struct dma_fence *fence);
-int drm_gem_fence_array_add_implicit(struct xarray *fence_array,
-				     struct drm_gem_object *obj,
-				     bool write);
 int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
 			    u32 handle, u64 *offset);
 
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 09/11] drm/gem: Delete gem array fencing helpers
@ 2021-07-02 21:38   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: David Airlie, Daniel Vetter, Christian König, linaro-mm-sig,
	Thomas Zimmermann, Daniel Vetter, linux-media

Integrated into the scheduler now and all users converted over.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/gpu/drm/drm_gem.c | 96 ---------------------------------------
 include/drm/drm_gem.h     |  5 --
 2 files changed, 101 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 68deb1de8235..24d49a2636e0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1294,99 +1294,3 @@ drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
 	ww_acquire_fini(acquire_ctx);
 }
 EXPORT_SYMBOL(drm_gem_unlock_reservations);
-
-/**
- * drm_gem_fence_array_add - Adds the fence to an array of fences to be
- * waited on, deduplicating fences from the same context.
- *
- * @fence_array: array of dma_fence * for the job to block on.
- * @fence: the dma_fence to add to the list of dependencies.
- *
- * This functions consumes the reference for @fence both on success and error
- * cases.
- *
- * Returns:
- * 0 on success, or an error on failing to expand the array.
- */
-int drm_gem_fence_array_add(struct xarray *fence_array,
-			    struct dma_fence *fence)
-{
-	struct dma_fence *entry;
-	unsigned long index;
-	u32 id = 0;
-	int ret;
-
-	if (!fence)
-		return 0;
-
-	/* Deduplicate if we already depend on a fence from the same context.
-	 * This lets the size of the array of deps scale with the number of
-	 * engines involved, rather than the number of BOs.
-	 */
-	xa_for_each(fence_array, index, entry) {
-		if (entry->context != fence->context)
-			continue;
-
-		if (dma_fence_is_later(fence, entry)) {
-			dma_fence_put(entry);
-			xa_store(fence_array, index, fence, GFP_KERNEL);
-		} else {
-			dma_fence_put(fence);
-		}
-		return 0;
-	}
-
-	ret = xa_alloc(fence_array, &id, fence, xa_limit_32b, GFP_KERNEL);
-	if (ret != 0)
-		dma_fence_put(fence);
-
-	return ret;
-}
-EXPORT_SYMBOL(drm_gem_fence_array_add);
-
-/**
- * drm_gem_fence_array_add_implicit - Adds the implicit dependencies tracked
- * in the GEM object's reservation object to an array of dma_fences for use in
- * scheduling a rendering job.
- *
- * This should be called after drm_gem_lock_reservations() on your array of
- * GEM objects used in the job but before updating the reservations with your
- * own fences.
- *
- * @fence_array: array of dma_fence * for the job to block on.
- * @obj: the gem object to add new dependencies from.
- * @write: whether the job might write the object (so we need to depend on
- * shared fences in the reservation object).
- */
-int drm_gem_fence_array_add_implicit(struct xarray *fence_array,
-				     struct drm_gem_object *obj,
-				     bool write)
-{
-	int ret;
-	struct dma_fence **fences;
-	unsigned int i, fence_count;
-
-	if (!write) {
-		struct dma_fence *fence =
-			dma_resv_get_excl_unlocked(obj->resv);
-
-		return drm_gem_fence_array_add(fence_array, fence);
-	}
-
-	ret = dma_resv_get_fences(obj->resv, NULL,
-						&fence_count, &fences);
-	if (ret || !fence_count)
-		return ret;
-
-	for (i = 0; i < fence_count; i++) {
-		ret = drm_gem_fence_array_add(fence_array, fences[i]);
-		if (ret)
-			break;
-	}
-
-	for (; i < fence_count; i++)
-		dma_fence_put(fences[i]);
-	kfree(fences);
-	return ret;
-}
-EXPORT_SYMBOL(drm_gem_fence_array_add_implicit);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 240049566592..6d5e33b89074 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -409,11 +409,6 @@ int drm_gem_lock_reservations(struct drm_gem_object **objs, int count,
 			      struct ww_acquire_ctx *acquire_ctx);
 void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
 				 struct ww_acquire_ctx *acquire_ctx);
-int drm_gem_fence_array_add(struct xarray *fence_array,
-			    struct dma_fence *fence);
-int drm_gem_fence_array_add_implicit(struct xarray *fence_array,
-				     struct drm_gem_object *obj,
-				     bool write);
 int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
 			    u32 handle, u64 *offset);
 
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 10/11] drm/sched: Don't store self-dependencies
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
                   ` (8 preceding siblings ...)
  2021-07-02 21:38   ` Daniel Vetter
@ 2021-07-02 21:38 ` Daniel Vetter
  2021-07-02 21:38 ` [PATCH v2 11/11] drm/sched: Check locking in drm_sched_job_await_implicit Daniel Vetter
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Jack Zhang, Daniel Vetter, Luben Tuikov, Alex Deucher,
	Daniel Vetter, Christian König

This is essentially part of drm_sched_dependency_optimized(), which
only amdgpu seems to make use of. Use it a bit more.

This would mean that as-is amdgpu can't use the dependency helpers, at
least not with the current approach amdgpu has for deciding whether a
vm_flush is needed. Since amdgpu also has very special rules around
implicit fencing it can't use those helpers either, and adding a
drm_sched_job_await_fence_always or similar for amdgpu wouldn't be too
onerous. That way the special case handling for amdgpu sticks even
more out and we have higher chances that reviewers that go across all
drivers wont miss it.

Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 12d533486518..de76f7e14e0d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -651,6 +651,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job,
 	if (!fence)
 		return 0;
 
+	/* if it's a fence from us it's guaranteed to be earlier */
+	if (fence->context == job->entity->fence_context ||
+	    fence->context == job->entity->fence_context + 1) {
+		dma_fence_put(fence);
+		return 0;
+	}
+
 	/* Deduplicate if we already depend on a fence from the same context.
 	 * This lets the size of the array of deps scale with the number of
 	 * engines involved, rather than the number of BOs.
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v2 11/11] drm/sched: Check locking in drm_sched_job_await_implicit
  2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
                   ` (9 preceding siblings ...)
  2021-07-02 21:38 ` [PATCH v2 10/11] drm/sched: Don't store self-dependencies Daniel Vetter
@ 2021-07-02 21:38 ` Daniel Vetter
  10 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-02 21:38 UTC (permalink / raw)
  To: DRI Development
  Cc: Jack Zhang, Daniel Vetter, Luben Tuikov, Alex Deucher,
	Daniel Vetter, Christian König

You really need to hold the reservation here or all kinds of funny
things can happen between grabbing the dependencies and inserting the
new fences.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jack Zhang <Jack.Zhang1@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index de76f7e14e0d..47f869aff335 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -705,6 +705,8 @@ int drm_sched_job_await_implicit(struct drm_sched_job *job,
 	struct dma_fence **fences;
 	unsigned int i, fence_count;
 
+	dma_resv_assert_held(obj->resv);
+
 	if (!write) {
 		struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
 
-- 
2.32.0.rc2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
  2021-07-02 21:38   ` Daniel Vetter
@ 2021-07-07  9:08     ` Lucas Stach
  -1 siblings, 0 replies; 58+ messages in thread
From: Lucas Stach @ 2021-07-07  9:08 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Daniel Vetter, Russell King, Christian Gmeiner, Sumit Semwal,
	Christian König, etnaviv, linux-media, linaro-mm-sig

Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> We need to pull the drm_sched_job_init much earlier, but that's very
> minor surgery.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: etnaviv@lists.freedesktop.org
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
>  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
>  4 files changed, 20 insertions(+), 81 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> index 98e60df882b6..63688e6e4580 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
>  	u64 va;
>  	struct etnaviv_gem_object *obj;
>  	struct etnaviv_vram_mapping *mapping;
> -	struct dma_fence *excl;
> -	unsigned int nr_shared;
> -	struct dma_fence **shared;
>  };
>  
>  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
>  	struct etnaviv_file_private *ctx;
>  	struct etnaviv_gpu *gpu;
>  	struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> -	struct dma_fence *out_fence, *in_fence;
> +	struct dma_fence *out_fence;
>  	int out_fence_id;
>  	struct list_head node; /* GPU active submit list */
>  	struct etnaviv_cmdbuf cmdbuf;
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> index 4dd7d9d541c0..92478a50a580 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
>  		if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
>  			continue;
>  
> -		if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> -			ret = dma_resv_get_fences(robj, &bo->excl,
> -						  &bo->nr_shared,
> -						  &bo->shared);
> -			if (ret)
> -				return ret;
> -		} else {
> -			bo->excl = dma_resv_get_excl_unlocked(robj);
> -		}
> -
> +		ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> +						   bo->flags & ETNA_SUBMIT_BO_WRITE);
> +		if (ret)
> +			return ret;
>  	}
>  
>  	return ret;
> @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
>  
>  	wake_up_all(&submit->gpu->fence_event);
>  
> -	if (submit->in_fence)
> -		dma_fence_put(submit->in_fence);
>  	if (submit->out_fence) {
>  		/* first remove from IDR, so fence can not be found anymore */
>  		mutex_lock(&submit->gpu->fence_lock);
> @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	submit->exec_state = args->exec_state;
>  	submit->flags = args->flags;
>  
> +	ret = drm_sched_job_init(&submit->sched_job,
> +				 &ctx->sched_entity[args->pipe],
> +				 submit->ctx);
> +	if (ret)
> +		goto err_submit_objects;
> +

With the init moved here you also need to move the
drm_sched_job_cleanup call from etnaviv_sched_free_job into
submit_cleanup to avoid the potential memory leak when we bail out
before pushing the job to the scheduler.

Regards,
Lucas

>  	ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
>  	if (ret)
>  		goto err_submit_objects;
> @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	}
>  
>  	if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> -		submit->in_fence = sync_file_get_fence(args->fence_fd);
> -		if (!submit->in_fence) {
> +		struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> +		if (!in_fence) {
>  			ret = -EINVAL;
>  			goto err_submit_objects;
>  		}
> +
> +		ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> +		if (ret)
> +			goto err_submit_objects;
>  	}
>  
>  	ret = submit_pin_objects(submit);
> @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	if (ret)
>  		goto err_submit_objects;
>  
> -	ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> +	ret = etnaviv_sched_push_job(submit);
>  	if (ret)
>  		goto err_submit_objects;
>  
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 180bb633d5c5..c98d67320be3 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
>  static int etnaviv_hw_jobs_limit = 4;
>  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
>  
> -static struct dma_fence *
> -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> -			 struct drm_sched_entity *entity)
> -{
> -	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> -	struct dma_fence *fence;
> -	int i;
> -
> -	if (unlikely(submit->in_fence)) {
> -		fence = submit->in_fence;
> -		submit->in_fence = NULL;
> -
> -		if (!dma_fence_is_signaled(fence))
> -			return fence;
> -
> -		dma_fence_put(fence);
> -	}
> -
> -	for (i = 0; i < submit->nr_bos; i++) {
> -		struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> -		int j;
> -
> -		if (bo->excl) {
> -			fence = bo->excl;
> -			bo->excl = NULL;
> -
> -			if (!dma_fence_is_signaled(fence))
> -				return fence;
> -
> -			dma_fence_put(fence);
> -		}
> -
> -		for (j = 0; j < bo->nr_shared; j++) {
> -			if (!bo->shared[j])
> -				continue;
> -
> -			fence = bo->shared[j];
> -			bo->shared[j] = NULL;
> -
> -			if (!dma_fence_is_signaled(fence))
> -				return fence;
> -
> -			dma_fence_put(fence);
> -		}
> -		kfree(bo->shared);
> -		bo->nr_shared = 0;
> -		bo->shared = NULL;
> -	}
> -
> -	return NULL;
> -}
> -
>  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
>  {
>  	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
>  }
>  
>  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> -	.dependency = etnaviv_sched_dependency,
>  	.run_job = etnaviv_sched_run_job,
>  	.timedout_job = etnaviv_sched_timedout_job,
>  	.free_job = etnaviv_sched_free_job,
>  };
>  
> -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> -			   struct etnaviv_gem_submit *submit)
> +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
>  {
>  	int ret = 0;
>  
> @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>  	 */
>  	mutex_lock(&submit->gpu->fence_lock);
>  
> -	ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> -				 submit->ctx);
> -	if (ret)
> -		goto out_unlock;
> -
>  	drm_sched_job_arm(&submit->sched_job);
>  
>  	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> index c0a6796e22c9..baebfa069afc 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
>  
>  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
>  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> -			   struct etnaviv_gem_submit *submit);
> +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
>  
>  #endif /* __ETNAVIV_SCHED_H__ */



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
@ 2021-07-07  9:08     ` Lucas Stach
  0 siblings, 0 replies; 58+ messages in thread
From: Lucas Stach @ 2021-07-07  9:08 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: etnaviv, linaro-mm-sig, Russell King, Daniel Vetter,
	Christian König, linux-media

Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> We need to pull the drm_sched_job_init much earlier, but that's very
> minor surgery.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: etnaviv@lists.freedesktop.org
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
>  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
>  4 files changed, 20 insertions(+), 81 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> index 98e60df882b6..63688e6e4580 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
>  	u64 va;
>  	struct etnaviv_gem_object *obj;
>  	struct etnaviv_vram_mapping *mapping;
> -	struct dma_fence *excl;
> -	unsigned int nr_shared;
> -	struct dma_fence **shared;
>  };
>  
>  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
>  	struct etnaviv_file_private *ctx;
>  	struct etnaviv_gpu *gpu;
>  	struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> -	struct dma_fence *out_fence, *in_fence;
> +	struct dma_fence *out_fence;
>  	int out_fence_id;
>  	struct list_head node; /* GPU active submit list */
>  	struct etnaviv_cmdbuf cmdbuf;
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> index 4dd7d9d541c0..92478a50a580 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
>  		if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
>  			continue;
>  
> -		if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> -			ret = dma_resv_get_fences(robj, &bo->excl,
> -						  &bo->nr_shared,
> -						  &bo->shared);
> -			if (ret)
> -				return ret;
> -		} else {
> -			bo->excl = dma_resv_get_excl_unlocked(robj);
> -		}
> -
> +		ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> +						   bo->flags & ETNA_SUBMIT_BO_WRITE);
> +		if (ret)
> +			return ret;
>  	}
>  
>  	return ret;
> @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
>  
>  	wake_up_all(&submit->gpu->fence_event);
>  
> -	if (submit->in_fence)
> -		dma_fence_put(submit->in_fence);
>  	if (submit->out_fence) {
>  		/* first remove from IDR, so fence can not be found anymore */
>  		mutex_lock(&submit->gpu->fence_lock);
> @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	submit->exec_state = args->exec_state;
>  	submit->flags = args->flags;
>  
> +	ret = drm_sched_job_init(&submit->sched_job,
> +				 &ctx->sched_entity[args->pipe],
> +				 submit->ctx);
> +	if (ret)
> +		goto err_submit_objects;
> +

With the init moved here you also need to move the
drm_sched_job_cleanup call from etnaviv_sched_free_job into
submit_cleanup to avoid the potential memory leak when we bail out
before pushing the job to the scheduler.

Regards,
Lucas

>  	ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
>  	if (ret)
>  		goto err_submit_objects;
> @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	}
>  
>  	if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> -		submit->in_fence = sync_file_get_fence(args->fence_fd);
> -		if (!submit->in_fence) {
> +		struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> +		if (!in_fence) {
>  			ret = -EINVAL;
>  			goto err_submit_objects;
>  		}
> +
> +		ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> +		if (ret)
> +			goto err_submit_objects;
>  	}
>  
>  	ret = submit_pin_objects(submit);
> @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	if (ret)
>  		goto err_submit_objects;
>  
> -	ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> +	ret = etnaviv_sched_push_job(submit);
>  	if (ret)
>  		goto err_submit_objects;
>  
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 180bb633d5c5..c98d67320be3 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
>  static int etnaviv_hw_jobs_limit = 4;
>  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
>  
> -static struct dma_fence *
> -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> -			 struct drm_sched_entity *entity)
> -{
> -	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> -	struct dma_fence *fence;
> -	int i;
> -
> -	if (unlikely(submit->in_fence)) {
> -		fence = submit->in_fence;
> -		submit->in_fence = NULL;
> -
> -		if (!dma_fence_is_signaled(fence))
> -			return fence;
> -
> -		dma_fence_put(fence);
> -	}
> -
> -	for (i = 0; i < submit->nr_bos; i++) {
> -		struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> -		int j;
> -
> -		if (bo->excl) {
> -			fence = bo->excl;
> -			bo->excl = NULL;
> -
> -			if (!dma_fence_is_signaled(fence))
> -				return fence;
> -
> -			dma_fence_put(fence);
> -		}
> -
> -		for (j = 0; j < bo->nr_shared; j++) {
> -			if (!bo->shared[j])
> -				continue;
> -
> -			fence = bo->shared[j];
> -			bo->shared[j] = NULL;
> -
> -			if (!dma_fence_is_signaled(fence))
> -				return fence;
> -
> -			dma_fence_put(fence);
> -		}
> -		kfree(bo->shared);
> -		bo->nr_shared = 0;
> -		bo->shared = NULL;
> -	}
> -
> -	return NULL;
> -}
> -
>  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
>  {
>  	struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
>  }
>  
>  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> -	.dependency = etnaviv_sched_dependency,
>  	.run_job = etnaviv_sched_run_job,
>  	.timedout_job = etnaviv_sched_timedout_job,
>  	.free_job = etnaviv_sched_free_job,
>  };
>  
> -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> -			   struct etnaviv_gem_submit *submit)
> +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
>  {
>  	int ret = 0;
>  
> @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>  	 */
>  	mutex_lock(&submit->gpu->fence_lock);
>  
> -	ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> -				 submit->ctx);
> -	if (ret)
> -		goto out_unlock;
> -
>  	drm_sched_job_arm(&submit->sched_job);
>  
>  	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> index c0a6796e22c9..baebfa069afc 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
>  
>  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
>  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> -			   struct etnaviv_gem_submit *submit);
> +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
>  
>  #endif /* __ETNAVIV_SCHED_H__ */



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v2 02/11] drm/sched: Add dependency tracking
  2021-07-02 21:38   ` Daniel Vetter
@ 2021-07-07  9:26     ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07  9:26 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Andrey Grodzovsky, Jack Zhang, Christian König,
	David Airlie, Steven Price, linaro-mm-sig, Boris Brezillon,
	Daniel Vetter, Alex Deucher, Daniel Vetter, linux-media,
	Lee Jones, Luben Tuikov, Nirmoy Das

Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> Instead of just a callback we can just glue in the gem helpers that
> panfrost, v3d and lima currently use. There's really not that many
> ways to skin this cat.
>
> On the naming bikeshed: The idea for using _await_ to denote adding
> dependencies to a job comes from i915, where that's used quite
> extensively all over the place, in lots of datastructures.
>
> v2: Rebased.
>
> Reviewed-by: Steven Price <steven.price@arm.com> (v1)
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c |  18 +++-
>   drivers/gpu/drm/scheduler/sched_main.c   | 103 +++++++++++++++++++++++
>   include/drm/gpu_scheduler.h              |  31 ++++++-
>   3 files changed, 146 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index f7347c284886..b6f72fafd504 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>   	job->sched->ops->free_job(job);
>   }
>   
> +static struct dma_fence *
> +drm_sched_job_dependency(struct drm_sched_job *job,
> +			 struct drm_sched_entity *entity)
> +{
> +	if (!xa_empty(&job->dependencies))
> +		return xa_erase(&job->dependencies, job->last_dependency++);
> +
> +	if (job->sched->ops->dependency)
> +		return job->sched->ops->dependency(job, entity);
> +
> +	return NULL;
> +}
> +
>   /**
>    * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed
>    *
> @@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
>   		struct drm_sched_fence *s_fence = job->s_fence;
>   
>   		/* Wait for all dependencies to avoid data corruptions */
> -		while ((f = job->sched->ops->dependency(job, entity)))
> +		while ((f = drm_sched_job_dependency(job, entity)))
>   			dma_fence_wait(f, false);
>   
>   		drm_sched_fence_scheduled(s_fence);
> @@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>    */
>   struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   {
> -	struct drm_gpu_scheduler *sched = entity->rq->sched;
>   	struct drm_sched_job *sched_job;
>   
>   	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> @@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   		return NULL;
>   
>   	while ((entity->dependency =
> -			sched->ops->dependency(sched_job, entity))) {
> +			drm_sched_job_dependency(sched_job, entity))) {
>   		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
>   
>   		if (drm_sched_entity_add_dependency_cb(entity))
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 5e84e1500c32..12d533486518 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   
>   	INIT_LIST_HEAD(&job->list);
>   
> +	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> +
>   	return 0;
>   }
>   EXPORT_SYMBOL(drm_sched_job_init);
> @@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>   }
>   EXPORT_SYMBOL(drm_sched_job_arm);
>   
> +/**
> + * drm_sched_job_await_fence - adds the fence as a job dependency
> + * @job: scheduler job to add the dependencies to
> + * @fence: the dma_fence to add to the list of dependencies.
> + *
> + * Note that @fence is consumed in both the success and error cases.
> + *
> + * Returns:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_await_fence(struct drm_sched_job *job,
> +			      struct dma_fence *fence)

I'm still not very keen about the naming "await", can't we just call 
this _add_dependency? and _remove_dependency() ?

Christian.

> +{
> +	struct dma_fence *entry;
> +	unsigned long index;
> +	u32 id = 0;
> +	int ret;
> +
> +	if (!fence)
> +		return 0;
> +
> +	/* Deduplicate if we already depend on a fence from the same context.
> +	 * This lets the size of the array of deps scale with the number of
> +	 * engines involved, rather than the number of BOs.
> +	 */
> +	xa_for_each(&job->dependencies, index, entry) {
> +		if (entry->context != fence->context)
> +			continue;
> +
> +		if (dma_fence_is_later(fence, entry)) {
> +			dma_fence_put(entry);
> +			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> +		} else {
> +			dma_fence_put(fence);
> +		}
> +		return 0;
> +	}
> +
> +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> +	if (ret != 0)
> +		dma_fence_put(fence);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_sched_job_await_fence);
> +
> +/**
> + * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies
> + * @job: scheduler job to add the dependencies to
> + * @obj: the gem object to add new dependencies from.
> + * @write: whether the job might write the object (so we need to depend on
> + * shared fences in the reservation object).
> + *
> + * This should be called after drm_gem_lock_reservations() on your array of
> + * GEM objects used in the job but before updating the reservations with your
> + * own fences.
> + *
> + * Returns:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> +				 struct drm_gem_object *obj,
> +				 bool write)
> +{
> +	int ret;
> +	struct dma_fence **fences;
> +	unsigned int i, fence_count;
> +
> +	if (!write) {
> +		struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
> +
> +		return drm_sched_job_await_fence(job, fence);
> +	}
> +
> +	ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
> +	if (ret || !fence_count)
> +		return ret;
> +
> +	for (i = 0; i < fence_count; i++) {
> +		ret = drm_sched_job_await_fence(job, fences[i]);
> +		if (ret)
> +			break;
> +	}
> +
> +	for (; i < fence_count; i++)
> +		dma_fence_put(fences[i]);
> +	kfree(fences);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_sched_job_await_implicit);
> +
> +
>   /**
>    * drm_sched_job_cleanup - clean up scheduler job resources
>    * @job: scheduler job to clean up
> @@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>    */
>   void drm_sched_job_cleanup(struct drm_sched_job *job)
>   {
> +	struct dma_fence *fence;
> +	unsigned long index;
> +
>   	if (!kref_read(&job->s_fence->finished.refcount)) {
>   		/* drm_sched_job_arm() has been called */
>   		dma_fence_put(&job->s_fence->finished);
> @@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
>   	}
>   
>   	job->s_fence = NULL;
> +
> +	xa_for_each(&job->dependencies, index, fence) {
> +		dma_fence_put(fence);
> +	}
> +	xa_destroy(&job->dependencies);
> +
>   }
>   EXPORT_SYMBOL(drm_sched_job_cleanup);
>   
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 83afc3aa8e2f..74fb321dbc44 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -27,9 +27,12 @@
>   #include <drm/spsc_queue.h>
>   #include <linux/dma-fence.h>
>   #include <linux/completion.h>
> +#include <linux/xarray.h>
>   
>   #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
>   
> +struct drm_gem_object;
> +
>   struct drm_gpu_scheduler;
>   struct drm_sched_rq;
>   
> @@ -198,6 +201,16 @@ struct drm_sched_job {
>   	enum drm_sched_priority		s_priority;
>   	struct drm_sched_entity         *entity;
>   	struct dma_fence_cb		cb;
> +	/**
> +	 * @dependencies:
> +	 *
> +	 * Contains the dependencies as struct dma_fence for this job, see
> +	 * drm_sched_job_await_fence() and drm_sched_job_await_implicit().
> +	 */
> +	struct xarray			dependencies;
> +
> +	/** @last_dependency: tracks @dependencies as they signal */
> +	unsigned long			last_dependency;
>   };
>   
>   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
> @@ -220,9 +233,14 @@ enum drm_gpu_sched_stat {
>    */
>   struct drm_sched_backend_ops {
>   	/**
> -         * @dependency: Called when the scheduler is considering scheduling
> -         * this job next, to get another struct dma_fence for this job to
> -	 * block on.  Once it returns NULL, run_job() may be called.
> +	 * @dependency:
> +	 *
> +	 * Called when the scheduler is considering scheduling this job next, to
> +	 * get another struct dma_fence for this job to block on.  Once it
> +	 * returns NULL, run_job() may be called.
> +	 *
> +	 * If a driver exclusively uses drm_sched_job_await_fence() and
> +	 * drm_sched_job_await_implicit() this can be ommitted and left as NULL.
>   	 */
>   	struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
>   					struct drm_sched_entity *s_entity);
> @@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		       struct drm_sched_entity *entity,
>   		       void *owner);
>   void drm_sched_job_arm(struct drm_sched_job *job);
> +int drm_sched_job_await_fence(struct drm_sched_job *job,
> +			      struct dma_fence *fence);
> +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> +				 struct drm_gem_object *obj,
> +				 bool write);
> +
> +
>   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>   				    struct drm_gpu_scheduler **sched_list,
>                                      unsigned int num_sched_list);


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v2 02/11] drm/sched: Add dependency tracking
@ 2021-07-07  9:26     ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07  9:26 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Jack Zhang, David Airlie, Steven Price, linaro-mm-sig,
	Boris Brezillon, Alex Deucher, Daniel Vetter, Nirmoy Das,
	Lee Jones, Christian König, Luben Tuikov, linux-media

Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> Instead of just a callback we can just glue in the gem helpers that
> panfrost, v3d and lima currently use. There's really not that many
> ways to skin this cat.
>
> On the naming bikeshed: The idea for using _await_ to denote adding
> dependencies to a job comes from i915, where that's used quite
> extensively all over the place, in lots of datastructures.
>
> v2: Rebased.
>
> Reviewed-by: Steven Price <steven.price@arm.com> (v1)
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c |  18 +++-
>   drivers/gpu/drm/scheduler/sched_main.c   | 103 +++++++++++++++++++++++
>   include/drm/gpu_scheduler.h              |  31 ++++++-
>   3 files changed, 146 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index f7347c284886..b6f72fafd504 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>   	job->sched->ops->free_job(job);
>   }
>   
> +static struct dma_fence *
> +drm_sched_job_dependency(struct drm_sched_job *job,
> +			 struct drm_sched_entity *entity)
> +{
> +	if (!xa_empty(&job->dependencies))
> +		return xa_erase(&job->dependencies, job->last_dependency++);
> +
> +	if (job->sched->ops->dependency)
> +		return job->sched->ops->dependency(job, entity);
> +
> +	return NULL;
> +}
> +
>   /**
>    * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed
>    *
> @@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
>   		struct drm_sched_fence *s_fence = job->s_fence;
>   
>   		/* Wait for all dependencies to avoid data corruptions */
> -		while ((f = job->sched->ops->dependency(job, entity)))
> +		while ((f = drm_sched_job_dependency(job, entity)))
>   			dma_fence_wait(f, false);
>   
>   		drm_sched_fence_scheduled(s_fence);
> @@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>    */
>   struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   {
> -	struct drm_gpu_scheduler *sched = entity->rq->sched;
>   	struct drm_sched_job *sched_job;
>   
>   	sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> @@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   		return NULL;
>   
>   	while ((entity->dependency =
> -			sched->ops->dependency(sched_job, entity))) {
> +			drm_sched_job_dependency(sched_job, entity))) {
>   		trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
>   
>   		if (drm_sched_entity_add_dependency_cb(entity))
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 5e84e1500c32..12d533486518 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   
>   	INIT_LIST_HEAD(&job->list);
>   
> +	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> +
>   	return 0;
>   }
>   EXPORT_SYMBOL(drm_sched_job_init);
> @@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job)
>   }
>   EXPORT_SYMBOL(drm_sched_job_arm);
>   
> +/**
> + * drm_sched_job_await_fence - adds the fence as a job dependency
> + * @job: scheduler job to add the dependencies to
> + * @fence: the dma_fence to add to the list of dependencies.
> + *
> + * Note that @fence is consumed in both the success and error cases.
> + *
> + * Returns:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_await_fence(struct drm_sched_job *job,
> +			      struct dma_fence *fence)

I'm still not very keen about the naming "await", can't we just call 
this _add_dependency? and _remove_dependency() ?

Christian.

> +{
> +	struct dma_fence *entry;
> +	unsigned long index;
> +	u32 id = 0;
> +	int ret;
> +
> +	if (!fence)
> +		return 0;
> +
> +	/* Deduplicate if we already depend on a fence from the same context.
> +	 * This lets the size of the array of deps scale with the number of
> +	 * engines involved, rather than the number of BOs.
> +	 */
> +	xa_for_each(&job->dependencies, index, entry) {
> +		if (entry->context != fence->context)
> +			continue;
> +
> +		if (dma_fence_is_later(fence, entry)) {
> +			dma_fence_put(entry);
> +			xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> +		} else {
> +			dma_fence_put(fence);
> +		}
> +		return 0;
> +	}
> +
> +	ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> +	if (ret != 0)
> +		dma_fence_put(fence);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_sched_job_await_fence);
> +
> +/**
> + * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies
> + * @job: scheduler job to add the dependencies to
> + * @obj: the gem object to add new dependencies from.
> + * @write: whether the job might write the object (so we need to depend on
> + * shared fences in the reservation object).
> + *
> + * This should be called after drm_gem_lock_reservations() on your array of
> + * GEM objects used in the job but before updating the reservations with your
> + * own fences.
> + *
> + * Returns:
> + * 0 on success, or an error on failing to expand the array.
> + */
> +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> +				 struct drm_gem_object *obj,
> +				 bool write)
> +{
> +	int ret;
> +	struct dma_fence **fences;
> +	unsigned int i, fence_count;
> +
> +	if (!write) {
> +		struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
> +
> +		return drm_sched_job_await_fence(job, fence);
> +	}
> +
> +	ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
> +	if (ret || !fence_count)
> +		return ret;
> +
> +	for (i = 0; i < fence_count; i++) {
> +		ret = drm_sched_job_await_fence(job, fences[i]);
> +		if (ret)
> +			break;
> +	}
> +
> +	for (; i < fence_count; i++)
> +		dma_fence_put(fences[i]);
> +	kfree(fences);
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_sched_job_await_implicit);
> +
> +
>   /**
>    * drm_sched_job_cleanup - clean up scheduler job resources
>    * @job: scheduler job to clean up
> @@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>    */
>   void drm_sched_job_cleanup(struct drm_sched_job *job)
>   {
> +	struct dma_fence *fence;
> +	unsigned long index;
> +
>   	if (!kref_read(&job->s_fence->finished.refcount)) {
>   		/* drm_sched_job_arm() has been called */
>   		dma_fence_put(&job->s_fence->finished);
> @@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
>   	}
>   
>   	job->s_fence = NULL;
> +
> +	xa_for_each(&job->dependencies, index, fence) {
> +		dma_fence_put(fence);
> +	}
> +	xa_destroy(&job->dependencies);
> +
>   }
>   EXPORT_SYMBOL(drm_sched_job_cleanup);
>   
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 83afc3aa8e2f..74fb321dbc44 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -27,9 +27,12 @@
>   #include <drm/spsc_queue.h>
>   #include <linux/dma-fence.h>
>   #include <linux/completion.h>
> +#include <linux/xarray.h>
>   
>   #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
>   
> +struct drm_gem_object;
> +
>   struct drm_gpu_scheduler;
>   struct drm_sched_rq;
>   
> @@ -198,6 +201,16 @@ struct drm_sched_job {
>   	enum drm_sched_priority		s_priority;
>   	struct drm_sched_entity         *entity;
>   	struct dma_fence_cb		cb;
> +	/**
> +	 * @dependencies:
> +	 *
> +	 * Contains the dependencies as struct dma_fence for this job, see
> +	 * drm_sched_job_await_fence() and drm_sched_job_await_implicit().
> +	 */
> +	struct xarray			dependencies;
> +
> +	/** @last_dependency: tracks @dependencies as they signal */
> +	unsigned long			last_dependency;
>   };
>   
>   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
> @@ -220,9 +233,14 @@ enum drm_gpu_sched_stat {
>    */
>   struct drm_sched_backend_ops {
>   	/**
> -         * @dependency: Called when the scheduler is considering scheduling
> -         * this job next, to get another struct dma_fence for this job to
> -	 * block on.  Once it returns NULL, run_job() may be called.
> +	 * @dependency:
> +	 *
> +	 * Called when the scheduler is considering scheduling this job next, to
> +	 * get another struct dma_fence for this job to block on.  Once it
> +	 * returns NULL, run_job() may be called.
> +	 *
> +	 * If a driver exclusively uses drm_sched_job_await_fence() and
> +	 * drm_sched_job_await_implicit() this can be ommitted and left as NULL.
>   	 */
>   	struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
>   					struct drm_sched_entity *s_entity);
> @@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		       struct drm_sched_entity *entity,
>   		       void *owner);
>   void drm_sched_job_arm(struct drm_sched_job *job);
> +int drm_sched_job_await_fence(struct drm_sched_job *job,
> +			      struct dma_fence *fence);
> +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> +				 struct drm_gem_object *obj,
> +				 bool write);
> +
> +
>   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>   				    struct drm_gpu_scheduler **sched_list,
>                                      unsigned int num_sched_list);


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-02 21:38   ` Daniel Vetter
@ 2021-07-07  9:29     ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07  9:29 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Steven Price, Daniel Vetter, Lucas Stach, Russell King,
	Christian Gmeiner, Qiang Yu, Rob Herring, Tomeu Vizoso,
	Alyssa Rosenzweig, David Airlie, Daniel Vetter, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, etnaviv, lima,
	linux-media, linaro-mm-sig, Emma Anholt

Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> This is a very confusingly named function, because not just does it
> init an object, it arms it and provides a point of no return for
> pushing a job into the scheduler. It would be nice if that's a bit
> clearer in the interface.
>
> But the real reason is that I want to push the dependency tracking
> helpers into the scheduler code, and that means drm_sched_job_init
> must be called a lot earlier, without arming the job.
>
> v2:
> - don't change .gitignore (Steven)
> - don't forget v3d (Emma)
>
> v3: Emma noticed that I leak the memory allocated in
> drm_sched_job_init if we bail out before the point of no return in
> subsequent driver patches. To be able to fix this change
> drm_sched_job_cleanup() so it can handle being called both before and
> after drm_sched_job_arm().

Thinking more about this, I'm not sure if this really works.

See drm_sched_job_init() was also calling drm_sched_entity_select_rq() 
to update the entity->rq association.

And that can only be done later on when we arm the fence as well.

Christian.

>
> Also improve the kerneldoc for this.
>
> Acked-by: Steven Price <steven.price@arm.com> (v2)
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> Cc: Qiang Yu <yuq825@gmail.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Masahiro Yamada <masahiroy@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Adam Borowski <kilobyte@angband.pl>
> Cc: Nick Terrell <terrelln@fb.com>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Nirmoy Das <nirmoy.das@amd.com>
> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Kevin Wang <kevin1.wang@amd.com>
> Cc: Chen Li <chenli@uniontech.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: "Marek Olšák" <marek.olsak@amd.com>
> Cc: Dennis Li <Dennis.Li@amd.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Sonny Jiang <sonny.jiang@amd.com>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Tian Tao <tiantao6@hisilicon.com>
> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> Cc: etnaviv@lists.freedesktop.org
> Cc: lima@lists.freedesktop.org
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> Cc: Emma Anholt <emma@anholt.net>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>   drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>   drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>   drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>   drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>   drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>   drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>   include/drm/gpu_scheduler.h              |  7 +++-
>   10 files changed, 74 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index c5386d13eb4a..a4ec092af9a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	if (r)
>   		goto error_unlock;
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	/* No memory allocation is allowed while holding the notifier lock.
>   	 * The lock is held until amdgpu_cs_submit is finished and fence is
>   	 * added to BOs.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index d33e6d97cc89..5ddb955d2315 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>   	if (r)
>   		return r;
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	*f = dma_fence_get(&job->base.s_fence->finished);
>   	amdgpu_job_free_resources(job);
>   	drm_sched_entity_push_job(&job->base, entity);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index feb6da1b6ceb..05f412204118 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>   	if (ret)
>   		goto out_unlock;
>   
> +	drm_sched_job_arm(&submit->sched_job);
> +
>   	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>   	submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>   						submit->out_fence, 0,
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index dba8329937a3..38f755580507 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>   		return err;
>   	}
>   
> +	drm_sched_job_arm(&task->base);
> +
>   	task->num_bos = num_bos;
>   	task->vm = lima_vm_get(vm);
>   
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 71a72fb50e6b..2992dc85325f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>   		goto unlock;
>   	}
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>   
>   	ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 79554aa4dbb1..f7347c284886 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>    * @sched_job: job to submit
>    * @entity: scheduler entity
>    *
> - * Note: To guarantee that the order of insertion to queue matches
> - * the job's fence sequence number this function should be
> - * called with drm_sched_job_init under common lock.
> + * Note: To guarantee that the order of insertion to queue matches the job's
> + * fence sequence number this function should be called with drm_sched_job_arm()
> + * under common lock.
>    *
>    * Returns 0 for success, negative error code otherwise.
>    */
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> index 69de2c76731f..c451ee9a30d7 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>    *
>    * Free up the fence memory after the RCU grace period.
>    */
> -static void drm_sched_fence_free(struct rcu_head *rcu)
> +void drm_sched_fence_free(struct rcu_head *rcu)
>   {
>   	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>   	struct drm_sched_fence *fence = to_drm_sched_fence(f);
> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>   }
>   EXPORT_SYMBOL(to_drm_sched_fence);
>   
> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> -					       void *owner)
> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> +					      void *owner)
>   {
>   	struct drm_sched_fence *fence = NULL;
> -	unsigned seq;
>   
>   	fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>   	if (fence == NULL)
> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>   	fence->sched = entity->rq->sched;
>   	spin_lock_init(&fence->lock);
>   
> +	return fence;
> +}
> +
> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> +			  struct drm_sched_entity *entity)
> +{
> +	unsigned seq;
> +
>   	seq = atomic_inc_return(&entity->fence_seq);
>   	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>   		       &fence->lock, entity->fence_context, seq);
>   	dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>   		       &fence->lock, entity->fence_context + 1, seq);
> -
> -	return fence;
>   }
>   
>   module_init(drm_sched_fence_slab_init);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 33c414d55fab..5e84e1500c32 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -48,9 +48,11 @@
>   #include <linux/wait.h>
>   #include <linux/sched.h>
>   #include <linux/completion.h>
> +#include <linux/dma-resv.h>
>   #include <uapi/linux/sched/types.h>
>   
>   #include <drm/drm_print.h>
> +#include <drm/drm_gem.h>
>   #include <drm/gpu_scheduler.h>
>   #include <drm/spsc_queue.h>
>   
> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>   
>   /**
>    * drm_sched_job_init - init a scheduler job
> - *
>    * @job: scheduler job to init
>    * @entity: scheduler entity to use
>    * @owner: job owner for debugging
> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>    * Refer to drm_sched_entity_push_job() documentation
>    * for locking considerations.
>    *
> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> + *
>    * Returns 0 for success, negative error code otherwise.
>    */
>   int drm_sched_job_init(struct drm_sched_job *job,
> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   	job->sched = sched;
>   	job->entity = entity;
>   	job->s_priority = entity->rq - sched->sched_rq;
> -	job->s_fence = drm_sched_fence_create(entity, owner);
> +	job->s_fence = drm_sched_fence_alloc(entity, owner);
>   	if (!job->s_fence)
>   		return -ENOMEM;
>   	job->id = atomic64_inc_return(&sched->job_id_count);
> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   EXPORT_SYMBOL(drm_sched_job_init);
>   
>   /**
> - * drm_sched_job_cleanup - clean up scheduler job resources
> + * drm_sched_job_arm - arm a scheduler job for execution
> + * @job: scheduler job to arm
> + *
> + * This arms a scheduler job for execution. Specifically it initializes the
> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> + * or other places that need to track the completion of this job.
> + *
> + * Refer to drm_sched_entity_push_job() documentation for locking
> + * considerations.
>    *
> + * This can only be called if drm_sched_job_init() succeeded.
> + */
> +void drm_sched_job_arm(struct drm_sched_job *job)
> +{
> +	drm_sched_fence_init(job->s_fence, job->entity);
> +}
> +EXPORT_SYMBOL(drm_sched_job_arm);
> +
> +/**
> + * drm_sched_job_cleanup - clean up scheduler job resources
>    * @job: scheduler job to clean up
> + *
> + * Cleans up the resources allocated with drm_sched_job_init().
> + *
> + * Drivers should call this from their error unwind code if @job is aborted
> + * before drm_sched_job_arm() is called.
> + *
> + * After that point of no return @job is committed to be executed by the
> + * scheduler, and this function should be called from the
> + * &drm_sched_backend_ops.free_job callback.
>    */
>   void drm_sched_job_cleanup(struct drm_sched_job *job)
>   {
> -	dma_fence_put(&job->s_fence->finished);
> +	if (!kref_read(&job->s_fence->finished.refcount)) {
> +		/* drm_sched_job_arm() has been called */
> +		dma_fence_put(&job->s_fence->finished);
> +	} else {
> +		/* aborted job before committing to run it */
> +		drm_sched_fence_free(&job->s_fence->finished.rcu);
> +	}
> +
>   	job->s_fence = NULL;
>   }
>   EXPORT_SYMBOL(drm_sched_job_cleanup);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index 4eb354226972..5c3a99027ecd 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>   	if (ret)
>   		return ret;
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>   
>   	/* put by scheduler job completion */
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 88ae7f331bb1..83afc3aa8e2f 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>   int drm_sched_job_init(struct drm_sched_job *job,
>   		       struct drm_sched_entity *entity,
>   		       void *owner);
> +void drm_sched_job_arm(struct drm_sched_job *job);
>   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>   				    struct drm_gpu_scheduler **sched_list,
>                                      unsigned int num_sched_list);
> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>   				   enum drm_sched_priority priority);
>   bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>   
> -struct drm_sched_fence *drm_sched_fence_create(
> +struct drm_sched_fence *drm_sched_fence_alloc(
>   	struct drm_sched_entity *s_entity, void *owner);
> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> +			  struct drm_sched_entity *entity);
> +void drm_sched_fence_free(struct rcu_head *rcu);
> +
>   void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>   void drm_sched_fence_finished(struct drm_sched_fence *fence);
>   


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-07  9:29     ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07  9:29 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	Sonny Jiang, Nirmoy Das, Daniel Vetter, Lee Jones, Jack Zhang,
	lima, Mauro Carvalho Chehab, Masahiro Yamada, Steven Price,
	Luben Tuikov, Alyssa Rosenzweig, Sami Tolvanen, Russell King,
	Dave Airlie, Dennis Li, Chen Li, Paul Menzel, Kees Cook,
	Marek Olšák, Kevin Wang, etnaviv, linaro-mm-sig,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao, linux-media

Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> This is a very confusingly named function, because not just does it
> init an object, it arms it and provides a point of no return for
> pushing a job into the scheduler. It would be nice if that's a bit
> clearer in the interface.
>
> But the real reason is that I want to push the dependency tracking
> helpers into the scheduler code, and that means drm_sched_job_init
> must be called a lot earlier, without arming the job.
>
> v2:
> - don't change .gitignore (Steven)
> - don't forget v3d (Emma)
>
> v3: Emma noticed that I leak the memory allocated in
> drm_sched_job_init if we bail out before the point of no return in
> subsequent driver patches. To be able to fix this change
> drm_sched_job_cleanup() so it can handle being called both before and
> after drm_sched_job_arm().

Thinking more about this, I'm not sure if this really works.

See drm_sched_job_init() was also calling drm_sched_entity_select_rq() 
to update the entity->rq association.

And that can only be done later on when we arm the fence as well.

Christian.

>
> Also improve the kerneldoc for this.
>
> Acked-by: Steven Price <steven.price@arm.com> (v2)
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> Cc: Qiang Yu <yuq825@gmail.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Masahiro Yamada <masahiroy@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Adam Borowski <kilobyte@angband.pl>
> Cc: Nick Terrell <terrelln@fb.com>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Nirmoy Das <nirmoy.das@amd.com>
> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Kevin Wang <kevin1.wang@amd.com>
> Cc: Chen Li <chenli@uniontech.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: "Marek Olšák" <marek.olsak@amd.com>
> Cc: Dennis Li <Dennis.Li@amd.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Sonny Jiang <sonny.jiang@amd.com>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Tian Tao <tiantao6@hisilicon.com>
> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> Cc: etnaviv@lists.freedesktop.org
> Cc: lima@lists.freedesktop.org
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> Cc: Emma Anholt <emma@anholt.net>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>   drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>   drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>   drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>   drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>   drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>   drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>   include/drm/gpu_scheduler.h              |  7 +++-
>   10 files changed, 74 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index c5386d13eb4a..a4ec092af9a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	if (r)
>   		goto error_unlock;
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	/* No memory allocation is allowed while holding the notifier lock.
>   	 * The lock is held until amdgpu_cs_submit is finished and fence is
>   	 * added to BOs.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index d33e6d97cc89..5ddb955d2315 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>   	if (r)
>   		return r;
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	*f = dma_fence_get(&job->base.s_fence->finished);
>   	amdgpu_job_free_resources(job);
>   	drm_sched_entity_push_job(&job->base, entity);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index feb6da1b6ceb..05f412204118 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>   	if (ret)
>   		goto out_unlock;
>   
> +	drm_sched_job_arm(&submit->sched_job);
> +
>   	submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>   	submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>   						submit->out_fence, 0,
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index dba8329937a3..38f755580507 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>   		return err;
>   	}
>   
> +	drm_sched_job_arm(&task->base);
> +
>   	task->num_bos = num_bos;
>   	task->vm = lima_vm_get(vm);
>   
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 71a72fb50e6b..2992dc85325f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>   		goto unlock;
>   	}
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>   
>   	ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 79554aa4dbb1..f7347c284886 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>    * @sched_job: job to submit
>    * @entity: scheduler entity
>    *
> - * Note: To guarantee that the order of insertion to queue matches
> - * the job's fence sequence number this function should be
> - * called with drm_sched_job_init under common lock.
> + * Note: To guarantee that the order of insertion to queue matches the job's
> + * fence sequence number this function should be called with drm_sched_job_arm()
> + * under common lock.
>    *
>    * Returns 0 for success, negative error code otherwise.
>    */
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> index 69de2c76731f..c451ee9a30d7 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>    *
>    * Free up the fence memory after the RCU grace period.
>    */
> -static void drm_sched_fence_free(struct rcu_head *rcu)
> +void drm_sched_fence_free(struct rcu_head *rcu)
>   {
>   	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>   	struct drm_sched_fence *fence = to_drm_sched_fence(f);
> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>   }
>   EXPORT_SYMBOL(to_drm_sched_fence);
>   
> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> -					       void *owner)
> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> +					      void *owner)
>   {
>   	struct drm_sched_fence *fence = NULL;
> -	unsigned seq;
>   
>   	fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>   	if (fence == NULL)
> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>   	fence->sched = entity->rq->sched;
>   	spin_lock_init(&fence->lock);
>   
> +	return fence;
> +}
> +
> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> +			  struct drm_sched_entity *entity)
> +{
> +	unsigned seq;
> +
>   	seq = atomic_inc_return(&entity->fence_seq);
>   	dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>   		       &fence->lock, entity->fence_context, seq);
>   	dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>   		       &fence->lock, entity->fence_context + 1, seq);
> -
> -	return fence;
>   }
>   
>   module_init(drm_sched_fence_slab_init);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 33c414d55fab..5e84e1500c32 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -48,9 +48,11 @@
>   #include <linux/wait.h>
>   #include <linux/sched.h>
>   #include <linux/completion.h>
> +#include <linux/dma-resv.h>
>   #include <uapi/linux/sched/types.h>
>   
>   #include <drm/drm_print.h>
> +#include <drm/drm_gem.h>
>   #include <drm/gpu_scheduler.h>
>   #include <drm/spsc_queue.h>
>   
> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>   
>   /**
>    * drm_sched_job_init - init a scheduler job
> - *
>    * @job: scheduler job to init
>    * @entity: scheduler entity to use
>    * @owner: job owner for debugging
> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>    * Refer to drm_sched_entity_push_job() documentation
>    * for locking considerations.
>    *
> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> + *
>    * Returns 0 for success, negative error code otherwise.
>    */
>   int drm_sched_job_init(struct drm_sched_job *job,
> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   	job->sched = sched;
>   	job->entity = entity;
>   	job->s_priority = entity->rq - sched->sched_rq;
> -	job->s_fence = drm_sched_fence_create(entity, owner);
> +	job->s_fence = drm_sched_fence_alloc(entity, owner);
>   	if (!job->s_fence)
>   		return -ENOMEM;
>   	job->id = atomic64_inc_return(&sched->job_id_count);
> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   EXPORT_SYMBOL(drm_sched_job_init);
>   
>   /**
> - * drm_sched_job_cleanup - clean up scheduler job resources
> + * drm_sched_job_arm - arm a scheduler job for execution
> + * @job: scheduler job to arm
> + *
> + * This arms a scheduler job for execution. Specifically it initializes the
> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> + * or other places that need to track the completion of this job.
> + *
> + * Refer to drm_sched_entity_push_job() documentation for locking
> + * considerations.
>    *
> + * This can only be called if drm_sched_job_init() succeeded.
> + */
> +void drm_sched_job_arm(struct drm_sched_job *job)
> +{
> +	drm_sched_fence_init(job->s_fence, job->entity);
> +}
> +EXPORT_SYMBOL(drm_sched_job_arm);
> +
> +/**
> + * drm_sched_job_cleanup - clean up scheduler job resources
>    * @job: scheduler job to clean up
> + *
> + * Cleans up the resources allocated with drm_sched_job_init().
> + *
> + * Drivers should call this from their error unwind code if @job is aborted
> + * before drm_sched_job_arm() is called.
> + *
> + * After that point of no return @job is committed to be executed by the
> + * scheduler, and this function should be called from the
> + * &drm_sched_backend_ops.free_job callback.
>    */
>   void drm_sched_job_cleanup(struct drm_sched_job *job)
>   {
> -	dma_fence_put(&job->s_fence->finished);
> +	if (!kref_read(&job->s_fence->finished.refcount)) {
> +		/* drm_sched_job_arm() has been called */
> +		dma_fence_put(&job->s_fence->finished);
> +	} else {
> +		/* aborted job before committing to run it */
> +		drm_sched_fence_free(&job->s_fence->finished.rcu);
> +	}
> +
>   	job->s_fence = NULL;
>   }
>   EXPORT_SYMBOL(drm_sched_job_cleanup);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index 4eb354226972..5c3a99027ecd 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>   	if (ret)
>   		return ret;
>   
> +	drm_sched_job_arm(&job->base);
> +
>   	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>   
>   	/* put by scheduler job completion */
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 88ae7f331bb1..83afc3aa8e2f 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>   int drm_sched_job_init(struct drm_sched_job *job,
>   		       struct drm_sched_entity *entity,
>   		       void *owner);
> +void drm_sched_job_arm(struct drm_sched_job *job);
>   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>   				    struct drm_gpu_scheduler **sched_list,
>                                      unsigned int num_sched_list);
> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>   				   enum drm_sched_priority priority);
>   bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>   
> -struct drm_sched_fence *drm_sched_fence_create(
> +struct drm_sched_fence *drm_sched_fence_alloc(
>   	struct drm_sched_entity *s_entity, void *owner);
> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> +			  struct drm_sched_entity *entity);
> +void drm_sched_fence_free(struct rcu_head *rcu);
> +
>   void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>   void drm_sched_fence_finished(struct drm_sched_fence *fence);
>   


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-07  9:29     ` Christian König
@ 2021-07-07 11:14       ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:14 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Wed, Jul 7, 2021 at 11:30 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > This is a very confusingly named function, because not just does it
> > init an object, it arms it and provides a point of no return for
> > pushing a job into the scheduler. It would be nice if that's a bit
> > clearer in the interface.
> >
> > But the real reason is that I want to push the dependency tracking
> > helpers into the scheduler code, and that means drm_sched_job_init
> > must be called a lot earlier, without arming the job.
> >
> > v2:
> > - don't change .gitignore (Steven)
> > - don't forget v3d (Emma)
> >
> > v3: Emma noticed that I leak the memory allocated in
> > drm_sched_job_init if we bail out before the point of no return in
> > subsequent driver patches. To be able to fix this change
> > drm_sched_job_cleanup() so it can handle being called both before and
> > after drm_sched_job_arm().
>
> Thinking more about this, I'm not sure if this really works.
>
> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> to update the entity->rq association.
>
> And that can only be done later on when we arm the fence as well.

Hm yeah, but that's a bug in the existing code I think: We already
fail to clean up if we fail to allocate the fences. So I think the
right thing to do here is to split the checks into job_init, and do
the actual arming/rq selection in job_arm? I'm not entirely sure
what's all going on there, the first check looks a bit like trying to
schedule before the entity is set up, which is a driver bug and should
have a WARN_ON?

The 2nd check around last_scheduled I have honeslty no idea what it's
even trying to do.
-Daniel

>
> Christian.
>
> >
> > Also improve the kerneldoc for this.
> >
> > Acked-by: Steven Price <steven.price@arm.com> (v2)
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Lucas Stach <l.stach@pengutronix.de>
> > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > Cc: Qiang Yu <yuq825@gmail.com>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Masahiro Yamada <masahiroy@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Adam Borowski <kilobyte@angband.pl>
> > Cc: Nick Terrell <terrelln@fb.com>
> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> > Cc: Sami Tolvanen <samitolvanen@google.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Nirmoy Das <nirmoy.das@amd.com>
> > Cc: Deepak R Varma <mh12gx2825@gmail.com>
> > Cc: Lee Jones <lee.jones@linaro.org>
> > Cc: Kevin Wang <kevin1.wang@amd.com>
> > Cc: Chen Li <chenli@uniontech.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: "Marek Olšák" <marek.olsak@amd.com>
> > Cc: Dennis Li <Dennis.Li@amd.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Cc: Sonny Jiang <sonny.jiang@amd.com>
> > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Tian Tao <tiantao6@hisilicon.com>
> > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > Cc: etnaviv@lists.freedesktop.org
> > Cc: lima@lists.freedesktop.org
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Cc: Emma Anholt <emma@anholt.net>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >   drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >   drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >   drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >   drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >   drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >   drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >   include/drm/gpu_scheduler.h              |  7 +++-
> >   10 files changed, 74 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index c5386d13eb4a..a4ec092af9a7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >       if (r)
> >               goto error_unlock;
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       /* No memory allocation is allowed while holding the notifier lock.
> >        * The lock is held until amdgpu_cs_submit is finished and fence is
> >        * added to BOs.
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index d33e6d97cc89..5ddb955d2315 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >       if (r)
> >               return r;
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       *f = dma_fence_get(&job->base.s_fence->finished);
> >       amdgpu_job_free_resources(job);
> >       drm_sched_entity_push_job(&job->base, entity);
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index feb6da1b6ceb..05f412204118 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >       if (ret)
> >               goto out_unlock;
> >
> > +     drm_sched_job_arm(&submit->sched_job);
> > +
> >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >       submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >                                               submit->out_fence, 0,
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > index dba8329937a3..38f755580507 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >               return err;
> >       }
> >
> > +     drm_sched_job_arm(&task->base);
> > +
> >       task->num_bos = num_bos;
> >       task->vm = lima_vm_get(vm);
> >
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index 71a72fb50e6b..2992dc85325f 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >               goto unlock;
> >       }
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >
> >       ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 79554aa4dbb1..f7347c284886 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >    * @sched_job: job to submit
> >    * @entity: scheduler entity
> >    *
> > - * Note: To guarantee that the order of insertion to queue matches
> > - * the job's fence sequence number this function should be
> > - * called with drm_sched_job_init under common lock.
> > + * Note: To guarantee that the order of insertion to queue matches the job's
> > + * fence sequence number this function should be called with drm_sched_job_arm()
> > + * under common lock.
> >    *
> >    * Returns 0 for success, negative error code otherwise.
> >    */
> > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > index 69de2c76731f..c451ee9a30d7 100644
> > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >    *
> >    * Free up the fence memory after the RCU grace period.
> >    */
> > -static void drm_sched_fence_free(struct rcu_head *rcu)
> > +void drm_sched_fence_free(struct rcu_head *rcu)
> >   {
> >       struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >       struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >   }
> >   EXPORT_SYMBOL(to_drm_sched_fence);
> >
> > -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > -                                            void *owner)
> > +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> > +                                           void *owner)
> >   {
> >       struct drm_sched_fence *fence = NULL;
> > -     unsigned seq;
> >
> >       fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >       if (fence == NULL)
> > @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >       fence->sched = entity->rq->sched;
> >       spin_lock_init(&fence->lock);
> >
> > +     return fence;
> > +}
> > +
> > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > +                       struct drm_sched_entity *entity)
> > +{
> > +     unsigned seq;
> > +
> >       seq = atomic_inc_return(&entity->fence_seq);
> >       dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >                      &fence->lock, entity->fence_context, seq);
> >       dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >                      &fence->lock, entity->fence_context + 1, seq);
> > -
> > -     return fence;
> >   }
> >
> >   module_init(drm_sched_fence_slab_init);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 33c414d55fab..5e84e1500c32 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -48,9 +48,11 @@
> >   #include <linux/wait.h>
> >   #include <linux/sched.h>
> >   #include <linux/completion.h>
> > +#include <linux/dma-resv.h>
> >   #include <uapi/linux/sched/types.h>
> >
> >   #include <drm/drm_print.h>
> > +#include <drm/drm_gem.h>
> >   #include <drm/gpu_scheduler.h>
> >   #include <drm/spsc_queue.h>
> >
> > @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >
> >   /**
> >    * drm_sched_job_init - init a scheduler job
> > - *
> >    * @job: scheduler job to init
> >    * @entity: scheduler entity to use
> >    * @owner: job owner for debugging
> > @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >    * Refer to drm_sched_entity_push_job() documentation
> >    * for locking considerations.
> >    *
> > + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> > + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> > + *
> >    * Returns 0 for success, negative error code otherwise.
> >    */
> >   int drm_sched_job_init(struct drm_sched_job *job,
> > @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >       job->sched = sched;
> >       job->entity = entity;
> >       job->s_priority = entity->rq - sched->sched_rq;
> > -     job->s_fence = drm_sched_fence_create(entity, owner);
> > +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >       if (!job->s_fence)
> >               return -ENOMEM;
> >       job->id = atomic64_inc_return(&sched->job_id_count);
> > @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >   EXPORT_SYMBOL(drm_sched_job_init);
> >
> >   /**
> > - * drm_sched_job_cleanup - clean up scheduler job resources
> > + * drm_sched_job_arm - arm a scheduler job for execution
> > + * @job: scheduler job to arm
> > + *
> > + * This arms a scheduler job for execution. Specifically it initializes the
> > + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> > + * or other places that need to track the completion of this job.
> > + *
> > + * Refer to drm_sched_entity_push_job() documentation for locking
> > + * considerations.
> >    *
> > + * This can only be called if drm_sched_job_init() succeeded.
> > + */
> > +void drm_sched_job_arm(struct drm_sched_job *job)
> > +{
> > +     drm_sched_fence_init(job->s_fence, job->entity);
> > +}
> > +EXPORT_SYMBOL(drm_sched_job_arm);
> > +
> > +/**
> > + * drm_sched_job_cleanup - clean up scheduler job resources
> >    * @job: scheduler job to clean up
> > + *
> > + * Cleans up the resources allocated with drm_sched_job_init().
> > + *
> > + * Drivers should call this from their error unwind code if @job is aborted
> > + * before drm_sched_job_arm() is called.
> > + *
> > + * After that point of no return @job is committed to be executed by the
> > + * scheduler, and this function should be called from the
> > + * &drm_sched_backend_ops.free_job callback.
> >    */
> >   void drm_sched_job_cleanup(struct drm_sched_job *job)
> >   {
> > -     dma_fence_put(&job->s_fence->finished);
> > +     if (!kref_read(&job->s_fence->finished.refcount)) {
> > +             /* drm_sched_job_arm() has been called */
> > +             dma_fence_put(&job->s_fence->finished);
> > +     } else {
> > +             /* aborted job before committing to run it */
> > +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> > +     }
> > +
> >       job->s_fence = NULL;
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_cleanup);
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> > index 4eb354226972..5c3a99027ecd 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >       if (ret)
> >               return ret;
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >
> >       /* put by scheduler job completion */
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 88ae7f331bb1..83afc3aa8e2f 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >   int drm_sched_job_init(struct drm_sched_job *job,
> >                      struct drm_sched_entity *entity,
> >                      void *owner);
> > +void drm_sched_job_arm(struct drm_sched_job *job);
> >   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >                                   struct drm_gpu_scheduler **sched_list,
> >                                      unsigned int num_sched_list);
> > @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >                                  enum drm_sched_priority priority);
> >   bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >
> > -struct drm_sched_fence *drm_sched_fence_create(
> > +struct drm_sched_fence *drm_sched_fence_alloc(
> >       struct drm_sched_entity *s_entity, void *owner);
> > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > +                       struct drm_sched_entity *entity);
> > +void drm_sched_fence_free(struct rcu_head *rcu);
> > +
> >   void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >   void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-07 11:14       ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:14 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 11:30 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > This is a very confusingly named function, because not just does it
> > init an object, it arms it and provides a point of no return for
> > pushing a job into the scheduler. It would be nice if that's a bit
> > clearer in the interface.
> >
> > But the real reason is that I want to push the dependency tracking
> > helpers into the scheduler code, and that means drm_sched_job_init
> > must be called a lot earlier, without arming the job.
> >
> > v2:
> > - don't change .gitignore (Steven)
> > - don't forget v3d (Emma)
> >
> > v3: Emma noticed that I leak the memory allocated in
> > drm_sched_job_init if we bail out before the point of no return in
> > subsequent driver patches. To be able to fix this change
> > drm_sched_job_cleanup() so it can handle being called both before and
> > after drm_sched_job_arm().
>
> Thinking more about this, I'm not sure if this really works.
>
> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> to update the entity->rq association.
>
> And that can only be done later on when we arm the fence as well.

Hm yeah, but that's a bug in the existing code I think: We already
fail to clean up if we fail to allocate the fences. So I think the
right thing to do here is to split the checks into job_init, and do
the actual arming/rq selection in job_arm? I'm not entirely sure
what's all going on there, the first check looks a bit like trying to
schedule before the entity is set up, which is a driver bug and should
have a WARN_ON?

The 2nd check around last_scheduled I have honeslty no idea what it's
even trying to do.
-Daniel

>
> Christian.
>
> >
> > Also improve the kerneldoc for this.
> >
> > Acked-by: Steven Price <steven.price@arm.com> (v2)
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Lucas Stach <l.stach@pengutronix.de>
> > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > Cc: Qiang Yu <yuq825@gmail.com>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Masahiro Yamada <masahiroy@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Adam Borowski <kilobyte@angband.pl>
> > Cc: Nick Terrell <terrelln@fb.com>
> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> > Cc: Sami Tolvanen <samitolvanen@google.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Nirmoy Das <nirmoy.das@amd.com>
> > Cc: Deepak R Varma <mh12gx2825@gmail.com>
> > Cc: Lee Jones <lee.jones@linaro.org>
> > Cc: Kevin Wang <kevin1.wang@amd.com>
> > Cc: Chen Li <chenli@uniontech.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: "Marek Olšák" <marek.olsak@amd.com>
> > Cc: Dennis Li <Dennis.Li@amd.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Cc: Sonny Jiang <sonny.jiang@amd.com>
> > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Tian Tao <tiantao6@hisilicon.com>
> > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > Cc: etnaviv@lists.freedesktop.org
> > Cc: lima@lists.freedesktop.org
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Cc: Emma Anholt <emma@anholt.net>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >   drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >   drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >   drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >   drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >   drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >   drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >   include/drm/gpu_scheduler.h              |  7 +++-
> >   10 files changed, 74 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index c5386d13eb4a..a4ec092af9a7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >       if (r)
> >               goto error_unlock;
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       /* No memory allocation is allowed while holding the notifier lock.
> >        * The lock is held until amdgpu_cs_submit is finished and fence is
> >        * added to BOs.
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index d33e6d97cc89..5ddb955d2315 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >       if (r)
> >               return r;
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       *f = dma_fence_get(&job->base.s_fence->finished);
> >       amdgpu_job_free_resources(job);
> >       drm_sched_entity_push_job(&job->base, entity);
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index feb6da1b6ceb..05f412204118 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >       if (ret)
> >               goto out_unlock;
> >
> > +     drm_sched_job_arm(&submit->sched_job);
> > +
> >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >       submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >                                               submit->out_fence, 0,
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > index dba8329937a3..38f755580507 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >               return err;
> >       }
> >
> > +     drm_sched_job_arm(&task->base);
> > +
> >       task->num_bos = num_bos;
> >       task->vm = lima_vm_get(vm);
> >
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index 71a72fb50e6b..2992dc85325f 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >               goto unlock;
> >       }
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >
> >       ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 79554aa4dbb1..f7347c284886 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >    * @sched_job: job to submit
> >    * @entity: scheduler entity
> >    *
> > - * Note: To guarantee that the order of insertion to queue matches
> > - * the job's fence sequence number this function should be
> > - * called with drm_sched_job_init under common lock.
> > + * Note: To guarantee that the order of insertion to queue matches the job's
> > + * fence sequence number this function should be called with drm_sched_job_arm()
> > + * under common lock.
> >    *
> >    * Returns 0 for success, negative error code otherwise.
> >    */
> > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > index 69de2c76731f..c451ee9a30d7 100644
> > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >    *
> >    * Free up the fence memory after the RCU grace period.
> >    */
> > -static void drm_sched_fence_free(struct rcu_head *rcu)
> > +void drm_sched_fence_free(struct rcu_head *rcu)
> >   {
> >       struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >       struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >   }
> >   EXPORT_SYMBOL(to_drm_sched_fence);
> >
> > -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > -                                            void *owner)
> > +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> > +                                           void *owner)
> >   {
> >       struct drm_sched_fence *fence = NULL;
> > -     unsigned seq;
> >
> >       fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >       if (fence == NULL)
> > @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >       fence->sched = entity->rq->sched;
> >       spin_lock_init(&fence->lock);
> >
> > +     return fence;
> > +}
> > +
> > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > +                       struct drm_sched_entity *entity)
> > +{
> > +     unsigned seq;
> > +
> >       seq = atomic_inc_return(&entity->fence_seq);
> >       dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >                      &fence->lock, entity->fence_context, seq);
> >       dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >                      &fence->lock, entity->fence_context + 1, seq);
> > -
> > -     return fence;
> >   }
> >
> >   module_init(drm_sched_fence_slab_init);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 33c414d55fab..5e84e1500c32 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -48,9 +48,11 @@
> >   #include <linux/wait.h>
> >   #include <linux/sched.h>
> >   #include <linux/completion.h>
> > +#include <linux/dma-resv.h>
> >   #include <uapi/linux/sched/types.h>
> >
> >   #include <drm/drm_print.h>
> > +#include <drm/drm_gem.h>
> >   #include <drm/gpu_scheduler.h>
> >   #include <drm/spsc_queue.h>
> >
> > @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >
> >   /**
> >    * drm_sched_job_init - init a scheduler job
> > - *
> >    * @job: scheduler job to init
> >    * @entity: scheduler entity to use
> >    * @owner: job owner for debugging
> > @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >    * Refer to drm_sched_entity_push_job() documentation
> >    * for locking considerations.
> >    *
> > + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> > + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> > + *
> >    * Returns 0 for success, negative error code otherwise.
> >    */
> >   int drm_sched_job_init(struct drm_sched_job *job,
> > @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >       job->sched = sched;
> >       job->entity = entity;
> >       job->s_priority = entity->rq - sched->sched_rq;
> > -     job->s_fence = drm_sched_fence_create(entity, owner);
> > +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >       if (!job->s_fence)
> >               return -ENOMEM;
> >       job->id = atomic64_inc_return(&sched->job_id_count);
> > @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >   EXPORT_SYMBOL(drm_sched_job_init);
> >
> >   /**
> > - * drm_sched_job_cleanup - clean up scheduler job resources
> > + * drm_sched_job_arm - arm a scheduler job for execution
> > + * @job: scheduler job to arm
> > + *
> > + * This arms a scheduler job for execution. Specifically it initializes the
> > + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> > + * or other places that need to track the completion of this job.
> > + *
> > + * Refer to drm_sched_entity_push_job() documentation for locking
> > + * considerations.
> >    *
> > + * This can only be called if drm_sched_job_init() succeeded.
> > + */
> > +void drm_sched_job_arm(struct drm_sched_job *job)
> > +{
> > +     drm_sched_fence_init(job->s_fence, job->entity);
> > +}
> > +EXPORT_SYMBOL(drm_sched_job_arm);
> > +
> > +/**
> > + * drm_sched_job_cleanup - clean up scheduler job resources
> >    * @job: scheduler job to clean up
> > + *
> > + * Cleans up the resources allocated with drm_sched_job_init().
> > + *
> > + * Drivers should call this from their error unwind code if @job is aborted
> > + * before drm_sched_job_arm() is called.
> > + *
> > + * After that point of no return @job is committed to be executed by the
> > + * scheduler, and this function should be called from the
> > + * &drm_sched_backend_ops.free_job callback.
> >    */
> >   void drm_sched_job_cleanup(struct drm_sched_job *job)
> >   {
> > -     dma_fence_put(&job->s_fence->finished);
> > +     if (!kref_read(&job->s_fence->finished.refcount)) {
> > +             /* drm_sched_job_arm() has been called */
> > +             dma_fence_put(&job->s_fence->finished);
> > +     } else {
> > +             /* aborted job before committing to run it */
> > +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> > +     }
> > +
> >       job->s_fence = NULL;
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_cleanup);
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> > index 4eb354226972..5c3a99027ecd 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >       if (ret)
> >               return ret;
> >
> > +     drm_sched_job_arm(&job->base);
> > +
> >       job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >
> >       /* put by scheduler job completion */
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 88ae7f331bb1..83afc3aa8e2f 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >   int drm_sched_job_init(struct drm_sched_job *job,
> >                      struct drm_sched_entity *entity,
> >                      void *owner);
> > +void drm_sched_job_arm(struct drm_sched_job *job);
> >   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >                                   struct drm_gpu_scheduler **sched_list,
> >                                      unsigned int num_sched_list);
> > @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >                                  enum drm_sched_priority priority);
> >   bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >
> > -struct drm_sched_fence *drm_sched_fence_create(
> > +struct drm_sched_fence *drm_sched_fence_alloc(
> >       struct drm_sched_entity *s_entity, void *owner);
> > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > +                       struct drm_sched_entity *entity);
> > +void drm_sched_fence_free(struct rcu_head *rcu);
> > +
> >   void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >   void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v2 02/11] drm/sched: Add dependency tracking
  2021-07-07  9:26     ` Christian König
@ 2021-07-07 11:23       ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:23 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Andrey Grodzovsky, Jack Zhang,
	Christian König, David Airlie, Steven Price,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Boris Brezillon,
	Alex Deucher, Daniel Vetter,
	open list:DMA BUFFER SHARING FRAMEWORK, Lee Jones, Luben Tuikov,
	Nirmoy Das

On Wed, Jul 7, 2021 at 11:26 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > Instead of just a callback we can just glue in the gem helpers that
> > panfrost, v3d and lima currently use. There's really not that many
> > ways to skin this cat.
> >
> > On the naming bikeshed: The idea for using _await_ to denote adding
> > dependencies to a job comes from i915, where that's used quite
> > extensively all over the place, in lots of datastructures.
> >
> > v2: Rebased.
> >
> > Reviewed-by: Steven Price <steven.price@arm.com> (v1)
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Cc: Lee Jones <lee.jones@linaro.org>
> > Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
> > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > ---
> >   drivers/gpu/drm/scheduler/sched_entity.c |  18 +++-
> >   drivers/gpu/drm/scheduler/sched_main.c   | 103 +++++++++++++++++++++++
> >   include/drm/gpu_scheduler.h              |  31 ++++++-
> >   3 files changed, 146 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index f7347c284886..b6f72fafd504 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
> >       job->sched->ops->free_job(job);
> >   }
> >
> > +static struct dma_fence *
> > +drm_sched_job_dependency(struct drm_sched_job *job,
> > +                      struct drm_sched_entity *entity)
> > +{
> > +     if (!xa_empty(&job->dependencies))
> > +             return xa_erase(&job->dependencies, job->last_dependency++);
> > +
> > +     if (job->sched->ops->dependency)
> > +             return job->sched->ops->dependency(job, entity);
> > +
> > +     return NULL;
> > +}
> > +
> >   /**
> >    * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed
> >    *
> > @@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
> >               struct drm_sched_fence *s_fence = job->s_fence;
> >
> >               /* Wait for all dependencies to avoid data corruptions */
> > -             while ((f = job->sched->ops->dependency(job, entity)))
> > +             while ((f = drm_sched_job_dependency(job, entity)))
> >                       dma_fence_wait(f, false);
> >
> >               drm_sched_fence_scheduled(s_fence);
> > @@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> >    */
> >   struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >   {
> > -     struct drm_gpu_scheduler *sched = entity->rq->sched;
> >       struct drm_sched_job *sched_job;
> >
> >       sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > @@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >               return NULL;
> >
> >       while ((entity->dependency =
> > -                     sched->ops->dependency(sched_job, entity))) {
> > +                     drm_sched_job_dependency(sched_job, entity))) {
> >               trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
> >
> >               if (drm_sched_entity_add_dependency_cb(entity))
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 5e84e1500c32..12d533486518 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >
> >       INIT_LIST_HEAD(&job->list);
> >
> > +     xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> > +
> >       return 0;
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_init);
> > @@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_arm);
> >
> > +/**
> > + * drm_sched_job_await_fence - adds the fence as a job dependency
> > + * @job: scheduler job to add the dependencies to
> > + * @fence: the dma_fence to add to the list of dependencies.
> > + *
> > + * Note that @fence is consumed in both the success and error cases.
> > + *
> > + * Returns:
> > + * 0 on success, or an error on failing to expand the array.
> > + */
> > +int drm_sched_job_await_fence(struct drm_sched_job *job,
> > +                           struct dma_fence *fence)
>
> I'm still not very keen about the naming "await", can't we just call
> this _add_dependency? and _remove_dependency() ?

I frankly never care about bikesheds if there's a consensus, await
just is a bit less typing. So we're not removing dependencies
anywhere, but there's still two functions here, one for fences once
for implicit sync stuff. So I need two names, and ideally someone else
who acks the new naming scheme so I don't have to rename twice :-)

Then I'll happily oblige.

Cheers, Daniel

>
> Christian.
>
> > +{
> > +     struct dma_fence *entry;
> > +     unsigned long index;
> > +     u32 id = 0;
> > +     int ret;
> > +
> > +     if (!fence)
> > +             return 0;
> > +
> > +     /* Deduplicate if we already depend on a fence from the same context.
> > +      * This lets the size of the array of deps scale with the number of
> > +      * engines involved, rather than the number of BOs.
> > +      */
> > +     xa_for_each(&job->dependencies, index, entry) {
> > +             if (entry->context != fence->context)
> > +                     continue;
> > +
> > +             if (dma_fence_is_later(fence, entry)) {
> > +                     dma_fence_put(entry);
> > +                     xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > +             } else {
> > +                     dma_fence_put(fence);
> > +             }
> > +             return 0;
> > +     }
> > +
> > +     ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> > +     if (ret != 0)
> > +             dma_fence_put(fence);
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(drm_sched_job_await_fence);
> > +
> > +/**
> > + * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies
> > + * @job: scheduler job to add the dependencies to
> > + * @obj: the gem object to add new dependencies from.
> > + * @write: whether the job might write the object (so we need to depend on
> > + * shared fences in the reservation object).
> > + *
> > + * This should be called after drm_gem_lock_reservations() on your array of
> > + * GEM objects used in the job but before updating the reservations with your
> > + * own fences.
> > + *
> > + * Returns:
> > + * 0 on success, or an error on failing to expand the array.
> > + */
> > +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> > +                              struct drm_gem_object *obj,
> > +                              bool write)
> > +{
> > +     int ret;
> > +     struct dma_fence **fences;
> > +     unsigned int i, fence_count;
> > +
> > +     if (!write) {
> > +             struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
> > +
> > +             return drm_sched_job_await_fence(job, fence);
> > +     }
> > +
> > +     ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
> > +     if (ret || !fence_count)
> > +             return ret;
> > +
> > +     for (i = 0; i < fence_count; i++) {
> > +             ret = drm_sched_job_await_fence(job, fences[i]);
> > +             if (ret)
> > +                     break;
> > +     }
> > +
> > +     for (; i < fence_count; i++)
> > +             dma_fence_put(fences[i]);
> > +     kfree(fences);
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(drm_sched_job_await_implicit);
> > +
> > +
> >   /**
> >    * drm_sched_job_cleanup - clean up scheduler job resources
> >    * @job: scheduler job to clean up
> > @@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> >    */
> >   void drm_sched_job_cleanup(struct drm_sched_job *job)
> >   {
> > +     struct dma_fence *fence;
> > +     unsigned long index;
> > +
> >       if (!kref_read(&job->s_fence->finished.refcount)) {
> >               /* drm_sched_job_arm() has been called */
> >               dma_fence_put(&job->s_fence->finished);
> > @@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
> >       }
> >
> >       job->s_fence = NULL;
> > +
> > +     xa_for_each(&job->dependencies, index, fence) {
> > +             dma_fence_put(fence);
> > +     }
> > +     xa_destroy(&job->dependencies);
> > +
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_cleanup);
> >
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 83afc3aa8e2f..74fb321dbc44 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -27,9 +27,12 @@
> >   #include <drm/spsc_queue.h>
> >   #include <linux/dma-fence.h>
> >   #include <linux/completion.h>
> > +#include <linux/xarray.h>
> >
> >   #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
> >
> > +struct drm_gem_object;
> > +
> >   struct drm_gpu_scheduler;
> >   struct drm_sched_rq;
> >
> > @@ -198,6 +201,16 @@ struct drm_sched_job {
> >       enum drm_sched_priority         s_priority;
> >       struct drm_sched_entity         *entity;
> >       struct dma_fence_cb             cb;
> > +     /**
> > +      * @dependencies:
> > +      *
> > +      * Contains the dependencies as struct dma_fence for this job, see
> > +      * drm_sched_job_await_fence() and drm_sched_job_await_implicit().
> > +      */
> > +     struct xarray                   dependencies;
> > +
> > +     /** @last_dependency: tracks @dependencies as they signal */
> > +     unsigned long                   last_dependency;
> >   };
> >
> >   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
> > @@ -220,9 +233,14 @@ enum drm_gpu_sched_stat {
> >    */
> >   struct drm_sched_backend_ops {
> >       /**
> > -         * @dependency: Called when the scheduler is considering scheduling
> > -         * this job next, to get another struct dma_fence for this job to
> > -      * block on.  Once it returns NULL, run_job() may be called.
> > +      * @dependency:
> > +      *
> > +      * Called when the scheduler is considering scheduling this job next, to
> > +      * get another struct dma_fence for this job to block on.  Once it
> > +      * returns NULL, run_job() may be called.
> > +      *
> > +      * If a driver exclusively uses drm_sched_job_await_fence() and
> > +      * drm_sched_job_await_implicit() this can be ommitted and left as NULL.
> >        */
> >       struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
> >                                       struct drm_sched_entity *s_entity);
> > @@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >                      struct drm_sched_entity *entity,
> >                      void *owner);
> >   void drm_sched_job_arm(struct drm_sched_job *job);
> > +int drm_sched_job_await_fence(struct drm_sched_job *job,
> > +                           struct dma_fence *fence);
> > +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> > +                              struct drm_gem_object *obj,
> > +                              bool write);
> > +
> > +
> >   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >                                   struct drm_gpu_scheduler **sched_list,
> >                                      unsigned int num_sched_list);
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Linaro-mm-sig] [PATCH v2 02/11] drm/sched: Add dependency tracking
@ 2021-07-07 11:23       ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:23 UTC (permalink / raw)
  To: Christian König
  Cc: Jack Zhang, David Airlie, DRI Development, Steven Price,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Boris Brezillon,
	Alex Deucher, Daniel Vetter, Nirmoy Das, Lee Jones,
	Christian König, Luben Tuikov,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 11:26 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > Instead of just a callback we can just glue in the gem helpers that
> > panfrost, v3d and lima currently use. There's really not that many
> > ways to skin this cat.
> >
> > On the naming bikeshed: The idea for using _await_ to denote adding
> > dependencies to a job comes from i915, where that's used quite
> > extensively all over the place, in lots of datastructures.
> >
> > v2: Rebased.
> >
> > Reviewed-by: Steven Price <steven.price@arm.com> (v1)
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Cc: Lee Jones <lee.jones@linaro.org>
> > Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
> > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > ---
> >   drivers/gpu/drm/scheduler/sched_entity.c |  18 +++-
> >   drivers/gpu/drm/scheduler/sched_main.c   | 103 +++++++++++++++++++++++
> >   include/drm/gpu_scheduler.h              |  31 ++++++-
> >   3 files changed, 146 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index f7347c284886..b6f72fafd504 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
> >       job->sched->ops->free_job(job);
> >   }
> >
> > +static struct dma_fence *
> > +drm_sched_job_dependency(struct drm_sched_job *job,
> > +                      struct drm_sched_entity *entity)
> > +{
> > +     if (!xa_empty(&job->dependencies))
> > +             return xa_erase(&job->dependencies, job->last_dependency++);
> > +
> > +     if (job->sched->ops->dependency)
> > +             return job->sched->ops->dependency(job, entity);
> > +
> > +     return NULL;
> > +}
> > +
> >   /**
> >    * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed
> >    *
> > @@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
> >               struct drm_sched_fence *s_fence = job->s_fence;
> >
> >               /* Wait for all dependencies to avoid data corruptions */
> > -             while ((f = job->sched->ops->dependency(job, entity)))
> > +             while ((f = drm_sched_job_dependency(job, entity)))
> >                       dma_fence_wait(f, false);
> >
> >               drm_sched_fence_scheduled(s_fence);
> > @@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> >    */
> >   struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >   {
> > -     struct drm_gpu_scheduler *sched = entity->rq->sched;
> >       struct drm_sched_job *sched_job;
> >
> >       sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > @@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >               return NULL;
> >
> >       while ((entity->dependency =
> > -                     sched->ops->dependency(sched_job, entity))) {
> > +                     drm_sched_job_dependency(sched_job, entity))) {
> >               trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
> >
> >               if (drm_sched_entity_add_dependency_cb(entity))
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 5e84e1500c32..12d533486518 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >
> >       INIT_LIST_HEAD(&job->list);
> >
> > +     xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
> > +
> >       return 0;
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_init);
> > @@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_arm);
> >
> > +/**
> > + * drm_sched_job_await_fence - adds the fence as a job dependency
> > + * @job: scheduler job to add the dependencies to
> > + * @fence: the dma_fence to add to the list of dependencies.
> > + *
> > + * Note that @fence is consumed in both the success and error cases.
> > + *
> > + * Returns:
> > + * 0 on success, or an error on failing to expand the array.
> > + */
> > +int drm_sched_job_await_fence(struct drm_sched_job *job,
> > +                           struct dma_fence *fence)
>
> I'm still not very keen about the naming "await", can't we just call
> this _add_dependency? and _remove_dependency() ?

I frankly never care about bikesheds if there's a consensus, await
just is a bit less typing. So we're not removing dependencies
anywhere, but there's still two functions here, one for fences once
for implicit sync stuff. So I need two names, and ideally someone else
who acks the new naming scheme so I don't have to rename twice :-)

Then I'll happily oblige.

Cheers, Daniel

>
> Christian.
>
> > +{
> > +     struct dma_fence *entry;
> > +     unsigned long index;
> > +     u32 id = 0;
> > +     int ret;
> > +
> > +     if (!fence)
> > +             return 0;
> > +
> > +     /* Deduplicate if we already depend on a fence from the same context.
> > +      * This lets the size of the array of deps scale with the number of
> > +      * engines involved, rather than the number of BOs.
> > +      */
> > +     xa_for_each(&job->dependencies, index, entry) {
> > +             if (entry->context != fence->context)
> > +                     continue;
> > +
> > +             if (dma_fence_is_later(fence, entry)) {
> > +                     dma_fence_put(entry);
> > +                     xa_store(&job->dependencies, index, fence, GFP_KERNEL);
> > +             } else {
> > +                     dma_fence_put(fence);
> > +             }
> > +             return 0;
> > +     }
> > +
> > +     ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> > +     if (ret != 0)
> > +             dma_fence_put(fence);
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(drm_sched_job_await_fence);
> > +
> > +/**
> > + * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies
> > + * @job: scheduler job to add the dependencies to
> > + * @obj: the gem object to add new dependencies from.
> > + * @write: whether the job might write the object (so we need to depend on
> > + * shared fences in the reservation object).
> > + *
> > + * This should be called after drm_gem_lock_reservations() on your array of
> > + * GEM objects used in the job but before updating the reservations with your
> > + * own fences.
> > + *
> > + * Returns:
> > + * 0 on success, or an error on failing to expand the array.
> > + */
> > +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> > +                              struct drm_gem_object *obj,
> > +                              bool write)
> > +{
> > +     int ret;
> > +     struct dma_fence **fences;
> > +     unsigned int i, fence_count;
> > +
> > +     if (!write) {
> > +             struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv);
> > +
> > +             return drm_sched_job_await_fence(job, fence);
> > +     }
> > +
> > +     ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
> > +     if (ret || !fence_count)
> > +             return ret;
> > +
> > +     for (i = 0; i < fence_count; i++) {
> > +             ret = drm_sched_job_await_fence(job, fences[i]);
> > +             if (ret)
> > +                     break;
> > +     }
> > +
> > +     for (; i < fence_count; i++)
> > +             dma_fence_put(fences[i]);
> > +     kfree(fences);
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL(drm_sched_job_await_implicit);
> > +
> > +
> >   /**
> >    * drm_sched_job_cleanup - clean up scheduler job resources
> >    * @job: scheduler job to clean up
> > @@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> >    */
> >   void drm_sched_job_cleanup(struct drm_sched_job *job)
> >   {
> > +     struct dma_fence *fence;
> > +     unsigned long index;
> > +
> >       if (!kref_read(&job->s_fence->finished.refcount)) {
> >               /* drm_sched_job_arm() has been called */
> >               dma_fence_put(&job->s_fence->finished);
> > @@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
> >       }
> >
> >       job->s_fence = NULL;
> > +
> > +     xa_for_each(&job->dependencies, index, fence) {
> > +             dma_fence_put(fence);
> > +     }
> > +     xa_destroy(&job->dependencies);
> > +
> >   }
> >   EXPORT_SYMBOL(drm_sched_job_cleanup);
> >
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 83afc3aa8e2f..74fb321dbc44 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -27,9 +27,12 @@
> >   #include <drm/spsc_queue.h>
> >   #include <linux/dma-fence.h>
> >   #include <linux/completion.h>
> > +#include <linux/xarray.h>
> >
> >   #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
> >
> > +struct drm_gem_object;
> > +
> >   struct drm_gpu_scheduler;
> >   struct drm_sched_rq;
> >
> > @@ -198,6 +201,16 @@ struct drm_sched_job {
> >       enum drm_sched_priority         s_priority;
> >       struct drm_sched_entity         *entity;
> >       struct dma_fence_cb             cb;
> > +     /**
> > +      * @dependencies:
> > +      *
> > +      * Contains the dependencies as struct dma_fence for this job, see
> > +      * drm_sched_job_await_fence() and drm_sched_job_await_implicit().
> > +      */
> > +     struct xarray                   dependencies;
> > +
> > +     /** @last_dependency: tracks @dependencies as they signal */
> > +     unsigned long                   last_dependency;
> >   };
> >
> >   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
> > @@ -220,9 +233,14 @@ enum drm_gpu_sched_stat {
> >    */
> >   struct drm_sched_backend_ops {
> >       /**
> > -         * @dependency: Called when the scheduler is considering scheduling
> > -         * this job next, to get another struct dma_fence for this job to
> > -      * block on.  Once it returns NULL, run_job() may be called.
> > +      * @dependency:
> > +      *
> > +      * Called when the scheduler is considering scheduling this job next, to
> > +      * get another struct dma_fence for this job to block on.  Once it
> > +      * returns NULL, run_job() may be called.
> > +      *
> > +      * If a driver exclusively uses drm_sched_job_await_fence() and
> > +      * drm_sched_job_await_implicit() this can be ommitted and left as NULL.
> >        */
> >       struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
> >                                       struct drm_sched_entity *s_entity);
> > @@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >                      struct drm_sched_entity *entity,
> >                      void *owner);
> >   void drm_sched_job_arm(struct drm_sched_job *job);
> > +int drm_sched_job_await_fence(struct drm_sched_job *job,
> > +                           struct dma_fence *fence);
> > +int drm_sched_job_await_implicit(struct drm_sched_job *job,
> > +                              struct drm_gem_object *obj,
> > +                              bool write);
> > +
> > +
> >   void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >                                   struct drm_gpu_scheduler **sched_list,
> >                                      unsigned int num_sched_list);
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
  2021-07-07  9:08     ` Lucas Stach
@ 2021-07-07 11:26       ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:26 UTC (permalink / raw)
  To: Lucas Stach
  Cc: DRI Development, Daniel Vetter, Russell King, Christian Gmeiner,
	Sumit Semwal, Christian König, The etnaviv authors,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 11:08 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> > We need to pull the drm_sched_job_init much earlier, but that's very
> > minor surgery.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Lucas Stach <l.stach@pengutronix.de>
> > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: etnaviv@lists.freedesktop.org
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
> >  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
> >  4 files changed, 20 insertions(+), 81 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > index 98e60df882b6..63688e6e4580 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
> >       u64 va;
> >       struct etnaviv_gem_object *obj;
> >       struct etnaviv_vram_mapping *mapping;
> > -     struct dma_fence *excl;
> > -     unsigned int nr_shared;
> > -     struct dma_fence **shared;
> >  };
> >
> >  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> > @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
> >       struct etnaviv_file_private *ctx;
> >       struct etnaviv_gpu *gpu;
> >       struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> > -     struct dma_fence *out_fence, *in_fence;
> > +     struct dma_fence *out_fence;
> >       int out_fence_id;
> >       struct list_head node; /* GPU active submit list */
> >       struct etnaviv_cmdbuf cmdbuf;
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > index 4dd7d9d541c0..92478a50a580 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
> >               if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
> >                       continue;
> >
> > -             if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> > -                     ret = dma_resv_get_fences(robj, &bo->excl,
> > -                                               &bo->nr_shared,
> > -                                               &bo->shared);
> > -                     if (ret)
> > -                             return ret;
> > -             } else {
> > -                     bo->excl = dma_resv_get_excl_unlocked(robj);
> > -             }
> > -
> > +             ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> > +                                                bo->flags & ETNA_SUBMIT_BO_WRITE);
> > +             if (ret)
> > +                     return ret;
> >       }
> >
> >       return ret;
> > @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
> >
> >       wake_up_all(&submit->gpu->fence_event);
> >
> > -     if (submit->in_fence)
> > -             dma_fence_put(submit->in_fence);
> >       if (submit->out_fence) {
> >               /* first remove from IDR, so fence can not be found anymore */
> >               mutex_lock(&submit->gpu->fence_lock);
> > @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       submit->exec_state = args->exec_state;
> >       submit->flags = args->flags;
> >
> > +     ret = drm_sched_job_init(&submit->sched_job,
> > +                              &ctx->sched_entity[args->pipe],
> > +                              submit->ctx);
> > +     if (ret)
> > +             goto err_submit_objects;
> > +
>
> With the init moved here you also need to move the
> drm_sched_job_cleanup call from etnaviv_sched_free_job into
> submit_cleanup to avoid the potential memory leak when we bail out
> before pushing the job to the scheduler.

Uh apologies for missing this again, the entire point of v2 was to fix
this across all drivers. But somehow the fixup for etnaviv got lost.
I'll do it now for v3.

Thanks, Daniel

>
> Regards,
> Lucas
>
> >       ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
> >       if (ret)
> >               goto err_submit_objects;
> > @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       }
> >
> >       if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> > -             submit->in_fence = sync_file_get_fence(args->fence_fd);
> > -             if (!submit->in_fence) {
> > +             struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> > +             if (!in_fence) {
> >                       ret = -EINVAL;
> >                       goto err_submit_objects;
> >               }
> > +
> > +             ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> > +             if (ret)
> > +                     goto err_submit_objects;
> >       }
> >
> >       ret = submit_pin_objects(submit);
> > @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       if (ret)
> >               goto err_submit_objects;
> >
> > -     ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> > +     ret = etnaviv_sched_push_job(submit);
> >       if (ret)
> >               goto err_submit_objects;
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index 180bb633d5c5..c98d67320be3 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
> >  static int etnaviv_hw_jobs_limit = 4;
> >  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
> >
> > -static struct dma_fence *
> > -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> > -                      struct drm_sched_entity *entity)
> > -{
> > -     struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > -     struct dma_fence *fence;
> > -     int i;
> > -
> > -     if (unlikely(submit->in_fence)) {
> > -             fence = submit->in_fence;
> > -             submit->in_fence = NULL;
> > -
> > -             if (!dma_fence_is_signaled(fence))
> > -                     return fence;
> > -
> > -             dma_fence_put(fence);
> > -     }
> > -
> > -     for (i = 0; i < submit->nr_bos; i++) {
> > -             struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> > -             int j;
> > -
> > -             if (bo->excl) {
> > -                     fence = bo->excl;
> > -                     bo->excl = NULL;
> > -
> > -                     if (!dma_fence_is_signaled(fence))
> > -                             return fence;
> > -
> > -                     dma_fence_put(fence);
> > -             }
> > -
> > -             for (j = 0; j < bo->nr_shared; j++) {
> > -                     if (!bo->shared[j])
> > -                             continue;
> > -
> > -                     fence = bo->shared[j];
> > -                     bo->shared[j] = NULL;
> > -
> > -                     if (!dma_fence_is_signaled(fence))
> > -                             return fence;
> > -
> > -                     dma_fence_put(fence);
> > -             }
> > -             kfree(bo->shared);
> > -             bo->nr_shared = 0;
> > -             bo->shared = NULL;
> > -     }
> > -
> > -     return NULL;
> > -}
> > -
> >  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
> >  {
> >       struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
> >  }
> >
> >  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> > -     .dependency = etnaviv_sched_dependency,
> >       .run_job = etnaviv_sched_run_job,
> >       .timedout_job = etnaviv_sched_timedout_job,
> >       .free_job = etnaviv_sched_free_job,
> >  };
> >
> > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > -                        struct etnaviv_gem_submit *submit)
> > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
> >  {
> >       int ret = 0;
> >
> > @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >        */
> >       mutex_lock(&submit->gpu->fence_lock);
> >
> > -     ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> > -                              submit->ctx);
> > -     if (ret)
> > -             goto out_unlock;
> > -
> >       drm_sched_job_arm(&submit->sched_job);
> >
> >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > index c0a6796e22c9..baebfa069afc 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
> >
> >  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
> >  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > -                        struct etnaviv_gem_submit *submit);
> > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
> >
> >  #endif /* __ETNAVIV_SCHED_H__ */
>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
@ 2021-07-07 11:26       ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:26 UTC (permalink / raw)
  To: Lucas Stach
  Cc: The etnaviv authors, DRI Development,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Russell King,
	Daniel Vetter, Christian König,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 11:08 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> > We need to pull the drm_sched_job_init much earlier, but that's very
> > minor surgery.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Lucas Stach <l.stach@pengutronix.de>
> > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: etnaviv@lists.freedesktop.org
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
> >  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
> >  4 files changed, 20 insertions(+), 81 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > index 98e60df882b6..63688e6e4580 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
> >       u64 va;
> >       struct etnaviv_gem_object *obj;
> >       struct etnaviv_vram_mapping *mapping;
> > -     struct dma_fence *excl;
> > -     unsigned int nr_shared;
> > -     struct dma_fence **shared;
> >  };
> >
> >  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> > @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
> >       struct etnaviv_file_private *ctx;
> >       struct etnaviv_gpu *gpu;
> >       struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> > -     struct dma_fence *out_fence, *in_fence;
> > +     struct dma_fence *out_fence;
> >       int out_fence_id;
> >       struct list_head node; /* GPU active submit list */
> >       struct etnaviv_cmdbuf cmdbuf;
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > index 4dd7d9d541c0..92478a50a580 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
> >               if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
> >                       continue;
> >
> > -             if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> > -                     ret = dma_resv_get_fences(robj, &bo->excl,
> > -                                               &bo->nr_shared,
> > -                                               &bo->shared);
> > -                     if (ret)
> > -                             return ret;
> > -             } else {
> > -                     bo->excl = dma_resv_get_excl_unlocked(robj);
> > -             }
> > -
> > +             ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> > +                                                bo->flags & ETNA_SUBMIT_BO_WRITE);
> > +             if (ret)
> > +                     return ret;
> >       }
> >
> >       return ret;
> > @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
> >
> >       wake_up_all(&submit->gpu->fence_event);
> >
> > -     if (submit->in_fence)
> > -             dma_fence_put(submit->in_fence);
> >       if (submit->out_fence) {
> >               /* first remove from IDR, so fence can not be found anymore */
> >               mutex_lock(&submit->gpu->fence_lock);
> > @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       submit->exec_state = args->exec_state;
> >       submit->flags = args->flags;
> >
> > +     ret = drm_sched_job_init(&submit->sched_job,
> > +                              &ctx->sched_entity[args->pipe],
> > +                              submit->ctx);
> > +     if (ret)
> > +             goto err_submit_objects;
> > +
>
> With the init moved here you also need to move the
> drm_sched_job_cleanup call from etnaviv_sched_free_job into
> submit_cleanup to avoid the potential memory leak when we bail out
> before pushing the job to the scheduler.

Uh apologies for missing this again, the entire point of v2 was to fix
this across all drivers. But somehow the fixup for etnaviv got lost.
I'll do it now for v3.

Thanks, Daniel

>
> Regards,
> Lucas
>
> >       ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
> >       if (ret)
> >               goto err_submit_objects;
> > @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       }
> >
> >       if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> > -             submit->in_fence = sync_file_get_fence(args->fence_fd);
> > -             if (!submit->in_fence) {
> > +             struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> > +             if (!in_fence) {
> >                       ret = -EINVAL;
> >                       goto err_submit_objects;
> >               }
> > +
> > +             ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> > +             if (ret)
> > +                     goto err_submit_objects;
> >       }
> >
> >       ret = submit_pin_objects(submit);
> > @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       if (ret)
> >               goto err_submit_objects;
> >
> > -     ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> > +     ret = etnaviv_sched_push_job(submit);
> >       if (ret)
> >               goto err_submit_objects;
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index 180bb633d5c5..c98d67320be3 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
> >  static int etnaviv_hw_jobs_limit = 4;
> >  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
> >
> > -static struct dma_fence *
> > -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> > -                      struct drm_sched_entity *entity)
> > -{
> > -     struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > -     struct dma_fence *fence;
> > -     int i;
> > -
> > -     if (unlikely(submit->in_fence)) {
> > -             fence = submit->in_fence;
> > -             submit->in_fence = NULL;
> > -
> > -             if (!dma_fence_is_signaled(fence))
> > -                     return fence;
> > -
> > -             dma_fence_put(fence);
> > -     }
> > -
> > -     for (i = 0; i < submit->nr_bos; i++) {
> > -             struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> > -             int j;
> > -
> > -             if (bo->excl) {
> > -                     fence = bo->excl;
> > -                     bo->excl = NULL;
> > -
> > -                     if (!dma_fence_is_signaled(fence))
> > -                             return fence;
> > -
> > -                     dma_fence_put(fence);
> > -             }
> > -
> > -             for (j = 0; j < bo->nr_shared; j++) {
> > -                     if (!bo->shared[j])
> > -                             continue;
> > -
> > -                     fence = bo->shared[j];
> > -                     bo->shared[j] = NULL;
> > -
> > -                     if (!dma_fence_is_signaled(fence))
> > -                             return fence;
> > -
> > -                     dma_fence_put(fence);
> > -             }
> > -             kfree(bo->shared);
> > -             bo->nr_shared = 0;
> > -             bo->shared = NULL;
> > -     }
> > -
> > -     return NULL;
> > -}
> > -
> >  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
> >  {
> >       struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
> >  }
> >
> >  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> > -     .dependency = etnaviv_sched_dependency,
> >       .run_job = etnaviv_sched_run_job,
> >       .timedout_job = etnaviv_sched_timedout_job,
> >       .free_job = etnaviv_sched_free_job,
> >  };
> >
> > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > -                        struct etnaviv_gem_submit *submit)
> > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
> >  {
> >       int ret = 0;
> >
> > @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >        */
> >       mutex_lock(&submit->gpu->fence_lock);
> >
> > -     ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> > -                              submit->ctx);
> > -     if (ret)
> > -             goto out_unlock;
> > -
> >       drm_sched_job_arm(&submit->sched_job);
> >
> >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > index c0a6796e22c9..baebfa069afc 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
> >
> >  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
> >  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > -                        struct etnaviv_gem_submit *submit);
> > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
> >
> >  #endif /* __ETNAVIV_SCHED_H__ */
>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
  2021-07-07 11:26       ` Daniel Vetter
@ 2021-07-07 11:32         ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:32 UTC (permalink / raw)
  To: Lucas Stach
  Cc: DRI Development, Daniel Vetter, Russell King, Christian Gmeiner,
	Sumit Semwal, Christian König, The etnaviv authors,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 1:26 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Wed, Jul 7, 2021 at 11:08 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> > > We need to pull the drm_sched_job_init much earlier, but that's very
> > > minor surgery.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Lucas Stach <l.stach@pengutronix.de>
> > > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: "Christian König" <christian.koenig@amd.com>
> > > Cc: etnaviv@lists.freedesktop.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > ---
> > >  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
> > >  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
> > >  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
> > >  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
> > >  4 files changed, 20 insertions(+), 81 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > index 98e60df882b6..63688e6e4580 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
> > >       u64 va;
> > >       struct etnaviv_gem_object *obj;
> > >       struct etnaviv_vram_mapping *mapping;
> > > -     struct dma_fence *excl;
> > > -     unsigned int nr_shared;
> > > -     struct dma_fence **shared;
> > >  };
> > >
> > >  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> > > @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
> > >       struct etnaviv_file_private *ctx;
> > >       struct etnaviv_gpu *gpu;
> > >       struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> > > -     struct dma_fence *out_fence, *in_fence;
> > > +     struct dma_fence *out_fence;
> > >       int out_fence_id;
> > >       struct list_head node; /* GPU active submit list */
> > >       struct etnaviv_cmdbuf cmdbuf;
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > index 4dd7d9d541c0..92478a50a580 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
> > >               if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
> > >                       continue;
> > >
> > > -             if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> > > -                     ret = dma_resv_get_fences(robj, &bo->excl,
> > > -                                               &bo->nr_shared,
> > > -                                               &bo->shared);
> > > -                     if (ret)
> > > -                             return ret;
> > > -             } else {
> > > -                     bo->excl = dma_resv_get_excl_unlocked(robj);
> > > -             }
> > > -
> > > +             ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> > > +                                                bo->flags & ETNA_SUBMIT_BO_WRITE);
> > > +             if (ret)
> > > +                     return ret;
> > >       }
> > >
> > >       return ret;
> > > @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
> > >
> > >       wake_up_all(&submit->gpu->fence_event);
> > >
> > > -     if (submit->in_fence)
> > > -             dma_fence_put(submit->in_fence);
> > >       if (submit->out_fence) {
> > >               /* first remove from IDR, so fence can not be found anymore */
> > >               mutex_lock(&submit->gpu->fence_lock);
> > > @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       submit->exec_state = args->exec_state;
> > >       submit->flags = args->flags;
> > >
> > > +     ret = drm_sched_job_init(&submit->sched_job,
> > > +                              &ctx->sched_entity[args->pipe],
> > > +                              submit->ctx);
> > > +     if (ret)
> > > +             goto err_submit_objects;
> > > +
> >
> > With the init moved here you also need to move the
> > drm_sched_job_cleanup call from etnaviv_sched_free_job into
> > submit_cleanup to avoid the potential memory leak when we bail out
> > before pushing the job to the scheduler.
>
> Uh apologies for missing this again, the entire point of v2 was to fix
> this across all drivers. But somehow the fixup for etnaviv got lost.
> I'll do it now for v3.

To clarify, in case you meant I should put it into submit_cleanup():
That doesn't work, because for some of the paths we shouldn't call it
yet, so I think it's better to be explicit here (like I've done with
other drivers) - drm_sched_job_cleanup handles being called
before/after drm_sched_job_arm, but it doesn't cope well with being
called before drm_sched_job_init :-)
-Daniel

>
> Thanks, Daniel
>
> >
> > Regards,
> > Lucas
> >
> > >       ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
> > >       if (ret)
> > >               goto err_submit_objects;
> > > @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       }
> > >
> > >       if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> > > -             submit->in_fence = sync_file_get_fence(args->fence_fd);
> > > -             if (!submit->in_fence) {
> > > +             struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> > > +             if (!in_fence) {
> > >                       ret = -EINVAL;
> > >                       goto err_submit_objects;
> > >               }
> > > +
> > > +             ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> > > +             if (ret)
> > > +                     goto err_submit_objects;
> > >       }
> > >
> > >       ret = submit_pin_objects(submit);
> > > @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       if (ret)
> > >               goto err_submit_objects;
> > >
> > > -     ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> > > +     ret = etnaviv_sched_push_job(submit);
> > >       if (ret)
> > >               goto err_submit_objects;
> > >
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > index 180bb633d5c5..c98d67320be3 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
> > >  static int etnaviv_hw_jobs_limit = 4;
> > >  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
> > >
> > > -static struct dma_fence *
> > > -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> > > -                      struct drm_sched_entity *entity)
> > > -{
> > > -     struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > -     struct dma_fence *fence;
> > > -     int i;
> > > -
> > > -     if (unlikely(submit->in_fence)) {
> > > -             fence = submit->in_fence;
> > > -             submit->in_fence = NULL;
> > > -
> > > -             if (!dma_fence_is_signaled(fence))
> > > -                     return fence;
> > > -
> > > -             dma_fence_put(fence);
> > > -     }
> > > -
> > > -     for (i = 0; i < submit->nr_bos; i++) {
> > > -             struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> > > -             int j;
> > > -
> > > -             if (bo->excl) {
> > > -                     fence = bo->excl;
> > > -                     bo->excl = NULL;
> > > -
> > > -                     if (!dma_fence_is_signaled(fence))
> > > -                             return fence;
> > > -
> > > -                     dma_fence_put(fence);
> > > -             }
> > > -
> > > -             for (j = 0; j < bo->nr_shared; j++) {
> > > -                     if (!bo->shared[j])
> > > -                             continue;
> > > -
> > > -                     fence = bo->shared[j];
> > > -                     bo->shared[j] = NULL;
> > > -
> > > -                     if (!dma_fence_is_signaled(fence))
> > > -                             return fence;
> > > -
> > > -                     dma_fence_put(fence);
> > > -             }
> > > -             kfree(bo->shared);
> > > -             bo->nr_shared = 0;
> > > -             bo->shared = NULL;
> > > -     }
> > > -
> > > -     return NULL;
> > > -}
> > > -
> > >  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
> > >  {
> > >       struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
> > >  }
> > >
> > >  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> > > -     .dependency = etnaviv_sched_dependency,
> > >       .run_job = etnaviv_sched_run_job,
> > >       .timedout_job = etnaviv_sched_timedout_job,
> > >       .free_job = etnaviv_sched_free_job,
> > >  };
> > >
> > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > -                        struct etnaviv_gem_submit *submit)
> > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
> > >  {
> > >       int ret = 0;
> > >
> > > @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > >        */
> > >       mutex_lock(&submit->gpu->fence_lock);
> > >
> > > -     ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> > > -                              submit->ctx);
> > > -     if (ret)
> > > -             goto out_unlock;
> > > -
> > >       drm_sched_job_arm(&submit->sched_job);
> > >
> > >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > index c0a6796e22c9..baebfa069afc 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
> > >
> > >  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
> > >  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > -                        struct etnaviv_gem_submit *submit);
> > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
> > >
> > >  #endif /* __ETNAVIV_SCHED_H__ */
> >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
@ 2021-07-07 11:32         ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 11:32 UTC (permalink / raw)
  To: Lucas Stach
  Cc: The etnaviv authors, DRI Development,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Russell King,
	Daniel Vetter, Christian König,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 1:26 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Wed, Jul 7, 2021 at 11:08 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> > > We need to pull the drm_sched_job_init much earlier, but that's very
> > > minor surgery.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Lucas Stach <l.stach@pengutronix.de>
> > > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > Cc: "Christian König" <christian.koenig@amd.com>
> > > Cc: etnaviv@lists.freedesktop.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > ---
> > >  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
> > >  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
> > >  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
> > >  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
> > >  4 files changed, 20 insertions(+), 81 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > index 98e60df882b6..63688e6e4580 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
> > >       u64 va;
> > >       struct etnaviv_gem_object *obj;
> > >       struct etnaviv_vram_mapping *mapping;
> > > -     struct dma_fence *excl;
> > > -     unsigned int nr_shared;
> > > -     struct dma_fence **shared;
> > >  };
> > >
> > >  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> > > @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
> > >       struct etnaviv_file_private *ctx;
> > >       struct etnaviv_gpu *gpu;
> > >       struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> > > -     struct dma_fence *out_fence, *in_fence;
> > > +     struct dma_fence *out_fence;
> > >       int out_fence_id;
> > >       struct list_head node; /* GPU active submit list */
> > >       struct etnaviv_cmdbuf cmdbuf;
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > index 4dd7d9d541c0..92478a50a580 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
> > >               if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
> > >                       continue;
> > >
> > > -             if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> > > -                     ret = dma_resv_get_fences(robj, &bo->excl,
> > > -                                               &bo->nr_shared,
> > > -                                               &bo->shared);
> > > -                     if (ret)
> > > -                             return ret;
> > > -             } else {
> > > -                     bo->excl = dma_resv_get_excl_unlocked(robj);
> > > -             }
> > > -
> > > +             ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> > > +                                                bo->flags & ETNA_SUBMIT_BO_WRITE);
> > > +             if (ret)
> > > +                     return ret;
> > >       }
> > >
> > >       return ret;
> > > @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
> > >
> > >       wake_up_all(&submit->gpu->fence_event);
> > >
> > > -     if (submit->in_fence)
> > > -             dma_fence_put(submit->in_fence);
> > >       if (submit->out_fence) {
> > >               /* first remove from IDR, so fence can not be found anymore */
> > >               mutex_lock(&submit->gpu->fence_lock);
> > > @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       submit->exec_state = args->exec_state;
> > >       submit->flags = args->flags;
> > >
> > > +     ret = drm_sched_job_init(&submit->sched_job,
> > > +                              &ctx->sched_entity[args->pipe],
> > > +                              submit->ctx);
> > > +     if (ret)
> > > +             goto err_submit_objects;
> > > +
> >
> > With the init moved here you also need to move the
> > drm_sched_job_cleanup call from etnaviv_sched_free_job into
> > submit_cleanup to avoid the potential memory leak when we bail out
> > before pushing the job to the scheduler.
>
> Uh apologies for missing this again, the entire point of v2 was to fix
> this across all drivers. But somehow the fixup for etnaviv got lost.
> I'll do it now for v3.

To clarify, in case you meant I should put it into submit_cleanup():
That doesn't work, because for some of the paths we shouldn't call it
yet, so I think it's better to be explicit here (like I've done with
other drivers) - drm_sched_job_cleanup handles being called
before/after drm_sched_job_arm, but it doesn't cope well with being
called before drm_sched_job_init :-)
-Daniel

>
> Thanks, Daniel
>
> >
> > Regards,
> > Lucas
> >
> > >       ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
> > >       if (ret)
> > >               goto err_submit_objects;
> > > @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       }
> > >
> > >       if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> > > -             submit->in_fence = sync_file_get_fence(args->fence_fd);
> > > -             if (!submit->in_fence) {
> > > +             struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> > > +             if (!in_fence) {
> > >                       ret = -EINVAL;
> > >                       goto err_submit_objects;
> > >               }
> > > +
> > > +             ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> > > +             if (ret)
> > > +                     goto err_submit_objects;
> > >       }
> > >
> > >       ret = submit_pin_objects(submit);
> > > @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       if (ret)
> > >               goto err_submit_objects;
> > >
> > > -     ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> > > +     ret = etnaviv_sched_push_job(submit);
> > >       if (ret)
> > >               goto err_submit_objects;
> > >
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > index 180bb633d5c5..c98d67320be3 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
> > >  static int etnaviv_hw_jobs_limit = 4;
> > >  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
> > >
> > > -static struct dma_fence *
> > > -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> > > -                      struct drm_sched_entity *entity)
> > > -{
> > > -     struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > -     struct dma_fence *fence;
> > > -     int i;
> > > -
> > > -     if (unlikely(submit->in_fence)) {
> > > -             fence = submit->in_fence;
> > > -             submit->in_fence = NULL;
> > > -
> > > -             if (!dma_fence_is_signaled(fence))
> > > -                     return fence;
> > > -
> > > -             dma_fence_put(fence);
> > > -     }
> > > -
> > > -     for (i = 0; i < submit->nr_bos; i++) {
> > > -             struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> > > -             int j;
> > > -
> > > -             if (bo->excl) {
> > > -                     fence = bo->excl;
> > > -                     bo->excl = NULL;
> > > -
> > > -                     if (!dma_fence_is_signaled(fence))
> > > -                             return fence;
> > > -
> > > -                     dma_fence_put(fence);
> > > -             }
> > > -
> > > -             for (j = 0; j < bo->nr_shared; j++) {
> > > -                     if (!bo->shared[j])
> > > -                             continue;
> > > -
> > > -                     fence = bo->shared[j];
> > > -                     bo->shared[j] = NULL;
> > > -
> > > -                     if (!dma_fence_is_signaled(fence))
> > > -                             return fence;
> > > -
> > > -                     dma_fence_put(fence);
> > > -             }
> > > -             kfree(bo->shared);
> > > -             bo->nr_shared = 0;
> > > -             bo->shared = NULL;
> > > -     }
> > > -
> > > -     return NULL;
> > > -}
> > > -
> > >  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
> > >  {
> > >       struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
> > >  }
> > >
> > >  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> > > -     .dependency = etnaviv_sched_dependency,
> > >       .run_job = etnaviv_sched_run_job,
> > >       .timedout_job = etnaviv_sched_timedout_job,
> > >       .free_job = etnaviv_sched_free_job,
> > >  };
> > >
> > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > -                        struct etnaviv_gem_submit *submit)
> > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
> > >  {
> > >       int ret = 0;
> > >
> > > @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > >        */
> > >       mutex_lock(&submit->gpu->fence_lock);
> > >
> > > -     ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> > > -                              submit->ctx);
> > > -     if (ret)
> > > -             goto out_unlock;
> > > -
> > >       drm_sched_job_arm(&submit->sched_job);
> > >
> > >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > index c0a6796e22c9..baebfa069afc 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
> > >
> > >  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
> > >  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > -                        struct etnaviv_gem_submit *submit);
> > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
> > >
> > >  #endif /* __ETNAVIV_SCHED_H__ */
> >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-07 11:14       ` Daniel Vetter
@ 2021-07-07 11:57         ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07 11:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>> This is a very confusingly named function, because not just does it
>>> init an object, it arms it and provides a point of no return for
>>> pushing a job into the scheduler. It would be nice if that's a bit
>>> clearer in the interface.
>>>
>>> But the real reason is that I want to push the dependency tracking
>>> helpers into the scheduler code, and that means drm_sched_job_init
>>> must be called a lot earlier, without arming the job.
>>>
>>> v2:
>>> - don't change .gitignore (Steven)
>>> - don't forget v3d (Emma)
>>>
>>> v3: Emma noticed that I leak the memory allocated in
>>> drm_sched_job_init if we bail out before the point of no return in
>>> subsequent driver patches. To be able to fix this change
>>> drm_sched_job_cleanup() so it can handle being called both before and
>>> after drm_sched_job_arm().
>> Thinking more about this, I'm not sure if this really works.
>>
>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>> to update the entity->rq association.
>>
>> And that can only be done later on when we arm the fence as well.
> Hm yeah, but that's a bug in the existing code I think: We already
> fail to clean up if we fail to allocate the fences. So I think the
> right thing to do here is to split the checks into job_init, and do
> the actual arming/rq selection in job_arm? I'm not entirely sure
> what's all going on there, the first check looks a bit like trying to
> schedule before the entity is set up, which is a driver bug and should
> have a WARN_ON?

No you misunderstood me, the problem is something else.

You asked previously why the call to drm_sched_job_init() was so late in 
the CS.

The reason for this was not alone the scheduler fence init, but also the 
call to drm_sched_entity_select_rq().

> The 2nd check around last_scheduled I have honeslty no idea what it's
> even trying to do.

You mean that here?

         fence = READ_ONCE(entity->last_scheduled);
         if (fence && !dma_fence_is_signaled(fence))
                 return;

This makes sure that load balancing is not moving the entity to a 
different scheduler while there are still jobs running from this entity 
on the hardware,

Regards
Christian.

> -Daniel
>
>> Christian.
>>
>>> Also improve the kerneldoc for this.
>>>
>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>> Cc: Qiang Yu <yuq825@gmail.com>
>>> Cc: Rob Herring <robh@kernel.org>
>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>> Cc: Steven Price <steven.price@arm.com>
>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>> Cc: David Airlie <airlied@linux.ie>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>> Cc: Nick Terrell <terrelln@fb.com>
>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Dave Airlie <airlied@redhat.com>
>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>> Cc: Lee Jones <lee.jones@linaro.org>
>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>> Cc: Chen Li <chenli@uniontech.com>
>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>> Cc: etnaviv@lists.freedesktop.org
>>> Cc: lima@lists.freedesktop.org
>>> Cc: linux-media@vger.kernel.org
>>> Cc: linaro-mm-sig@lists.linaro.org
>>> Cc: Emma Anholt <emma@anholt.net>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>    drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>    drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>    drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>    drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>    drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>    drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>    include/drm/gpu_scheduler.h              |  7 +++-
>>>    10 files changed, 74 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index c5386d13eb4a..a4ec092af9a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>        if (r)
>>>                goto error_unlock;
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        /* No memory allocation is allowed while holding the notifier lock.
>>>         * The lock is held until amdgpu_cs_submit is finished and fence is
>>>         * added to BOs.
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index d33e6d97cc89..5ddb955d2315 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>        if (r)
>>>                return r;
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        *f = dma_fence_get(&job->base.s_fence->finished);
>>>        amdgpu_job_free_resources(job);
>>>        drm_sched_entity_push_job(&job->base, entity);
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> index feb6da1b6ceb..05f412204118 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>        if (ret)
>>>                goto out_unlock;
>>>
>>> +     drm_sched_job_arm(&submit->sched_job);
>>> +
>>>        submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>        submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>                                                submit->out_fence, 0,
>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>> index dba8329937a3..38f755580507 100644
>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>                return err;
>>>        }
>>>
>>> +     drm_sched_job_arm(&task->base);
>>> +
>>>        task->num_bos = num_bos;
>>>        task->vm = lima_vm_get(vm);
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 71a72fb50e6b..2992dc85325f 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>                goto unlock;
>>>        }
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>
>>>        ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 79554aa4dbb1..f7347c284886 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>     * @sched_job: job to submit
>>>     * @entity: scheduler entity
>>>     *
>>> - * Note: To guarantee that the order of insertion to queue matches
>>> - * the job's fence sequence number this function should be
>>> - * called with drm_sched_job_init under common lock.
>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>> + * under common lock.
>>>     *
>>>     * Returns 0 for success, negative error code otherwise.
>>>     */
>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>> index 69de2c76731f..c451ee9a30d7 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>     *
>>>     * Free up the fence memory after the RCU grace period.
>>>     */
>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>    {
>>>        struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>        struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>    }
>>>    EXPORT_SYMBOL(to_drm_sched_fence);
>>>
>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>> -                                            void *owner)
>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>> +                                           void *owner)
>>>    {
>>>        struct drm_sched_fence *fence = NULL;
>>> -     unsigned seq;
>>>
>>>        fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>        if (fence == NULL)
>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>        fence->sched = entity->rq->sched;
>>>        spin_lock_init(&fence->lock);
>>>
>>> +     return fence;
>>> +}
>>> +
>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> +                       struct drm_sched_entity *entity)
>>> +{
>>> +     unsigned seq;
>>> +
>>>        seq = atomic_inc_return(&entity->fence_seq);
>>>        dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>                       &fence->lock, entity->fence_context, seq);
>>>        dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>                       &fence->lock, entity->fence_context + 1, seq);
>>> -
>>> -     return fence;
>>>    }
>>>
>>>    module_init(drm_sched_fence_slab_init);
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 33c414d55fab..5e84e1500c32 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -48,9 +48,11 @@
>>>    #include <linux/wait.h>
>>>    #include <linux/sched.h>
>>>    #include <linux/completion.h>
>>> +#include <linux/dma-resv.h>
>>>    #include <uapi/linux/sched/types.h>
>>>
>>>    #include <drm/drm_print.h>
>>> +#include <drm/drm_gem.h>
>>>    #include <drm/gpu_scheduler.h>
>>>    #include <drm/spsc_queue.h>
>>>
>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>
>>>    /**
>>>     * drm_sched_job_init - init a scheduler job
>>> - *
>>>     * @job: scheduler job to init
>>>     * @entity: scheduler entity to use
>>>     * @owner: job owner for debugging
>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>     * Refer to drm_sched_entity_push_job() documentation
>>>     * for locking considerations.
>>>     *
>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>> + *
>>>     * Returns 0 for success, negative error code otherwise.
>>>     */
>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>        job->sched = sched;
>>>        job->entity = entity;
>>>        job->s_priority = entity->rq - sched->sched_rq;
>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>        if (!job->s_fence)
>>>                return -ENOMEM;
>>>        job->id = atomic64_inc_return(&sched->job_id_count);
>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>    EXPORT_SYMBOL(drm_sched_job_init);
>>>
>>>    /**
>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>> + * @job: scheduler job to arm
>>> + *
>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>> + * or other places that need to track the completion of this job.
>>> + *
>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>> + * considerations.
>>>     *
>>> + * This can only be called if drm_sched_job_init() succeeded.
>>> + */
>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>> +{
>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>> +
>>> +/**
>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>     * @job: scheduler job to clean up
>>> + *
>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>> + *
>>> + * Drivers should call this from their error unwind code if @job is aborted
>>> + * before drm_sched_job_arm() is called.
>>> + *
>>> + * After that point of no return @job is committed to be executed by the
>>> + * scheduler, and this function should be called from the
>>> + * &drm_sched_backend_ops.free_job callback.
>>>     */
>>>    void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>    {
>>> -     dma_fence_put(&job->s_fence->finished);
>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>> +             /* drm_sched_job_arm() has been called */
>>> +             dma_fence_put(&job->s_fence->finished);
>>> +     } else {
>>> +             /* aborted job before committing to run it */
>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>> +     }
>>> +
>>>        job->s_fence = NULL;
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_job_cleanup);
>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>> index 4eb354226972..5c3a99027ecd 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>        if (ret)
>>>                return ret;
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>
>>>        /* put by scheduler job completion */
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>>                       struct drm_sched_entity *entity,
>>>                       void *owner);
>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>    void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>                                    struct drm_gpu_scheduler **sched_list,
>>>                                       unsigned int num_sched_list);
>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>                                   enum drm_sched_priority priority);
>>>    bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>
>>> -struct drm_sched_fence *drm_sched_fence_create(
>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>        struct drm_sched_entity *s_entity, void *owner);
>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> +                       struct drm_sched_entity *entity);
>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>> +
>>>    void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>    void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-07 11:57         ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07 11:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>> This is a very confusingly named function, because not just does it
>>> init an object, it arms it and provides a point of no return for
>>> pushing a job into the scheduler. It would be nice if that's a bit
>>> clearer in the interface.
>>>
>>> But the real reason is that I want to push the dependency tracking
>>> helpers into the scheduler code, and that means drm_sched_job_init
>>> must be called a lot earlier, without arming the job.
>>>
>>> v2:
>>> - don't change .gitignore (Steven)
>>> - don't forget v3d (Emma)
>>>
>>> v3: Emma noticed that I leak the memory allocated in
>>> drm_sched_job_init if we bail out before the point of no return in
>>> subsequent driver patches. To be able to fix this change
>>> drm_sched_job_cleanup() so it can handle being called both before and
>>> after drm_sched_job_arm().
>> Thinking more about this, I'm not sure if this really works.
>>
>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>> to update the entity->rq association.
>>
>> And that can only be done later on when we arm the fence as well.
> Hm yeah, but that's a bug in the existing code I think: We already
> fail to clean up if we fail to allocate the fences. So I think the
> right thing to do here is to split the checks into job_init, and do
> the actual arming/rq selection in job_arm? I'm not entirely sure
> what's all going on there, the first check looks a bit like trying to
> schedule before the entity is set up, which is a driver bug and should
> have a WARN_ON?

No you misunderstood me, the problem is something else.

You asked previously why the call to drm_sched_job_init() was so late in 
the CS.

The reason for this was not alone the scheduler fence init, but also the 
call to drm_sched_entity_select_rq().

> The 2nd check around last_scheduled I have honeslty no idea what it's
> even trying to do.

You mean that here?

         fence = READ_ONCE(entity->last_scheduled);
         if (fence && !dma_fence_is_signaled(fence))
                 return;

This makes sure that load balancing is not moving the entity to a 
different scheduler while there are still jobs running from this entity 
on the hardware,

Regards
Christian.

> -Daniel
>
>> Christian.
>>
>>> Also improve the kerneldoc for this.
>>>
>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>> Cc: Qiang Yu <yuq825@gmail.com>
>>> Cc: Rob Herring <robh@kernel.org>
>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>> Cc: Steven Price <steven.price@arm.com>
>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>> Cc: David Airlie <airlied@linux.ie>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>> Cc: Nick Terrell <terrelln@fb.com>
>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Dave Airlie <airlied@redhat.com>
>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>> Cc: Lee Jones <lee.jones@linaro.org>
>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>> Cc: Chen Li <chenli@uniontech.com>
>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>> Cc: etnaviv@lists.freedesktop.org
>>> Cc: lima@lists.freedesktop.org
>>> Cc: linux-media@vger.kernel.org
>>> Cc: linaro-mm-sig@lists.linaro.org
>>> Cc: Emma Anholt <emma@anholt.net>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>    drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>    drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>    drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>    drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>    drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>    drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>    include/drm/gpu_scheduler.h              |  7 +++-
>>>    10 files changed, 74 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index c5386d13eb4a..a4ec092af9a7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>        if (r)
>>>                goto error_unlock;
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        /* No memory allocation is allowed while holding the notifier lock.
>>>         * The lock is held until amdgpu_cs_submit is finished and fence is
>>>         * added to BOs.
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index d33e6d97cc89..5ddb955d2315 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>        if (r)
>>>                return r;
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        *f = dma_fence_get(&job->base.s_fence->finished);
>>>        amdgpu_job_free_resources(job);
>>>        drm_sched_entity_push_job(&job->base, entity);
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> index feb6da1b6ceb..05f412204118 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>        if (ret)
>>>                goto out_unlock;
>>>
>>> +     drm_sched_job_arm(&submit->sched_job);
>>> +
>>>        submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>        submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>                                                submit->out_fence, 0,
>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>> index dba8329937a3..38f755580507 100644
>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>                return err;
>>>        }
>>>
>>> +     drm_sched_job_arm(&task->base);
>>> +
>>>        task->num_bos = num_bos;
>>>        task->vm = lima_vm_get(vm);
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 71a72fb50e6b..2992dc85325f 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>                goto unlock;
>>>        }
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>
>>>        ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 79554aa4dbb1..f7347c284886 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>     * @sched_job: job to submit
>>>     * @entity: scheduler entity
>>>     *
>>> - * Note: To guarantee that the order of insertion to queue matches
>>> - * the job's fence sequence number this function should be
>>> - * called with drm_sched_job_init under common lock.
>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>> + * under common lock.
>>>     *
>>>     * Returns 0 for success, negative error code otherwise.
>>>     */
>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>> index 69de2c76731f..c451ee9a30d7 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>     *
>>>     * Free up the fence memory after the RCU grace period.
>>>     */
>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>    {
>>>        struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>        struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>    }
>>>    EXPORT_SYMBOL(to_drm_sched_fence);
>>>
>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>> -                                            void *owner)
>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>> +                                           void *owner)
>>>    {
>>>        struct drm_sched_fence *fence = NULL;
>>> -     unsigned seq;
>>>
>>>        fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>        if (fence == NULL)
>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>        fence->sched = entity->rq->sched;
>>>        spin_lock_init(&fence->lock);
>>>
>>> +     return fence;
>>> +}
>>> +
>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> +                       struct drm_sched_entity *entity)
>>> +{
>>> +     unsigned seq;
>>> +
>>>        seq = atomic_inc_return(&entity->fence_seq);
>>>        dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>                       &fence->lock, entity->fence_context, seq);
>>>        dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>                       &fence->lock, entity->fence_context + 1, seq);
>>> -
>>> -     return fence;
>>>    }
>>>
>>>    module_init(drm_sched_fence_slab_init);
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 33c414d55fab..5e84e1500c32 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -48,9 +48,11 @@
>>>    #include <linux/wait.h>
>>>    #include <linux/sched.h>
>>>    #include <linux/completion.h>
>>> +#include <linux/dma-resv.h>
>>>    #include <uapi/linux/sched/types.h>
>>>
>>>    #include <drm/drm_print.h>
>>> +#include <drm/drm_gem.h>
>>>    #include <drm/gpu_scheduler.h>
>>>    #include <drm/spsc_queue.h>
>>>
>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>
>>>    /**
>>>     * drm_sched_job_init - init a scheduler job
>>> - *
>>>     * @job: scheduler job to init
>>>     * @entity: scheduler entity to use
>>>     * @owner: job owner for debugging
>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>     * Refer to drm_sched_entity_push_job() documentation
>>>     * for locking considerations.
>>>     *
>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>> + *
>>>     * Returns 0 for success, negative error code otherwise.
>>>     */
>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>        job->sched = sched;
>>>        job->entity = entity;
>>>        job->s_priority = entity->rq - sched->sched_rq;
>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>        if (!job->s_fence)
>>>                return -ENOMEM;
>>>        job->id = atomic64_inc_return(&sched->job_id_count);
>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>    EXPORT_SYMBOL(drm_sched_job_init);
>>>
>>>    /**
>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>> + * @job: scheduler job to arm
>>> + *
>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>> + * or other places that need to track the completion of this job.
>>> + *
>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>> + * considerations.
>>>     *
>>> + * This can only be called if drm_sched_job_init() succeeded.
>>> + */
>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>> +{
>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>> +}
>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>> +
>>> +/**
>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>     * @job: scheduler job to clean up
>>> + *
>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>> + *
>>> + * Drivers should call this from their error unwind code if @job is aborted
>>> + * before drm_sched_job_arm() is called.
>>> + *
>>> + * After that point of no return @job is committed to be executed by the
>>> + * scheduler, and this function should be called from the
>>> + * &drm_sched_backend_ops.free_job callback.
>>>     */
>>>    void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>    {
>>> -     dma_fence_put(&job->s_fence->finished);
>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>> +             /* drm_sched_job_arm() has been called */
>>> +             dma_fence_put(&job->s_fence->finished);
>>> +     } else {
>>> +             /* aborted job before committing to run it */
>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>> +     }
>>> +
>>>        job->s_fence = NULL;
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_job_cleanup);
>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>> index 4eb354226972..5c3a99027ecd 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>        if (ret)
>>>                return ret;
>>>
>>> +     drm_sched_job_arm(&job->base);
>>> +
>>>        job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>
>>>        /* put by scheduler job completion */
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>>                       struct drm_sched_entity *entity,
>>>                       void *owner);
>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>    void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>                                    struct drm_gpu_scheduler **sched_list,
>>>                                       unsigned int num_sched_list);
>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>                                   enum drm_sched_priority priority);
>>>    bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>
>>> -struct drm_sched_fence *drm_sched_fence_create(
>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>        struct drm_sched_entity *s_entity, void *owner);
>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>> +                       struct drm_sched_entity *entity);
>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>> +
>>>    void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>    void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-07 11:57         ` Christian König
@ 2021-07-07 12:13           ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 12:13 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> > On Wed, Jul 7, 2021 at 11:30 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>> This is a very confusingly named function, because not just does it
> >>> init an object, it arms it and provides a point of no return for
> >>> pushing a job into the scheduler. It would be nice if that's a bit
> >>> clearer in the interface.
> >>>
> >>> But the real reason is that I want to push the dependency tracking
> >>> helpers into the scheduler code, and that means drm_sched_job_init
> >>> must be called a lot earlier, without arming the job.
> >>>
> >>> v2:
> >>> - don't change .gitignore (Steven)
> >>> - don't forget v3d (Emma)
> >>>
> >>> v3: Emma noticed that I leak the memory allocated in
> >>> drm_sched_job_init if we bail out before the point of no return in
> >>> subsequent driver patches. To be able to fix this change
> >>> drm_sched_job_cleanup() so it can handle being called both before and
> >>> after drm_sched_job_arm().
> >> Thinking more about this, I'm not sure if this really works.
> >>
> >> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >> to update the entity->rq association.
> >>
> >> And that can only be done later on when we arm the fence as well.
> > Hm yeah, but that's a bug in the existing code I think: We already
> > fail to clean up if we fail to allocate the fences. So I think the
> > right thing to do here is to split the checks into job_init, and do
> > the actual arming/rq selection in job_arm? I'm not entirely sure
> > what's all going on there, the first check looks a bit like trying to
> > schedule before the entity is set up, which is a driver bug and should
> > have a WARN_ON?
>
> No you misunderstood me, the problem is something else.
>
> You asked previously why the call to drm_sched_job_init() was so late in
> the CS.
>
> The reason for this was not alone the scheduler fence init, but also the
> call to drm_sched_entity_select_rq().

Ah ok, I think I can fix that. Needs a prep patch to first make
drm_sched_entity_select infallible, then should be easy to do.

> > The 2nd check around last_scheduled I have honeslty no idea what it's
> > even trying to do.
>
> You mean that here?
>
>          fence = READ_ONCE(entity->last_scheduled);
>          if (fence && !dma_fence_is_signaled(fence))
>                  return;
>
> This makes sure that load balancing is not moving the entity to a
> different scheduler while there are still jobs running from this entity
> on the hardware,

Yeah after a nap that idea crossed my mind too. But now I have locking
questions, afaiui the scheduler thread updates this, without taking
any locks - entity dequeuing is lockless. And here we read the fence
and then seem to yolo check whether it's signalled? What's preventing
a use-after-free here? There's no rcu or anything going on here at
all, and it's outside of the spinlock section, which starts a bit
further down.
-Daniel

>
> Regards
> Christian.
>
> > -Daniel
> >
> >> Christian.
> >>
> >>> Also improve the kerneldoc for this.
> >>>
> >>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>> Cc: Qiang Yu <yuq825@gmail.com>
> >>> Cc: Rob Herring <robh@kernel.org>
> >>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>> Cc: Steven Price <steven.price@arm.com>
> >>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>> Cc: David Airlie <airlied@linux.ie>
> >>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>> Cc: "Christian König" <christian.koenig@amd.com>
> >>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>> Cc: Kees Cook <keescook@chromium.org>
> >>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>> Cc: Nick Terrell <terrelln@fb.com>
> >>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>> Cc: Dave Airlie <airlied@redhat.com>
> >>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>> Cc: Lee Jones <lee.jones@linaro.org>
> >>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>> Cc: Chen Li <chenli@uniontech.com>
> >>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>> Cc: etnaviv@lists.freedesktop.org
> >>> Cc: lima@lists.freedesktop.org
> >>> Cc: linux-media@vger.kernel.org
> >>> Cc: linaro-mm-sig@lists.linaro.org
> >>> Cc: Emma Anholt <emma@anholt.net>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>    drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>    drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>    drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>    drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>    drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>    drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>    include/drm/gpu_scheduler.h              |  7 +++-
> >>>    10 files changed, 74 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> index c5386d13eb4a..a4ec092af9a7 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>        if (r)
> >>>                goto error_unlock;
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        /* No memory allocation is allowed while holding the notifier lock.
> >>>         * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>         * added to BOs.
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> index d33e6d97cc89..5ddb955d2315 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>        if (r)
> >>>                return r;
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        *f = dma_fence_get(&job->base.s_fence->finished);
> >>>        amdgpu_job_free_resources(job);
> >>>        drm_sched_entity_push_job(&job->base, entity);
> >>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>> index feb6da1b6ceb..05f412204118 100644
> >>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>        if (ret)
> >>>                goto out_unlock;
> >>>
> >>> +     drm_sched_job_arm(&submit->sched_job);
> >>> +
> >>>        submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>        submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>                                                submit->out_fence, 0,
> >>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>> index dba8329937a3..38f755580507 100644
> >>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>                return err;
> >>>        }
> >>>
> >>> +     drm_sched_job_arm(&task->base);
> >>> +
> >>>        task->num_bos = num_bos;
> >>>        task->vm = lima_vm_get(vm);
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>> index 71a72fb50e6b..2992dc85325f 100644
> >>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>                goto unlock;
> >>>        }
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>
> >>>        ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>> index 79554aa4dbb1..f7347c284886 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>     * @sched_job: job to submit
> >>>     * @entity: scheduler entity
> >>>     *
> >>> - * Note: To guarantee that the order of insertion to queue matches
> >>> - * the job's fence sequence number this function should be
> >>> - * called with drm_sched_job_init under common lock.
> >>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>> + * under common lock.
> >>>     *
> >>>     * Returns 0 for success, negative error code otherwise.
> >>>     */
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>> index 69de2c76731f..c451ee9a30d7 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>     *
> >>>     * Free up the fence memory after the RCU grace period.
> >>>     */
> >>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>    {
> >>>        struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>        struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>    }
> >>>    EXPORT_SYMBOL(to_drm_sched_fence);
> >>>
> >>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>> -                                            void *owner)
> >>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>> +                                           void *owner)
> >>>    {
> >>>        struct drm_sched_fence *fence = NULL;
> >>> -     unsigned seq;
> >>>
> >>>        fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>        if (fence == NULL)
> >>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>        fence->sched = entity->rq->sched;
> >>>        spin_lock_init(&fence->lock);
> >>>
> >>> +     return fence;
> >>> +}
> >>> +
> >>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>> +                       struct drm_sched_entity *entity)
> >>> +{
> >>> +     unsigned seq;
> >>> +
> >>>        seq = atomic_inc_return(&entity->fence_seq);
> >>>        dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>                       &fence->lock, entity->fence_context, seq);
> >>>        dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>                       &fence->lock, entity->fence_context + 1, seq);
> >>> -
> >>> -     return fence;
> >>>    }
> >>>
> >>>    module_init(drm_sched_fence_slab_init);
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index 33c414d55fab..5e84e1500c32 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -48,9 +48,11 @@
> >>>    #include <linux/wait.h>
> >>>    #include <linux/sched.h>
> >>>    #include <linux/completion.h>
> >>> +#include <linux/dma-resv.h>
> >>>    #include <uapi/linux/sched/types.h>
> >>>
> >>>    #include <drm/drm_print.h>
> >>> +#include <drm/drm_gem.h>
> >>>    #include <drm/gpu_scheduler.h>
> >>>    #include <drm/spsc_queue.h>
> >>>
> >>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>
> >>>    /**
> >>>     * drm_sched_job_init - init a scheduler job
> >>> - *
> >>>     * @job: scheduler job to init
> >>>     * @entity: scheduler entity to use
> >>>     * @owner: job owner for debugging
> >>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>     * Refer to drm_sched_entity_push_job() documentation
> >>>     * for locking considerations.
> >>>     *
> >>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>> + *
> >>>     * Returns 0 for success, negative error code otherwise.
> >>>     */
> >>>    int drm_sched_job_init(struct drm_sched_job *job,
> >>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>        job->sched = sched;
> >>>        job->entity = entity;
> >>>        job->s_priority = entity->rq - sched->sched_rq;
> >>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>        if (!job->s_fence)
> >>>                return -ENOMEM;
> >>>        job->id = atomic64_inc_return(&sched->job_id_count);
> >>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>    EXPORT_SYMBOL(drm_sched_job_init);
> >>>
> >>>    /**
> >>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>> + * @job: scheduler job to arm
> >>> + *
> >>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>> + * or other places that need to track the completion of this job.
> >>> + *
> >>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>> + * considerations.
> >>>     *
> >>> + * This can only be called if drm_sched_job_init() succeeded.
> >>> + */
> >>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>> +{
> >>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>> +}
> >>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>> +
> >>> +/**
> >>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>     * @job: scheduler job to clean up
> >>> + *
> >>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>> + *
> >>> + * Drivers should call this from their error unwind code if @job is aborted
> >>> + * before drm_sched_job_arm() is called.
> >>> + *
> >>> + * After that point of no return @job is committed to be executed by the
> >>> + * scheduler, and this function should be called from the
> >>> + * &drm_sched_backend_ops.free_job callback.
> >>>     */
> >>>    void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>    {
> >>> -     dma_fence_put(&job->s_fence->finished);
> >>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>> +             /* drm_sched_job_arm() has been called */
> >>> +             dma_fence_put(&job->s_fence->finished);
> >>> +     } else {
> >>> +             /* aborted job before committing to run it */
> >>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>> +     }
> >>> +
> >>>        job->s_fence = NULL;
> >>>    }
> >>>    EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>> index 4eb354226972..5c3a99027ecd 100644
> >>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>        if (ret)
> >>>                return ret;
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>
> >>>        /* put by scheduler job completion */
> >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>> --- a/include/drm/gpu_scheduler.h
> >>> +++ b/include/drm/gpu_scheduler.h
> >>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>    int drm_sched_job_init(struct drm_sched_job *job,
> >>>                       struct drm_sched_entity *entity,
> >>>                       void *owner);
> >>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>    void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>                                    struct drm_gpu_scheduler **sched_list,
> >>>                                       unsigned int num_sched_list);
> >>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>                                   enum drm_sched_priority priority);
> >>>    bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>
> >>> -struct drm_sched_fence *drm_sched_fence_create(
> >>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>        struct drm_sched_entity *s_entity, void *owner);
> >>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>> +                       struct drm_sched_entity *entity);
> >>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>> +
> >>>    void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>    void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-07 12:13           ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 12:13 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> > On Wed, Jul 7, 2021 at 11:30 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>> This is a very confusingly named function, because not just does it
> >>> init an object, it arms it and provides a point of no return for
> >>> pushing a job into the scheduler. It would be nice if that's a bit
> >>> clearer in the interface.
> >>>
> >>> But the real reason is that I want to push the dependency tracking
> >>> helpers into the scheduler code, and that means drm_sched_job_init
> >>> must be called a lot earlier, without arming the job.
> >>>
> >>> v2:
> >>> - don't change .gitignore (Steven)
> >>> - don't forget v3d (Emma)
> >>>
> >>> v3: Emma noticed that I leak the memory allocated in
> >>> drm_sched_job_init if we bail out before the point of no return in
> >>> subsequent driver patches. To be able to fix this change
> >>> drm_sched_job_cleanup() so it can handle being called both before and
> >>> after drm_sched_job_arm().
> >> Thinking more about this, I'm not sure if this really works.
> >>
> >> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >> to update the entity->rq association.
> >>
> >> And that can only be done later on when we arm the fence as well.
> > Hm yeah, but that's a bug in the existing code I think: We already
> > fail to clean up if we fail to allocate the fences. So I think the
> > right thing to do here is to split the checks into job_init, and do
> > the actual arming/rq selection in job_arm? I'm not entirely sure
> > what's all going on there, the first check looks a bit like trying to
> > schedule before the entity is set up, which is a driver bug and should
> > have a WARN_ON?
>
> No you misunderstood me, the problem is something else.
>
> You asked previously why the call to drm_sched_job_init() was so late in
> the CS.
>
> The reason for this was not alone the scheduler fence init, but also the
> call to drm_sched_entity_select_rq().

Ah ok, I think I can fix that. Needs a prep patch to first make
drm_sched_entity_select infallible, then should be easy to do.

> > The 2nd check around last_scheduled I have honeslty no idea what it's
> > even trying to do.
>
> You mean that here?
>
>          fence = READ_ONCE(entity->last_scheduled);
>          if (fence && !dma_fence_is_signaled(fence))
>                  return;
>
> This makes sure that load balancing is not moving the entity to a
> different scheduler while there are still jobs running from this entity
> on the hardware,

Yeah after a nap that idea crossed my mind too. But now I have locking
questions, afaiui the scheduler thread updates this, without taking
any locks - entity dequeuing is lockless. And here we read the fence
and then seem to yolo check whether it's signalled? What's preventing
a use-after-free here? There's no rcu or anything going on here at
all, and it's outside of the spinlock section, which starts a bit
further down.
-Daniel

>
> Regards
> Christian.
>
> > -Daniel
> >
> >> Christian.
> >>
> >>> Also improve the kerneldoc for this.
> >>>
> >>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>> Cc: Qiang Yu <yuq825@gmail.com>
> >>> Cc: Rob Herring <robh@kernel.org>
> >>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>> Cc: Steven Price <steven.price@arm.com>
> >>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>> Cc: David Airlie <airlied@linux.ie>
> >>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>> Cc: "Christian König" <christian.koenig@amd.com>
> >>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>> Cc: Kees Cook <keescook@chromium.org>
> >>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>> Cc: Nick Terrell <terrelln@fb.com>
> >>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>> Cc: Dave Airlie <airlied@redhat.com>
> >>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>> Cc: Lee Jones <lee.jones@linaro.org>
> >>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>> Cc: Chen Li <chenli@uniontech.com>
> >>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>> Cc: etnaviv@lists.freedesktop.org
> >>> Cc: lima@lists.freedesktop.org
> >>> Cc: linux-media@vger.kernel.org
> >>> Cc: linaro-mm-sig@lists.linaro.org
> >>> Cc: Emma Anholt <emma@anholt.net>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>    drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>    drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>    drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>    drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>    drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>    drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>    include/drm/gpu_scheduler.h              |  7 +++-
> >>>    10 files changed, 74 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> index c5386d13eb4a..a4ec092af9a7 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>        if (r)
> >>>                goto error_unlock;
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        /* No memory allocation is allowed while holding the notifier lock.
> >>>         * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>         * added to BOs.
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> index d33e6d97cc89..5ddb955d2315 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>        if (r)
> >>>                return r;
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        *f = dma_fence_get(&job->base.s_fence->finished);
> >>>        amdgpu_job_free_resources(job);
> >>>        drm_sched_entity_push_job(&job->base, entity);
> >>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>> index feb6da1b6ceb..05f412204118 100644
> >>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>        if (ret)
> >>>                goto out_unlock;
> >>>
> >>> +     drm_sched_job_arm(&submit->sched_job);
> >>> +
> >>>        submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>        submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>                                                submit->out_fence, 0,
> >>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>> index dba8329937a3..38f755580507 100644
> >>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>                return err;
> >>>        }
> >>>
> >>> +     drm_sched_job_arm(&task->base);
> >>> +
> >>>        task->num_bos = num_bos;
> >>>        task->vm = lima_vm_get(vm);
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>> index 71a72fb50e6b..2992dc85325f 100644
> >>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>                goto unlock;
> >>>        }
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>
> >>>        ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>> index 79554aa4dbb1..f7347c284886 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>     * @sched_job: job to submit
> >>>     * @entity: scheduler entity
> >>>     *
> >>> - * Note: To guarantee that the order of insertion to queue matches
> >>> - * the job's fence sequence number this function should be
> >>> - * called with drm_sched_job_init under common lock.
> >>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>> + * under common lock.
> >>>     *
> >>>     * Returns 0 for success, negative error code otherwise.
> >>>     */
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>> index 69de2c76731f..c451ee9a30d7 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>     *
> >>>     * Free up the fence memory after the RCU grace period.
> >>>     */
> >>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>    {
> >>>        struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>        struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>    }
> >>>    EXPORT_SYMBOL(to_drm_sched_fence);
> >>>
> >>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>> -                                            void *owner)
> >>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>> +                                           void *owner)
> >>>    {
> >>>        struct drm_sched_fence *fence = NULL;
> >>> -     unsigned seq;
> >>>
> >>>        fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>        if (fence == NULL)
> >>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>        fence->sched = entity->rq->sched;
> >>>        spin_lock_init(&fence->lock);
> >>>
> >>> +     return fence;
> >>> +}
> >>> +
> >>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>> +                       struct drm_sched_entity *entity)
> >>> +{
> >>> +     unsigned seq;
> >>> +
> >>>        seq = atomic_inc_return(&entity->fence_seq);
> >>>        dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>                       &fence->lock, entity->fence_context, seq);
> >>>        dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>                       &fence->lock, entity->fence_context + 1, seq);
> >>> -
> >>> -     return fence;
> >>>    }
> >>>
> >>>    module_init(drm_sched_fence_slab_init);
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index 33c414d55fab..5e84e1500c32 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -48,9 +48,11 @@
> >>>    #include <linux/wait.h>
> >>>    #include <linux/sched.h>
> >>>    #include <linux/completion.h>
> >>> +#include <linux/dma-resv.h>
> >>>    #include <uapi/linux/sched/types.h>
> >>>
> >>>    #include <drm/drm_print.h>
> >>> +#include <drm/drm_gem.h>
> >>>    #include <drm/gpu_scheduler.h>
> >>>    #include <drm/spsc_queue.h>
> >>>
> >>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>
> >>>    /**
> >>>     * drm_sched_job_init - init a scheduler job
> >>> - *
> >>>     * @job: scheduler job to init
> >>>     * @entity: scheduler entity to use
> >>>     * @owner: job owner for debugging
> >>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>     * Refer to drm_sched_entity_push_job() documentation
> >>>     * for locking considerations.
> >>>     *
> >>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>> + *
> >>>     * Returns 0 for success, negative error code otherwise.
> >>>     */
> >>>    int drm_sched_job_init(struct drm_sched_job *job,
> >>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>        job->sched = sched;
> >>>        job->entity = entity;
> >>>        job->s_priority = entity->rq - sched->sched_rq;
> >>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>        if (!job->s_fence)
> >>>                return -ENOMEM;
> >>>        job->id = atomic64_inc_return(&sched->job_id_count);
> >>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>    EXPORT_SYMBOL(drm_sched_job_init);
> >>>
> >>>    /**
> >>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>> + * @job: scheduler job to arm
> >>> + *
> >>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>> + * or other places that need to track the completion of this job.
> >>> + *
> >>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>> + * considerations.
> >>>     *
> >>> + * This can only be called if drm_sched_job_init() succeeded.
> >>> + */
> >>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>> +{
> >>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>> +}
> >>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>> +
> >>> +/**
> >>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>     * @job: scheduler job to clean up
> >>> + *
> >>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>> + *
> >>> + * Drivers should call this from their error unwind code if @job is aborted
> >>> + * before drm_sched_job_arm() is called.
> >>> + *
> >>> + * After that point of no return @job is committed to be executed by the
> >>> + * scheduler, and this function should be called from the
> >>> + * &drm_sched_backend_ops.free_job callback.
> >>>     */
> >>>    void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>    {
> >>> -     dma_fence_put(&job->s_fence->finished);
> >>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>> +             /* drm_sched_job_arm() has been called */
> >>> +             dma_fence_put(&job->s_fence->finished);
> >>> +     } else {
> >>> +             /* aborted job before committing to run it */
> >>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>> +     }
> >>> +
> >>>        job->s_fence = NULL;
> >>>    }
> >>>    EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>> index 4eb354226972..5c3a99027ecd 100644
> >>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>        if (ret)
> >>>                return ret;
> >>>
> >>> +     drm_sched_job_arm(&job->base);
> >>> +
> >>>        job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>
> >>>        /* put by scheduler job completion */
> >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>> --- a/include/drm/gpu_scheduler.h
> >>> +++ b/include/drm/gpu_scheduler.h
> >>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>    int drm_sched_job_init(struct drm_sched_job *job,
> >>>                       struct drm_sched_entity *entity,
> >>>                       void *owner);
> >>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>    void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>                                    struct drm_gpu_scheduler **sched_list,
> >>>                                       unsigned int num_sched_list);
> >>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>                                   enum drm_sched_priority priority);
> >>>    bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>
> >>> -struct drm_sched_fence *drm_sched_fence_create(
> >>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>        struct drm_sched_entity *s_entity, void *owner);
> >>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>> +                       struct drm_sched_entity *entity);
> >>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>> +
> >>>    void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>    void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
  2021-07-07 11:32         ` Daniel Vetter
@ 2021-07-07 12:34           ` Lucas Stach
  -1 siblings, 0 replies; 58+ messages in thread
From: Lucas Stach @ 2021-07-07 12:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Daniel Vetter, Russell King, Christian Gmeiner,
	Sumit Semwal, Christian König, The etnaviv authors,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

Am Mittwoch, dem 07.07.2021 um 13:32 +0200 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 1:26 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Wed, Jul 7, 2021 at 11:08 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > > Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> > > > We need to pull the drm_sched_job_init much earlier, but that's very
> > > > minor surgery.
> > > > 
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Lucas Stach <l.stach@pengutronix.de>
> > > > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > > > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > Cc: "Christian König" <christian.koenig@amd.com>
> > > > Cc: etnaviv@lists.freedesktop.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > ---
> > > >  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
> > > >  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
> > > >  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
> > > >  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
> > > >  4 files changed, 20 insertions(+), 81 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > > index 98e60df882b6..63688e6e4580 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > > @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
> > > >       u64 va;
> > > >       struct etnaviv_gem_object *obj;
> > > >       struct etnaviv_vram_mapping *mapping;
> > > > -     struct dma_fence *excl;
> > > > -     unsigned int nr_shared;
> > > > -     struct dma_fence **shared;
> > > >  };
> > > > 
> > > >  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> > > > @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
> > > >       struct etnaviv_file_private *ctx;
> > > >       struct etnaviv_gpu *gpu;
> > > >       struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> > > > -     struct dma_fence *out_fence, *in_fence;
> > > > +     struct dma_fence *out_fence;
> > > >       int out_fence_id;
> > > >       struct list_head node; /* GPU active submit list */
> > > >       struct etnaviv_cmdbuf cmdbuf;
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > > index 4dd7d9d541c0..92478a50a580 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > > @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
> > > >               if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
> > > >                       continue;
> > > > 
> > > > -             if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> > > > -                     ret = dma_resv_get_fences(robj, &bo->excl,
> > > > -                                               &bo->nr_shared,
> > > > -                                               &bo->shared);
> > > > -                     if (ret)
> > > > -                             return ret;
> > > > -             } else {
> > > > -                     bo->excl = dma_resv_get_excl_unlocked(robj);
> > > > -             }
> > > > -
> > > > +             ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> > > > +                                                bo->flags & ETNA_SUBMIT_BO_WRITE);
> > > > +             if (ret)
> > > > +                     return ret;
> > > >       }
> > > > 
> > > >       return ret;
> > > > @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
> > > > 
> > > >       wake_up_all(&submit->gpu->fence_event);
> > > > 
> > > > -     if (submit->in_fence)
> > > > -             dma_fence_put(submit->in_fence);
> > > >       if (submit->out_fence) {
> > > >               /* first remove from IDR, so fence can not be found anymore */
> > > >               mutex_lock(&submit->gpu->fence_lock);
> > > > @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       submit->exec_state = args->exec_state;
> > > >       submit->flags = args->flags;
> > > > 
> > > > +     ret = drm_sched_job_init(&submit->sched_job,
> > > > +                              &ctx->sched_entity[args->pipe],
> > > > +                              submit->ctx);
> > > > +     if (ret)
> > > > +             goto err_submit_objects;
> > > > +
> > > 
> > > With the init moved here you also need to move the
> > > drm_sched_job_cleanup call from etnaviv_sched_free_job into
> > > submit_cleanup to avoid the potential memory leak when we bail out
> > > before pushing the job to the scheduler.
> > 
> > Uh apologies for missing this again, the entire point of v2 was to fix
> > this across all drivers. But somehow the fixup for etnaviv got lost.
> > I'll do it now for v3.
> 
> To clarify, in case you meant I should put it into submit_cleanup():
> That doesn't work, because for some of the paths we shouldn't call it
> yet, so I think it's better to be explicit here (like I've done with
> other drivers) - drm_sched_job_cleanup handles being called
> before/after drm_sched_job_arm, but it doesn't cope well with being
> called before drm_sched_job_init :-)

Yes, that was just my first idea to make sure it's always called. If
this is problematic in some cases I don't care if your solution looks
different, all I care about is that drm_sched_job_cleanup is called
when needed. :)

Regards,
Lucas

> -Daniel
> 
> > 
> > Thanks, Daniel
> > 
> > > 
> > > Regards,
> > > Lucas
> > > 
> > > >       ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
> > > >       if (ret)
> > > >               goto err_submit_objects;
> > > > @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       }
> > > > 
> > > >       if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> > > > -             submit->in_fence = sync_file_get_fence(args->fence_fd);
> > > > -             if (!submit->in_fence) {
> > > > +             struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> > > > +             if (!in_fence) {
> > > >                       ret = -EINVAL;
> > > >                       goto err_submit_objects;
> > > >               }
> > > > +
> > > > +             ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> > > > +             if (ret)
> > > > +                     goto err_submit_objects;
> > > >       }
> > > > 
> > > >       ret = submit_pin_objects(submit);
> > > > @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       if (ret)
> > > >               goto err_submit_objects;
> > > > 
> > > > -     ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> > > > +     ret = etnaviv_sched_push_job(submit);
> > > >       if (ret)
> > > >               goto err_submit_objects;
> > > > 
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > index 180bb633d5c5..c98d67320be3 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
> > > >  static int etnaviv_hw_jobs_limit = 4;
> > > >  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
> > > > 
> > > > -static struct dma_fence *
> > > > -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> > > > -                      struct drm_sched_entity *entity)
> > > > -{
> > > > -     struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > > -     struct dma_fence *fence;
> > > > -     int i;
> > > > -
> > > > -     if (unlikely(submit->in_fence)) {
> > > > -             fence = submit->in_fence;
> > > > -             submit->in_fence = NULL;
> > > > -
> > > > -             if (!dma_fence_is_signaled(fence))
> > > > -                     return fence;
> > > > -
> > > > -             dma_fence_put(fence);
> > > > -     }
> > > > -
> > > > -     for (i = 0; i < submit->nr_bos; i++) {
> > > > -             struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> > > > -             int j;
> > > > -
> > > > -             if (bo->excl) {
> > > > -                     fence = bo->excl;
> > > > -                     bo->excl = NULL;
> > > > -
> > > > -                     if (!dma_fence_is_signaled(fence))
> > > > -                             return fence;
> > > > -
> > > > -                     dma_fence_put(fence);
> > > > -             }
> > > > -
> > > > -             for (j = 0; j < bo->nr_shared; j++) {
> > > > -                     if (!bo->shared[j])
> > > > -                             continue;
> > > > -
> > > > -                     fence = bo->shared[j];
> > > > -                     bo->shared[j] = NULL;
> > > > -
> > > > -                     if (!dma_fence_is_signaled(fence))
> > > > -                             return fence;
> > > > -
> > > > -                     dma_fence_put(fence);
> > > > -             }
> > > > -             kfree(bo->shared);
> > > > -             bo->nr_shared = 0;
> > > > -             bo->shared = NULL;
> > > > -     }
> > > > -
> > > > -     return NULL;
> > > > -}
> > > > -
> > > >  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
> > > >  {
> > > >       struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > > @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
> > > >  }
> > > > 
> > > >  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> > > > -     .dependency = etnaviv_sched_dependency,
> > > >       .run_job = etnaviv_sched_run_job,
> > > >       .timedout_job = etnaviv_sched_timedout_job,
> > > >       .free_job = etnaviv_sched_free_job,
> > > >  };
> > > > 
> > > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > > -                        struct etnaviv_gem_submit *submit)
> > > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
> > > >  {
> > > >       int ret = 0;
> > > > 
> > > > @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > >        */
> > > >       mutex_lock(&submit->gpu->fence_lock);
> > > > 
> > > > -     ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> > > > -                              submit->ctx);
> > > > -     if (ret)
> > > > -             goto out_unlock;
> > > > -
> > > >       drm_sched_job_arm(&submit->sched_job);
> > > > 
> > > >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > > index c0a6796e22c9..baebfa069afc 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > > @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
> > > > 
> > > >  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
> > > >  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> > > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > > -                        struct etnaviv_gem_submit *submit);
> > > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
> > > > 
> > > >  #endif /* __ETNAVIV_SCHED_H__ */
> > > 
> > > 
> > 
> > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> 
> 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling
@ 2021-07-07 12:34           ` Lucas Stach
  0 siblings, 0 replies; 58+ messages in thread
From: Lucas Stach @ 2021-07-07 12:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: The etnaviv authors, DRI Development,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Russell King,
	Daniel Vetter, Christian König,
	open list:DMA BUFFER SHARING FRAMEWORK

Am Mittwoch, dem 07.07.2021 um 13:32 +0200 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 1:26 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Wed, Jul 7, 2021 at 11:08 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > > Am Freitag, dem 02.07.2021 um 23:38 +0200 schrieb Daniel Vetter:
> > > > We need to pull the drm_sched_job_init much earlier, but that's very
> > > > minor surgery.
> > > > 
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Lucas Stach <l.stach@pengutronix.de>
> > > > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > > > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > Cc: "Christian König" <christian.koenig@amd.com>
> > > > Cc: etnaviv@lists.freedesktop.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > ---
> > > >  drivers/gpu/drm/etnaviv/etnaviv_gem.h        |  5 +-
> > > >  drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++-----
> > > >  drivers/gpu/drm/etnaviv/etnaviv_sched.c      | 61 +-------------------
> > > >  drivers/gpu/drm/etnaviv/etnaviv_sched.h      |  3 +-
> > > >  4 files changed, 20 insertions(+), 81 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > > index 98e60df882b6..63688e6e4580 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h
> > > > @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo {
> > > >       u64 va;
> > > >       struct etnaviv_gem_object *obj;
> > > >       struct etnaviv_vram_mapping *mapping;
> > > > -     struct dma_fence *excl;
> > > > -     unsigned int nr_shared;
> > > > -     struct dma_fence **shared;
> > > >  };
> > > > 
> > > >  /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
> > > > @@ -95,7 +92,7 @@ struct etnaviv_gem_submit {
> > > >       struct etnaviv_file_private *ctx;
> > > >       struct etnaviv_gpu *gpu;
> > > >       struct etnaviv_iommu_context *mmu_context, *prev_mmu_context;
> > > > -     struct dma_fence *out_fence, *in_fence;
> > > > +     struct dma_fence *out_fence;
> > > >       int out_fence_id;
> > > >       struct list_head node; /* GPU active submit list */
> > > >       struct etnaviv_cmdbuf cmdbuf;
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > > index 4dd7d9d541c0..92478a50a580 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
> > > > @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit)
> > > >               if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
> > > >                       continue;
> > > > 
> > > > -             if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
> > > > -                     ret = dma_resv_get_fences(robj, &bo->excl,
> > > > -                                               &bo->nr_shared,
> > > > -                                               &bo->shared);
> > > > -                     if (ret)
> > > > -                             return ret;
> > > > -             } else {
> > > > -                     bo->excl = dma_resv_get_excl_unlocked(robj);
> > > > -             }
> > > > -
> > > > +             ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
> > > > +                                                bo->flags & ETNA_SUBMIT_BO_WRITE);
> > > > +             if (ret)
> > > > +                     return ret;
> > > >       }
> > > > 
> > > >       return ret;
> > > > @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref)
> > > > 
> > > >       wake_up_all(&submit->gpu->fence_event);
> > > > 
> > > > -     if (submit->in_fence)
> > > > -             dma_fence_put(submit->in_fence);
> > > >       if (submit->out_fence) {
> > > >               /* first remove from IDR, so fence can not be found anymore */
> > > >               mutex_lock(&submit->gpu->fence_lock);
> > > > @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       submit->exec_state = args->exec_state;
> > > >       submit->flags = args->flags;
> > > > 
> > > > +     ret = drm_sched_job_init(&submit->sched_job,
> > > > +                              &ctx->sched_entity[args->pipe],
> > > > +                              submit->ctx);
> > > > +     if (ret)
> > > > +             goto err_submit_objects;
> > > > +
> > > 
> > > With the init moved here you also need to move the
> > > drm_sched_job_cleanup call from etnaviv_sched_free_job into
> > > submit_cleanup to avoid the potential memory leak when we bail out
> > > before pushing the job to the scheduler.
> > 
> > Uh apologies for missing this again, the entire point of v2 was to fix
> > this across all drivers. But somehow the fixup for etnaviv got lost.
> > I'll do it now for v3.
> 
> To clarify, in case you meant I should put it into submit_cleanup():
> That doesn't work, because for some of the paths we shouldn't call it
> yet, so I think it's better to be explicit here (like I've done with
> other drivers) - drm_sched_job_cleanup handles being called
> before/after drm_sched_job_arm, but it doesn't cope well with being
> called before drm_sched_job_init :-)

Yes, that was just my first idea to make sure it's always called. If
this is problematic in some cases I don't care if your solution looks
different, all I care about is that drm_sched_job_cleanup is called
when needed. :)

Regards,
Lucas

> -Daniel
> 
> > 
> > Thanks, Daniel
> > 
> > > 
> > > Regards,
> > > Lucas
> > > 
> > > >       ret = submit_lookup_objects(submit, file, bos, args->nr_bos);
> > > >       if (ret)
> > > >               goto err_submit_objects;
> > > > @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       }
> > > > 
> > > >       if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) {
> > > > -             submit->in_fence = sync_file_get_fence(args->fence_fd);
> > > > -             if (!submit->in_fence) {
> > > > +             struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd);
> > > > +             if (!in_fence) {
> > > >                       ret = -EINVAL;
> > > >                       goto err_submit_objects;
> > > >               }
> > > > +
> > > > +             ret = drm_sched_job_await_fence(&submit->sched_job, in_fence);
> > > > +             if (ret)
> > > > +                     goto err_submit_objects;
> > > >       }
> > > > 
> > > >       ret = submit_pin_objects(submit);
> > > > @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       if (ret)
> > > >               goto err_submit_objects;
> > > > 
> > > > -     ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit);
> > > > +     ret = etnaviv_sched_push_job(submit);
> > > >       if (ret)
> > > >               goto err_submit_objects;
> > > > 
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > index 180bb633d5c5..c98d67320be3 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444);
> > > >  static int etnaviv_hw_jobs_limit = 4;
> > > >  module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444);
> > > > 
> > > > -static struct dma_fence *
> > > > -etnaviv_sched_dependency(struct drm_sched_job *sched_job,
> > > > -                      struct drm_sched_entity *entity)
> > > > -{
> > > > -     struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > > -     struct dma_fence *fence;
> > > > -     int i;
> > > > -
> > > > -     if (unlikely(submit->in_fence)) {
> > > > -             fence = submit->in_fence;
> > > > -             submit->in_fence = NULL;
> > > > -
> > > > -             if (!dma_fence_is_signaled(fence))
> > > > -                     return fence;
> > > > -
> > > > -             dma_fence_put(fence);
> > > > -     }
> > > > -
> > > > -     for (i = 0; i < submit->nr_bos; i++) {
> > > > -             struct etnaviv_gem_submit_bo *bo = &submit->bos[i];
> > > > -             int j;
> > > > -
> > > > -             if (bo->excl) {
> > > > -                     fence = bo->excl;
> > > > -                     bo->excl = NULL;
> > > > -
> > > > -                     if (!dma_fence_is_signaled(fence))
> > > > -                             return fence;
> > > > -
> > > > -                     dma_fence_put(fence);
> > > > -             }
> > > > -
> > > > -             for (j = 0; j < bo->nr_shared; j++) {
> > > > -                     if (!bo->shared[j])
> > > > -                             continue;
> > > > -
> > > > -                     fence = bo->shared[j];
> > > > -                     bo->shared[j] = NULL;
> > > > -
> > > > -                     if (!dma_fence_is_signaled(fence))
> > > > -                             return fence;
> > > > -
> > > > -                     dma_fence_put(fence);
> > > > -             }
> > > > -             kfree(bo->shared);
> > > > -             bo->nr_shared = 0;
> > > > -             bo->shared = NULL;
> > > > -     }
> > > > -
> > > > -     return NULL;
> > > > -}
> > > > -
> > > >  static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job)
> > > >  {
> > > >       struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
> > > > @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job)
> > > >  }
> > > > 
> > > >  static const struct drm_sched_backend_ops etnaviv_sched_ops = {
> > > > -     .dependency = etnaviv_sched_dependency,
> > > >       .run_job = etnaviv_sched_run_job,
> > > >       .timedout_job = etnaviv_sched_timedout_job,
> > > >       .free_job = etnaviv_sched_free_job,
> > > >  };
> > > > 
> > > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > > -                        struct etnaviv_gem_submit *submit)
> > > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit)
> > > >  {
> > > >       int ret = 0;
> > > > 
> > > > @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > >        */
> > > >       mutex_lock(&submit->gpu->fence_lock);
> > > > 
> > > > -     ret = drm_sched_job_init(&submit->sched_job, sched_entity,
> > > > -                              submit->ctx);
> > > > -     if (ret)
> > > > -             goto out_unlock;
> > > > -
> > > >       drm_sched_job_arm(&submit->sched_job);
> > > > 
> > > >       submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > > index c0a6796e22c9..baebfa069afc 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h
> > > > @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job)
> > > > 
> > > >  int etnaviv_sched_init(struct etnaviv_gpu *gpu);
> > > >  void etnaviv_sched_fini(struct etnaviv_gpu *gpu);
> > > > -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > > -                        struct etnaviv_gem_submit *submit);
> > > > +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit);
> > > > 
> > > >  #endif /* __ETNAVIV_SCHED_H__ */
> > > 
> > > 
> > 
> > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> 
> 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-07 12:13           ` Daniel Vetter
@ 2021-07-07 12:58             ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07 12:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>> This is a very confusingly named function, because not just does it
>>>>> init an object, it arms it and provides a point of no return for
>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>> clearer in the interface.
>>>>>
>>>>> But the real reason is that I want to push the dependency tracking
>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>> must be called a lot earlier, without arming the job.
>>>>>
>>>>> v2:
>>>>> - don't change .gitignore (Steven)
>>>>> - don't forget v3d (Emma)
>>>>>
>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>> subsequent driver patches. To be able to fix this change
>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>> after drm_sched_job_arm().
>>>> Thinking more about this, I'm not sure if this really works.
>>>>
>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>> to update the entity->rq association.
>>>>
>>>> And that can only be done later on when we arm the fence as well.
>>> Hm yeah, but that's a bug in the existing code I think: We already
>>> fail to clean up if we fail to allocate the fences. So I think the
>>> right thing to do here is to split the checks into job_init, and do
>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>> what's all going on there, the first check looks a bit like trying to
>>> schedule before the entity is set up, which is a driver bug and should
>>> have a WARN_ON?
>> No you misunderstood me, the problem is something else.
>>
>> You asked previously why the call to drm_sched_job_init() was so late in
>> the CS.
>>
>> The reason for this was not alone the scheduler fence init, but also the
>> call to drm_sched_entity_select_rq().
> Ah ok, I think I can fix that. Needs a prep patch to first make
> drm_sched_entity_select infallible, then should be easy to do.
>
>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>> even trying to do.
>> You mean that here?
>>
>>           fence = READ_ONCE(entity->last_scheduled);
>>           if (fence && !dma_fence_is_signaled(fence))
>>                   return;
>>
>> This makes sure that load balancing is not moving the entity to a
>> different scheduler while there are still jobs running from this entity
>> on the hardware,
> Yeah after a nap that idea crossed my mind too. But now I have locking
> questions, afaiui the scheduler thread updates this, without taking
> any locks - entity dequeuing is lockless. And here we read the fence
> and then seem to yolo check whether it's signalled? What's preventing
> a use-after-free here? There's no rcu or anything going on here at
> all, and it's outside of the spinlock section, which starts a bit
> further down.

The last_scheduled fence of an entity can only change when there are 
jobs on the entities queued, and we have just ruled that out in the 
check before.

Christian.


> -Daniel
>
>> Regards
>> Christian.
>>
>>> -Daniel
>>>
>>>> Christian.
>>>>
>>>>> Also improve the kerneldoc for this.
>>>>>
>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>> Cc: lima@lists.freedesktop.org
>>>>> Cc: linux-media@vger.kernel.org
>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>     drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>     drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>     drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>     drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>     drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>     include/drm/gpu_scheduler.h              |  7 +++-
>>>>>     10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>         if (r)
>>>>>                 goto error_unlock;
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         /* No memory allocation is allowed while holding the notifier lock.
>>>>>          * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>          * added to BOs.
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>         if (r)
>>>>>                 return r;
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>         amdgpu_job_free_resources(job);
>>>>>         drm_sched_entity_push_job(&job->base, entity);
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>         if (ret)
>>>>>                 goto out_unlock;
>>>>>
>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>> +
>>>>>         submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>         submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>                                                 submit->out_fence, 0,
>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>> index dba8329937a3..38f755580507 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>                 return err;
>>>>>         }
>>>>>
>>>>> +     drm_sched_job_arm(&task->base);
>>>>> +
>>>>>         task->num_bos = num_bos;
>>>>>         task->vm = lima_vm_get(vm);
>>>>>
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>                 goto unlock;
>>>>>         }
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>
>>>>>         ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>      * @sched_job: job to submit
>>>>>      * @entity: scheduler entity
>>>>>      *
>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>> - * the job's fence sequence number this function should be
>>>>> - * called with drm_sched_job_init under common lock.
>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>> + * under common lock.
>>>>>      *
>>>>>      * Returns 0 for success, negative error code otherwise.
>>>>>      */
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>      *
>>>>>      * Free up the fence memory after the RCU grace period.
>>>>>      */
>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>     {
>>>>>         struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>         struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>     }
>>>>>     EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>
>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>> -                                            void *owner)
>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>> +                                           void *owner)
>>>>>     {
>>>>>         struct drm_sched_fence *fence = NULL;
>>>>> -     unsigned seq;
>>>>>
>>>>>         fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>         if (fence == NULL)
>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>         fence->sched = entity->rq->sched;
>>>>>         spin_lock_init(&fence->lock);
>>>>>
>>>>> +     return fence;
>>>>> +}
>>>>> +
>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>> +                       struct drm_sched_entity *entity)
>>>>> +{
>>>>> +     unsigned seq;
>>>>> +
>>>>>         seq = atomic_inc_return(&entity->fence_seq);
>>>>>         dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>                        &fence->lock, entity->fence_context, seq);
>>>>>         dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>                        &fence->lock, entity->fence_context + 1, seq);
>>>>> -
>>>>> -     return fence;
>>>>>     }
>>>>>
>>>>>     module_init(drm_sched_fence_slab_init);
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -48,9 +48,11 @@
>>>>>     #include <linux/wait.h>
>>>>>     #include <linux/sched.h>
>>>>>     #include <linux/completion.h>
>>>>> +#include <linux/dma-resv.h>
>>>>>     #include <uapi/linux/sched/types.h>
>>>>>
>>>>>     #include <drm/drm_print.h>
>>>>> +#include <drm/drm_gem.h>
>>>>>     #include <drm/gpu_scheduler.h>
>>>>>     #include <drm/spsc_queue.h>
>>>>>
>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>
>>>>>     /**
>>>>>      * drm_sched_job_init - init a scheduler job
>>>>> - *
>>>>>      * @job: scheduler job to init
>>>>>      * @entity: scheduler entity to use
>>>>>      * @owner: job owner for debugging
>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>      * Refer to drm_sched_entity_push_job() documentation
>>>>>      * for locking considerations.
>>>>>      *
>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>> + *
>>>>>      * Returns 0 for success, negative error code otherwise.
>>>>>      */
>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>         job->sched = sched;
>>>>>         job->entity = entity;
>>>>>         job->s_priority = entity->rq - sched->sched_rq;
>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>         if (!job->s_fence)
>>>>>                 return -ENOMEM;
>>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>     EXPORT_SYMBOL(drm_sched_job_init);
>>>>>
>>>>>     /**
>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>> + * @job: scheduler job to arm
>>>>> + *
>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>> + * or other places that need to track the completion of this job.
>>>>> + *
>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>> + * considerations.
>>>>>      *
>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>> + */
>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>> +{
>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>> +
>>>>> +/**
>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>      * @job: scheduler job to clean up
>>>>> + *
>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>> + *
>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>> + * before drm_sched_job_arm() is called.
>>>>> + *
>>>>> + * After that point of no return @job is committed to be executed by the
>>>>> + * scheduler, and this function should be called from the
>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>      */
>>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>     {
>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>> +             /* drm_sched_job_arm() has been called */
>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>> +     } else {
>>>>> +             /* aborted job before committing to run it */
>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>> +     }
>>>>> +
>>>>>         job->s_fence = NULL;
>>>>>     }
>>>>>     EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>         if (ret)
>>>>>                 return ret;
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>
>>>>>         /* put by scheduler job completion */
>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>>                        struct drm_sched_entity *entity,
>>>>>                        void *owner);
>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>     void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>                                     struct drm_gpu_scheduler **sched_list,
>>>>>                                        unsigned int num_sched_list);
>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>                                    enum drm_sched_priority priority);
>>>>>     bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>
>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>         struct drm_sched_entity *s_entity, void *owner);
>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>> +                       struct drm_sched_entity *entity);
>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>> +
>>>>>     void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>     void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-07 12:58             ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-07 12:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>> This is a very confusingly named function, because not just does it
>>>>> init an object, it arms it and provides a point of no return for
>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>> clearer in the interface.
>>>>>
>>>>> But the real reason is that I want to push the dependency tracking
>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>> must be called a lot earlier, without arming the job.
>>>>>
>>>>> v2:
>>>>> - don't change .gitignore (Steven)
>>>>> - don't forget v3d (Emma)
>>>>>
>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>> subsequent driver patches. To be able to fix this change
>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>> after drm_sched_job_arm().
>>>> Thinking more about this, I'm not sure if this really works.
>>>>
>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>> to update the entity->rq association.
>>>>
>>>> And that can only be done later on when we arm the fence as well.
>>> Hm yeah, but that's a bug in the existing code I think: We already
>>> fail to clean up if we fail to allocate the fences. So I think the
>>> right thing to do here is to split the checks into job_init, and do
>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>> what's all going on there, the first check looks a bit like trying to
>>> schedule before the entity is set up, which is a driver bug and should
>>> have a WARN_ON?
>> No you misunderstood me, the problem is something else.
>>
>> You asked previously why the call to drm_sched_job_init() was so late in
>> the CS.
>>
>> The reason for this was not alone the scheduler fence init, but also the
>> call to drm_sched_entity_select_rq().
> Ah ok, I think I can fix that. Needs a prep patch to first make
> drm_sched_entity_select infallible, then should be easy to do.
>
>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>> even trying to do.
>> You mean that here?
>>
>>           fence = READ_ONCE(entity->last_scheduled);
>>           if (fence && !dma_fence_is_signaled(fence))
>>                   return;
>>
>> This makes sure that load balancing is not moving the entity to a
>> different scheduler while there are still jobs running from this entity
>> on the hardware,
> Yeah after a nap that idea crossed my mind too. But now I have locking
> questions, afaiui the scheduler thread updates this, without taking
> any locks - entity dequeuing is lockless. And here we read the fence
> and then seem to yolo check whether it's signalled? What's preventing
> a use-after-free here? There's no rcu or anything going on here at
> all, and it's outside of the spinlock section, which starts a bit
> further down.

The last_scheduled fence of an entity can only change when there are 
jobs on the entities queued, and we have just ruled that out in the 
check before.

Christian.


> -Daniel
>
>> Regards
>> Christian.
>>
>>> -Daniel
>>>
>>>> Christian.
>>>>
>>>>> Also improve the kerneldoc for this.
>>>>>
>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>> Cc: lima@lists.freedesktop.org
>>>>> Cc: linux-media@vger.kernel.org
>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>     drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>     drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>     drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>     drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>     drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>     include/drm/gpu_scheduler.h              |  7 +++-
>>>>>     10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>         if (r)
>>>>>                 goto error_unlock;
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         /* No memory allocation is allowed while holding the notifier lock.
>>>>>          * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>          * added to BOs.
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>         if (r)
>>>>>                 return r;
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>         amdgpu_job_free_resources(job);
>>>>>         drm_sched_entity_push_job(&job->base, entity);
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>         if (ret)
>>>>>                 goto out_unlock;
>>>>>
>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>> +
>>>>>         submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>         submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>                                                 submit->out_fence, 0,
>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>> index dba8329937a3..38f755580507 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>                 return err;
>>>>>         }
>>>>>
>>>>> +     drm_sched_job_arm(&task->base);
>>>>> +
>>>>>         task->num_bos = num_bos;
>>>>>         task->vm = lima_vm_get(vm);
>>>>>
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>                 goto unlock;
>>>>>         }
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>
>>>>>         ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>      * @sched_job: job to submit
>>>>>      * @entity: scheduler entity
>>>>>      *
>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>> - * the job's fence sequence number this function should be
>>>>> - * called with drm_sched_job_init under common lock.
>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>> + * under common lock.
>>>>>      *
>>>>>      * Returns 0 for success, negative error code otherwise.
>>>>>      */
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>      *
>>>>>      * Free up the fence memory after the RCU grace period.
>>>>>      */
>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>     {
>>>>>         struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>         struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>     }
>>>>>     EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>
>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>> -                                            void *owner)
>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>> +                                           void *owner)
>>>>>     {
>>>>>         struct drm_sched_fence *fence = NULL;
>>>>> -     unsigned seq;
>>>>>
>>>>>         fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>         if (fence == NULL)
>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>         fence->sched = entity->rq->sched;
>>>>>         spin_lock_init(&fence->lock);
>>>>>
>>>>> +     return fence;
>>>>> +}
>>>>> +
>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>> +                       struct drm_sched_entity *entity)
>>>>> +{
>>>>> +     unsigned seq;
>>>>> +
>>>>>         seq = atomic_inc_return(&entity->fence_seq);
>>>>>         dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>                        &fence->lock, entity->fence_context, seq);
>>>>>         dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>                        &fence->lock, entity->fence_context + 1, seq);
>>>>> -
>>>>> -     return fence;
>>>>>     }
>>>>>
>>>>>     module_init(drm_sched_fence_slab_init);
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -48,9 +48,11 @@
>>>>>     #include <linux/wait.h>
>>>>>     #include <linux/sched.h>
>>>>>     #include <linux/completion.h>
>>>>> +#include <linux/dma-resv.h>
>>>>>     #include <uapi/linux/sched/types.h>
>>>>>
>>>>>     #include <drm/drm_print.h>
>>>>> +#include <drm/drm_gem.h>
>>>>>     #include <drm/gpu_scheduler.h>
>>>>>     #include <drm/spsc_queue.h>
>>>>>
>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>
>>>>>     /**
>>>>>      * drm_sched_job_init - init a scheduler job
>>>>> - *
>>>>>      * @job: scheduler job to init
>>>>>      * @entity: scheduler entity to use
>>>>>      * @owner: job owner for debugging
>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>      * Refer to drm_sched_entity_push_job() documentation
>>>>>      * for locking considerations.
>>>>>      *
>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>> + *
>>>>>      * Returns 0 for success, negative error code otherwise.
>>>>>      */
>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>         job->sched = sched;
>>>>>         job->entity = entity;
>>>>>         job->s_priority = entity->rq - sched->sched_rq;
>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>         if (!job->s_fence)
>>>>>                 return -ENOMEM;
>>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>     EXPORT_SYMBOL(drm_sched_job_init);
>>>>>
>>>>>     /**
>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>> + * @job: scheduler job to arm
>>>>> + *
>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>> + * or other places that need to track the completion of this job.
>>>>> + *
>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>> + * considerations.
>>>>>      *
>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>> + */
>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>> +{
>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>> +
>>>>> +/**
>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>      * @job: scheduler job to clean up
>>>>> + *
>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>> + *
>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>> + * before drm_sched_job_arm() is called.
>>>>> + *
>>>>> + * After that point of no return @job is committed to be executed by the
>>>>> + * scheduler, and this function should be called from the
>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>      */
>>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>     {
>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>> +             /* drm_sched_job_arm() has been called */
>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>> +     } else {
>>>>> +             /* aborted job before committing to run it */
>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>> +     }
>>>>> +
>>>>>         job->s_fence = NULL;
>>>>>     }
>>>>>     EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>         if (ret)
>>>>>                 return ret;
>>>>>
>>>>> +     drm_sched_job_arm(&job->base);
>>>>> +
>>>>>         job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>
>>>>>         /* put by scheduler job completion */
>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>>                        struct drm_sched_entity *entity,
>>>>>                        void *owner);
>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>     void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>                                     struct drm_gpu_scheduler **sched_list,
>>>>>                                        unsigned int num_sched_list);
>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>                                    enum drm_sched_priority priority);
>>>>>     bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>
>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>         struct drm_sched_entity *s_entity, void *owner);
>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>> +                       struct drm_sched_entity *entity);
>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>> +
>>>>>     void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>     void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-07 12:58             ` Christian König
@ 2021-07-07 16:32               ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 16:32 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> > On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> >> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> >>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> >>> <christian.koenig@amd.com> wrote:
> >>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>>>> This is a very confusingly named function, because not just does it
> >>>>> init an object, it arms it and provides a point of no return for
> >>>>> pushing a job into the scheduler. It would be nice if that's a bit
> >>>>> clearer in the interface.
> >>>>>
> >>>>> But the real reason is that I want to push the dependency tracking
> >>>>> helpers into the scheduler code, and that means drm_sched_job_init
> >>>>> must be called a lot earlier, without arming the job.
> >>>>>
> >>>>> v2:
> >>>>> - don't change .gitignore (Steven)
> >>>>> - don't forget v3d (Emma)
> >>>>>
> >>>>> v3: Emma noticed that I leak the memory allocated in
> >>>>> drm_sched_job_init if we bail out before the point of no return in
> >>>>> subsequent driver patches. To be able to fix this change
> >>>>> drm_sched_job_cleanup() so it can handle being called both before and
> >>>>> after drm_sched_job_arm().
> >>>> Thinking more about this, I'm not sure if this really works.
> >>>>
> >>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >>>> to update the entity->rq association.
> >>>>
> >>>> And that can only be done later on when we arm the fence as well.
> >>> Hm yeah, but that's a bug in the existing code I think: We already
> >>> fail to clean up if we fail to allocate the fences. So I think the
> >>> right thing to do here is to split the checks into job_init, and do
> >>> the actual arming/rq selection in job_arm? I'm not entirely sure
> >>> what's all going on there, the first check looks a bit like trying to
> >>> schedule before the entity is set up, which is a driver bug and should
> >>> have a WARN_ON?
> >> No you misunderstood me, the problem is something else.
> >>
> >> You asked previously why the call to drm_sched_job_init() was so late in
> >> the CS.
> >>
> >> The reason for this was not alone the scheduler fence init, but also the
> >> call to drm_sched_entity_select_rq().
> > Ah ok, I think I can fix that. Needs a prep patch to first make
> > drm_sched_entity_select infallible, then should be easy to do.
> >
> >>> The 2nd check around last_scheduled I have honeslty no idea what it's
> >>> even trying to do.
> >> You mean that here?
> >>
> >>           fence = READ_ONCE(entity->last_scheduled);
> >>           if (fence && !dma_fence_is_signaled(fence))
> >>                   return;
> >>
> >> This makes sure that load balancing is not moving the entity to a
> >> different scheduler while there are still jobs running from this entity
> >> on the hardware,
> > Yeah after a nap that idea crossed my mind too. But now I have locking
> > questions, afaiui the scheduler thread updates this, without taking
> > any locks - entity dequeuing is lockless. And here we read the fence
> > and then seem to yolo check whether it's signalled? What's preventing
> > a use-after-free here? There's no rcu or anything going on here at
> > all, and it's outside of the spinlock section, which starts a bit
> > further down.
>
> The last_scheduled fence of an entity can only change when there are
> jobs on the entities queued, and we have just ruled that out in the
> check before.

There aren't any barriers, so the cpu could easily run the two checks
the other way round. I'll ponder this and figure out where exactly we
need docs for the constraint and/or barriers to make this work as
intended. As-is I'm not seeing how it does ...
-Daniel

> Christian.
>
>
> > -Daniel
> >
> >> Regards
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>> Christian.
> >>>>
> >>>>> Also improve the kerneldoc for this.
> >>>>>
> >>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>>>> Cc: Qiang Yu <yuq825@gmail.com>
> >>>>> Cc: Rob Herring <robh@kernel.org>
> >>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>>>> Cc: Steven Price <steven.price@arm.com>
> >>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>>>> Cc: David Airlie <airlied@linux.ie>
> >>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>>>> Cc: Kees Cook <keescook@chromium.org>
> >>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>>>> Cc: Nick Terrell <terrelln@fb.com>
> >>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>> Cc: Dave Airlie <airlied@redhat.com>
> >>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>>>> Cc: Lee Jones <lee.jones@linaro.org>
> >>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>>>> Cc: Chen Li <chenli@uniontech.com>
> >>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>> Cc: etnaviv@lists.freedesktop.org
> >>>>> Cc: lima@lists.freedesktop.org
> >>>>> Cc: linux-media@vger.kernel.org
> >>>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>>> Cc: Emma Anholt <emma@anholt.net>
> >>>>> ---
> >>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>>>     drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>>>     drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>>>     drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>>>     drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>>>     drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>>>     drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>>>     include/drm/gpu_scheduler.h              |  7 +++-
> >>>>>     10 files changed, 74 insertions(+), 14 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>> index c5386d13eb4a..a4ec092af9a7 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>>>         if (r)
> >>>>>                 goto error_unlock;
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         /* No memory allocation is allowed while holding the notifier lock.
> >>>>>          * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>>>          * added to BOs.
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> index d33e6d97cc89..5ddb955d2315 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>>>         if (r)
> >>>>>                 return r;
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         *f = dma_fence_get(&job->base.s_fence->finished);
> >>>>>         amdgpu_job_free_resources(job);
> >>>>>         drm_sched_entity_push_job(&job->base, entity);
> >>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>> index feb6da1b6ceb..05f412204118 100644
> >>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>>>         if (ret)
> >>>>>                 goto out_unlock;
> >>>>>
> >>>>> +     drm_sched_job_arm(&submit->sched_job);
> >>>>> +
> >>>>>         submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>>>         submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>>>                                                 submit->out_fence, 0,
> >>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>>>> index dba8329937a3..38f755580507 100644
> >>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>>>                 return err;
> >>>>>         }
> >>>>>
> >>>>> +     drm_sched_job_arm(&task->base);
> >>>>> +
> >>>>>         task->num_bos = num_bos;
> >>>>>         task->vm = lima_vm_get(vm);
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>> index 71a72fb50e6b..2992dc85325f 100644
> >>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>>>                 goto unlock;
> >>>>>         }
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>
> >>>>>         ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>> index 79554aa4dbb1..f7347c284886 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>>>      * @sched_job: job to submit
> >>>>>      * @entity: scheduler entity
> >>>>>      *
> >>>>> - * Note: To guarantee that the order of insertion to queue matches
> >>>>> - * the job's fence sequence number this function should be
> >>>>> - * called with drm_sched_job_init under common lock.
> >>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>>>> + * under common lock.
> >>>>>      *
> >>>>>      * Returns 0 for success, negative error code otherwise.
> >>>>>      */
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>> index 69de2c76731f..c451ee9a30d7 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>>>      *
> >>>>>      * Free up the fence memory after the RCU grace period.
> >>>>>      */
> >>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>     {
> >>>>>         struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>>>         struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>>>     }
> >>>>>     EXPORT_SYMBOL(to_drm_sched_fence);
> >>>>>
> >>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>> -                                            void *owner)
> >>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>>>> +                                           void *owner)
> >>>>>     {
> >>>>>         struct drm_sched_fence *fence = NULL;
> >>>>> -     unsigned seq;
> >>>>>
> >>>>>         fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>>>         if (fence == NULL)
> >>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>         fence->sched = entity->rq->sched;
> >>>>>         spin_lock_init(&fence->lock);
> >>>>>
> >>>>> +     return fence;
> >>>>> +}
> >>>>> +
> >>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>> +                       struct drm_sched_entity *entity)
> >>>>> +{
> >>>>> +     unsigned seq;
> >>>>> +
> >>>>>         seq = atomic_inc_return(&entity->fence_seq);
> >>>>>         dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>>>                        &fence->lock, entity->fence_context, seq);
> >>>>>         dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>>>                        &fence->lock, entity->fence_context + 1, seq);
> >>>>> -
> >>>>> -     return fence;
> >>>>>     }
> >>>>>
> >>>>>     module_init(drm_sched_fence_slab_init);
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> index 33c414d55fab..5e84e1500c32 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> @@ -48,9 +48,11 @@
> >>>>>     #include <linux/wait.h>
> >>>>>     #include <linux/sched.h>
> >>>>>     #include <linux/completion.h>
> >>>>> +#include <linux/dma-resv.h>
> >>>>>     #include <uapi/linux/sched/types.h>
> >>>>>
> >>>>>     #include <drm/drm_print.h>
> >>>>> +#include <drm/drm_gem.h>
> >>>>>     #include <drm/gpu_scheduler.h>
> >>>>>     #include <drm/spsc_queue.h>
> >>>>>
> >>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>
> >>>>>     /**
> >>>>>      * drm_sched_job_init - init a scheduler job
> >>>>> - *
> >>>>>      * @job: scheduler job to init
> >>>>>      * @entity: scheduler entity to use
> >>>>>      * @owner: job owner for debugging
> >>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>      * Refer to drm_sched_entity_push_job() documentation
> >>>>>      * for locking considerations.
> >>>>>      *
> >>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>>>> + *
> >>>>>      * Returns 0 for success, negative error code otherwise.
> >>>>>      */
> >>>>>     int drm_sched_job_init(struct drm_sched_job *job,
> >>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>         job->sched = sched;
> >>>>>         job->entity = entity;
> >>>>>         job->s_priority = entity->rq - sched->sched_rq;
> >>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>>>         if (!job->s_fence)
> >>>>>                 return -ENOMEM;
> >>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
> >>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>     EXPORT_SYMBOL(drm_sched_job_init);
> >>>>>
> >>>>>     /**
> >>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>>>> + * @job: scheduler job to arm
> >>>>> + *
> >>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>>>> + * or other places that need to track the completion of this job.
> >>>>> + *
> >>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>>>> + * considerations.
> >>>>>      *
> >>>>> + * This can only be called if drm_sched_job_init() succeeded.
> >>>>> + */
> >>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>>>> +{
> >>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>>>> +
> >>>>> +/**
> >>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>      * @job: scheduler job to clean up
> >>>>> + *
> >>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>>>> + *
> >>>>> + * Drivers should call this from their error unwind code if @job is aborted
> >>>>> + * before drm_sched_job_arm() is called.
> >>>>> + *
> >>>>> + * After that point of no return @job is committed to be executed by the
> >>>>> + * scheduler, and this function should be called from the
> >>>>> + * &drm_sched_backend_ops.free_job callback.
> >>>>>      */
> >>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>>>     {
> >>>>> -     dma_fence_put(&job->s_fence->finished);
> >>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>>>> +             /* drm_sched_job_arm() has been called */
> >>>>> +             dma_fence_put(&job->s_fence->finished);
> >>>>> +     } else {
> >>>>> +             /* aborted job before committing to run it */
> >>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>>>> +     }
> >>>>> +
> >>>>>         job->s_fence = NULL;
> >>>>>     }
> >>>>>     EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>> index 4eb354226972..5c3a99027ecd 100644
> >>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>>>         if (ret)
> >>>>>                 return ret;
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>
> >>>>>         /* put by scheduler job completion */
> >>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>>>> --- a/include/drm/gpu_scheduler.h
> >>>>> +++ b/include/drm/gpu_scheduler.h
> >>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>>>     int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>                        struct drm_sched_entity *entity,
> >>>>>                        void *owner);
> >>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>>>     void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>>>                                     struct drm_gpu_scheduler **sched_list,
> >>>>>                                        unsigned int num_sched_list);
> >>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>>>                                    enum drm_sched_priority priority);
> >>>>>     bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>>>
> >>>>> -struct drm_sched_fence *drm_sched_fence_create(
> >>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>>>         struct drm_sched_entity *s_entity, void *owner);
> >>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>> +                       struct drm_sched_entity *entity);
> >>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>>>> +
> >>>>>     void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>>>     void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>>>
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-07 16:32               ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-07 16:32 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> > On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> >> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> >>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> >>> <christian.koenig@amd.com> wrote:
> >>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>>>> This is a very confusingly named function, because not just does it
> >>>>> init an object, it arms it and provides a point of no return for
> >>>>> pushing a job into the scheduler. It would be nice if that's a bit
> >>>>> clearer in the interface.
> >>>>>
> >>>>> But the real reason is that I want to push the dependency tracking
> >>>>> helpers into the scheduler code, and that means drm_sched_job_init
> >>>>> must be called a lot earlier, without arming the job.
> >>>>>
> >>>>> v2:
> >>>>> - don't change .gitignore (Steven)
> >>>>> - don't forget v3d (Emma)
> >>>>>
> >>>>> v3: Emma noticed that I leak the memory allocated in
> >>>>> drm_sched_job_init if we bail out before the point of no return in
> >>>>> subsequent driver patches. To be able to fix this change
> >>>>> drm_sched_job_cleanup() so it can handle being called both before and
> >>>>> after drm_sched_job_arm().
> >>>> Thinking more about this, I'm not sure if this really works.
> >>>>
> >>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >>>> to update the entity->rq association.
> >>>>
> >>>> And that can only be done later on when we arm the fence as well.
> >>> Hm yeah, but that's a bug in the existing code I think: We already
> >>> fail to clean up if we fail to allocate the fences. So I think the
> >>> right thing to do here is to split the checks into job_init, and do
> >>> the actual arming/rq selection in job_arm? I'm not entirely sure
> >>> what's all going on there, the first check looks a bit like trying to
> >>> schedule before the entity is set up, which is a driver bug and should
> >>> have a WARN_ON?
> >> No you misunderstood me, the problem is something else.
> >>
> >> You asked previously why the call to drm_sched_job_init() was so late in
> >> the CS.
> >>
> >> The reason for this was not alone the scheduler fence init, but also the
> >> call to drm_sched_entity_select_rq().
> > Ah ok, I think I can fix that. Needs a prep patch to first make
> > drm_sched_entity_select infallible, then should be easy to do.
> >
> >>> The 2nd check around last_scheduled I have honeslty no idea what it's
> >>> even trying to do.
> >> You mean that here?
> >>
> >>           fence = READ_ONCE(entity->last_scheduled);
> >>           if (fence && !dma_fence_is_signaled(fence))
> >>                   return;
> >>
> >> This makes sure that load balancing is not moving the entity to a
> >> different scheduler while there are still jobs running from this entity
> >> on the hardware,
> > Yeah after a nap that idea crossed my mind too. But now I have locking
> > questions, afaiui the scheduler thread updates this, without taking
> > any locks - entity dequeuing is lockless. And here we read the fence
> > and then seem to yolo check whether it's signalled? What's preventing
> > a use-after-free here? There's no rcu or anything going on here at
> > all, and it's outside of the spinlock section, which starts a bit
> > further down.
>
> The last_scheduled fence of an entity can only change when there are
> jobs on the entities queued, and we have just ruled that out in the
> check before.

There aren't any barriers, so the cpu could easily run the two checks
the other way round. I'll ponder this and figure out where exactly we
need docs for the constraint and/or barriers to make this work as
intended. As-is I'm not seeing how it does ...
-Daniel

> Christian.
>
>
> > -Daniel
> >
> >> Regards
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>> Christian.
> >>>>
> >>>>> Also improve the kerneldoc for this.
> >>>>>
> >>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>>>> Cc: Qiang Yu <yuq825@gmail.com>
> >>>>> Cc: Rob Herring <robh@kernel.org>
> >>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>>>> Cc: Steven Price <steven.price@arm.com>
> >>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>>>> Cc: David Airlie <airlied@linux.ie>
> >>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>>>> Cc: Kees Cook <keescook@chromium.org>
> >>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>>>> Cc: Nick Terrell <terrelln@fb.com>
> >>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>> Cc: Dave Airlie <airlied@redhat.com>
> >>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>>>> Cc: Lee Jones <lee.jones@linaro.org>
> >>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>>>> Cc: Chen Li <chenli@uniontech.com>
> >>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>> Cc: etnaviv@lists.freedesktop.org
> >>>>> Cc: lima@lists.freedesktop.org
> >>>>> Cc: linux-media@vger.kernel.org
> >>>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>>> Cc: Emma Anholt <emma@anholt.net>
> >>>>> ---
> >>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>>>     drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>>>     drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>>>     drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>>>     drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>>>     drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>>>     drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>>>     include/drm/gpu_scheduler.h              |  7 +++-
> >>>>>     10 files changed, 74 insertions(+), 14 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>> index c5386d13eb4a..a4ec092af9a7 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>>>         if (r)
> >>>>>                 goto error_unlock;
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         /* No memory allocation is allowed while holding the notifier lock.
> >>>>>          * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>>>          * added to BOs.
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> index d33e6d97cc89..5ddb955d2315 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>>>         if (r)
> >>>>>                 return r;
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         *f = dma_fence_get(&job->base.s_fence->finished);
> >>>>>         amdgpu_job_free_resources(job);
> >>>>>         drm_sched_entity_push_job(&job->base, entity);
> >>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>> index feb6da1b6ceb..05f412204118 100644
> >>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>>>         if (ret)
> >>>>>                 goto out_unlock;
> >>>>>
> >>>>> +     drm_sched_job_arm(&submit->sched_job);
> >>>>> +
> >>>>>         submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>>>         submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>>>                                                 submit->out_fence, 0,
> >>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>>>> index dba8329937a3..38f755580507 100644
> >>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>>>                 return err;
> >>>>>         }
> >>>>>
> >>>>> +     drm_sched_job_arm(&task->base);
> >>>>> +
> >>>>>         task->num_bos = num_bos;
> >>>>>         task->vm = lima_vm_get(vm);
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>> index 71a72fb50e6b..2992dc85325f 100644
> >>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>>>                 goto unlock;
> >>>>>         }
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>
> >>>>>         ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>> index 79554aa4dbb1..f7347c284886 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>>>      * @sched_job: job to submit
> >>>>>      * @entity: scheduler entity
> >>>>>      *
> >>>>> - * Note: To guarantee that the order of insertion to queue matches
> >>>>> - * the job's fence sequence number this function should be
> >>>>> - * called with drm_sched_job_init under common lock.
> >>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>>>> + * under common lock.
> >>>>>      *
> >>>>>      * Returns 0 for success, negative error code otherwise.
> >>>>>      */
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>> index 69de2c76731f..c451ee9a30d7 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>>>      *
> >>>>>      * Free up the fence memory after the RCU grace period.
> >>>>>      */
> >>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>     {
> >>>>>         struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>>>         struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>>>     }
> >>>>>     EXPORT_SYMBOL(to_drm_sched_fence);
> >>>>>
> >>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>> -                                            void *owner)
> >>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>>>> +                                           void *owner)
> >>>>>     {
> >>>>>         struct drm_sched_fence *fence = NULL;
> >>>>> -     unsigned seq;
> >>>>>
> >>>>>         fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>>>         if (fence == NULL)
> >>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>         fence->sched = entity->rq->sched;
> >>>>>         spin_lock_init(&fence->lock);
> >>>>>
> >>>>> +     return fence;
> >>>>> +}
> >>>>> +
> >>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>> +                       struct drm_sched_entity *entity)
> >>>>> +{
> >>>>> +     unsigned seq;
> >>>>> +
> >>>>>         seq = atomic_inc_return(&entity->fence_seq);
> >>>>>         dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>>>                        &fence->lock, entity->fence_context, seq);
> >>>>>         dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>>>                        &fence->lock, entity->fence_context + 1, seq);
> >>>>> -
> >>>>> -     return fence;
> >>>>>     }
> >>>>>
> >>>>>     module_init(drm_sched_fence_slab_init);
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> index 33c414d55fab..5e84e1500c32 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> @@ -48,9 +48,11 @@
> >>>>>     #include <linux/wait.h>
> >>>>>     #include <linux/sched.h>
> >>>>>     #include <linux/completion.h>
> >>>>> +#include <linux/dma-resv.h>
> >>>>>     #include <uapi/linux/sched/types.h>
> >>>>>
> >>>>>     #include <drm/drm_print.h>
> >>>>> +#include <drm/drm_gem.h>
> >>>>>     #include <drm/gpu_scheduler.h>
> >>>>>     #include <drm/spsc_queue.h>
> >>>>>
> >>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>
> >>>>>     /**
> >>>>>      * drm_sched_job_init - init a scheduler job
> >>>>> - *
> >>>>>      * @job: scheduler job to init
> >>>>>      * @entity: scheduler entity to use
> >>>>>      * @owner: job owner for debugging
> >>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>      * Refer to drm_sched_entity_push_job() documentation
> >>>>>      * for locking considerations.
> >>>>>      *
> >>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>>>> + *
> >>>>>      * Returns 0 for success, negative error code otherwise.
> >>>>>      */
> >>>>>     int drm_sched_job_init(struct drm_sched_job *job,
> >>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>         job->sched = sched;
> >>>>>         job->entity = entity;
> >>>>>         job->s_priority = entity->rq - sched->sched_rq;
> >>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>>>         if (!job->s_fence)
> >>>>>                 return -ENOMEM;
> >>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
> >>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>     EXPORT_SYMBOL(drm_sched_job_init);
> >>>>>
> >>>>>     /**
> >>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>>>> + * @job: scheduler job to arm
> >>>>> + *
> >>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>>>> + * or other places that need to track the completion of this job.
> >>>>> + *
> >>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>>>> + * considerations.
> >>>>>      *
> >>>>> + * This can only be called if drm_sched_job_init() succeeded.
> >>>>> + */
> >>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>>>> +{
> >>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>>>> +
> >>>>> +/**
> >>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>      * @job: scheduler job to clean up
> >>>>> + *
> >>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>>>> + *
> >>>>> + * Drivers should call this from their error unwind code if @job is aborted
> >>>>> + * before drm_sched_job_arm() is called.
> >>>>> + *
> >>>>> + * After that point of no return @job is committed to be executed by the
> >>>>> + * scheduler, and this function should be called from the
> >>>>> + * &drm_sched_backend_ops.free_job callback.
> >>>>>      */
> >>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>>>     {
> >>>>> -     dma_fence_put(&job->s_fence->finished);
> >>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>>>> +             /* drm_sched_job_arm() has been called */
> >>>>> +             dma_fence_put(&job->s_fence->finished);
> >>>>> +     } else {
> >>>>> +             /* aborted job before committing to run it */
> >>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>>>> +     }
> >>>>> +
> >>>>>         job->s_fence = NULL;
> >>>>>     }
> >>>>>     EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>> index 4eb354226972..5c3a99027ecd 100644
> >>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>>>         if (ret)
> >>>>>                 return ret;
> >>>>>
> >>>>> +     drm_sched_job_arm(&job->base);
> >>>>> +
> >>>>>         job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>
> >>>>>         /* put by scheduler job completion */
> >>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>>>> --- a/include/drm/gpu_scheduler.h
> >>>>> +++ b/include/drm/gpu_scheduler.h
> >>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>>>     int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>                        struct drm_sched_entity *entity,
> >>>>>                        void *owner);
> >>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>>>     void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>>>                                     struct drm_gpu_scheduler **sched_list,
> >>>>>                                        unsigned int num_sched_list);
> >>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>>>                                    enum drm_sched_priority priority);
> >>>>>     bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>>>
> >>>>> -struct drm_sched_fence *drm_sched_fence_create(
> >>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>>>         struct drm_sched_entity *s_entity, void *owner);
> >>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>> +                       struct drm_sched_entity *entity);
> >>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>>>> +
> >>>>>     void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>>>     void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>>>
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-07 16:32               ` Daniel Vetter
@ 2021-07-08  6:56                 ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08  6:56 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>>>> This is a very confusingly named function, because not just does it
>>>>>>> init an object, it arms it and provides a point of no return for
>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>>>> clearer in the interface.
>>>>>>>
>>>>>>> But the real reason is that I want to push the dependency tracking
>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>>>> must be called a lot earlier, without arming the job.
>>>>>>>
>>>>>>> v2:
>>>>>>> - don't change .gitignore (Steven)
>>>>>>> - don't forget v3d (Emma)
>>>>>>>
>>>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>>>> subsequent driver patches. To be able to fix this change
>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>>>> after drm_sched_job_arm().
>>>>>> Thinking more about this, I'm not sure if this really works.
>>>>>>
>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>>>> to update the entity->rq association.
>>>>>>
>>>>>> And that can only be done later on when we arm the fence as well.
>>>>> Hm yeah, but that's a bug in the existing code I think: We already
>>>>> fail to clean up if we fail to allocate the fences. So I think the
>>>>> right thing to do here is to split the checks into job_init, and do
>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>>>> what's all going on there, the first check looks a bit like trying to
>>>>> schedule before the entity is set up, which is a driver bug and should
>>>>> have a WARN_ON?
>>>> No you misunderstood me, the problem is something else.
>>>>
>>>> You asked previously why the call to drm_sched_job_init() was so late in
>>>> the CS.
>>>>
>>>> The reason for this was not alone the scheduler fence init, but also the
>>>> call to drm_sched_entity_select_rq().
>>> Ah ok, I think I can fix that. Needs a prep patch to first make
>>> drm_sched_entity_select infallible, then should be easy to do.
>>>
>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>>>> even trying to do.
>>>> You mean that here?
>>>>
>>>>            fence = READ_ONCE(entity->last_scheduled);
>>>>            if (fence && !dma_fence_is_signaled(fence))
>>>>                    return;
>>>>
>>>> This makes sure that load balancing is not moving the entity to a
>>>> different scheduler while there are still jobs running from this entity
>>>> on the hardware,
>>> Yeah after a nap that idea crossed my mind too. But now I have locking
>>> questions, afaiui the scheduler thread updates this, without taking
>>> any locks - entity dequeuing is lockless. And here we read the fence
>>> and then seem to yolo check whether it's signalled? What's preventing
>>> a use-after-free here? There's no rcu or anything going on here at
>>> all, and it's outside of the spinlock section, which starts a bit
>>> further down.
>> The last_scheduled fence of an entity can only change when there are
>> jobs on the entities queued, and we have just ruled that out in the
>> check before.
> There aren't any barriers, so the cpu could easily run the two checks
> the other way round. I'll ponder this and figure out where exactly we
> need docs for the constraint and/or barriers to make this work as
> intended. As-is I'm not seeing how it does ...

spsc_queue_count() provides the necessary barrier with the atomic_read().

But yes a comment would be really nice here. I had to think for a while 
why we don't need this as well.

Christian.

> -Daniel
>
>> Christian.
>>
>>
>>> -Daniel
>>>
>>>> Regards
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Also improve the kerneldoc for this.
>>>>>>>
>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>      drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>      drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>      drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>      drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>      drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>      include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>      10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>          if (r)
>>>>>>>                  goto error_unlock;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>           * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>           * added to BOs.
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>          if (r)
>>>>>>>                  return r;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>          amdgpu_job_free_resources(job);
>>>>>>>          drm_sched_entity_push_job(&job->base, entity);
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>          if (ret)
>>>>>>>                  goto out_unlock;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>> +
>>>>>>>          submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>          submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>                                                  submit->out_fence, 0,
>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>                  return err;
>>>>>>>          }
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>> +
>>>>>>>          task->num_bos = num_bos;
>>>>>>>          task->vm = lima_vm_get(vm);
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>                  goto unlock;
>>>>>>>          }
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>
>>>>>>>          ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>       * @sched_job: job to submit
>>>>>>>       * @entity: scheduler entity
>>>>>>>       *
>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>> - * the job's fence sequence number this function should be
>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>> + * under common lock.
>>>>>>>       *
>>>>>>>       * Returns 0 for success, negative error code otherwise.
>>>>>>>       */
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>       *
>>>>>>>       * Free up the fence memory after the RCU grace period.
>>>>>>>       */
>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>      {
>>>>>>>          struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>          struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>      }
>>>>>>>      EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>
>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>> -                                            void *owner)
>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>> +                                           void *owner)
>>>>>>>      {
>>>>>>>          struct drm_sched_fence *fence = NULL;
>>>>>>> -     unsigned seq;
>>>>>>>
>>>>>>>          fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>          if (fence == NULL)
>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>          fence->sched = entity->rq->sched;
>>>>>>>          spin_lock_init(&fence->lock);
>>>>>>>
>>>>>>> +     return fence;
>>>>>>> +}
>>>>>>> +
>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>> +{
>>>>>>> +     unsigned seq;
>>>>>>> +
>>>>>>>          seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>          dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>                         &fence->lock, entity->fence_context, seq);
>>>>>>>          dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>                         &fence->lock, entity->fence_context + 1, seq);
>>>>>>> -
>>>>>>> -     return fence;
>>>>>>>      }
>>>>>>>
>>>>>>>      module_init(drm_sched_fence_slab_init);
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>      #include <linux/wait.h>
>>>>>>>      #include <linux/sched.h>
>>>>>>>      #include <linux/completion.h>
>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>      #include <uapi/linux/sched/types.h>
>>>>>>>
>>>>>>>      #include <drm/drm_print.h>
>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>      #include <drm/gpu_scheduler.h>
>>>>>>>      #include <drm/spsc_queue.h>
>>>>>>>
>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>
>>>>>>>      /**
>>>>>>>       * drm_sched_job_init - init a scheduler job
>>>>>>> - *
>>>>>>>       * @job: scheduler job to init
>>>>>>>       * @entity: scheduler entity to use
>>>>>>>       * @owner: job owner for debugging
>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>       * Refer to drm_sched_entity_push_job() documentation
>>>>>>>       * for locking considerations.
>>>>>>>       *
>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>> + *
>>>>>>>       * Returns 0 for success, negative error code otherwise.
>>>>>>>       */
>>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>          job->sched = sched;
>>>>>>>          job->entity = entity;
>>>>>>>          job->s_priority = entity->rq - sched->sched_rq;
>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>          if (!job->s_fence)
>>>>>>>                  return -ENOMEM;
>>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>      EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>
>>>>>>>      /**
>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>> + * @job: scheduler job to arm
>>>>>>> + *
>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>> + *
>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>> + * considerations.
>>>>>>>       *
>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>> + */
>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>> +{
>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>       * @job: scheduler job to clean up
>>>>>>> + *
>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>> + *
>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>> + *
>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>> + * scheduler, and this function should be called from the
>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>       */
>>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>      {
>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>> +     } else {
>>>>>>> +             /* aborted job before committing to run it */
>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>> +     }
>>>>>>> +
>>>>>>>          job->s_fence = NULL;
>>>>>>>      }
>>>>>>>      EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>          if (ret)
>>>>>>>                  return ret;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>
>>>>>>>          /* put by scheduler job completion */
>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>                         struct drm_sched_entity *entity,
>>>>>>>                         void *owner);
>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>      void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>                                      struct drm_gpu_scheduler **sched_list,
>>>>>>>                                         unsigned int num_sched_list);
>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>                                     enum drm_sched_priority priority);
>>>>>>>      bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>
>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>          struct drm_sched_entity *s_entity, void *owner);
>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>> +
>>>>>>>      void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>      void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08  6:56                 ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08  6:56 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>>>> This is a very confusingly named function, because not just does it
>>>>>>> init an object, it arms it and provides a point of no return for
>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>>>> clearer in the interface.
>>>>>>>
>>>>>>> But the real reason is that I want to push the dependency tracking
>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>>>> must be called a lot earlier, without arming the job.
>>>>>>>
>>>>>>> v2:
>>>>>>> - don't change .gitignore (Steven)
>>>>>>> - don't forget v3d (Emma)
>>>>>>>
>>>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>>>> subsequent driver patches. To be able to fix this change
>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>>>> after drm_sched_job_arm().
>>>>>> Thinking more about this, I'm not sure if this really works.
>>>>>>
>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>>>> to update the entity->rq association.
>>>>>>
>>>>>> And that can only be done later on when we arm the fence as well.
>>>>> Hm yeah, but that's a bug in the existing code I think: We already
>>>>> fail to clean up if we fail to allocate the fences. So I think the
>>>>> right thing to do here is to split the checks into job_init, and do
>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>>>> what's all going on there, the first check looks a bit like trying to
>>>>> schedule before the entity is set up, which is a driver bug and should
>>>>> have a WARN_ON?
>>>> No you misunderstood me, the problem is something else.
>>>>
>>>> You asked previously why the call to drm_sched_job_init() was so late in
>>>> the CS.
>>>>
>>>> The reason for this was not alone the scheduler fence init, but also the
>>>> call to drm_sched_entity_select_rq().
>>> Ah ok, I think I can fix that. Needs a prep patch to first make
>>> drm_sched_entity_select infallible, then should be easy to do.
>>>
>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>>>> even trying to do.
>>>> You mean that here?
>>>>
>>>>            fence = READ_ONCE(entity->last_scheduled);
>>>>            if (fence && !dma_fence_is_signaled(fence))
>>>>                    return;
>>>>
>>>> This makes sure that load balancing is not moving the entity to a
>>>> different scheduler while there are still jobs running from this entity
>>>> on the hardware,
>>> Yeah after a nap that idea crossed my mind too. But now I have locking
>>> questions, afaiui the scheduler thread updates this, without taking
>>> any locks - entity dequeuing is lockless. And here we read the fence
>>> and then seem to yolo check whether it's signalled? What's preventing
>>> a use-after-free here? There's no rcu or anything going on here at
>>> all, and it's outside of the spinlock section, which starts a bit
>>> further down.
>> The last_scheduled fence of an entity can only change when there are
>> jobs on the entities queued, and we have just ruled that out in the
>> check before.
> There aren't any barriers, so the cpu could easily run the two checks
> the other way round. I'll ponder this and figure out where exactly we
> need docs for the constraint and/or barriers to make this work as
> intended. As-is I'm not seeing how it does ...

spsc_queue_count() provides the necessary barrier with the atomic_read().

But yes a comment would be really nice here. I had to think for a while 
why we don't need this as well.

Christian.

> -Daniel
>
>> Christian.
>>
>>
>>> -Daniel
>>>
>>>> Regards
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Also improve the kerneldoc for this.
>>>>>>>
>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>      drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>      drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>      drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>      drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>      drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>      include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>      10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>          if (r)
>>>>>>>                  goto error_unlock;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>           * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>           * added to BOs.
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>          if (r)
>>>>>>>                  return r;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>          amdgpu_job_free_resources(job);
>>>>>>>          drm_sched_entity_push_job(&job->base, entity);
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>          if (ret)
>>>>>>>                  goto out_unlock;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>> +
>>>>>>>          submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>          submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>                                                  submit->out_fence, 0,
>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>                  return err;
>>>>>>>          }
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>> +
>>>>>>>          task->num_bos = num_bos;
>>>>>>>          task->vm = lima_vm_get(vm);
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>                  goto unlock;
>>>>>>>          }
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>
>>>>>>>          ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>       * @sched_job: job to submit
>>>>>>>       * @entity: scheduler entity
>>>>>>>       *
>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>> - * the job's fence sequence number this function should be
>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>> + * under common lock.
>>>>>>>       *
>>>>>>>       * Returns 0 for success, negative error code otherwise.
>>>>>>>       */
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>       *
>>>>>>>       * Free up the fence memory after the RCU grace period.
>>>>>>>       */
>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>      {
>>>>>>>          struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>          struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>      }
>>>>>>>      EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>
>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>> -                                            void *owner)
>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>> +                                           void *owner)
>>>>>>>      {
>>>>>>>          struct drm_sched_fence *fence = NULL;
>>>>>>> -     unsigned seq;
>>>>>>>
>>>>>>>          fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>          if (fence == NULL)
>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>          fence->sched = entity->rq->sched;
>>>>>>>          spin_lock_init(&fence->lock);
>>>>>>>
>>>>>>> +     return fence;
>>>>>>> +}
>>>>>>> +
>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>> +{
>>>>>>> +     unsigned seq;
>>>>>>> +
>>>>>>>          seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>          dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>                         &fence->lock, entity->fence_context, seq);
>>>>>>>          dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>                         &fence->lock, entity->fence_context + 1, seq);
>>>>>>> -
>>>>>>> -     return fence;
>>>>>>>      }
>>>>>>>
>>>>>>>      module_init(drm_sched_fence_slab_init);
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>      #include <linux/wait.h>
>>>>>>>      #include <linux/sched.h>
>>>>>>>      #include <linux/completion.h>
>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>      #include <uapi/linux/sched/types.h>
>>>>>>>
>>>>>>>      #include <drm/drm_print.h>
>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>      #include <drm/gpu_scheduler.h>
>>>>>>>      #include <drm/spsc_queue.h>
>>>>>>>
>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>
>>>>>>>      /**
>>>>>>>       * drm_sched_job_init - init a scheduler job
>>>>>>> - *
>>>>>>>       * @job: scheduler job to init
>>>>>>>       * @entity: scheduler entity to use
>>>>>>>       * @owner: job owner for debugging
>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>       * Refer to drm_sched_entity_push_job() documentation
>>>>>>>       * for locking considerations.
>>>>>>>       *
>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>> + *
>>>>>>>       * Returns 0 for success, negative error code otherwise.
>>>>>>>       */
>>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>          job->sched = sched;
>>>>>>>          job->entity = entity;
>>>>>>>          job->s_priority = entity->rq - sched->sched_rq;
>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>          if (!job->s_fence)
>>>>>>>                  return -ENOMEM;
>>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>      EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>
>>>>>>>      /**
>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>> + * @job: scheduler job to arm
>>>>>>> + *
>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>> + *
>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>> + * considerations.
>>>>>>>       *
>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>> + */
>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>> +{
>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>       * @job: scheduler job to clean up
>>>>>>> + *
>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>> + *
>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>> + *
>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>> + * scheduler, and this function should be called from the
>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>       */
>>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>      {
>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>> +     } else {
>>>>>>> +             /* aborted job before committing to run it */
>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>> +     }
>>>>>>> +
>>>>>>>          job->s_fence = NULL;
>>>>>>>      }
>>>>>>>      EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>          if (ret)
>>>>>>>                  return ret;
>>>>>>>
>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>> +
>>>>>>>          job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>
>>>>>>>          /* put by scheduler job completion */
>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>                         struct drm_sched_entity *entity,
>>>>>>>                         void *owner);
>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>      void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>                                      struct drm_gpu_scheduler **sched_list,
>>>>>>>                                         unsigned int num_sched_list);
>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>                                     enum drm_sched_priority priority);
>>>>>>>      bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>
>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>          struct drm_sched_entity *s_entity, void *owner);
>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>> +
>>>>>>>      void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>      void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08  6:56                 ` Christian König
@ 2021-07-08  7:09                   ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08  7:09 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> > On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> >> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> >>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> >>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> >>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> >>>>> <christian.koenig@amd.com> wrote:
> >>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>>>>>> This is a very confusingly named function, because not just does it
> >>>>>>> init an object, it arms it and provides a point of no return for
> >>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
> >>>>>>> clearer in the interface.
> >>>>>>>
> >>>>>>> But the real reason is that I want to push the dependency tracking
> >>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
> >>>>>>> must be called a lot earlier, without arming the job.
> >>>>>>>
> >>>>>>> v2:
> >>>>>>> - don't change .gitignore (Steven)
> >>>>>>> - don't forget v3d (Emma)
> >>>>>>>
> >>>>>>> v3: Emma noticed that I leak the memory allocated in
> >>>>>>> drm_sched_job_init if we bail out before the point of no return in
> >>>>>>> subsequent driver patches. To be able to fix this change
> >>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
> >>>>>>> after drm_sched_job_arm().
> >>>>>> Thinking more about this, I'm not sure if this really works.
> >>>>>>
> >>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >>>>>> to update the entity->rq association.
> >>>>>>
> >>>>>> And that can only be done later on when we arm the fence as well.
> >>>>> Hm yeah, but that's a bug in the existing code I think: We already
> >>>>> fail to clean up if we fail to allocate the fences. So I think the
> >>>>> right thing to do here is to split the checks into job_init, and do
> >>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
> >>>>> what's all going on there, the first check looks a bit like trying to
> >>>>> schedule before the entity is set up, which is a driver bug and should
> >>>>> have a WARN_ON?
> >>>> No you misunderstood me, the problem is something else.
> >>>>
> >>>> You asked previously why the call to drm_sched_job_init() was so late in
> >>>> the CS.
> >>>>
> >>>> The reason for this was not alone the scheduler fence init, but also the
> >>>> call to drm_sched_entity_select_rq().
> >>> Ah ok, I think I can fix that. Needs a prep patch to first make
> >>> drm_sched_entity_select infallible, then should be easy to do.
> >>>
> >>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
> >>>>> even trying to do.
> >>>> You mean that here?
> >>>>
> >>>>            fence = READ_ONCE(entity->last_scheduled);
> >>>>            if (fence && !dma_fence_is_signaled(fence))
> >>>>                    return;
> >>>>
> >>>> This makes sure that load balancing is not moving the entity to a
> >>>> different scheduler while there are still jobs running from this entity
> >>>> on the hardware,
> >>> Yeah after a nap that idea crossed my mind too. But now I have locking
> >>> questions, afaiui the scheduler thread updates this, without taking
> >>> any locks - entity dequeuing is lockless. And here we read the fence
> >>> and then seem to yolo check whether it's signalled? What's preventing
> >>> a use-after-free here? There's no rcu or anything going on here at
> >>> all, and it's outside of the spinlock section, which starts a bit
> >>> further down.
> >> The last_scheduled fence of an entity can only change when there are
> >> jobs on the entities queued, and we have just ruled that out in the
> >> check before.
> > There aren't any barriers, so the cpu could easily run the two checks
> > the other way round. I'll ponder this and figure out where exactly we
> > need docs for the constraint and/or barriers to make this work as
> > intended. As-is I'm not seeing how it does ...
>
> spsc_queue_count() provides the necessary barrier with the atomic_read().

atomic_t is fully unordered, except when it's a read-modify-write
atomic op, then it's a full barrier. So yeah you need more here. But
also since you only need a read barrier on one side, and a write
barrier on the other, you don't actually need a cpu barriers on x86.
And READ_ONCE gives you the compiler barrier on one side at least, I
haven't found it on the writer side yet.

> But yes a comment would be really nice here. I had to think for a while
> why we don't need this as well.

I'm typing a patch, which after a night's sleep I realized has the
wrong barriers. And now I'm also typing some doc improvements for
drm_sched_entity and related functions.

>
> Christian.
>
> > -Daniel
> >
> >> Christian.
> >>
> >>
> >>> -Daniel
> >>>
> >>>> Regards
> >>>> Christian.
> >>>>
> >>>>> -Daniel
> >>>>>
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Also improve the kerneldoc for this.
> >>>>>>>
> >>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
> >>>>>>> Cc: Rob Herring <robh@kernel.org>
> >>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>>>>>> Cc: Steven Price <steven.price@arm.com>
> >>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>>>>>> Cc: David Airlie <airlied@linux.ie>
> >>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>>>>>> Cc: Kees Cook <keescook@chromium.org>
> >>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>>>>>> Cc: Nick Terrell <terrelln@fb.com>
> >>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>> Cc: Dave Airlie <airlied@redhat.com>
> >>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
> >>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>>>>>> Cc: Chen Li <chenli@uniontech.com>
> >>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>>>> Cc: etnaviv@lists.freedesktop.org
> >>>>>>> Cc: lima@lists.freedesktop.org
> >>>>>>> Cc: linux-media@vger.kernel.org
> >>>>>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>>>>> Cc: Emma Anholt <emma@anholt.net>
> >>>>>>> ---
> >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>>>>>      drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>>>>>      drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>>>>>      drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>>>>>      drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>>>>>      drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>>>>>      include/drm/gpu_scheduler.h              |  7 +++-
> >>>>>>>      10 files changed, 74 insertions(+), 14 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
> >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>>>>>          if (r)
> >>>>>>>                  goto error_unlock;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          /* No memory allocation is allowed while holding the notifier lock.
> >>>>>>>           * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>>>>>           * added to BOs.
> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>> index d33e6d97cc89..5ddb955d2315 100644
> >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>>>>>          if (r)
> >>>>>>>                  return r;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          *f = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>          amdgpu_job_free_resources(job);
> >>>>>>>          drm_sched_entity_push_job(&job->base, entity);
> >>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>> index feb6da1b6ceb..05f412204118 100644
> >>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>>>>>          if (ret)
> >>>>>>>                  goto out_unlock;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&submit->sched_job);
> >>>>>>> +
> >>>>>>>          submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>>>>>          submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>>>>>                                                  submit->out_fence, 0,
> >>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>> index dba8329937a3..38f755580507 100644
> >>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>>>>>                  return err;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&task->base);
> >>>>>>> +
> >>>>>>>          task->num_bos = num_bos;
> >>>>>>>          task->vm = lima_vm_get(vm);
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>> index 71a72fb50e6b..2992dc85325f 100644
> >>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>>>>>                  goto unlock;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>
> >>>>>>>          ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>> index 79554aa4dbb1..f7347c284886 100644
> >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>>>>>       * @sched_job: job to submit
> >>>>>>>       * @entity: scheduler entity
> >>>>>>>       *
> >>>>>>> - * Note: To guarantee that the order of insertion to queue matches
> >>>>>>> - * the job's fence sequence number this function should be
> >>>>>>> - * called with drm_sched_job_init under common lock.
> >>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>>>>>> + * under common lock.
> >>>>>>>       *
> >>>>>>>       * Returns 0 for success, negative error code otherwise.
> >>>>>>>       */
> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>> index 69de2c76731f..c451ee9a30d7 100644
> >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>>>>>       *
> >>>>>>>       * Free up the fence memory after the RCU grace period.
> >>>>>>>       */
> >>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>>      {
> >>>>>>>          struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>>>>>          struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>>>>>      }
> >>>>>>>      EXPORT_SYMBOL(to_drm_sched_fence);
> >>>>>>>
> >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>> -                                            void *owner)
> >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>>>>>> +                                           void *owner)
> >>>>>>>      {
> >>>>>>>          struct drm_sched_fence *fence = NULL;
> >>>>>>> -     unsigned seq;
> >>>>>>>
> >>>>>>>          fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>>>>>          if (fence == NULL)
> >>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>>          fence->sched = entity->rq->sched;
> >>>>>>>          spin_lock_init(&fence->lock);
> >>>>>>>
> >>>>>>> +     return fence;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>> +                       struct drm_sched_entity *entity)
> >>>>>>> +{
> >>>>>>> +     unsigned seq;
> >>>>>>> +
> >>>>>>>          seq = atomic_inc_return(&entity->fence_seq);
> >>>>>>>          dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>>>>>                         &fence->lock, entity->fence_context, seq);
> >>>>>>>          dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>>>>>                         &fence->lock, entity->fence_context + 1, seq);
> >>>>>>> -
> >>>>>>> -     return fence;
> >>>>>>>      }
> >>>>>>>
> >>>>>>>      module_init(drm_sched_fence_slab_init);
> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>> index 33c414d55fab..5e84e1500c32 100644
> >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>> @@ -48,9 +48,11 @@
> >>>>>>>      #include <linux/wait.h>
> >>>>>>>      #include <linux/sched.h>
> >>>>>>>      #include <linux/completion.h>
> >>>>>>> +#include <linux/dma-resv.h>
> >>>>>>>      #include <uapi/linux/sched/types.h>
> >>>>>>>
> >>>>>>>      #include <drm/drm_print.h>
> >>>>>>> +#include <drm/drm_gem.h>
> >>>>>>>      #include <drm/gpu_scheduler.h>
> >>>>>>>      #include <drm/spsc_queue.h>
> >>>>>>>
> >>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>
> >>>>>>>      /**
> >>>>>>>       * drm_sched_job_init - init a scheduler job
> >>>>>>> - *
> >>>>>>>       * @job: scheduler job to init
> >>>>>>>       * @entity: scheduler entity to use
> >>>>>>>       * @owner: job owner for debugging
> >>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>       * Refer to drm_sched_entity_push_job() documentation
> >>>>>>>       * for locking considerations.
> >>>>>>>       *
> >>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>>>>>> + *
> >>>>>>>       * Returns 0 for success, negative error code otherwise.
> >>>>>>>       */
> >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>          job->sched = sched;
> >>>>>>>          job->entity = entity;
> >>>>>>>          job->s_priority = entity->rq - sched->sched_rq;
> >>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>>>>>          if (!job->s_fence)
> >>>>>>>                  return -ENOMEM;
> >>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
> >>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>      EXPORT_SYMBOL(drm_sched_job_init);
> >>>>>>>
> >>>>>>>      /**
> >>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>>>>>> + * @job: scheduler job to arm
> >>>>>>> + *
> >>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>>>>>> + * or other places that need to track the completion of this job.
> >>>>>>> + *
> >>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>>>>>> + * considerations.
> >>>>>>>       *
> >>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
> >>>>>>> + */
> >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>>>>>> +{
> >>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>>>>>> +}
> >>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>>       * @job: scheduler job to clean up
> >>>>>>> + *
> >>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>>>>>> + *
> >>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
> >>>>>>> + * before drm_sched_job_arm() is called.
> >>>>>>> + *
> >>>>>>> + * After that point of no return @job is committed to be executed by the
> >>>>>>> + * scheduler, and this function should be called from the
> >>>>>>> + * &drm_sched_backend_ops.free_job callback.
> >>>>>>>       */
> >>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>>>>>      {
> >>>>>>> -     dma_fence_put(&job->s_fence->finished);
> >>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>>>>>> +             /* drm_sched_job_arm() has been called */
> >>>>>>> +             dma_fence_put(&job->s_fence->finished);
> >>>>>>> +     } else {
> >>>>>>> +             /* aborted job before committing to run it */
> >>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>>          job->s_fence = NULL;
> >>>>>>>      }
> >>>>>>>      EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>> index 4eb354226972..5c3a99027ecd 100644
> >>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>>>>>          if (ret)
> >>>>>>>                  return ret;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>
> >>>>>>>          /* put by scheduler job completion */
> >>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>>>>>> --- a/include/drm/gpu_scheduler.h
> >>>>>>> +++ b/include/drm/gpu_scheduler.h
> >>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>                         struct drm_sched_entity *entity,
> >>>>>>>                         void *owner);
> >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>>>>>      void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>>>>>                                      struct drm_gpu_scheduler **sched_list,
> >>>>>>>                                         unsigned int num_sched_list);
> >>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>>>>>                                     enum drm_sched_priority priority);
> >>>>>>>      bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>>>>>
> >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
> >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>>>>>          struct drm_sched_entity *s_entity, void *owner);
> >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>> +                       struct drm_sched_entity *entity);
> >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>>>>>> +
> >>>>>>>      void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>>>>>      void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>>>>>
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08  7:09                   ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08  7:09 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> > On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> >> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> >>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> >>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> >>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> >>>>> <christian.koenig@amd.com> wrote:
> >>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>>>>>> This is a very confusingly named function, because not just does it
> >>>>>>> init an object, it arms it and provides a point of no return for
> >>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
> >>>>>>> clearer in the interface.
> >>>>>>>
> >>>>>>> But the real reason is that I want to push the dependency tracking
> >>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
> >>>>>>> must be called a lot earlier, without arming the job.
> >>>>>>>
> >>>>>>> v2:
> >>>>>>> - don't change .gitignore (Steven)
> >>>>>>> - don't forget v3d (Emma)
> >>>>>>>
> >>>>>>> v3: Emma noticed that I leak the memory allocated in
> >>>>>>> drm_sched_job_init if we bail out before the point of no return in
> >>>>>>> subsequent driver patches. To be able to fix this change
> >>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
> >>>>>>> after drm_sched_job_arm().
> >>>>>> Thinking more about this, I'm not sure if this really works.
> >>>>>>
> >>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >>>>>> to update the entity->rq association.
> >>>>>>
> >>>>>> And that can only be done later on when we arm the fence as well.
> >>>>> Hm yeah, but that's a bug in the existing code I think: We already
> >>>>> fail to clean up if we fail to allocate the fences. So I think the
> >>>>> right thing to do here is to split the checks into job_init, and do
> >>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
> >>>>> what's all going on there, the first check looks a bit like trying to
> >>>>> schedule before the entity is set up, which is a driver bug and should
> >>>>> have a WARN_ON?
> >>>> No you misunderstood me, the problem is something else.
> >>>>
> >>>> You asked previously why the call to drm_sched_job_init() was so late in
> >>>> the CS.
> >>>>
> >>>> The reason for this was not alone the scheduler fence init, but also the
> >>>> call to drm_sched_entity_select_rq().
> >>> Ah ok, I think I can fix that. Needs a prep patch to first make
> >>> drm_sched_entity_select infallible, then should be easy to do.
> >>>
> >>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
> >>>>> even trying to do.
> >>>> You mean that here?
> >>>>
> >>>>            fence = READ_ONCE(entity->last_scheduled);
> >>>>            if (fence && !dma_fence_is_signaled(fence))
> >>>>                    return;
> >>>>
> >>>> This makes sure that load balancing is not moving the entity to a
> >>>> different scheduler while there are still jobs running from this entity
> >>>> on the hardware,
> >>> Yeah after a nap that idea crossed my mind too. But now I have locking
> >>> questions, afaiui the scheduler thread updates this, without taking
> >>> any locks - entity dequeuing is lockless. And here we read the fence
> >>> and then seem to yolo check whether it's signalled? What's preventing
> >>> a use-after-free here? There's no rcu or anything going on here at
> >>> all, and it's outside of the spinlock section, which starts a bit
> >>> further down.
> >> The last_scheduled fence of an entity can only change when there are
> >> jobs on the entities queued, and we have just ruled that out in the
> >> check before.
> > There aren't any barriers, so the cpu could easily run the two checks
> > the other way round. I'll ponder this and figure out where exactly we
> > need docs for the constraint and/or barriers to make this work as
> > intended. As-is I'm not seeing how it does ...
>
> spsc_queue_count() provides the necessary barrier with the atomic_read().

atomic_t is fully unordered, except when it's a read-modify-write
atomic op, then it's a full barrier. So yeah you need more here. But
also since you only need a read barrier on one side, and a write
barrier on the other, you don't actually need a cpu barriers on x86.
And READ_ONCE gives you the compiler barrier on one side at least, I
haven't found it on the writer side yet.

> But yes a comment would be really nice here. I had to think for a while
> why we don't need this as well.

I'm typing a patch, which after a night's sleep I realized has the
wrong barriers. And now I'm also typing some doc improvements for
drm_sched_entity and related functions.

>
> Christian.
>
> > -Daniel
> >
> >> Christian.
> >>
> >>
> >>> -Daniel
> >>>
> >>>> Regards
> >>>> Christian.
> >>>>
> >>>>> -Daniel
> >>>>>
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Also improve the kerneldoc for this.
> >>>>>>>
> >>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
> >>>>>>> Cc: Rob Herring <robh@kernel.org>
> >>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>>>>>> Cc: Steven Price <steven.price@arm.com>
> >>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>>>>>> Cc: David Airlie <airlied@linux.ie>
> >>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>>>>>> Cc: Kees Cook <keescook@chromium.org>
> >>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>>>>>> Cc: Nick Terrell <terrelln@fb.com>
> >>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>> Cc: Dave Airlie <airlied@redhat.com>
> >>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
> >>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>>>>>> Cc: Chen Li <chenli@uniontech.com>
> >>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>>>> Cc: etnaviv@lists.freedesktop.org
> >>>>>>> Cc: lima@lists.freedesktop.org
> >>>>>>> Cc: linux-media@vger.kernel.org
> >>>>>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>>>>> Cc: Emma Anholt <emma@anholt.net>
> >>>>>>> ---
> >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>>>>>      drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>>>>>      drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>>>>>      drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>>>>>      drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>>>>>      drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>>>>>      include/drm/gpu_scheduler.h              |  7 +++-
> >>>>>>>      10 files changed, 74 insertions(+), 14 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
> >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>>>>>          if (r)
> >>>>>>>                  goto error_unlock;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          /* No memory allocation is allowed while holding the notifier lock.
> >>>>>>>           * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>>>>>           * added to BOs.
> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>> index d33e6d97cc89..5ddb955d2315 100644
> >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>>>>>          if (r)
> >>>>>>>                  return r;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          *f = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>          amdgpu_job_free_resources(job);
> >>>>>>>          drm_sched_entity_push_job(&job->base, entity);
> >>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>> index feb6da1b6ceb..05f412204118 100644
> >>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>>>>>          if (ret)
> >>>>>>>                  goto out_unlock;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&submit->sched_job);
> >>>>>>> +
> >>>>>>>          submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>>>>>          submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>>>>>                                                  submit->out_fence, 0,
> >>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>> index dba8329937a3..38f755580507 100644
> >>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>>>>>                  return err;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&task->base);
> >>>>>>> +
> >>>>>>>          task->num_bos = num_bos;
> >>>>>>>          task->vm = lima_vm_get(vm);
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>> index 71a72fb50e6b..2992dc85325f 100644
> >>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>>>>>                  goto unlock;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>
> >>>>>>>          ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>> index 79554aa4dbb1..f7347c284886 100644
> >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>>>>>       * @sched_job: job to submit
> >>>>>>>       * @entity: scheduler entity
> >>>>>>>       *
> >>>>>>> - * Note: To guarantee that the order of insertion to queue matches
> >>>>>>> - * the job's fence sequence number this function should be
> >>>>>>> - * called with drm_sched_job_init under common lock.
> >>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>>>>>> + * under common lock.
> >>>>>>>       *
> >>>>>>>       * Returns 0 for success, negative error code otherwise.
> >>>>>>>       */
> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>> index 69de2c76731f..c451ee9a30d7 100644
> >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>>>>>       *
> >>>>>>>       * Free up the fence memory after the RCU grace period.
> >>>>>>>       */
> >>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>>      {
> >>>>>>>          struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>>>>>          struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>>>>>      }
> >>>>>>>      EXPORT_SYMBOL(to_drm_sched_fence);
> >>>>>>>
> >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>> -                                            void *owner)
> >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>>>>>> +                                           void *owner)
> >>>>>>>      {
> >>>>>>>          struct drm_sched_fence *fence = NULL;
> >>>>>>> -     unsigned seq;
> >>>>>>>
> >>>>>>>          fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>>>>>          if (fence == NULL)
> >>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>>          fence->sched = entity->rq->sched;
> >>>>>>>          spin_lock_init(&fence->lock);
> >>>>>>>
> >>>>>>> +     return fence;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>> +                       struct drm_sched_entity *entity)
> >>>>>>> +{
> >>>>>>> +     unsigned seq;
> >>>>>>> +
> >>>>>>>          seq = atomic_inc_return(&entity->fence_seq);
> >>>>>>>          dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>>>>>                         &fence->lock, entity->fence_context, seq);
> >>>>>>>          dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>>>>>                         &fence->lock, entity->fence_context + 1, seq);
> >>>>>>> -
> >>>>>>> -     return fence;
> >>>>>>>      }
> >>>>>>>
> >>>>>>>      module_init(drm_sched_fence_slab_init);
> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>> index 33c414d55fab..5e84e1500c32 100644
> >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>> @@ -48,9 +48,11 @@
> >>>>>>>      #include <linux/wait.h>
> >>>>>>>      #include <linux/sched.h>
> >>>>>>>      #include <linux/completion.h>
> >>>>>>> +#include <linux/dma-resv.h>
> >>>>>>>      #include <uapi/linux/sched/types.h>
> >>>>>>>
> >>>>>>>      #include <drm/drm_print.h>
> >>>>>>> +#include <drm/drm_gem.h>
> >>>>>>>      #include <drm/gpu_scheduler.h>
> >>>>>>>      #include <drm/spsc_queue.h>
> >>>>>>>
> >>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>
> >>>>>>>      /**
> >>>>>>>       * drm_sched_job_init - init a scheduler job
> >>>>>>> - *
> >>>>>>>       * @job: scheduler job to init
> >>>>>>>       * @entity: scheduler entity to use
> >>>>>>>       * @owner: job owner for debugging
> >>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>       * Refer to drm_sched_entity_push_job() documentation
> >>>>>>>       * for locking considerations.
> >>>>>>>       *
> >>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>>>>>> + *
> >>>>>>>       * Returns 0 for success, negative error code otherwise.
> >>>>>>>       */
> >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>          job->sched = sched;
> >>>>>>>          job->entity = entity;
> >>>>>>>          job->s_priority = entity->rq - sched->sched_rq;
> >>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>>>>>          if (!job->s_fence)
> >>>>>>>                  return -ENOMEM;
> >>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
> >>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>      EXPORT_SYMBOL(drm_sched_job_init);
> >>>>>>>
> >>>>>>>      /**
> >>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>>>>>> + * @job: scheduler job to arm
> >>>>>>> + *
> >>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>>>>>> + * or other places that need to track the completion of this job.
> >>>>>>> + *
> >>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>>>>>> + * considerations.
> >>>>>>>       *
> >>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
> >>>>>>> + */
> >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>>>>>> +{
> >>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>>>>>> +}
> >>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>>       * @job: scheduler job to clean up
> >>>>>>> + *
> >>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>>>>>> + *
> >>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
> >>>>>>> + * before drm_sched_job_arm() is called.
> >>>>>>> + *
> >>>>>>> + * After that point of no return @job is committed to be executed by the
> >>>>>>> + * scheduler, and this function should be called from the
> >>>>>>> + * &drm_sched_backend_ops.free_job callback.
> >>>>>>>       */
> >>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>>>>>      {
> >>>>>>> -     dma_fence_put(&job->s_fence->finished);
> >>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>>>>>> +             /* drm_sched_job_arm() has been called */
> >>>>>>> +             dma_fence_put(&job->s_fence->finished);
> >>>>>>> +     } else {
> >>>>>>> +             /* aborted job before committing to run it */
> >>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>>>>>> +     }
> >>>>>>> +
> >>>>>>>          job->s_fence = NULL;
> >>>>>>>      }
> >>>>>>>      EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>> index 4eb354226972..5c3a99027ecd 100644
> >>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>>>>>          if (ret)
> >>>>>>>                  return ret;
> >>>>>>>
> >>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>> +
> >>>>>>>          job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>
> >>>>>>>          /* put by scheduler job completion */
> >>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>>>>>> --- a/include/drm/gpu_scheduler.h
> >>>>>>> +++ b/include/drm/gpu_scheduler.h
> >>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>                         struct drm_sched_entity *entity,
> >>>>>>>                         void *owner);
> >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>>>>>      void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>>>>>                                      struct drm_gpu_scheduler **sched_list,
> >>>>>>>                                         unsigned int num_sched_list);
> >>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>>>>>                                     enum drm_sched_priority priority);
> >>>>>>>      bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>>>>>
> >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
> >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>>>>>          struct drm_sched_entity *s_entity, void *owner);
> >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>> +                       struct drm_sched_entity *entity);
> >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>>>>>> +
> >>>>>>>      void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>>>>>      void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>>>>>
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08  7:09                   ` Daniel Vetter
@ 2021-07-08  7:19                     ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08  7:19 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
> > Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> > > On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> > >> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> > >>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> > >>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> > >>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> > >>>>> <christian.koenig@amd.com> wrote:
> > >>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > >>>>>>> This is a very confusingly named function, because not just does it
> > >>>>>>> init an object, it arms it and provides a point of no return for
> > >>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
> > >>>>>>> clearer in the interface.
> > >>>>>>>
> > >>>>>>> But the real reason is that I want to push the dependency tracking
> > >>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
> > >>>>>>> must be called a lot earlier, without arming the job.
> > >>>>>>>
> > >>>>>>> v2:
> > >>>>>>> - don't change .gitignore (Steven)
> > >>>>>>> - don't forget v3d (Emma)
> > >>>>>>>
> > >>>>>>> v3: Emma noticed that I leak the memory allocated in
> > >>>>>>> drm_sched_job_init if we bail out before the point of no return in
> > >>>>>>> subsequent driver patches. To be able to fix this change
> > >>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
> > >>>>>>> after drm_sched_job_arm().
> > >>>>>> Thinking more about this, I'm not sure if this really works.
> > >>>>>>
> > >>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> > >>>>>> to update the entity->rq association.
> > >>>>>>
> > >>>>>> And that can only be done later on when we arm the fence as well.
> > >>>>> Hm yeah, but that's a bug in the existing code I think: We already
> > >>>>> fail to clean up if we fail to allocate the fences. So I think the
> > >>>>> right thing to do here is to split the checks into job_init, and do
> > >>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
> > >>>>> what's all going on there, the first check looks a bit like trying to
> > >>>>> schedule before the entity is set up, which is a driver bug and should
> > >>>>> have a WARN_ON?
> > >>>> No you misunderstood me, the problem is something else.
> > >>>>
> > >>>> You asked previously why the call to drm_sched_job_init() was so late in
> > >>>> the CS.
> > >>>>
> > >>>> The reason for this was not alone the scheduler fence init, but also the
> > >>>> call to drm_sched_entity_select_rq().
> > >>> Ah ok, I think I can fix that. Needs a prep patch to first make
> > >>> drm_sched_entity_select infallible, then should be easy to do.
> > >>>
> > >>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
> > >>>>> even trying to do.
> > >>>> You mean that here?
> > >>>>
> > >>>>            fence = READ_ONCE(entity->last_scheduled);
> > >>>>            if (fence && !dma_fence_is_signaled(fence))
> > >>>>                    return;
> > >>>>
> > >>>> This makes sure that load balancing is not moving the entity to a
> > >>>> different scheduler while there are still jobs running from this entity
> > >>>> on the hardware,
> > >>> Yeah after a nap that idea crossed my mind too. But now I have locking
> > >>> questions, afaiui the scheduler thread updates this, without taking
> > >>> any locks - entity dequeuing is lockless. And here we read the fence
> > >>> and then seem to yolo check whether it's signalled? What's preventing
> > >>> a use-after-free here? There's no rcu or anything going on here at
> > >>> all, and it's outside of the spinlock section, which starts a bit
> > >>> further down.
> > >> The last_scheduled fence of an entity can only change when there are
> > >> jobs on the entities queued, and we have just ruled that out in the
> > >> check before.
> > > There aren't any barriers, so the cpu could easily run the two checks
> > > the other way round. I'll ponder this and figure out where exactly we
> > > need docs for the constraint and/or barriers to make this work as
> > > intended. As-is I'm not seeing how it does ...
> >
> > spsc_queue_count() provides the necessary barrier with the atomic_read().
>
> atomic_t is fully unordered, except when it's a read-modify-write

Wasn't awake yet, I think the rule is read-modify-write and return
previous value gives you full barrier. So stuff like cmpxchg, but also
a few others. See atomic_t.txt under ODERING heading (yes that
maintainer refuses to accept .rst so I can't just link you to the
right section, it's silly). get/set and even RMW atomic ops that don't
return anything are all fully unordered.
-Daniel


> atomic op, then it's a full barrier. So yeah you need more here. But
> also since you only need a read barrier on one side, and a write
> barrier on the other, you don't actually need a cpu barriers on x86.
> And READ_ONCE gives you the compiler barrier on one side at least, I
> haven't found it on the writer side yet.
>
> > But yes a comment would be really nice here. I had to think for a while
> > why we don't need this as well.
>
> I'm typing a patch, which after a night's sleep I realized has the
> wrong barriers. And now I'm also typing some doc improvements for
> drm_sched_entity and related functions.
>
> >
> > Christian.
> >
> > > -Daniel
> > >
> > >> Christian.
> > >>
> > >>
> > >>> -Daniel
> > >>>
> > >>>> Regards
> > >>>> Christian.
> > >>>>
> > >>>>> -Daniel
> > >>>>>
> > >>>>>> Christian.
> > >>>>>>
> > >>>>>>> Also improve the kerneldoc for this.
> > >>>>>>>
> > >>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> > >>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > >>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> > >>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > >>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > >>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
> > >>>>>>> Cc: Rob Herring <robh@kernel.org>
> > >>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > >>>>>>> Cc: Steven Price <steven.price@arm.com>
> > >>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > >>>>>>> Cc: David Airlie <airlied@linux.ie>
> > >>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> > >>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > >>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
> > >>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> > >>>>>>> Cc: Kees Cook <keescook@chromium.org>
> > >>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> > >>>>>>> Cc: Nick Terrell <terrelln@fb.com>
> > >>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > >>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> > >>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> > >>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > >>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> > >>>>>>> Cc: Dave Airlie <airlied@redhat.com>
> > >>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> > >>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> > >>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
> > >>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> > >>>>>>> Cc: Chen Li <chenli@uniontech.com>
> > >>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> > >>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> > >>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> > >>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > >>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > >>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> > >>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > >>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> > >>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > >>>>>>> Cc: etnaviv@lists.freedesktop.org
> > >>>>>>> Cc: lima@lists.freedesktop.org
> > >>>>>>> Cc: linux-media@vger.kernel.org
> > >>>>>>> Cc: linaro-mm-sig@lists.linaro.org
> > >>>>>>> Cc: Emma Anholt <emma@anholt.net>
> > >>>>>>> ---
> > >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> > >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> > >>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> > >>>>>>>      drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> > >>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> > >>>>>>>      drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> > >>>>>>>      drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> > >>>>>>>      drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> > >>>>>>>      drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> > >>>>>>>      include/drm/gpu_scheduler.h              |  7 +++-
> > >>>>>>>      10 files changed, 74 insertions(+), 14 deletions(-)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > >>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
> > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > >>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > >>>>>>>          if (r)
> > >>>>>>>                  goto error_unlock;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          /* No memory allocation is allowed while holding the notifier lock.
> > >>>>>>>           * The lock is held until amdgpu_cs_submit is finished and fence is
> > >>>>>>>           * added to BOs.
> > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > >>>>>>> index d33e6d97cc89..5ddb955d2315 100644
> > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > >>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> > >>>>>>>          if (r)
> > >>>>>>>                  return r;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          *f = dma_fence_get(&job->base.s_fence->finished);
> > >>>>>>>          amdgpu_job_free_resources(job);
> > >>>>>>>          drm_sched_entity_push_job(&job->base, entity);
> > >>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >>>>>>> index feb6da1b6ceb..05f412204118 100644
> > >>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > >>>>>>>          if (ret)
> > >>>>>>>                  goto out_unlock;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&submit->sched_job);
> > >>>>>>> +
> > >>>>>>>          submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > >>>>>>>          submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> > >>>>>>>                                                  submit->out_fence, 0,
> > >>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > >>>>>>> index dba8329937a3..38f755580507 100644
> > >>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> > >>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> > >>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> > >>>>>>>                  return err;
> > >>>>>>>          }
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&task->base);
> > >>>>>>> +
> > >>>>>>>          task->num_bos = num_bos;
> > >>>>>>>          task->vm = lima_vm_get(vm);
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > >>>>>>> index 71a72fb50e6b..2992dc85325f 100644
> > >>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > >>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > >>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> > >>>>>>>                  goto unlock;
> > >>>>>>>          }
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> > >>>>>>>
> > >>>>>>>          ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> > >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > >>>>>>> index 79554aa4dbb1..f7347c284886 100644
> > >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > >>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> > >>>>>>>       * @sched_job: job to submit
> > >>>>>>>       * @entity: scheduler entity
> > >>>>>>>       *
> > >>>>>>> - * Note: To guarantee that the order of insertion to queue matches
> > >>>>>>> - * the job's fence sequence number this function should be
> > >>>>>>> - * called with drm_sched_job_init under common lock.
> > >>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> > >>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> > >>>>>>> + * under common lock.
> > >>>>>>>       *
> > >>>>>>>       * Returns 0 for success, negative error code otherwise.
> > >>>>>>>       */
> > >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > >>>>>>> index 69de2c76731f..c451ee9a30d7 100644
> > >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > >>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> > >>>>>>>       *
> > >>>>>>>       * Free up the fence memory after the RCU grace period.
> > >>>>>>>       */
> > >>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> > >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> > >>>>>>>      {
> > >>>>>>>          struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> > >>>>>>>          struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > >>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> > >>>>>>>      }
> > >>>>>>>      EXPORT_SYMBOL(to_drm_sched_fence);
> > >>>>>>>
> > >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > >>>>>>> -                                            void *owner)
> > >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> > >>>>>>> +                                           void *owner)
> > >>>>>>>      {
> > >>>>>>>          struct drm_sched_fence *fence = NULL;
> > >>>>>>> -     unsigned seq;
> > >>>>>>>
> > >>>>>>>          fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> > >>>>>>>          if (fence == NULL)
> > >>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > >>>>>>>          fence->sched = entity->rq->sched;
> > >>>>>>>          spin_lock_init(&fence->lock);
> > >>>>>>>
> > >>>>>>> +     return fence;
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > >>>>>>> +                       struct drm_sched_entity *entity)
> > >>>>>>> +{
> > >>>>>>> +     unsigned seq;
> > >>>>>>> +
> > >>>>>>>          seq = atomic_inc_return(&entity->fence_seq);
> > >>>>>>>          dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> > >>>>>>>                         &fence->lock, entity->fence_context, seq);
> > >>>>>>>          dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> > >>>>>>>                         &fence->lock, entity->fence_context + 1, seq);
> > >>>>>>> -
> > >>>>>>> -     return fence;
> > >>>>>>>      }
> > >>>>>>>
> > >>>>>>>      module_init(drm_sched_fence_slab_init);
> > >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>>>> index 33c414d55fab..5e84e1500c32 100644
> > >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>>>> @@ -48,9 +48,11 @@
> > >>>>>>>      #include <linux/wait.h>
> > >>>>>>>      #include <linux/sched.h>
> > >>>>>>>      #include <linux/completion.h>
> > >>>>>>> +#include <linux/dma-resv.h>
> > >>>>>>>      #include <uapi/linux/sched/types.h>
> > >>>>>>>
> > >>>>>>>      #include <drm/drm_print.h>
> > >>>>>>> +#include <drm/drm_gem.h>
> > >>>>>>>      #include <drm/gpu_scheduler.h>
> > >>>>>>>      #include <drm/spsc_queue.h>
> > >>>>>>>
> > >>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > >>>>>>>
> > >>>>>>>      /**
> > >>>>>>>       * drm_sched_job_init - init a scheduler job
> > >>>>>>> - *
> > >>>>>>>       * @job: scheduler job to init
> > >>>>>>>       * @entity: scheduler entity to use
> > >>>>>>>       * @owner: job owner for debugging
> > >>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > >>>>>>>       * Refer to drm_sched_entity_push_job() documentation
> > >>>>>>>       * for locking considerations.
> > >>>>>>>       *
> > >>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> > >>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> > >>>>>>> + *
> > >>>>>>>       * Returns 0 for success, negative error code otherwise.
> > >>>>>>>       */
> > >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>>          job->sched = sched;
> > >>>>>>>          job->entity = entity;
> > >>>>>>>          job->s_priority = entity->rq - sched->sched_rq;
> > >>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> > >>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> > >>>>>>>          if (!job->s_fence)
> > >>>>>>>                  return -ENOMEM;
> > >>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
> > >>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>>      EXPORT_SYMBOL(drm_sched_job_init);
> > >>>>>>>
> > >>>>>>>      /**
> > >>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> > >>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> > >>>>>>> + * @job: scheduler job to arm
> > >>>>>>> + *
> > >>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> > >>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> > >>>>>>> + * or other places that need to track the completion of this job.
> > >>>>>>> + *
> > >>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> > >>>>>>> + * considerations.
> > >>>>>>>       *
> > >>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
> > >>>>>>> + */
> > >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> > >>>>>>> +{
> > >>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> > >>>>>>> +}
> > >>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> > >>>>>>> +
> > >>>>>>> +/**
> > >>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> > >>>>>>>       * @job: scheduler job to clean up
> > >>>>>>> + *
> > >>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> > >>>>>>> + *
> > >>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
> > >>>>>>> + * before drm_sched_job_arm() is called.
> > >>>>>>> + *
> > >>>>>>> + * After that point of no return @job is committed to be executed by the
> > >>>>>>> + * scheduler, and this function should be called from the
> > >>>>>>> + * &drm_sched_backend_ops.free_job callback.
> > >>>>>>>       */
> > >>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job)
> > >>>>>>>      {
> > >>>>>>> -     dma_fence_put(&job->s_fence->finished);
> > >>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> > >>>>>>> +             /* drm_sched_job_arm() has been called */
> > >>>>>>> +             dma_fence_put(&job->s_fence->finished);
> > >>>>>>> +     } else {
> > >>>>>>> +             /* aborted job before committing to run it */
> > >>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> > >>>>>>> +     }
> > >>>>>>> +
> > >>>>>>>          job->s_fence = NULL;
> > >>>>>>>      }
> > >>>>>>>      EXPORT_SYMBOL(drm_sched_job_cleanup);
> > >>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> > >>>>>>> index 4eb354226972..5c3a99027ecd 100644
> > >>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > >>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > >>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> > >>>>>>>          if (ret)
> > >>>>>>>                  return ret;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> > >>>>>>>
> > >>>>>>>          /* put by scheduler job completion */
> > >>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > >>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> > >>>>>>> --- a/include/drm/gpu_scheduler.h
> > >>>>>>> +++ b/include/drm/gpu_scheduler.h
> > >>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> > >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>>                         struct drm_sched_entity *entity,
> > >>>>>>>                         void *owner);
> > >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> > >>>>>>>      void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > >>>>>>>                                      struct drm_gpu_scheduler **sched_list,
> > >>>>>>>                                         unsigned int num_sched_list);
> > >>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> > >>>>>>>                                     enum drm_sched_priority priority);
> > >>>>>>>      bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> > >>>>>>>
> > >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
> > >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> > >>>>>>>          struct drm_sched_entity *s_entity, void *owner);
> > >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > >>>>>>> +                       struct drm_sched_entity *entity);
> > >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> > >>>>>>> +
> > >>>>>>>      void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> > >>>>>>>      void drm_sched_fence_finished(struct drm_sched_fence *fence);
> > >>>>>>>
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08  7:19                     ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08  7:19 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
> > Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> > > On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> > >> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> > >>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> > >>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> > >>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> > >>>>> <christian.koenig@amd.com> wrote:
> > >>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > >>>>>>> This is a very confusingly named function, because not just does it
> > >>>>>>> init an object, it arms it and provides a point of no return for
> > >>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
> > >>>>>>> clearer in the interface.
> > >>>>>>>
> > >>>>>>> But the real reason is that I want to push the dependency tracking
> > >>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
> > >>>>>>> must be called a lot earlier, without arming the job.
> > >>>>>>>
> > >>>>>>> v2:
> > >>>>>>> - don't change .gitignore (Steven)
> > >>>>>>> - don't forget v3d (Emma)
> > >>>>>>>
> > >>>>>>> v3: Emma noticed that I leak the memory allocated in
> > >>>>>>> drm_sched_job_init if we bail out before the point of no return in
> > >>>>>>> subsequent driver patches. To be able to fix this change
> > >>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
> > >>>>>>> after drm_sched_job_arm().
> > >>>>>> Thinking more about this, I'm not sure if this really works.
> > >>>>>>
> > >>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> > >>>>>> to update the entity->rq association.
> > >>>>>>
> > >>>>>> And that can only be done later on when we arm the fence as well.
> > >>>>> Hm yeah, but that's a bug in the existing code I think: We already
> > >>>>> fail to clean up if we fail to allocate the fences. So I think the
> > >>>>> right thing to do here is to split the checks into job_init, and do
> > >>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
> > >>>>> what's all going on there, the first check looks a bit like trying to
> > >>>>> schedule before the entity is set up, which is a driver bug and should
> > >>>>> have a WARN_ON?
> > >>>> No you misunderstood me, the problem is something else.
> > >>>>
> > >>>> You asked previously why the call to drm_sched_job_init() was so late in
> > >>>> the CS.
> > >>>>
> > >>>> The reason for this was not alone the scheduler fence init, but also the
> > >>>> call to drm_sched_entity_select_rq().
> > >>> Ah ok, I think I can fix that. Needs a prep patch to first make
> > >>> drm_sched_entity_select infallible, then should be easy to do.
> > >>>
> > >>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
> > >>>>> even trying to do.
> > >>>> You mean that here?
> > >>>>
> > >>>>            fence = READ_ONCE(entity->last_scheduled);
> > >>>>            if (fence && !dma_fence_is_signaled(fence))
> > >>>>                    return;
> > >>>>
> > >>>> This makes sure that load balancing is not moving the entity to a
> > >>>> different scheduler while there are still jobs running from this entity
> > >>>> on the hardware,
> > >>> Yeah after a nap that idea crossed my mind too. But now I have locking
> > >>> questions, afaiui the scheduler thread updates this, without taking
> > >>> any locks - entity dequeuing is lockless. And here we read the fence
> > >>> and then seem to yolo check whether it's signalled? What's preventing
> > >>> a use-after-free here? There's no rcu or anything going on here at
> > >>> all, and it's outside of the spinlock section, which starts a bit
> > >>> further down.
> > >> The last_scheduled fence of an entity can only change when there are
> > >> jobs on the entities queued, and we have just ruled that out in the
> > >> check before.
> > > There aren't any barriers, so the cpu could easily run the two checks
> > > the other way round. I'll ponder this and figure out where exactly we
> > > need docs for the constraint and/or barriers to make this work as
> > > intended. As-is I'm not seeing how it does ...
> >
> > spsc_queue_count() provides the necessary barrier with the atomic_read().
>
> atomic_t is fully unordered, except when it's a read-modify-write

Wasn't awake yet, I think the rule is read-modify-write and return
previous value gives you full barrier. So stuff like cmpxchg, but also
a few others. See atomic_t.txt under ODERING heading (yes that
maintainer refuses to accept .rst so I can't just link you to the
right section, it's silly). get/set and even RMW atomic ops that don't
return anything are all fully unordered.
-Daniel


> atomic op, then it's a full barrier. So yeah you need more here. But
> also since you only need a read barrier on one side, and a write
> barrier on the other, you don't actually need a cpu barriers on x86.
> And READ_ONCE gives you the compiler barrier on one side at least, I
> haven't found it on the writer side yet.
>
> > But yes a comment would be really nice here. I had to think for a while
> > why we don't need this as well.
>
> I'm typing a patch, which after a night's sleep I realized has the
> wrong barriers. And now I'm also typing some doc improvements for
> drm_sched_entity and related functions.
>
> >
> > Christian.
> >
> > > -Daniel
> > >
> > >> Christian.
> > >>
> > >>
> > >>> -Daniel
> > >>>
> > >>>> Regards
> > >>>> Christian.
> > >>>>
> > >>>>> -Daniel
> > >>>>>
> > >>>>>> Christian.
> > >>>>>>
> > >>>>>>> Also improve the kerneldoc for this.
> > >>>>>>>
> > >>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> > >>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > >>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> > >>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > >>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > >>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
> > >>>>>>> Cc: Rob Herring <robh@kernel.org>
> > >>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > >>>>>>> Cc: Steven Price <steven.price@arm.com>
> > >>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > >>>>>>> Cc: David Airlie <airlied@linux.ie>
> > >>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> > >>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > >>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
> > >>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> > >>>>>>> Cc: Kees Cook <keescook@chromium.org>
> > >>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> > >>>>>>> Cc: Nick Terrell <terrelln@fb.com>
> > >>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > >>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> > >>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> > >>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > >>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> > >>>>>>> Cc: Dave Airlie <airlied@redhat.com>
> > >>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> > >>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> > >>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
> > >>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> > >>>>>>> Cc: Chen Li <chenli@uniontech.com>
> > >>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> > >>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> > >>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> > >>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > >>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > >>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> > >>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > >>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> > >>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > >>>>>>> Cc: etnaviv@lists.freedesktop.org
> > >>>>>>> Cc: lima@lists.freedesktop.org
> > >>>>>>> Cc: linux-media@vger.kernel.org
> > >>>>>>> Cc: linaro-mm-sig@lists.linaro.org
> > >>>>>>> Cc: Emma Anholt <emma@anholt.net>
> > >>>>>>> ---
> > >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> > >>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> > >>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> > >>>>>>>      drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> > >>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> > >>>>>>>      drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> > >>>>>>>      drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> > >>>>>>>      drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> > >>>>>>>      drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> > >>>>>>>      include/drm/gpu_scheduler.h              |  7 +++-
> > >>>>>>>      10 files changed, 74 insertions(+), 14 deletions(-)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > >>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
> > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > >>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > >>>>>>>          if (r)
> > >>>>>>>                  goto error_unlock;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          /* No memory allocation is allowed while holding the notifier lock.
> > >>>>>>>           * The lock is held until amdgpu_cs_submit is finished and fence is
> > >>>>>>>           * added to BOs.
> > >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > >>>>>>> index d33e6d97cc89..5ddb955d2315 100644
> > >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > >>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> > >>>>>>>          if (r)
> > >>>>>>>                  return r;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          *f = dma_fence_get(&job->base.s_fence->finished);
> > >>>>>>>          amdgpu_job_free_resources(job);
> > >>>>>>>          drm_sched_entity_push_job(&job->base, entity);
> > >>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >>>>>>> index feb6da1b6ceb..05f412204118 100644
> > >>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > >>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > >>>>>>>          if (ret)
> > >>>>>>>                  goto out_unlock;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&submit->sched_job);
> > >>>>>>> +
> > >>>>>>>          submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > >>>>>>>          submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> > >>>>>>>                                                  submit->out_fence, 0,
> > >>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > >>>>>>> index dba8329937a3..38f755580507 100644
> > >>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> > >>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> > >>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> > >>>>>>>                  return err;
> > >>>>>>>          }
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&task->base);
> > >>>>>>> +
> > >>>>>>>          task->num_bos = num_bos;
> > >>>>>>>          task->vm = lima_vm_get(vm);
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > >>>>>>> index 71a72fb50e6b..2992dc85325f 100644
> > >>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > >>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > >>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> > >>>>>>>                  goto unlock;
> > >>>>>>>          }
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> > >>>>>>>
> > >>>>>>>          ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> > >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > >>>>>>> index 79554aa4dbb1..f7347c284886 100644
> > >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > >>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> > >>>>>>>       * @sched_job: job to submit
> > >>>>>>>       * @entity: scheduler entity
> > >>>>>>>       *
> > >>>>>>> - * Note: To guarantee that the order of insertion to queue matches
> > >>>>>>> - * the job's fence sequence number this function should be
> > >>>>>>> - * called with drm_sched_job_init under common lock.
> > >>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> > >>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> > >>>>>>> + * under common lock.
> > >>>>>>>       *
> > >>>>>>>       * Returns 0 for success, negative error code otherwise.
> > >>>>>>>       */
> > >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > >>>>>>> index 69de2c76731f..c451ee9a30d7 100644
> > >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > >>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> > >>>>>>>       *
> > >>>>>>>       * Free up the fence memory after the RCU grace period.
> > >>>>>>>       */
> > >>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> > >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> > >>>>>>>      {
> > >>>>>>>          struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> > >>>>>>>          struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > >>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> > >>>>>>>      }
> > >>>>>>>      EXPORT_SYMBOL(to_drm_sched_fence);
> > >>>>>>>
> > >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > >>>>>>> -                                            void *owner)
> > >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> > >>>>>>> +                                           void *owner)
> > >>>>>>>      {
> > >>>>>>>          struct drm_sched_fence *fence = NULL;
> > >>>>>>> -     unsigned seq;
> > >>>>>>>
> > >>>>>>>          fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> > >>>>>>>          if (fence == NULL)
> > >>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > >>>>>>>          fence->sched = entity->rq->sched;
> > >>>>>>>          spin_lock_init(&fence->lock);
> > >>>>>>>
> > >>>>>>> +     return fence;
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > >>>>>>> +                       struct drm_sched_entity *entity)
> > >>>>>>> +{
> > >>>>>>> +     unsigned seq;
> > >>>>>>> +
> > >>>>>>>          seq = atomic_inc_return(&entity->fence_seq);
> > >>>>>>>          dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> > >>>>>>>                         &fence->lock, entity->fence_context, seq);
> > >>>>>>>          dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> > >>>>>>>                         &fence->lock, entity->fence_context + 1, seq);
> > >>>>>>> -
> > >>>>>>> -     return fence;
> > >>>>>>>      }
> > >>>>>>>
> > >>>>>>>      module_init(drm_sched_fence_slab_init);
> > >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>>>> index 33c414d55fab..5e84e1500c32 100644
> > >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > >>>>>>> @@ -48,9 +48,11 @@
> > >>>>>>>      #include <linux/wait.h>
> > >>>>>>>      #include <linux/sched.h>
> > >>>>>>>      #include <linux/completion.h>
> > >>>>>>> +#include <linux/dma-resv.h>
> > >>>>>>>      #include <uapi/linux/sched/types.h>
> > >>>>>>>
> > >>>>>>>      #include <drm/drm_print.h>
> > >>>>>>> +#include <drm/drm_gem.h>
> > >>>>>>>      #include <drm/gpu_scheduler.h>
> > >>>>>>>      #include <drm/spsc_queue.h>
> > >>>>>>>
> > >>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > >>>>>>>
> > >>>>>>>      /**
> > >>>>>>>       * drm_sched_job_init - init a scheduler job
> > >>>>>>> - *
> > >>>>>>>       * @job: scheduler job to init
> > >>>>>>>       * @entity: scheduler entity to use
> > >>>>>>>       * @owner: job owner for debugging
> > >>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > >>>>>>>       * Refer to drm_sched_entity_push_job() documentation
> > >>>>>>>       * for locking considerations.
> > >>>>>>>       *
> > >>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> > >>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> > >>>>>>> + *
> > >>>>>>>       * Returns 0 for success, negative error code otherwise.
> > >>>>>>>       */
> > >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>>          job->sched = sched;
> > >>>>>>>          job->entity = entity;
> > >>>>>>>          job->s_priority = entity->rq - sched->sched_rq;
> > >>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> > >>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> > >>>>>>>          if (!job->s_fence)
> > >>>>>>>                  return -ENOMEM;
> > >>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
> > >>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>>      EXPORT_SYMBOL(drm_sched_job_init);
> > >>>>>>>
> > >>>>>>>      /**
> > >>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> > >>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> > >>>>>>> + * @job: scheduler job to arm
> > >>>>>>> + *
> > >>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> > >>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> > >>>>>>> + * or other places that need to track the completion of this job.
> > >>>>>>> + *
> > >>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> > >>>>>>> + * considerations.
> > >>>>>>>       *
> > >>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
> > >>>>>>> + */
> > >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> > >>>>>>> +{
> > >>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> > >>>>>>> +}
> > >>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> > >>>>>>> +
> > >>>>>>> +/**
> > >>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> > >>>>>>>       * @job: scheduler job to clean up
> > >>>>>>> + *
> > >>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> > >>>>>>> + *
> > >>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
> > >>>>>>> + * before drm_sched_job_arm() is called.
> > >>>>>>> + *
> > >>>>>>> + * After that point of no return @job is committed to be executed by the
> > >>>>>>> + * scheduler, and this function should be called from the
> > >>>>>>> + * &drm_sched_backend_ops.free_job callback.
> > >>>>>>>       */
> > >>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job)
> > >>>>>>>      {
> > >>>>>>> -     dma_fence_put(&job->s_fence->finished);
> > >>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> > >>>>>>> +             /* drm_sched_job_arm() has been called */
> > >>>>>>> +             dma_fence_put(&job->s_fence->finished);
> > >>>>>>> +     } else {
> > >>>>>>> +             /* aborted job before committing to run it */
> > >>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> > >>>>>>> +     }
> > >>>>>>> +
> > >>>>>>>          job->s_fence = NULL;
> > >>>>>>>      }
> > >>>>>>>      EXPORT_SYMBOL(drm_sched_job_cleanup);
> > >>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> > >>>>>>> index 4eb354226972..5c3a99027ecd 100644
> > >>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > >>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > >>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> > >>>>>>>          if (ret)
> > >>>>>>>                  return ret;
> > >>>>>>>
> > >>>>>>> +     drm_sched_job_arm(&job->base);
> > >>>>>>> +
> > >>>>>>>          job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> > >>>>>>>
> > >>>>>>>          /* put by scheduler job completion */
> > >>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > >>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> > >>>>>>> --- a/include/drm/gpu_scheduler.h
> > >>>>>>> +++ b/include/drm/gpu_scheduler.h
> > >>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> > >>>>>>>      int drm_sched_job_init(struct drm_sched_job *job,
> > >>>>>>>                         struct drm_sched_entity *entity,
> > >>>>>>>                         void *owner);
> > >>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> > >>>>>>>      void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > >>>>>>>                                      struct drm_gpu_scheduler **sched_list,
> > >>>>>>>                                         unsigned int num_sched_list);
> > >>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> > >>>>>>>                                     enum drm_sched_priority priority);
> > >>>>>>>      bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> > >>>>>>>
> > >>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
> > >>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> > >>>>>>>          struct drm_sched_entity *s_entity, void *owner);
> > >>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > >>>>>>> +                       struct drm_sched_entity *entity);
> > >>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> > >>>>>>> +
> > >>>>>>>      void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> > >>>>>>>      void drm_sched_fence_finished(struct drm_sched_fence *fence);
> > >>>>>>>
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08  7:19                     ` Daniel Vetter
@ 2021-07-08  7:53                       ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08  7:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

Am 08.07.21 um 09:19 schrieb Daniel Vetter:
> On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
>>> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
>>>> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
>>>>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
>>>>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>>>>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>>>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>>>>>>> This is a very confusingly named function, because not just does it
>>>>>>>>>> init an object, it arms it and provides a point of no return for
>>>>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>>>>>>> clearer in the interface.
>>>>>>>>>>
>>>>>>>>>> But the real reason is that I want to push the dependency tracking
>>>>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>>>>>>> must be called a lot earlier, without arming the job.
>>>>>>>>>>
>>>>>>>>>> v2:
>>>>>>>>>> - don't change .gitignore (Steven)
>>>>>>>>>> - don't forget v3d (Emma)
>>>>>>>>>>
>>>>>>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>>>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>>>>>>> subsequent driver patches. To be able to fix this change
>>>>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>>>>>>> after drm_sched_job_arm().
>>>>>>>>> Thinking more about this, I'm not sure if this really works.
>>>>>>>>>
>>>>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>>>>>>> to update the entity->rq association.
>>>>>>>>>
>>>>>>>>> And that can only be done later on when we arm the fence as well.
>>>>>>>> Hm yeah, but that's a bug in the existing code I think: We already
>>>>>>>> fail to clean up if we fail to allocate the fences. So I think the
>>>>>>>> right thing to do here is to split the checks into job_init, and do
>>>>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>>>>>>> what's all going on there, the first check looks a bit like trying to
>>>>>>>> schedule before the entity is set up, which is a driver bug and should
>>>>>>>> have a WARN_ON?
>>>>>>> No you misunderstood me, the problem is something else.
>>>>>>>
>>>>>>> You asked previously why the call to drm_sched_job_init() was so late in
>>>>>>> the CS.
>>>>>>>
>>>>>>> The reason for this was not alone the scheduler fence init, but also the
>>>>>>> call to drm_sched_entity_select_rq().
>>>>>> Ah ok, I think I can fix that. Needs a prep patch to first make
>>>>>> drm_sched_entity_select infallible, then should be easy to do.
>>>>>>
>>>>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>>>>>>> even trying to do.
>>>>>>> You mean that here?
>>>>>>>
>>>>>>>             fence = READ_ONCE(entity->last_scheduled);
>>>>>>>             if (fence && !dma_fence_is_signaled(fence))
>>>>>>>                     return;
>>>>>>>
>>>>>>> This makes sure that load balancing is not moving the entity to a
>>>>>>> different scheduler while there are still jobs running from this entity
>>>>>>> on the hardware,
>>>>>> Yeah after a nap that idea crossed my mind too. But now I have locking
>>>>>> questions, afaiui the scheduler thread updates this, without taking
>>>>>> any locks - entity dequeuing is lockless. And here we read the fence
>>>>>> and then seem to yolo check whether it's signalled? What's preventing
>>>>>> a use-after-free here? There's no rcu or anything going on here at
>>>>>> all, and it's outside of the spinlock section, which starts a bit
>>>>>> further down.
>>>>> The last_scheduled fence of an entity can only change when there are
>>>>> jobs on the entities queued, and we have just ruled that out in the
>>>>> check before.
>>>> There aren't any barriers, so the cpu could easily run the two checks
>>>> the other way round. I'll ponder this and figure out where exactly we
>>>> need docs for the constraint and/or barriers to make this work as
>>>> intended. As-is I'm not seeing how it does ...
>>> spsc_queue_count() provides the necessary barrier with the atomic_read().
>> atomic_t is fully unordered, except when it's a read-modify-write
> Wasn't awake yet, I think the rule is read-modify-write and return
> previous value gives you full barrier. So stuff like cmpxchg, but also
> a few others. See atomic_t.txt under ODERING heading (yes that
> maintainer refuses to accept .rst so I can't just link you to the
> right section, it's silly). get/set and even RMW atomic ops that don't
> return anything are all fully unordered.

As far as I know that not completely correct. The rules around atomics i 
once learned are:

1. Everything which modifies something is a write barrier.
2. Everything which returns something is a read barrier.

And I know a whole bunch of use cases where this is relied upon in the 
core kernel, so I'm pretty sure that's correct.

In this case the write barrier is the atomic_dec() in spsc_queue_pop() 
and the read barrier is the aromic_read() in spsc_queue_count().

The READ_ONCE() is actually not even necessary as far as I can see.

Christian.

> -Daniel
>
>
>> atomic op, then it's a full barrier. So yeah you need more here. But
>> also since you only need a read barrier on one side, and a write
>> barrier on the other, you don't actually need a cpu barriers on x86.
>> And READ_ONCE gives you the compiler barrier on one side at least, I
>> haven't found it on the writer side yet.
>>
>>> But yes a comment would be really nice here. I had to think for a while
>>> why we don't need this as well.
>> I'm typing a patch, which after a night's sleep I realized has the
>> wrong barriers. And now I'm also typing some doc improvements for
>> drm_sched_entity and related functions.
>>
>>> Christian.
>>>
>>>> -Daniel
>>>>
>>>>> Christian.
>>>>>
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Regards
>>>>>>> Christian.
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Also improve the kerneldoc for this.
>>>>>>>>>>
>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>>>>       drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>>>>       drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>>>>       drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>>>>       drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>>>>       drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>>>>       drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>>>>       drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>>>>       include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>>>>       10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>>>>           if (r)
>>>>>>>>>>                   goto error_unlock;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>>>>            * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>>>>            * added to BOs.
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>>>>           if (r)
>>>>>>>>>>                   return r;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>           amdgpu_job_free_resources(job);
>>>>>>>>>>           drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>>>>           if (ret)
>>>>>>>>>>                   goto out_unlock;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>>>>> +
>>>>>>>>>>           submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>>>>           submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>>>>                                                   submit->out_fence, 0,
>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>>>>                   return err;
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>>>>> +
>>>>>>>>>>           task->num_bos = num_bos;
>>>>>>>>>>           task->vm = lima_vm_get(vm);
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>>>>                   goto unlock;
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>
>>>>>>>>>>           ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>>>>        * @sched_job: job to submit
>>>>>>>>>>        * @entity: scheduler entity
>>>>>>>>>>        *
>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>>>>> - * the job's fence sequence number this function should be
>>>>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>>>>> + * under common lock.
>>>>>>>>>>        *
>>>>>>>>>>        * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>        */
>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>>>>        *
>>>>>>>>>>        * Free up the fence memory after the RCU grace period.
>>>>>>>>>>        */
>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>       {
>>>>>>>>>>           struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>>>>           struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>>>>       }
>>>>>>>>>>       EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>>>>
>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>> -                                            void *owner)
>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>>>>> +                                           void *owner)
>>>>>>>>>>       {
>>>>>>>>>>           struct drm_sched_fence *fence = NULL;
>>>>>>>>>> -     unsigned seq;
>>>>>>>>>>
>>>>>>>>>>           fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>>>>           if (fence == NULL)
>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>           fence->sched = entity->rq->sched;
>>>>>>>>>>           spin_lock_init(&fence->lock);
>>>>>>>>>>
>>>>>>>>>> +     return fence;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>>>>> +{
>>>>>>>>>> +     unsigned seq;
>>>>>>>>>> +
>>>>>>>>>>           seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>>>>           dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>>>>                          &fence->lock, entity->fence_context, seq);
>>>>>>>>>>           dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>>>>                          &fence->lock, entity->fence_context + 1, seq);
>>>>>>>>>> -
>>>>>>>>>> -     return fence;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>       module_init(drm_sched_fence_slab_init);
>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>>>>       #include <linux/wait.h>
>>>>>>>>>>       #include <linux/sched.h>
>>>>>>>>>>       #include <linux/completion.h>
>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>       #include <uapi/linux/sched/types.h>
>>>>>>>>>>
>>>>>>>>>>       #include <drm/drm_print.h>
>>>>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>>>>       #include <drm/gpu_scheduler.h>
>>>>>>>>>>       #include <drm/spsc_queue.h>
>>>>>>>>>>
>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>
>>>>>>>>>>       /**
>>>>>>>>>>        * drm_sched_job_init - init a scheduler job
>>>>>>>>>> - *
>>>>>>>>>>        * @job: scheduler job to init
>>>>>>>>>>        * @entity: scheduler entity to use
>>>>>>>>>>        * @owner: job owner for debugging
>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>        * Refer to drm_sched_entity_push_job() documentation
>>>>>>>>>>        * for locking considerations.
>>>>>>>>>>        *
>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>>>>> + *
>>>>>>>>>>        * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>        */
>>>>>>>>>>       int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>           job->sched = sched;
>>>>>>>>>>           job->entity = entity;
>>>>>>>>>>           job->s_priority = entity->rq - sched->sched_rq;
>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>>>>           if (!job->s_fence)
>>>>>>>>>>                   return -ENOMEM;
>>>>>>>>>>           job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>       EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>>>>
>>>>>>>>>>       /**
>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>>>>> + * @job: scheduler job to arm
>>>>>>>>>> + *
>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>>>>> + *
>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>>>>> + * considerations.
>>>>>>>>>>        *
>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>>>>> + */
>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>>>>> +{
>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>        * @job: scheduler job to clean up
>>>>>>>>>> + *
>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>>>>> + *
>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>>>>> + *
>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>>>>> + * scheduler, and this function should be called from the
>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>>>>        */
>>>>>>>>>>       void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>>>>       {
>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>>>>> +     } else {
>>>>>>>>>> +             /* aborted job before committing to run it */
>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>>           job->s_fence = NULL;
>>>>>>>>>>       }
>>>>>>>>>>       EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>>>>           if (ret)
>>>>>>>>>>                   return ret;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>
>>>>>>>>>>           /* put by scheduler job completion */
>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>>>       int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>                          struct drm_sched_entity *entity,
>>>>>>>>>>                          void *owner);
>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>>>>       void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>>>>                                       struct drm_gpu_scheduler **sched_list,
>>>>>>>>>>                                          unsigned int num_sched_list);
>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>>>>                                      enum drm_sched_priority priority);
>>>>>>>>>>       bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>>>>
>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>>>>           struct drm_sched_entity *s_entity, void *owner);
>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>>>>> +
>>>>>>>>>>       void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>>>>       void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>>>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08  7:53                       ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08  7:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

Am 08.07.21 um 09:19 schrieb Daniel Vetter:
> On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
>>> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
>>>> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
>>>>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
>>>>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>>>>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>>>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>>>>>>> This is a very confusingly named function, because not just does it
>>>>>>>>>> init an object, it arms it and provides a point of no return for
>>>>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>>>>>>> clearer in the interface.
>>>>>>>>>>
>>>>>>>>>> But the real reason is that I want to push the dependency tracking
>>>>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>>>>>>> must be called a lot earlier, without arming the job.
>>>>>>>>>>
>>>>>>>>>> v2:
>>>>>>>>>> - don't change .gitignore (Steven)
>>>>>>>>>> - don't forget v3d (Emma)
>>>>>>>>>>
>>>>>>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>>>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>>>>>>> subsequent driver patches. To be able to fix this change
>>>>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>>>>>>> after drm_sched_job_arm().
>>>>>>>>> Thinking more about this, I'm not sure if this really works.
>>>>>>>>>
>>>>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>>>>>>> to update the entity->rq association.
>>>>>>>>>
>>>>>>>>> And that can only be done later on when we arm the fence as well.
>>>>>>>> Hm yeah, but that's a bug in the existing code I think: We already
>>>>>>>> fail to clean up if we fail to allocate the fences. So I think the
>>>>>>>> right thing to do here is to split the checks into job_init, and do
>>>>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>>>>>>> what's all going on there, the first check looks a bit like trying to
>>>>>>>> schedule before the entity is set up, which is a driver bug and should
>>>>>>>> have a WARN_ON?
>>>>>>> No you misunderstood me, the problem is something else.
>>>>>>>
>>>>>>> You asked previously why the call to drm_sched_job_init() was so late in
>>>>>>> the CS.
>>>>>>>
>>>>>>> The reason for this was not alone the scheduler fence init, but also the
>>>>>>> call to drm_sched_entity_select_rq().
>>>>>> Ah ok, I think I can fix that. Needs a prep patch to first make
>>>>>> drm_sched_entity_select infallible, then should be easy to do.
>>>>>>
>>>>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>>>>>>> even trying to do.
>>>>>>> You mean that here?
>>>>>>>
>>>>>>>             fence = READ_ONCE(entity->last_scheduled);
>>>>>>>             if (fence && !dma_fence_is_signaled(fence))
>>>>>>>                     return;
>>>>>>>
>>>>>>> This makes sure that load balancing is not moving the entity to a
>>>>>>> different scheduler while there are still jobs running from this entity
>>>>>>> on the hardware,
>>>>>> Yeah after a nap that idea crossed my mind too. But now I have locking
>>>>>> questions, afaiui the scheduler thread updates this, without taking
>>>>>> any locks - entity dequeuing is lockless. And here we read the fence
>>>>>> and then seem to yolo check whether it's signalled? What's preventing
>>>>>> a use-after-free here? There's no rcu or anything going on here at
>>>>>> all, and it's outside of the spinlock section, which starts a bit
>>>>>> further down.
>>>>> The last_scheduled fence of an entity can only change when there are
>>>>> jobs on the entities queued, and we have just ruled that out in the
>>>>> check before.
>>>> There aren't any barriers, so the cpu could easily run the two checks
>>>> the other way round. I'll ponder this and figure out where exactly we
>>>> need docs for the constraint and/or barriers to make this work as
>>>> intended. As-is I'm not seeing how it does ...
>>> spsc_queue_count() provides the necessary barrier with the atomic_read().
>> atomic_t is fully unordered, except when it's a read-modify-write
> Wasn't awake yet, I think the rule is read-modify-write and return
> previous value gives you full barrier. So stuff like cmpxchg, but also
> a few others. See atomic_t.txt under ODERING heading (yes that
> maintainer refuses to accept .rst so I can't just link you to the
> right section, it's silly). get/set and even RMW atomic ops that don't
> return anything are all fully unordered.

As far as I know that not completely correct. The rules around atomics i 
once learned are:

1. Everything which modifies something is a write barrier.
2. Everything which returns something is a read barrier.

And I know a whole bunch of use cases where this is relied upon in the 
core kernel, so I'm pretty sure that's correct.

In this case the write barrier is the atomic_dec() in spsc_queue_pop() 
and the read barrier is the aromic_read() in spsc_queue_count().

The READ_ONCE() is actually not even necessary as far as I can see.

Christian.

> -Daniel
>
>
>> atomic op, then it's a full barrier. So yeah you need more here. But
>> also since you only need a read barrier on one side, and a write
>> barrier on the other, you don't actually need a cpu barriers on x86.
>> And READ_ONCE gives you the compiler barrier on one side at least, I
>> haven't found it on the writer side yet.
>>
>>> But yes a comment would be really nice here. I had to think for a while
>>> why we don't need this as well.
>> I'm typing a patch, which after a night's sleep I realized has the
>> wrong barriers. And now I'm also typing some doc improvements for
>> drm_sched_entity and related functions.
>>
>>> Christian.
>>>
>>>> -Daniel
>>>>
>>>>> Christian.
>>>>>
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Regards
>>>>>>> Christian.
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Also improve the kerneldoc for this.
>>>>>>>>>>
>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>>>>       drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>>>>       drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>>>>       drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>>>>       drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>>>>       drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>>>>       drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>>>>       drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>>>>       include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>>>>       10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>>>>           if (r)
>>>>>>>>>>                   goto error_unlock;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>>>>            * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>>>>            * added to BOs.
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>>>>           if (r)
>>>>>>>>>>                   return r;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>           amdgpu_job_free_resources(job);
>>>>>>>>>>           drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>>>>           if (ret)
>>>>>>>>>>                   goto out_unlock;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>>>>> +
>>>>>>>>>>           submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>>>>           submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>>>>                                                   submit->out_fence, 0,
>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>>>>                   return err;
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>>>>> +
>>>>>>>>>>           task->num_bos = num_bos;
>>>>>>>>>>           task->vm = lima_vm_get(vm);
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>>>>                   goto unlock;
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>
>>>>>>>>>>           ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>>>>        * @sched_job: job to submit
>>>>>>>>>>        * @entity: scheduler entity
>>>>>>>>>>        *
>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>>>>> - * the job's fence sequence number this function should be
>>>>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>>>>> + * under common lock.
>>>>>>>>>>        *
>>>>>>>>>>        * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>        */
>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>>>>        *
>>>>>>>>>>        * Free up the fence memory after the RCU grace period.
>>>>>>>>>>        */
>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>       {
>>>>>>>>>>           struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>>>>           struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>>>>       }
>>>>>>>>>>       EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>>>>
>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>> -                                            void *owner)
>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>>>>> +                                           void *owner)
>>>>>>>>>>       {
>>>>>>>>>>           struct drm_sched_fence *fence = NULL;
>>>>>>>>>> -     unsigned seq;
>>>>>>>>>>
>>>>>>>>>>           fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>>>>           if (fence == NULL)
>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>           fence->sched = entity->rq->sched;
>>>>>>>>>>           spin_lock_init(&fence->lock);
>>>>>>>>>>
>>>>>>>>>> +     return fence;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>>>>> +{
>>>>>>>>>> +     unsigned seq;
>>>>>>>>>> +
>>>>>>>>>>           seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>>>>           dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>>>>                          &fence->lock, entity->fence_context, seq);
>>>>>>>>>>           dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>>>>                          &fence->lock, entity->fence_context + 1, seq);
>>>>>>>>>> -
>>>>>>>>>> -     return fence;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>       module_init(drm_sched_fence_slab_init);
>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>>>>       #include <linux/wait.h>
>>>>>>>>>>       #include <linux/sched.h>
>>>>>>>>>>       #include <linux/completion.h>
>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>       #include <uapi/linux/sched/types.h>
>>>>>>>>>>
>>>>>>>>>>       #include <drm/drm_print.h>
>>>>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>>>>       #include <drm/gpu_scheduler.h>
>>>>>>>>>>       #include <drm/spsc_queue.h>
>>>>>>>>>>
>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>
>>>>>>>>>>       /**
>>>>>>>>>>        * drm_sched_job_init - init a scheduler job
>>>>>>>>>> - *
>>>>>>>>>>        * @job: scheduler job to init
>>>>>>>>>>        * @entity: scheduler entity to use
>>>>>>>>>>        * @owner: job owner for debugging
>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>        * Refer to drm_sched_entity_push_job() documentation
>>>>>>>>>>        * for locking considerations.
>>>>>>>>>>        *
>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>>>>> + *
>>>>>>>>>>        * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>        */
>>>>>>>>>>       int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>           job->sched = sched;
>>>>>>>>>>           job->entity = entity;
>>>>>>>>>>           job->s_priority = entity->rq - sched->sched_rq;
>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>>>>           if (!job->s_fence)
>>>>>>>>>>                   return -ENOMEM;
>>>>>>>>>>           job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>       EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>>>>
>>>>>>>>>>       /**
>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>>>>> + * @job: scheduler job to arm
>>>>>>>>>> + *
>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>>>>> + *
>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>>>>> + * considerations.
>>>>>>>>>>        *
>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>>>>> + */
>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>>>>> +{
>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>>>>> +}
>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>        * @job: scheduler job to clean up
>>>>>>>>>> + *
>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>>>>> + *
>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>>>>> + *
>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>>>>> + * scheduler, and this function should be called from the
>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>>>>        */
>>>>>>>>>>       void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>>>>       {
>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>>>>> +     } else {
>>>>>>>>>> +             /* aborted job before committing to run it */
>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>>           job->s_fence = NULL;
>>>>>>>>>>       }
>>>>>>>>>>       EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>>>>           if (ret)
>>>>>>>>>>                   return ret;
>>>>>>>>>>
>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>> +
>>>>>>>>>>           job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>
>>>>>>>>>>           /* put by scheduler job completion */
>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>>>       int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>                          struct drm_sched_entity *entity,
>>>>>>>>>>                          void *owner);
>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>>>>       void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>>>>                                       struct drm_gpu_scheduler **sched_list,
>>>>>>>>>>                                          unsigned int num_sched_list);
>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>>>>                                      enum drm_sched_priority priority);
>>>>>>>>>>       bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>>>>
>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>>>>           struct drm_sched_entity *s_entity, void *owner);
>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>>>>> +
>>>>>>>>>>       void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>>>>       void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>>>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08  7:53                       ` Christian König
@ 2021-07-08 10:02                         ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08 10:02 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, DRI Development, Steven Price, Daniel Vetter,
	Lucas Stach, Russell King, Christian Gmeiner, Qiang Yu,
	Rob Herring, Tomeu Vizoso, Alyssa Rosenzweig, David Airlie,
	Sumit Semwal, Masahiro Yamada, Kees Cook, Adam Borowski,
	Nick Terrell, Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen,
	Viresh Kumar, Alex Deucher, Dave Airlie, Nirmoy Das,
	Deepak R Varma, Lee Jones, Kevin Wang, Chen Li, Luben Tuikov,
	Marek Olšák, Dennis Li, Maarten Lankhorst,
	Andrey Grodzovsky, Sonny Jiang, Boris Brezillon, Tian Tao,
	Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Thu, Jul 08, 2021 at 09:53:00AM +0200, Christian König wrote:
> Am 08.07.21 um 09:19 schrieb Daniel Vetter:
> > On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
> > > > Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> > > > > On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> > > > > > Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> > > > > > > On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> > > > > > > > Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> > > > > > > > > On Wed, Jul 7, 2021 at 11:30 AM Christian König
> > > > > > > > > <christian.koenig@amd.com> wrote:
> > > > > > > > > > Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > > > > > > > > > > This is a very confusingly named function, because not just does it
> > > > > > > > > > > init an object, it arms it and provides a point of no return for
> > > > > > > > > > > pushing a job into the scheduler. It would be nice if that's a bit
> > > > > > > > > > > clearer in the interface.
> > > > > > > > > > > 
> > > > > > > > > > > But the real reason is that I want to push the dependency tracking
> > > > > > > > > > > helpers into the scheduler code, and that means drm_sched_job_init
> > > > > > > > > > > must be called a lot earlier, without arming the job.
> > > > > > > > > > > 
> > > > > > > > > > > v2:
> > > > > > > > > > > - don't change .gitignore (Steven)
> > > > > > > > > > > - don't forget v3d (Emma)
> > > > > > > > > > > 
> > > > > > > > > > > v3: Emma noticed that I leak the memory allocated in
> > > > > > > > > > > drm_sched_job_init if we bail out before the point of no return in
> > > > > > > > > > > subsequent driver patches. To be able to fix this change
> > > > > > > > > > > drm_sched_job_cleanup() so it can handle being called both before and
> > > > > > > > > > > after drm_sched_job_arm().
> > > > > > > > > > Thinking more about this, I'm not sure if this really works.
> > > > > > > > > > 
> > > > > > > > > > See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> > > > > > > > > > to update the entity->rq association.
> > > > > > > > > > 
> > > > > > > > > > And that can only be done later on when we arm the fence as well.
> > > > > > > > > Hm yeah, but that's a bug in the existing code I think: We already
> > > > > > > > > fail to clean up if we fail to allocate the fences. So I think the
> > > > > > > > > right thing to do here is to split the checks into job_init, and do
> > > > > > > > > the actual arming/rq selection in job_arm? I'm not entirely sure
> > > > > > > > > what's all going on there, the first check looks a bit like trying to
> > > > > > > > > schedule before the entity is set up, which is a driver bug and should
> > > > > > > > > have a WARN_ON?
> > > > > > > > No you misunderstood me, the problem is something else.
> > > > > > > > 
> > > > > > > > You asked previously why the call to drm_sched_job_init() was so late in
> > > > > > > > the CS.
> > > > > > > > 
> > > > > > > > The reason for this was not alone the scheduler fence init, but also the
> > > > > > > > call to drm_sched_entity_select_rq().
> > > > > > > Ah ok, I think I can fix that. Needs a prep patch to first make
> > > > > > > drm_sched_entity_select infallible, then should be easy to do.
> > > > > > > 
> > > > > > > > > The 2nd check around last_scheduled I have honeslty no idea what it's
> > > > > > > > > even trying to do.
> > > > > > > > You mean that here?
> > > > > > > > 
> > > > > > > >             fence = READ_ONCE(entity->last_scheduled);
> > > > > > > >             if (fence && !dma_fence_is_signaled(fence))
> > > > > > > >                     return;
> > > > > > > > 
> > > > > > > > This makes sure that load balancing is not moving the entity to a
> > > > > > > > different scheduler while there are still jobs running from this entity
> > > > > > > > on the hardware,
> > > > > > > Yeah after a nap that idea crossed my mind too. But now I have locking
> > > > > > > questions, afaiui the scheduler thread updates this, without taking
> > > > > > > any locks - entity dequeuing is lockless. And here we read the fence
> > > > > > > and then seem to yolo check whether it's signalled? What's preventing
> > > > > > > a use-after-free here? There's no rcu or anything going on here at
> > > > > > > all, and it's outside of the spinlock section, which starts a bit
> > > > > > > further down.
> > > > > > The last_scheduled fence of an entity can only change when there are
> > > > > > jobs on the entities queued, and we have just ruled that out in the
> > > > > > check before.
> > > > > There aren't any barriers, so the cpu could easily run the two checks
> > > > > the other way round. I'll ponder this and figure out where exactly we
> > > > > need docs for the constraint and/or barriers to make this work as
> > > > > intended. As-is I'm not seeing how it does ...
> > > > spsc_queue_count() provides the necessary barrier with the atomic_read().
> > > atomic_t is fully unordered, except when it's a read-modify-write
> > Wasn't awake yet, I think the rule is read-modify-write and return
> > previous value gives you full barrier. So stuff like cmpxchg, but also
> > a few others. See atomic_t.txt under ODERING heading (yes that
> > maintainer refuses to accept .rst so I can't just link you to the
> > right section, it's silly). get/set and even RMW atomic ops that don't
> > return anything are all fully unordered.
> 
> As far as I know that not completely correct. The rules around atomics i
> once learned are:
> 
> 1. Everything which modifies something is a write barrier.
> 2. Everything which returns something is a read barrier.
> 
> And I know a whole bunch of use cases where this is relied upon in the core
> kernel, so I'm pretty sure that's correct.

That's against what the doc says, and also it would mean stuff like
atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.

On x86 you're right, anywhere else where there's no total store ordering I
you're wrong.

If there's code that relies on this it needs to be fixed and properly
documented. I did go through the squeue code a bit, and might be better to
just replace this with a core data structure.
-Daniel

> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
> the read barrier is the aromic_read() in spsc_queue_count().
> 
> The READ_ONCE() is actually not even necessary as far as I can see.
> 
> Christian.
> 
> > -Daniel
> > 
> > 
> > > atomic op, then it's a full barrier. So yeah you need more here. But
> > > also since you only need a read barrier on one side, and a write
> > > barrier on the other, you don't actually need a cpu barriers on x86.
> > > And READ_ONCE gives you the compiler barrier on one side at least, I
> > > haven't found it on the writer side yet.
> > > 
> > > > But yes a comment would be really nice here. I had to think for a while
> > > > why we don't need this as well.
> > > I'm typing a patch, which after a night's sleep I realized has the
> > > wrong barriers. And now I'm also typing some doc improvements for
> > > drm_sched_entity and related functions.
> > > 
> > > > Christian.
> > > > 
> > > > > -Daniel
> > > > > 
> > > > > > Christian.
> > > > > > 
> > > > > > 
> > > > > > > -Daniel
> > > > > > > 
> > > > > > > > Regards
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > > -Daniel
> > > > > > > > > 
> > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > > Also improve the kerneldoc for this.
> > > > > > > > > > > 
> > > > > > > > > > > Acked-by: Steven Price <steven.price@arm.com> (v2)
> > > > > > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > > > > > > > > Cc: Lucas Stach <l.stach@pengutronix.de>
> > > > > > > > > > > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > > > > > > > > > > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > > > > > > > > > > Cc: Qiang Yu <yuq825@gmail.com>
> > > > > > > > > > > Cc: Rob Herring <robh@kernel.org>
> > > > > > > > > > > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > > > > > > > > > > Cc: Steven Price <steven.price@arm.com>
> > > > > > > > > > > Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > > > > > > > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > > > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > > > > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > > > > > > > > Cc: "Christian König" <christian.koenig@amd.com>
> > > > > > > > > > > Cc: Masahiro Yamada <masahiroy@kernel.org>
> > > > > > > > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > > > > > > > Cc: Adam Borowski <kilobyte@angband.pl>
> > > > > > > > > > > Cc: Nick Terrell <terrelln@fb.com>
> > > > > > > > > > > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > > > > > > > Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > > > > > > > > Cc: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > > > > > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > > > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > > > > > > > > Cc: Nirmoy Das <nirmoy.das@amd.com>
> > > > > > > > > > > Cc: Deepak R Varma <mh12gx2825@gmail.com>
> > > > > > > > > > > Cc: Lee Jones <lee.jones@linaro.org>
> > > > > > > > > > > Cc: Kevin Wang <kevin1.wang@amd.com>
> > > > > > > > > > > Cc: Chen Li <chenli@uniontech.com>
> > > > > > > > > > > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > > > > > > > > > > Cc: "Marek Olšák" <marek.olsak@amd.com>
> > > > > > > > > > > Cc: Dennis Li <Dennis.Li@amd.com>
> > > > > > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > > > > > > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > > > > > > > Cc: Sonny Jiang <sonny.jiang@amd.com>
> > > > > > > > > > > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > > > > > > > Cc: Tian Tao <tiantao6@hisilicon.com>
> > > > > > > > > > > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > > > > > > > > > > Cc: etnaviv@lists.freedesktop.org
> > > > > > > > > > > Cc: lima@lists.freedesktop.org
> > > > > > > > > > > Cc: linux-media@vger.kernel.org
> > > > > > > > > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > > > > > > > > Cc: Emma Anholt <emma@anholt.net>
> > > > > > > > > > > ---
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> > > > > > > > > > >       drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> > > > > > > > > > >       include/drm/gpu_scheduler.h              |  7 +++-
> > > > > > > > > > >       10 files changed, 74 insertions(+), 14 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > > > index c5386d13eb4a..a4ec092af9a7 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > > > @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > > > > > > > > > >           if (r)
> > > > > > > > > > >                   goto error_unlock;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           /* No memory allocation is allowed while holding the notifier lock.
> > > > > > > > > > >            * The lock is held until amdgpu_cs_submit is finished and fence is
> > > > > > > > > > >            * added to BOs.
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > index d33e6d97cc89..5ddb955d2315 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> > > > > > > > > > >           if (r)
> > > > > > > > > > >                   return r;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           *f = dma_fence_get(&job->base.s_fence->finished);
> > > > > > > > > > >           amdgpu_job_free_resources(job);
> > > > > > > > > > >           drm_sched_entity_push_job(&job->base, entity);
> > > > > > > > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > > > > > > > > index feb6da1b6ceb..05f412204118 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > > > > > > > > @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > > > > > > > > >           if (ret)
> > > > > > > > > > >                   goto out_unlock;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&submit->sched_job);
> > > > > > > > > > > +
> > > > > > > > > > >           submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > > > > > > > > > >           submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> > > > > > > > > > >                                                   submit->out_fence, 0,
> > > > > > > > > > > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > > > > > > > > > > index dba8329937a3..38f755580507 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/lima/lima_sched.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > > > > > > > > > > @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> > > > > > > > > > >                   return err;
> > > > > > > > > > >           }
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&task->base);
> > > > > > > > > > > +
> > > > > > > > > > >           task->num_bos = num_bos;
> > > > > > > > > > >           task->vm = lima_vm_get(vm);
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > > > > > > > > > > index 71a72fb50e6b..2992dc85325f 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > > > > > > > > > > @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> > > > > > > > > > >                   goto unlock;
> > > > > > > > > > >           }
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> > > > > > > > > > > 
> > > > > > > > > > >           ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > > > > > > index 79554aa4dbb1..f7347c284886 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > > > > > > @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> > > > > > > > > > >        * @sched_job: job to submit
> > > > > > > > > > >        * @entity: scheduler entity
> > > > > > > > > > >        *
> > > > > > > > > > > - * Note: To guarantee that the order of insertion to queue matches
> > > > > > > > > > > - * the job's fence sequence number this function should be
> > > > > > > > > > > - * called with drm_sched_job_init under common lock.
> > > > > > > > > > > + * Note: To guarantee that the order of insertion to queue matches the job's
> > > > > > > > > > > + * fence sequence number this function should be called with drm_sched_job_arm()
> > > > > > > > > > > + * under common lock.
> > > > > > > > > > >        *
> > > > > > > > > > >        * Returns 0 for success, negative error code otherwise.
> > > > > > > > > > >        */
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > > > > > > > > > > index 69de2c76731f..c451ee9a30d7 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > > > > > > > > > > @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> > > > > > > > > > >        *
> > > > > > > > > > >        * Free up the fence memory after the RCU grace period.
> > > > > > > > > > >        */
> > > > > > > > > > > -static void drm_sched_fence_free(struct rcu_head *rcu)
> > > > > > > > > > > +void drm_sched_fence_free(struct rcu_head *rcu)
> > > > > > > > > > >       {
> > > > > > > > > > >           struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> > > > > > > > > > >           struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > > > > > > > > > > @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> > > > > > > > > > >       }
> > > > > > > > > > >       EXPORT_SYMBOL(to_drm_sched_fence);
> > > > > > > > > > > 
> > > > > > > > > > > -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > > > > > > > > > > -                                            void *owner)
> > > > > > > > > > > +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> > > > > > > > > > > +                                           void *owner)
> > > > > > > > > > >       {
> > > > > > > > > > >           struct drm_sched_fence *fence = NULL;
> > > > > > > > > > > -     unsigned seq;
> > > > > > > > > > > 
> > > > > > > > > > >           fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> > > > > > > > > > >           if (fence == NULL)
> > > > > > > > > > > @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > > > > > > > > > >           fence->sched = entity->rq->sched;
> > > > > > > > > > >           spin_lock_init(&fence->lock);
> > > > > > > > > > > 
> > > > > > > > > > > +     return fence;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > > > > > > > > > > +                       struct drm_sched_entity *entity)
> > > > > > > > > > > +{
> > > > > > > > > > > +     unsigned seq;
> > > > > > > > > > > +
> > > > > > > > > > >           seq = atomic_inc_return(&entity->fence_seq);
> > > > > > > > > > >           dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> > > > > > > > > > >                          &fence->lock, entity->fence_context, seq);
> > > > > > > > > > >           dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> > > > > > > > > > >                          &fence->lock, entity->fence_context + 1, seq);
> > > > > > > > > > > -
> > > > > > > > > > > -     return fence;
> > > > > > > > > > >       }
> > > > > > > > > > > 
> > > > > > > > > > >       module_init(drm_sched_fence_slab_init);
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > index 33c414d55fab..5e84e1500c32 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > @@ -48,9 +48,11 @@
> > > > > > > > > > >       #include <linux/wait.h>
> > > > > > > > > > >       #include <linux/sched.h>
> > > > > > > > > > >       #include <linux/completion.h>
> > > > > > > > > > > +#include <linux/dma-resv.h>
> > > > > > > > > > >       #include <uapi/linux/sched/types.h>
> > > > > > > > > > > 
> > > > > > > > > > >       #include <drm/drm_print.h>
> > > > > > > > > > > +#include <drm/drm_gem.h>
> > > > > > > > > > >       #include <drm/gpu_scheduler.h>
> > > > > > > > > > >       #include <drm/spsc_queue.h>
> > > > > > > > > > > 
> > > > > > > > > > > @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > > > > > > > > > > 
> > > > > > > > > > >       /**
> > > > > > > > > > >        * drm_sched_job_init - init a scheduler job
> > > > > > > > > > > - *
> > > > > > > > > > >        * @job: scheduler job to init
> > > > > > > > > > >        * @entity: scheduler entity to use
> > > > > > > > > > >        * @owner: job owner for debugging
> > > > > > > > > > > @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > > > > > > > > > >        * Refer to drm_sched_entity_push_job() documentation
> > > > > > > > > > >        * for locking considerations.
> > > > > > > > > > >        *
> > > > > > > > > > > + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> > > > > > > > > > > + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> > > > > > > > > > > + *
> > > > > > > > > > >        * Returns 0 for success, negative error code otherwise.
> > > > > > > > > > >        */
> > > > > > > > > > >       int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > > @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > >           job->sched = sched;
> > > > > > > > > > >           job->entity = entity;
> > > > > > > > > > >           job->s_priority = entity->rq - sched->sched_rq;
> > > > > > > > > > > -     job->s_fence = drm_sched_fence_create(entity, owner);
> > > > > > > > > > > +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> > > > > > > > > > >           if (!job->s_fence)
> > > > > > > > > > >                   return -ENOMEM;
> > > > > > > > > > >           job->id = atomic64_inc_return(&sched->job_id_count);
> > > > > > > > > > > @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > >       EXPORT_SYMBOL(drm_sched_job_init);
> > > > > > > > > > > 
> > > > > > > > > > >       /**
> > > > > > > > > > > - * drm_sched_job_cleanup - clean up scheduler job resources
> > > > > > > > > > > + * drm_sched_job_arm - arm a scheduler job for execution
> > > > > > > > > > > + * @job: scheduler job to arm
> > > > > > > > > > > + *
> > > > > > > > > > > + * This arms a scheduler job for execution. Specifically it initializes the
> > > > > > > > > > > + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> > > > > > > > > > > + * or other places that need to track the completion of this job.
> > > > > > > > > > > + *
> > > > > > > > > > > + * Refer to drm_sched_entity_push_job() documentation for locking
> > > > > > > > > > > + * considerations.
> > > > > > > > > > >        *
> > > > > > > > > > > + * This can only be called if drm_sched_job_init() succeeded.
> > > > > > > > > > > + */
> > > > > > > > > > > +void drm_sched_job_arm(struct drm_sched_job *job)
> > > > > > > > > > > +{
> > > > > > > > > > > +     drm_sched_fence_init(job->s_fence, job->entity);
> > > > > > > > > > > +}
> > > > > > > > > > > +EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > > > > > > > +
> > > > > > > > > > > +/**
> > > > > > > > > > > + * drm_sched_job_cleanup - clean up scheduler job resources
> > > > > > > > > > >        * @job: scheduler job to clean up
> > > > > > > > > > > + *
> > > > > > > > > > > + * Cleans up the resources allocated with drm_sched_job_init().
> > > > > > > > > > > + *
> > > > > > > > > > > + * Drivers should call this from their error unwind code if @job is aborted
> > > > > > > > > > > + * before drm_sched_job_arm() is called.
> > > > > > > > > > > + *
> > > > > > > > > > > + * After that point of no return @job is committed to be executed by the
> > > > > > > > > > > + * scheduler, and this function should be called from the
> > > > > > > > > > > + * &drm_sched_backend_ops.free_job callback.
> > > > > > > > > > >        */
> > > > > > > > > > >       void drm_sched_job_cleanup(struct drm_sched_job *job)
> > > > > > > > > > >       {
> > > > > > > > > > > -     dma_fence_put(&job->s_fence->finished);
> > > > > > > > > > > +     if (!kref_read(&job->s_fence->finished.refcount)) {
> > > > > > > > > > > +             /* drm_sched_job_arm() has been called */
> > > > > > > > > > > +             dma_fence_put(&job->s_fence->finished);
> > > > > > > > > > > +     } else {
> > > > > > > > > > > +             /* aborted job before committing to run it */
> > > > > > > > > > > +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> > > > > > > > > > > +     }
> > > > > > > > > > > +
> > > > > > > > > > >           job->s_fence = NULL;
> > > > > > > > > > >       }
> > > > > > > > > > >       EXPORT_SYMBOL(drm_sched_job_cleanup);
> > > > > > > > > > > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> > > > > > > > > > > index 4eb354226972..5c3a99027ecd 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > > > > > > > > > > @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> > > > > > > > > > >           if (ret)
> > > > > > > > > > >                   return ret;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> > > > > > > > > > > 
> > > > > > > > > > >           /* put by scheduler job completion */
> > > > > > > > > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > > > > > > > > index 88ae7f331bb1..83afc3aa8e2f 100644
> > > > > > > > > > > --- a/include/drm/gpu_scheduler.h
> > > > > > > > > > > +++ b/include/drm/gpu_scheduler.h
> > > > > > > > > > > @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> > > > > > > > > > >       int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > >                          struct drm_sched_entity *entity,
> > > > > > > > > > >                          void *owner);
> > > > > > > > > > > +void drm_sched_job_arm(struct drm_sched_job *job);
> > > > > > > > > > >       void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > > > > > > > > > >                                       struct drm_gpu_scheduler **sched_list,
> > > > > > > > > > >                                          unsigned int num_sched_list);
> > > > > > > > > > > @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> > > > > > > > > > >                                      enum drm_sched_priority priority);
> > > > > > > > > > >       bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> > > > > > > > > > > 
> > > > > > > > > > > -struct drm_sched_fence *drm_sched_fence_create(
> > > > > > > > > > > +struct drm_sched_fence *drm_sched_fence_alloc(
> > > > > > > > > > >           struct drm_sched_entity *s_entity, void *owner);
> > > > > > > > > > > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > > > > > > > > > > +                       struct drm_sched_entity *entity);
> > > > > > > > > > > +void drm_sched_fence_free(struct rcu_head *rcu);
> > > > > > > > > > > +
> > > > > > > > > > >       void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> > > > > > > > > > >       void drm_sched_fence_finished(struct drm_sched_fence *fence);
> > > > > > > > > > > 
> > > 
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0
> > 
> > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08 10:02                         ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08 10:02 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Daniel Vetter,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Viresh Kumar, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kevin Wang, Kees Cook, Marek Olšák,
	Russell King, The etnaviv authors,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Nick Terrell,
	Deepak R Varma, Tomeu Vizoso, Boris Brezillon, Qiang Yu,
	Alex Deucher, Tian Tao, open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 08, 2021 at 09:53:00AM +0200, Christian König wrote:
> Am 08.07.21 um 09:19 schrieb Daniel Vetter:
> > On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
> > > > Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> > > > > On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> > > > > > Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> > > > > > > On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> > > > > > > > Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> > > > > > > > > On Wed, Jul 7, 2021 at 11:30 AM Christian König
> > > > > > > > > <christian.koenig@amd.com> wrote:
> > > > > > > > > > Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> > > > > > > > > > > This is a very confusingly named function, because not just does it
> > > > > > > > > > > init an object, it arms it and provides a point of no return for
> > > > > > > > > > > pushing a job into the scheduler. It would be nice if that's a bit
> > > > > > > > > > > clearer in the interface.
> > > > > > > > > > > 
> > > > > > > > > > > But the real reason is that I want to push the dependency tracking
> > > > > > > > > > > helpers into the scheduler code, and that means drm_sched_job_init
> > > > > > > > > > > must be called a lot earlier, without arming the job.
> > > > > > > > > > > 
> > > > > > > > > > > v2:
> > > > > > > > > > > - don't change .gitignore (Steven)
> > > > > > > > > > > - don't forget v3d (Emma)
> > > > > > > > > > > 
> > > > > > > > > > > v3: Emma noticed that I leak the memory allocated in
> > > > > > > > > > > drm_sched_job_init if we bail out before the point of no return in
> > > > > > > > > > > subsequent driver patches. To be able to fix this change
> > > > > > > > > > > drm_sched_job_cleanup() so it can handle being called both before and
> > > > > > > > > > > after drm_sched_job_arm().
> > > > > > > > > > Thinking more about this, I'm not sure if this really works.
> > > > > > > > > > 
> > > > > > > > > > See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> > > > > > > > > > to update the entity->rq association.
> > > > > > > > > > 
> > > > > > > > > > And that can only be done later on when we arm the fence as well.
> > > > > > > > > Hm yeah, but that's a bug in the existing code I think: We already
> > > > > > > > > fail to clean up if we fail to allocate the fences. So I think the
> > > > > > > > > right thing to do here is to split the checks into job_init, and do
> > > > > > > > > the actual arming/rq selection in job_arm? I'm not entirely sure
> > > > > > > > > what's all going on there, the first check looks a bit like trying to
> > > > > > > > > schedule before the entity is set up, which is a driver bug and should
> > > > > > > > > have a WARN_ON?
> > > > > > > > No you misunderstood me, the problem is something else.
> > > > > > > > 
> > > > > > > > You asked previously why the call to drm_sched_job_init() was so late in
> > > > > > > > the CS.
> > > > > > > > 
> > > > > > > > The reason for this was not alone the scheduler fence init, but also the
> > > > > > > > call to drm_sched_entity_select_rq().
> > > > > > > Ah ok, I think I can fix that. Needs a prep patch to first make
> > > > > > > drm_sched_entity_select infallible, then should be easy to do.
> > > > > > > 
> > > > > > > > > The 2nd check around last_scheduled I have honeslty no idea what it's
> > > > > > > > > even trying to do.
> > > > > > > > You mean that here?
> > > > > > > > 
> > > > > > > >             fence = READ_ONCE(entity->last_scheduled);
> > > > > > > >             if (fence && !dma_fence_is_signaled(fence))
> > > > > > > >                     return;
> > > > > > > > 
> > > > > > > > This makes sure that load balancing is not moving the entity to a
> > > > > > > > different scheduler while there are still jobs running from this entity
> > > > > > > > on the hardware,
> > > > > > > Yeah after a nap that idea crossed my mind too. But now I have locking
> > > > > > > questions, afaiui the scheduler thread updates this, without taking
> > > > > > > any locks - entity dequeuing is lockless. And here we read the fence
> > > > > > > and then seem to yolo check whether it's signalled? What's preventing
> > > > > > > a use-after-free here? There's no rcu or anything going on here at
> > > > > > > all, and it's outside of the spinlock section, which starts a bit
> > > > > > > further down.
> > > > > > The last_scheduled fence of an entity can only change when there are
> > > > > > jobs on the entities queued, and we have just ruled that out in the
> > > > > > check before.
> > > > > There aren't any barriers, so the cpu could easily run the two checks
> > > > > the other way round. I'll ponder this and figure out where exactly we
> > > > > need docs for the constraint and/or barriers to make this work as
> > > > > intended. As-is I'm not seeing how it does ...
> > > > spsc_queue_count() provides the necessary barrier with the atomic_read().
> > > atomic_t is fully unordered, except when it's a read-modify-write
> > Wasn't awake yet, I think the rule is read-modify-write and return
> > previous value gives you full barrier. So stuff like cmpxchg, but also
> > a few others. See atomic_t.txt under ODERING heading (yes that
> > maintainer refuses to accept .rst so I can't just link you to the
> > right section, it's silly). get/set and even RMW atomic ops that don't
> > return anything are all fully unordered.
> 
> As far as I know that not completely correct. The rules around atomics i
> once learned are:
> 
> 1. Everything which modifies something is a write barrier.
> 2. Everything which returns something is a read barrier.
> 
> And I know a whole bunch of use cases where this is relied upon in the core
> kernel, so I'm pretty sure that's correct.

That's against what the doc says, and also it would mean stuff like
atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.

On x86 you're right, anywhere else where there's no total store ordering I
you're wrong.

If there's code that relies on this it needs to be fixed and properly
documented. I did go through the squeue code a bit, and might be better to
just replace this with a core data structure.
-Daniel

> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
> the read barrier is the aromic_read() in spsc_queue_count().
> 
> The READ_ONCE() is actually not even necessary as far as I can see.
> 
> Christian.
> 
> > -Daniel
> > 
> > 
> > > atomic op, then it's a full barrier. So yeah you need more here. But
> > > also since you only need a read barrier on one side, and a write
> > > barrier on the other, you don't actually need a cpu barriers on x86.
> > > And READ_ONCE gives you the compiler barrier on one side at least, I
> > > haven't found it on the writer side yet.
> > > 
> > > > But yes a comment would be really nice here. I had to think for a while
> > > > why we don't need this as well.
> > > I'm typing a patch, which after a night's sleep I realized has the
> > > wrong barriers. And now I'm also typing some doc improvements for
> > > drm_sched_entity and related functions.
> > > 
> > > > Christian.
> > > > 
> > > > > -Daniel
> > > > > 
> > > > > > Christian.
> > > > > > 
> > > > > > 
> > > > > > > -Daniel
> > > > > > > 
> > > > > > > > Regards
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > > -Daniel
> > > > > > > > > 
> > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > > Also improve the kerneldoc for this.
> > > > > > > > > > > 
> > > > > > > > > > > Acked-by: Steven Price <steven.price@arm.com> (v2)
> > > > > > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > > > > > > > > Cc: Lucas Stach <l.stach@pengutronix.de>
> > > > > > > > > > > Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> > > > > > > > > > > Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> > > > > > > > > > > Cc: Qiang Yu <yuq825@gmail.com>
> > > > > > > > > > > Cc: Rob Herring <robh@kernel.org>
> > > > > > > > > > > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > > > > > > > > > > Cc: Steven Price <steven.price@arm.com>
> > > > > > > > > > > Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> > > > > > > > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > > > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > > > > > > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > > > > > > > > > > Cc: "Christian König" <christian.koenig@amd.com>
> > > > > > > > > > > Cc: Masahiro Yamada <masahiroy@kernel.org>
> > > > > > > > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > > > > > > > Cc: Adam Borowski <kilobyte@angband.pl>
> > > > > > > > > > > Cc: Nick Terrell <terrelln@fb.com>
> > > > > > > > > > > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > > > > > > > Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> > > > > > > > > > > Cc: Sami Tolvanen <samitolvanen@google.com>
> > > > > > > > > > > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > > > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > > > > > > > > Cc: Nirmoy Das <nirmoy.das@amd.com>
> > > > > > > > > > > Cc: Deepak R Varma <mh12gx2825@gmail.com>
> > > > > > > > > > > Cc: Lee Jones <lee.jones@linaro.org>
> > > > > > > > > > > Cc: Kevin Wang <kevin1.wang@amd.com>
> > > > > > > > > > > Cc: Chen Li <chenli@uniontech.com>
> > > > > > > > > > > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > > > > > > > > > > Cc: "Marek Olšák" <marek.olsak@amd.com>
> > > > > > > > > > > Cc: Dennis Li <Dennis.Li@amd.com>
> > > > > > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > > > > > > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > > > > > > > Cc: Sonny Jiang <sonny.jiang@amd.com>
> > > > > > > > > > > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > > > > > > > Cc: Tian Tao <tiantao6@hisilicon.com>
> > > > > > > > > > > Cc: Jack Zhang <Jack.Zhang1@amd.com>
> > > > > > > > > > > Cc: etnaviv@lists.freedesktop.org
> > > > > > > > > > > Cc: lima@lists.freedesktop.org
> > > > > > > > > > > Cc: linux-media@vger.kernel.org
> > > > > > > > > > > Cc: linaro-mm-sig@lists.linaro.org
> > > > > > > > > > > Cc: Emma Anholt <emma@anholt.net>
> > > > > > > > > > > ---
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> > > > > > > > > > >       drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> > > > > > > > > > >       drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> > > > > > > > > > >       include/drm/gpu_scheduler.h              |  7 +++-
> > > > > > > > > > >       10 files changed, 74 insertions(+), 14 deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > > > index c5386d13eb4a..a4ec092af9a7 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > > > @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > > > > > > > > > >           if (r)
> > > > > > > > > > >                   goto error_unlock;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           /* No memory allocation is allowed while holding the notifier lock.
> > > > > > > > > > >            * The lock is held until amdgpu_cs_submit is finished and fence is
> > > > > > > > > > >            * added to BOs.
> > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > index d33e6d97cc89..5ddb955d2315 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > > > > > > > @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> > > > > > > > > > >           if (r)
> > > > > > > > > > >                   return r;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           *f = dma_fence_get(&job->base.s_fence->finished);
> > > > > > > > > > >           amdgpu_job_free_resources(job);
> > > > > > > > > > >           drm_sched_entity_push_job(&job->base, entity);
> > > > > > > > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > > > > > > > > index feb6da1b6ceb..05f412204118 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > > > > > > > > > > @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> > > > > > > > > > >           if (ret)
> > > > > > > > > > >                   goto out_unlock;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&submit->sched_job);
> > > > > > > > > > > +
> > > > > > > > > > >           submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> > > > > > > > > > >           submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> > > > > > > > > > >                                                   submit->out_fence, 0,
> > > > > > > > > > > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> > > > > > > > > > > index dba8329937a3..38f755580507 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/lima/lima_sched.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > > > > > > > > > > @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> > > > > > > > > > >                   return err;
> > > > > > > > > > >           }
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&task->base);
> > > > > > > > > > > +
> > > > > > > > > > >           task->num_bos = num_bos;
> > > > > > > > > > >           task->vm = lima_vm_get(vm);
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > > > > > > > > > > index 71a72fb50e6b..2992dc85325f 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > > > > > > > > > > @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> > > > > > > > > > >                   goto unlock;
> > > > > > > > > > >           }
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> > > > > > > > > > > 
> > > > > > > > > > >           ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > > > > > > index 79554aa4dbb1..f7347c284886 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > > > > > > > @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> > > > > > > > > > >        * @sched_job: job to submit
> > > > > > > > > > >        * @entity: scheduler entity
> > > > > > > > > > >        *
> > > > > > > > > > > - * Note: To guarantee that the order of insertion to queue matches
> > > > > > > > > > > - * the job's fence sequence number this function should be
> > > > > > > > > > > - * called with drm_sched_job_init under common lock.
> > > > > > > > > > > + * Note: To guarantee that the order of insertion to queue matches the job's
> > > > > > > > > > > + * fence sequence number this function should be called with drm_sched_job_arm()
> > > > > > > > > > > + * under common lock.
> > > > > > > > > > >        *
> > > > > > > > > > >        * Returns 0 for success, negative error code otherwise.
> > > > > > > > > > >        */
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> > > > > > > > > > > index 69de2c76731f..c451ee9a30d7 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_fence.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> > > > > > > > > > > @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> > > > > > > > > > >        *
> > > > > > > > > > >        * Free up the fence memory after the RCU grace period.
> > > > > > > > > > >        */
> > > > > > > > > > > -static void drm_sched_fence_free(struct rcu_head *rcu)
> > > > > > > > > > > +void drm_sched_fence_free(struct rcu_head *rcu)
> > > > > > > > > > >       {
> > > > > > > > > > >           struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> > > > > > > > > > >           struct drm_sched_fence *fence = to_drm_sched_fence(f);
> > > > > > > > > > > @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> > > > > > > > > > >       }
> > > > > > > > > > >       EXPORT_SYMBOL(to_drm_sched_fence);
> > > > > > > > > > > 
> > > > > > > > > > > -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > > > > > > > > > > -                                            void *owner)
> > > > > > > > > > > +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> > > > > > > > > > > +                                           void *owner)
> > > > > > > > > > >       {
> > > > > > > > > > >           struct drm_sched_fence *fence = NULL;
> > > > > > > > > > > -     unsigned seq;
> > > > > > > > > > > 
> > > > > > > > > > >           fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> > > > > > > > > > >           if (fence == NULL)
> > > > > > > > > > > @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> > > > > > > > > > >           fence->sched = entity->rq->sched;
> > > > > > > > > > >           spin_lock_init(&fence->lock);
> > > > > > > > > > > 
> > > > > > > > > > > +     return fence;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > > > > > > > > > > +                       struct drm_sched_entity *entity)
> > > > > > > > > > > +{
> > > > > > > > > > > +     unsigned seq;
> > > > > > > > > > > +
> > > > > > > > > > >           seq = atomic_inc_return(&entity->fence_seq);
> > > > > > > > > > >           dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> > > > > > > > > > >                          &fence->lock, entity->fence_context, seq);
> > > > > > > > > > >           dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> > > > > > > > > > >                          &fence->lock, entity->fence_context + 1, seq);
> > > > > > > > > > > -
> > > > > > > > > > > -     return fence;
> > > > > > > > > > >       }
> > > > > > > > > > > 
> > > > > > > > > > >       module_init(drm_sched_fence_slab_init);
> > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > index 33c414d55fab..5e84e1500c32 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > > > @@ -48,9 +48,11 @@
> > > > > > > > > > >       #include <linux/wait.h>
> > > > > > > > > > >       #include <linux/sched.h>
> > > > > > > > > > >       #include <linux/completion.h>
> > > > > > > > > > > +#include <linux/dma-resv.h>
> > > > > > > > > > >       #include <uapi/linux/sched/types.h>
> > > > > > > > > > > 
> > > > > > > > > > >       #include <drm/drm_print.h>
> > > > > > > > > > > +#include <drm/drm_gem.h>
> > > > > > > > > > >       #include <drm/gpu_scheduler.h>
> > > > > > > > > > >       #include <drm/spsc_queue.h>
> > > > > > > > > > > 
> > > > > > > > > > > @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > > > > > > > > > > 
> > > > > > > > > > >       /**
> > > > > > > > > > >        * drm_sched_job_init - init a scheduler job
> > > > > > > > > > > - *
> > > > > > > > > > >        * @job: scheduler job to init
> > > > > > > > > > >        * @entity: scheduler entity to use
> > > > > > > > > > >        * @owner: job owner for debugging
> > > > > > > > > > > @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> > > > > > > > > > >        * Refer to drm_sched_entity_push_job() documentation
> > > > > > > > > > >        * for locking considerations.
> > > > > > > > > > >        *
> > > > > > > > > > > + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> > > > > > > > > > > + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> > > > > > > > > > > + *
> > > > > > > > > > >        * Returns 0 for success, negative error code otherwise.
> > > > > > > > > > >        */
> > > > > > > > > > >       int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > > @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > >           job->sched = sched;
> > > > > > > > > > >           job->entity = entity;
> > > > > > > > > > >           job->s_priority = entity->rq - sched->sched_rq;
> > > > > > > > > > > -     job->s_fence = drm_sched_fence_create(entity, owner);
> > > > > > > > > > > +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> > > > > > > > > > >           if (!job->s_fence)
> > > > > > > > > > >                   return -ENOMEM;
> > > > > > > > > > >           job->id = atomic64_inc_return(&sched->job_id_count);
> > > > > > > > > > > @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > >       EXPORT_SYMBOL(drm_sched_job_init);
> > > > > > > > > > > 
> > > > > > > > > > >       /**
> > > > > > > > > > > - * drm_sched_job_cleanup - clean up scheduler job resources
> > > > > > > > > > > + * drm_sched_job_arm - arm a scheduler job for execution
> > > > > > > > > > > + * @job: scheduler job to arm
> > > > > > > > > > > + *
> > > > > > > > > > > + * This arms a scheduler job for execution. Specifically it initializes the
> > > > > > > > > > > + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> > > > > > > > > > > + * or other places that need to track the completion of this job.
> > > > > > > > > > > + *
> > > > > > > > > > > + * Refer to drm_sched_entity_push_job() documentation for locking
> > > > > > > > > > > + * considerations.
> > > > > > > > > > >        *
> > > > > > > > > > > + * This can only be called if drm_sched_job_init() succeeded.
> > > > > > > > > > > + */
> > > > > > > > > > > +void drm_sched_job_arm(struct drm_sched_job *job)
> > > > > > > > > > > +{
> > > > > > > > > > > +     drm_sched_fence_init(job->s_fence, job->entity);
> > > > > > > > > > > +}
> > > > > > > > > > > +EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > > > > > > > +
> > > > > > > > > > > +/**
> > > > > > > > > > > + * drm_sched_job_cleanup - clean up scheduler job resources
> > > > > > > > > > >        * @job: scheduler job to clean up
> > > > > > > > > > > + *
> > > > > > > > > > > + * Cleans up the resources allocated with drm_sched_job_init().
> > > > > > > > > > > + *
> > > > > > > > > > > + * Drivers should call this from their error unwind code if @job is aborted
> > > > > > > > > > > + * before drm_sched_job_arm() is called.
> > > > > > > > > > > + *
> > > > > > > > > > > + * After that point of no return @job is committed to be executed by the
> > > > > > > > > > > + * scheduler, and this function should be called from the
> > > > > > > > > > > + * &drm_sched_backend_ops.free_job callback.
> > > > > > > > > > >        */
> > > > > > > > > > >       void drm_sched_job_cleanup(struct drm_sched_job *job)
> > > > > > > > > > >       {
> > > > > > > > > > > -     dma_fence_put(&job->s_fence->finished);
> > > > > > > > > > > +     if (!kref_read(&job->s_fence->finished.refcount)) {
> > > > > > > > > > > +             /* drm_sched_job_arm() has been called */
> > > > > > > > > > > +             dma_fence_put(&job->s_fence->finished);
> > > > > > > > > > > +     } else {
> > > > > > > > > > > +             /* aborted job before committing to run it */
> > > > > > > > > > > +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> > > > > > > > > > > +     }
> > > > > > > > > > > +
> > > > > > > > > > >           job->s_fence = NULL;
> > > > > > > > > > >       }
> > > > > > > > > > >       EXPORT_SYMBOL(drm_sched_job_cleanup);
> > > > > > > > > > > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> > > > > > > > > > > index 4eb354226972..5c3a99027ecd 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > > > > > > > > > > @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> > > > > > > > > > >           if (ret)
> > > > > > > > > > >                   return ret;
> > > > > > > > > > > 
> > > > > > > > > > > +     drm_sched_job_arm(&job->base);
> > > > > > > > > > > +
> > > > > > > > > > >           job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> > > > > > > > > > > 
> > > > > > > > > > >           /* put by scheduler job completion */
> > > > > > > > > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > > > > > > > > index 88ae7f331bb1..83afc3aa8e2f 100644
> > > > > > > > > > > --- a/include/drm/gpu_scheduler.h
> > > > > > > > > > > +++ b/include/drm/gpu_scheduler.h
> > > > > > > > > > > @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> > > > > > > > > > >       int drm_sched_job_init(struct drm_sched_job *job,
> > > > > > > > > > >                          struct drm_sched_entity *entity,
> > > > > > > > > > >                          void *owner);
> > > > > > > > > > > +void drm_sched_job_arm(struct drm_sched_job *job);
> > > > > > > > > > >       void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > > > > > > > > > >                                       struct drm_gpu_scheduler **sched_list,
> > > > > > > > > > >                                          unsigned int num_sched_list);
> > > > > > > > > > > @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> > > > > > > > > > >                                      enum drm_sched_priority priority);
> > > > > > > > > > >       bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> > > > > > > > > > > 
> > > > > > > > > > > -struct drm_sched_fence *drm_sched_fence_create(
> > > > > > > > > > > +struct drm_sched_fence *drm_sched_fence_alloc(
> > > > > > > > > > >           struct drm_sched_entity *s_entity, void *owner);
> > > > > > > > > > > +void drm_sched_fence_init(struct drm_sched_fence *fence,
> > > > > > > > > > > +                       struct drm_sched_entity *entity);
> > > > > > > > > > > +void drm_sched_fence_free(struct rcu_head *rcu);
> > > > > > > > > > > +
> > > > > > > > > > >       void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> > > > > > > > > > >       void drm_sched_fence_finished(struct drm_sched_fence *fence);
> > > > > > > > > > > 
> > > 
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0
> > 
> > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C1ac51fc78f9f4e2f08a808d941e0c013%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613255881294371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PRGZl6tUAc7FrL39mu%2BBV2AfC02Mz9R2Neqs5TjdB6M%3D&amp;reserved=0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08 10:02                         ` Daniel Vetter
@ 2021-07-08 10:54                           ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08 10:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, DRI Development, Steven Price, Daniel Vetter,
	Lucas Stach, Russell King, Christian Gmeiner, Qiang Yu,
	Rob Herring, Tomeu Vizoso, Alyssa Rosenzweig, David Airlie,
	Sumit Semwal, Masahiro Yamada, Kees Cook, Adam Borowski,
	Nick Terrell, Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen,
	Viresh Kumar, Alex Deucher, Dave Airlie, Nirmoy Das,
	Deepak R Varma, Lee Jones, Kevin Wang, Chen Li, Luben Tuikov,
	Marek Olšák, Dennis Li, Maarten Lankhorst,
	Andrey Grodzovsky, Sonny Jiang, Boris Brezillon, Tian Tao,
	Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

Am 08.07.21 um 12:02 schrieb Daniel Vetter:
> On Thu, Jul 08, 2021 at 09:53:00AM +0200, Christian König wrote:
>> Am 08.07.21 um 09:19 schrieb Daniel Vetter:
>>> On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
>>>>> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
>>>>>> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
>>>>>>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
>>>>>>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>>>>>>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>>>>>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>>>>>>>>> This is a very confusingly named function, because not just does it
>>>>>>>>>>>> init an object, it arms it and provides a point of no return for
>>>>>>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>>>>>>>>> clearer in the interface.
>>>>>>>>>>>>
>>>>>>>>>>>> But the real reason is that I want to push the dependency tracking
>>>>>>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>>>>>>>>> must be called a lot earlier, without arming the job.
>>>>>>>>>>>>
>>>>>>>>>>>> v2:
>>>>>>>>>>>> - don't change .gitignore (Steven)
>>>>>>>>>>>> - don't forget v3d (Emma)
>>>>>>>>>>>>
>>>>>>>>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>>>>>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>>>>>>>>> subsequent driver patches. To be able to fix this change
>>>>>>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>>>>>>>>> after drm_sched_job_arm().
>>>>>>>>>>> Thinking more about this, I'm not sure if this really works.
>>>>>>>>>>>
>>>>>>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>>>>>>>>> to update the entity->rq association.
>>>>>>>>>>>
>>>>>>>>>>> And that can only be done later on when we arm the fence as well.
>>>>>>>>>> Hm yeah, but that's a bug in the existing code I think: We already
>>>>>>>>>> fail to clean up if we fail to allocate the fences. So I think the
>>>>>>>>>> right thing to do here is to split the checks into job_init, and do
>>>>>>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>>>>>>>>> what's all going on there, the first check looks a bit like trying to
>>>>>>>>>> schedule before the entity is set up, which is a driver bug and should
>>>>>>>>>> have a WARN_ON?
>>>>>>>>> No you misunderstood me, the problem is something else.
>>>>>>>>>
>>>>>>>>> You asked previously why the call to drm_sched_job_init() was so late in
>>>>>>>>> the CS.
>>>>>>>>>
>>>>>>>>> The reason for this was not alone the scheduler fence init, but also the
>>>>>>>>> call to drm_sched_entity_select_rq().
>>>>>>>> Ah ok, I think I can fix that. Needs a prep patch to first make
>>>>>>>> drm_sched_entity_select infallible, then should be easy to do.
>>>>>>>>
>>>>>>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>>>>>>>>> even trying to do.
>>>>>>>>> You mean that here?
>>>>>>>>>
>>>>>>>>>              fence = READ_ONCE(entity->last_scheduled);
>>>>>>>>>              if (fence && !dma_fence_is_signaled(fence))
>>>>>>>>>                      return;
>>>>>>>>>
>>>>>>>>> This makes sure that load balancing is not moving the entity to a
>>>>>>>>> different scheduler while there are still jobs running from this entity
>>>>>>>>> on the hardware,
>>>>>>>> Yeah after a nap that idea crossed my mind too. But now I have locking
>>>>>>>> questions, afaiui the scheduler thread updates this, without taking
>>>>>>>> any locks - entity dequeuing is lockless. And here we read the fence
>>>>>>>> and then seem to yolo check whether it's signalled? What's preventing
>>>>>>>> a use-after-free here? There's no rcu or anything going on here at
>>>>>>>> all, and it's outside of the spinlock section, which starts a bit
>>>>>>>> further down.
>>>>>>> The last_scheduled fence of an entity can only change when there are
>>>>>>> jobs on the entities queued, and we have just ruled that out in the
>>>>>>> check before.
>>>>>> There aren't any barriers, so the cpu could easily run the two checks
>>>>>> the other way round. I'll ponder this and figure out where exactly we
>>>>>> need docs for the constraint and/or barriers to make this work as
>>>>>> intended. As-is I'm not seeing how it does ...
>>>>> spsc_queue_count() provides the necessary barrier with the atomic_read().
>>>> atomic_t is fully unordered, except when it's a read-modify-write
>>> Wasn't awake yet, I think the rule is read-modify-write and return
>>> previous value gives you full barrier. So stuff like cmpxchg, but also
>>> a few others. See atomic_t.txt under ODERING heading (yes that
>>> maintainer refuses to accept .rst so I can't just link you to the
>>> right section, it's silly). get/set and even RMW atomic ops that don't
>>> return anything are all fully unordered.
>> As far as I know that not completely correct. The rules around atomics i
>> once learned are:
>>
>> 1. Everything which modifies something is a write barrier.
>> 2. Everything which returns something is a read barrier.
>>
>> And I know a whole bunch of use cases where this is relied upon in the core
>> kernel, so I'm pretty sure that's correct.
> That's against what the doc says, and also it would mean stuff like
> atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.
>
> On x86 you're right, anywhere else where there's no total store ordering I
> you're wrong.

Good to know. I always thought that atomic_read_acquire() was just for 
documentation purpose.



> If there's code that relies on this it needs to be fixed and properly
> documented. I did go through the squeue code a bit, and might be better to
> just replace this with a core data structure.

Well the spsc was especially crafted for this use case and performed 
quite a bit better then a double linked list.

Or what core data structure do you have in mind?

Christian.

> -Daniel
>
>> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
>> the read barrier is the aromic_read() in spsc_queue_count().
>>
>> The READ_ONCE() is actually not even necessary as far as I can see.
>>
>> Christian.
>>
>>> -Daniel
>>>
>>>
>>>> atomic op, then it's a full barrier. So yeah you need more here. But
>>>> also since you only need a read barrier on one side, and a write
>>>> barrier on the other, you don't actually need a cpu barriers on x86.
>>>> And READ_ONCE gives you the compiler barrier on one side at least, I
>>>> haven't found it on the writer side yet.
>>>>
>>>>> But yes a comment would be really nice here. I had to think for a while
>>>>> why we don't need this as well.
>>>> I'm typing a patch, which after a night's sleep I realized has the
>>>> wrong barriers. And now I'm also typing some doc improvements for
>>>> drm_sched_entity and related functions.
>>>>
>>>>> Christian.
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Also improve the kerneldoc for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>>>>>>> ---
>>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>>>>>>        drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>>>>>>        include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>>>>>>        10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>>>>>>            if (r)
>>>>>>>>>>>>                    goto error_unlock;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>>>>>>             * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>>>>>>             * added to BOs.
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>>>>>>            if (r)
>>>>>>>>>>>>                    return r;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>            amdgpu_job_free_resources(job);
>>>>>>>>>>>>            drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>>>>>>            if (ret)
>>>>>>>>>>>>                    goto out_unlock;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>>>>>>> +
>>>>>>>>>>>>            submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>>>>>>            submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>>>>>>                                                    submit->out_fence, 0,
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>>>>>>                    return err;
>>>>>>>>>>>>            }
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            task->num_bos = num_bos;
>>>>>>>>>>>>            task->vm = lima_vm_get(vm);
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>>>>>>                    goto unlock;
>>>>>>>>>>>>            }
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>
>>>>>>>>>>>>            ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>>>>>>         * @sched_job: job to submit
>>>>>>>>>>>>         * @entity: scheduler entity
>>>>>>>>>>>>         *
>>>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>>>>>>> - * the job's fence sequence number this function should be
>>>>>>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>>>>>>> + * under common lock.
>>>>>>>>>>>>         *
>>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>         */
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>>>>>>         *
>>>>>>>>>>>>         * Free up the fence memory after the RCU grace period.
>>>>>>>>>>>>         */
>>>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>>        {
>>>>>>>>>>>>            struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>>>>>>            struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>>>>>>        }
>>>>>>>>>>>>        EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>>>>>>
>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>> -                                            void *owner)
>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>>>>>>> +                                           void *owner)
>>>>>>>>>>>>        {
>>>>>>>>>>>>            struct drm_sched_fence *fence = NULL;
>>>>>>>>>>>> -     unsigned seq;
>>>>>>>>>>>>
>>>>>>>>>>>>            fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>>>>>>            if (fence == NULL)
>>>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>>            fence->sched = entity->rq->sched;
>>>>>>>>>>>>            spin_lock_init(&fence->lock);
>>>>>>>>>>>>
>>>>>>>>>>>> +     return fence;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     unsigned seq;
>>>>>>>>>>>> +
>>>>>>>>>>>>            seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>>>>>>            dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>>>>>>                           &fence->lock, entity->fence_context, seq);
>>>>>>>>>>>>            dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>>>>>>                           &fence->lock, entity->fence_context + 1, seq);
>>>>>>>>>>>> -
>>>>>>>>>>>> -     return fence;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>>        module_init(drm_sched_fence_slab_init);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>>>>>>        #include <linux/wait.h>
>>>>>>>>>>>>        #include <linux/sched.h>
>>>>>>>>>>>>        #include <linux/completion.h>
>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>        #include <uapi/linux/sched/types.h>
>>>>>>>>>>>>
>>>>>>>>>>>>        #include <drm/drm_print.h>
>>>>>>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>>>>>>        #include <drm/gpu_scheduler.h>
>>>>>>>>>>>>        #include <drm/spsc_queue.h>
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>
>>>>>>>>>>>>        /**
>>>>>>>>>>>>         * drm_sched_job_init - init a scheduler job
>>>>>>>>>>>> - *
>>>>>>>>>>>>         * @job: scheduler job to init
>>>>>>>>>>>>         * @entity: scheduler entity to use
>>>>>>>>>>>>         * @owner: job owner for debugging
>>>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>         * Refer to drm_sched_entity_push_job() documentation
>>>>>>>>>>>>         * for locking considerations.
>>>>>>>>>>>>         *
>>>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>>>>>>> + *
>>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>         */
>>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>            job->sched = sched;
>>>>>>>>>>>>            job->entity = entity;
>>>>>>>>>>>>            job->s_priority = entity->rq - sched->sched_rq;
>>>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>>>>>>            if (!job->s_fence)
>>>>>>>>>>>>                    return -ENOMEM;
>>>>>>>>>>>>            job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>>>>>>
>>>>>>>>>>>>        /**
>>>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>>>>>>> + * @job: scheduler job to arm
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>>>>>>> + * considerations.
>>>>>>>>>>>>         *
>>>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>>         * @job: scheduler job to clean up
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>>>>>>> + * scheduler, and this function should be called from the
>>>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>>>>>>         */
>>>>>>>>>>>>        void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>>>>>>        {
>>>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>> +     } else {
>>>>>>>>>>>> +             /* aborted job before committing to run it */
>>>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>>>>>>> +     }
>>>>>>>>>>>> +
>>>>>>>>>>>>            job->s_fence = NULL;
>>>>>>>>>>>>        }
>>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>>>>>>            if (ret)
>>>>>>>>>>>>                    return ret;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>
>>>>>>>>>>>>            /* put by scheduler job completion */
>>>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>                           struct drm_sched_entity *entity,
>>>>>>>>>>>>                           void *owner);
>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>>>>>>        void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>>>>>>                                        struct drm_gpu_scheduler **sched_list,
>>>>>>>>>>>>                                           unsigned int num_sched_list);
>>>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>>>>>>                                       enum drm_sched_priority priority);
>>>>>>>>>>>>        bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>>>>>>
>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>>>>>>            struct drm_sched_entity *s_entity, void *owner);
>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>>>>>>> +
>>>>>>>>>>>>        void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>>>>>>        void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>>>>>>
>>>> --
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580226578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9GhYoGHD6TlcrW5dvT9Z%2BFukW%2F8%2BicK2t8180coEsJY%3D&amp;reserved=0
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580236571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Yt%2FirDjTmtDUjQS1xlYg4x5mz82cHkNyLPkNNpO31ro%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08 10:54                           ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08 10:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Emma Anholt, Adam Borowski, David Airlie, Daniel Vetter,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Viresh Kumar, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kevin Wang, Kees Cook, Marek Olšák,
	Russell King, The etnaviv authors,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Nick Terrell,
	Deepak R Varma, Tomeu Vizoso, Boris Brezillon, Qiang Yu,
	Alex Deucher, Tian Tao, open list:DMA BUFFER SHARING FRAMEWORK

Am 08.07.21 um 12:02 schrieb Daniel Vetter:
> On Thu, Jul 08, 2021 at 09:53:00AM +0200, Christian König wrote:
>> Am 08.07.21 um 09:19 schrieb Daniel Vetter:
>>> On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
>>>>> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
>>>>>> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
>>>>>>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
>>>>>>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
>>>>>>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
>>>>>>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
>>>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
>>>>>>>>>>>> This is a very confusingly named function, because not just does it
>>>>>>>>>>>> init an object, it arms it and provides a point of no return for
>>>>>>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
>>>>>>>>>>>> clearer in the interface.
>>>>>>>>>>>>
>>>>>>>>>>>> But the real reason is that I want to push the dependency tracking
>>>>>>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
>>>>>>>>>>>> must be called a lot earlier, without arming the job.
>>>>>>>>>>>>
>>>>>>>>>>>> v2:
>>>>>>>>>>>> - don't change .gitignore (Steven)
>>>>>>>>>>>> - don't forget v3d (Emma)
>>>>>>>>>>>>
>>>>>>>>>>>> v3: Emma noticed that I leak the memory allocated in
>>>>>>>>>>>> drm_sched_job_init if we bail out before the point of no return in
>>>>>>>>>>>> subsequent driver patches. To be able to fix this change
>>>>>>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
>>>>>>>>>>>> after drm_sched_job_arm().
>>>>>>>>>>> Thinking more about this, I'm not sure if this really works.
>>>>>>>>>>>
>>>>>>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
>>>>>>>>>>> to update the entity->rq association.
>>>>>>>>>>>
>>>>>>>>>>> And that can only be done later on when we arm the fence as well.
>>>>>>>>>> Hm yeah, but that's a bug in the existing code I think: We already
>>>>>>>>>> fail to clean up if we fail to allocate the fences. So I think the
>>>>>>>>>> right thing to do here is to split the checks into job_init, and do
>>>>>>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
>>>>>>>>>> what's all going on there, the first check looks a bit like trying to
>>>>>>>>>> schedule before the entity is set up, which is a driver bug and should
>>>>>>>>>> have a WARN_ON?
>>>>>>>>> No you misunderstood me, the problem is something else.
>>>>>>>>>
>>>>>>>>> You asked previously why the call to drm_sched_job_init() was so late in
>>>>>>>>> the CS.
>>>>>>>>>
>>>>>>>>> The reason for this was not alone the scheduler fence init, but also the
>>>>>>>>> call to drm_sched_entity_select_rq().
>>>>>>>> Ah ok, I think I can fix that. Needs a prep patch to first make
>>>>>>>> drm_sched_entity_select infallible, then should be easy to do.
>>>>>>>>
>>>>>>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
>>>>>>>>>> even trying to do.
>>>>>>>>> You mean that here?
>>>>>>>>>
>>>>>>>>>              fence = READ_ONCE(entity->last_scheduled);
>>>>>>>>>              if (fence && !dma_fence_is_signaled(fence))
>>>>>>>>>                      return;
>>>>>>>>>
>>>>>>>>> This makes sure that load balancing is not moving the entity to a
>>>>>>>>> different scheduler while there are still jobs running from this entity
>>>>>>>>> on the hardware,
>>>>>>>> Yeah after a nap that idea crossed my mind too. But now I have locking
>>>>>>>> questions, afaiui the scheduler thread updates this, without taking
>>>>>>>> any locks - entity dequeuing is lockless. And here we read the fence
>>>>>>>> and then seem to yolo check whether it's signalled? What's preventing
>>>>>>>> a use-after-free here? There's no rcu or anything going on here at
>>>>>>>> all, and it's outside of the spinlock section, which starts a bit
>>>>>>>> further down.
>>>>>>> The last_scheduled fence of an entity can only change when there are
>>>>>>> jobs on the entities queued, and we have just ruled that out in the
>>>>>>> check before.
>>>>>> There aren't any barriers, so the cpu could easily run the two checks
>>>>>> the other way round. I'll ponder this and figure out where exactly we
>>>>>> need docs for the constraint and/or barriers to make this work as
>>>>>> intended. As-is I'm not seeing how it does ...
>>>>> spsc_queue_count() provides the necessary barrier with the atomic_read().
>>>> atomic_t is fully unordered, except when it's a read-modify-write
>>> Wasn't awake yet, I think the rule is read-modify-write and return
>>> previous value gives you full barrier. So stuff like cmpxchg, but also
>>> a few others. See atomic_t.txt under ODERING heading (yes that
>>> maintainer refuses to accept .rst so I can't just link you to the
>>> right section, it's silly). get/set and even RMW atomic ops that don't
>>> return anything are all fully unordered.
>> As far as I know that not completely correct. The rules around atomics i
>> once learned are:
>>
>> 1. Everything which modifies something is a write barrier.
>> 2. Everything which returns something is a read barrier.
>>
>> And I know a whole bunch of use cases where this is relied upon in the core
>> kernel, so I'm pretty sure that's correct.
> That's against what the doc says, and also it would mean stuff like
> atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.
>
> On x86 you're right, anywhere else where there's no total store ordering I
> you're wrong.

Good to know. I always thought that atomic_read_acquire() was just for 
documentation purpose.



> If there's code that relies on this it needs to be fixed and properly
> documented. I did go through the squeue code a bit, and might be better to
> just replace this with a core data structure.

Well the spsc was especially crafted for this use case and performed 
quite a bit better then a double linked list.

Or what core data structure do you have in mind?

Christian.

> -Daniel
>
>> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
>> the read barrier is the aromic_read() in spsc_queue_count().
>>
>> The READ_ONCE() is actually not even necessary as far as I can see.
>>
>> Christian.
>>
>>> -Daniel
>>>
>>>
>>>> atomic op, then it's a full barrier. So yeah you need more here. But
>>>> also since you only need a read barrier on one side, and a write
>>>> barrier on the other, you don't actually need a cpu barriers on x86.
>>>> And READ_ONCE gives you the compiler barrier on one side at least, I
>>>> haven't found it on the writer side yet.
>>>>
>>>>> But yes a comment would be really nice here. I had to think for a while
>>>>> why we don't need this as well.
>>>> I'm typing a patch, which after a night's sleep I realized has the
>>>> wrong barriers. And now I'm also typing some doc improvements for
>>>> drm_sched_entity and related functions.
>>>>
>>>>> Christian.
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Also improve the kerneldoc for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>>>>>>> ---
>>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>>>>>>        drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>>>>>>        include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>>>>>>        10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>>>>>>            if (r)
>>>>>>>>>>>>                    goto error_unlock;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>>>>>>             * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>>>>>>             * added to BOs.
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>>>>>>            if (r)
>>>>>>>>>>>>                    return r;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>            amdgpu_job_free_resources(job);
>>>>>>>>>>>>            drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>>>>>>            if (ret)
>>>>>>>>>>>>                    goto out_unlock;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>>>>>>> +
>>>>>>>>>>>>            submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>>>>>>            submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>>>>>>                                                    submit->out_fence, 0,
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>>>>>>                    return err;
>>>>>>>>>>>>            }
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            task->num_bos = num_bos;
>>>>>>>>>>>>            task->vm = lima_vm_get(vm);
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>>>>>>                    goto unlock;
>>>>>>>>>>>>            }
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>
>>>>>>>>>>>>            ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>>>>>>         * @sched_job: job to submit
>>>>>>>>>>>>         * @entity: scheduler entity
>>>>>>>>>>>>         *
>>>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>>>>>>> - * the job's fence sequence number this function should be
>>>>>>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>>>>>>> + * under common lock.
>>>>>>>>>>>>         *
>>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>         */
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>>>>>>         *
>>>>>>>>>>>>         * Free up the fence memory after the RCU grace period.
>>>>>>>>>>>>         */
>>>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>>        {
>>>>>>>>>>>>            struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>>>>>>            struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>>>>>>        }
>>>>>>>>>>>>        EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>>>>>>
>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>> -                                            void *owner)
>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>>>>>>> +                                           void *owner)
>>>>>>>>>>>>        {
>>>>>>>>>>>>            struct drm_sched_fence *fence = NULL;
>>>>>>>>>>>> -     unsigned seq;
>>>>>>>>>>>>
>>>>>>>>>>>>            fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>>>>>>            if (fence == NULL)
>>>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>>            fence->sched = entity->rq->sched;
>>>>>>>>>>>>            spin_lock_init(&fence->lock);
>>>>>>>>>>>>
>>>>>>>>>>>> +     return fence;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     unsigned seq;
>>>>>>>>>>>> +
>>>>>>>>>>>>            seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>>>>>>            dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>>>>>>                           &fence->lock, entity->fence_context, seq);
>>>>>>>>>>>>            dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>>>>>>                           &fence->lock, entity->fence_context + 1, seq);
>>>>>>>>>>>> -
>>>>>>>>>>>> -     return fence;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>>        module_init(drm_sched_fence_slab_init);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>>>>>>        #include <linux/wait.h>
>>>>>>>>>>>>        #include <linux/sched.h>
>>>>>>>>>>>>        #include <linux/completion.h>
>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>        #include <uapi/linux/sched/types.h>
>>>>>>>>>>>>
>>>>>>>>>>>>        #include <drm/drm_print.h>
>>>>>>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>>>>>>        #include <drm/gpu_scheduler.h>
>>>>>>>>>>>>        #include <drm/spsc_queue.h>
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>
>>>>>>>>>>>>        /**
>>>>>>>>>>>>         * drm_sched_job_init - init a scheduler job
>>>>>>>>>>>> - *
>>>>>>>>>>>>         * @job: scheduler job to init
>>>>>>>>>>>>         * @entity: scheduler entity to use
>>>>>>>>>>>>         * @owner: job owner for debugging
>>>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>         * Refer to drm_sched_entity_push_job() documentation
>>>>>>>>>>>>         * for locking considerations.
>>>>>>>>>>>>         *
>>>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>>>>>>> + *
>>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>         */
>>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>            job->sched = sched;
>>>>>>>>>>>>            job->entity = entity;
>>>>>>>>>>>>            job->s_priority = entity->rq - sched->sched_rq;
>>>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>>>>>>            if (!job->s_fence)
>>>>>>>>>>>>                    return -ENOMEM;
>>>>>>>>>>>>            job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>>>>>>
>>>>>>>>>>>>        /**
>>>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>>>>>>> + * @job: scheduler job to arm
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>>>>>>> + * considerations.
>>>>>>>>>>>>         *
>>>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>>>>>>> + */
>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/**
>>>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>>         * @job: scheduler job to clean up
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>>>>>>> + * scheduler, and this function should be called from the
>>>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>>>>>>         */
>>>>>>>>>>>>        void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>>>>>>        {
>>>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>> +     } else {
>>>>>>>>>>>> +             /* aborted job before committing to run it */
>>>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>>>>>>> +     }
>>>>>>>>>>>> +
>>>>>>>>>>>>            job->s_fence = NULL;
>>>>>>>>>>>>        }
>>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>>>>>>            if (ret)
>>>>>>>>>>>>                    return ret;
>>>>>>>>>>>>
>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>> +
>>>>>>>>>>>>            job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>
>>>>>>>>>>>>            /* put by scheduler job completion */
>>>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>                           struct drm_sched_entity *entity,
>>>>>>>>>>>>                           void *owner);
>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>>>>>>        void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>>>>>>                                        struct drm_gpu_scheduler **sched_list,
>>>>>>>>>>>>                                           unsigned int num_sched_list);
>>>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>>>>>>                                       enum drm_sched_priority priority);
>>>>>>>>>>>>        bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>>>>>>
>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>>>>>>            struct drm_sched_entity *s_entity, void *owner);
>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>>>>>>> +
>>>>>>>>>>>>        void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>>>>>>        void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>>>>>>
>>>> --
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580226578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9GhYoGHD6TlcrW5dvT9Z%2BFukW%2F8%2BicK2t8180coEsJY%3D&amp;reserved=0
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580236571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Yt%2FirDjTmtDUjQS1xlYg4x5mz82cHkNyLPkNNpO31ro%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08 10:54                           ` Christian König
@ 2021-07-08 11:20                             ` Daniel Vetter
  -1 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08 11:20 UTC (permalink / raw)
  To: Christian König
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt

On Thu, Jul 8, 2021 at 12:54 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 08.07.21 um 12:02 schrieb Daniel Vetter:
> > On Thu, Jul 08, 2021 at 09:53:00AM +0200, Christian König wrote:
> >> Am 08.07.21 um 09:19 schrieb Daniel Vetter:
> >>> On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >>>> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
> >>>>> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> >>>>>> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> >>>>>>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> >>>>>>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> >>>>>>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> >>>>>>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> >>>>>>>>>> <christian.koenig@amd.com> wrote:
> >>>>>>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>>>>>>>>>>> This is a very confusingly named function, because not just does it
> >>>>>>>>>>>> init an object, it arms it and provides a point of no return for
> >>>>>>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
> >>>>>>>>>>>> clearer in the interface.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But the real reason is that I want to push the dependency tracking
> >>>>>>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
> >>>>>>>>>>>> must be called a lot earlier, without arming the job.
> >>>>>>>>>>>>
> >>>>>>>>>>>> v2:
> >>>>>>>>>>>> - don't change .gitignore (Steven)
> >>>>>>>>>>>> - don't forget v3d (Emma)
> >>>>>>>>>>>>
> >>>>>>>>>>>> v3: Emma noticed that I leak the memory allocated in
> >>>>>>>>>>>> drm_sched_job_init if we bail out before the point of no return in
> >>>>>>>>>>>> subsequent driver patches. To be able to fix this change
> >>>>>>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
> >>>>>>>>>>>> after drm_sched_job_arm().
> >>>>>>>>>>> Thinking more about this, I'm not sure if this really works.
> >>>>>>>>>>>
> >>>>>>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >>>>>>>>>>> to update the entity->rq association.
> >>>>>>>>>>>
> >>>>>>>>>>> And that can only be done later on when we arm the fence as well.
> >>>>>>>>>> Hm yeah, but that's a bug in the existing code I think: We already
> >>>>>>>>>> fail to clean up if we fail to allocate the fences. So I think the
> >>>>>>>>>> right thing to do here is to split the checks into job_init, and do
> >>>>>>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
> >>>>>>>>>> what's all going on there, the first check looks a bit like trying to
> >>>>>>>>>> schedule before the entity is set up, which is a driver bug and should
> >>>>>>>>>> have a WARN_ON?
> >>>>>>>>> No you misunderstood me, the problem is something else.
> >>>>>>>>>
> >>>>>>>>> You asked previously why the call to drm_sched_job_init() was so late in
> >>>>>>>>> the CS.
> >>>>>>>>>
> >>>>>>>>> The reason for this was not alone the scheduler fence init, but also the
> >>>>>>>>> call to drm_sched_entity_select_rq().
> >>>>>>>> Ah ok, I think I can fix that. Needs a prep patch to first make
> >>>>>>>> drm_sched_entity_select infallible, then should be easy to do.
> >>>>>>>>
> >>>>>>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
> >>>>>>>>>> even trying to do.
> >>>>>>>>> You mean that here?
> >>>>>>>>>
> >>>>>>>>>              fence = READ_ONCE(entity->last_scheduled);
> >>>>>>>>>              if (fence && !dma_fence_is_signaled(fence))
> >>>>>>>>>                      return;
> >>>>>>>>>
> >>>>>>>>> This makes sure that load balancing is not moving the entity to a
> >>>>>>>>> different scheduler while there are still jobs running from this entity
> >>>>>>>>> on the hardware,
> >>>>>>>> Yeah after a nap that idea crossed my mind too. But now I have locking
> >>>>>>>> questions, afaiui the scheduler thread updates this, without taking
> >>>>>>>> any locks - entity dequeuing is lockless. And here we read the fence
> >>>>>>>> and then seem to yolo check whether it's signalled? What's preventing
> >>>>>>>> a use-after-free here? There's no rcu or anything going on here at
> >>>>>>>> all, and it's outside of the spinlock section, which starts a bit
> >>>>>>>> further down.
> >>>>>>> The last_scheduled fence of an entity can only change when there are
> >>>>>>> jobs on the entities queued, and we have just ruled that out in the
> >>>>>>> check before.
> >>>>>> There aren't any barriers, so the cpu could easily run the two checks
> >>>>>> the other way round. I'll ponder this and figure out where exactly we
> >>>>>> need docs for the constraint and/or barriers to make this work as
> >>>>>> intended. As-is I'm not seeing how it does ...
> >>>>> spsc_queue_count() provides the necessary barrier with the atomic_read().
> >>>> atomic_t is fully unordered, except when it's a read-modify-write
> >>> Wasn't awake yet, I think the rule is read-modify-write and return
> >>> previous value gives you full barrier. So stuff like cmpxchg, but also
> >>> a few others. See atomic_t.txt under ODERING heading (yes that
> >>> maintainer refuses to accept .rst so I can't just link you to the
> >>> right section, it's silly). get/set and even RMW atomic ops that don't
> >>> return anything are all fully unordered.
> >> As far as I know that not completely correct. The rules around atomics i
> >> once learned are:
> >>
> >> 1. Everything which modifies something is a write barrier.
> >> 2. Everything which returns something is a read barrier.
> >>
> >> And I know a whole bunch of use cases where this is relied upon in the core
> >> kernel, so I'm pretty sure that's correct.
> > That's against what the doc says, and also it would mean stuff like
> > atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.
> >
> > On x86 you're right, anywhere else where there's no total store ordering I
> > you're wrong.
>
> Good to know. I always thought that atomic_read_acquire() was just for
> documentation purpose.

Maybe you mixed it up with C++ atomics (which I think are now also in
C)? Those are strongly ordered by default (you can get the weakly
ordered kernel-style one too). It's a bit unfortunate that the default
semantics are exactly opposite between kernel and userspace :-/

> > If there's code that relies on this it needs to be fixed and properly
> > documented. I did go through the squeue code a bit, and might be better to
> > just replace this with a core data structure.
>
> Well the spsc was especially crafted for this use case and performed
> quite a bit better then a double linked list.

Yeah  double-linked list is awful.

> Or what core data structure do you have in mind?

Hm I thought there's a ready-made queue primitive, but there's just
llist.h. Which I think is roughly what the scheduler queue also does.
Minus the atomic_t for counting how many there are, and aside from the
tracepoints I don't think we're using those anywhere, we just check
for is_empty in the code (from a quick look only).
-Daniel

>
> Christian.
>
> > -Daniel
> >
> >> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
> >> the read barrier is the aromic_read() in spsc_queue_count().
> >>
> >> The READ_ONCE() is actually not even necessary as far as I can see.
> >>
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>
> >>>> atomic op, then it's a full barrier. So yeah you need more here. But
> >>>> also since you only need a read barrier on one side, and a write
> >>>> barrier on the other, you don't actually need a cpu barriers on x86.
> >>>> And READ_ONCE gives you the compiler barrier on one side at least, I
> >>>> haven't found it on the writer side yet.
> >>>>
> >>>>> But yes a comment would be really nice here. I had to think for a while
> >>>>> why we don't need this as well.
> >>>> I'm typing a patch, which after a night's sleep I realized has the
> >>>> wrong barriers. And now I'm also typing some doc improvements for
> >>>> drm_sched_entity and related functions.
> >>>>
> >>>>> Christian.
> >>>>>
> >>>>>> -Daniel
> >>>>>>
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>>
> >>>>>>>> -Daniel
> >>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>> Christian.
> >>>>>>>>>
> >>>>>>>>>> -Daniel
> >>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>>>
> >>>>>>>>>>>> Also improve the kerneldoc for this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
> >>>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
> >>>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
> >>>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
> >>>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
> >>>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
> >>>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
> >>>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
> >>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
> >>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
> >>>>>>>>>>>> Cc: lima@lists.freedesktop.org
> >>>>>>>>>>>> Cc: linux-media@vger.kernel.org
> >>>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>>>>>>>>>>        drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>>>>>>>>>>        include/drm/gpu_scheduler.h              |  7 +++-
> >>>>>>>>>>>>        10 files changed, 74 insertions(+), 14 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>>>>>>>>>>            if (r)
> >>>>>>>>>>>>                    goto error_unlock;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            /* No memory allocation is allowed while holding the notifier lock.
> >>>>>>>>>>>>             * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>>>>>>>>>>             * added to BOs.
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>>>>>>>>>>            if (r)
> >>>>>>>>>>>>                    return r;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            *f = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>>>>>>            amdgpu_job_free_resources(job);
> >>>>>>>>>>>>            drm_sched_entity_push_job(&job->base, entity);
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>>>>>>>>>>            if (ret)
> >>>>>>>>>>>>                    goto out_unlock;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>>>>>>>>>>            submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>>>>>>>>>>                                                    submit->out_fence, 0,
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>>>>>>> index dba8329937a3..38f755580507 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>>>>>>>>>>                    return err;
> >>>>>>>>>>>>            }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&task->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            task->num_bos = num_bos;
> >>>>>>>>>>>>            task->vm = lima_vm_get(vm);
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>>>>>>>>>>                    goto unlock;
> >>>>>>>>>>>>            }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>>>>>>
> >>>>>>>>>>>>            ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>>>>>>>>>>         * @sched_job: job to submit
> >>>>>>>>>>>>         * @entity: scheduler entity
> >>>>>>>>>>>>         *
> >>>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
> >>>>>>>>>>>> - * the job's fence sequence number this function should be
> >>>>>>>>>>>> - * called with drm_sched_job_init under common lock.
> >>>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>>>>>>>>>>> + * under common lock.
> >>>>>>>>>>>>         *
> >>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>>>>>>>>>>         *
> >>>>>>>>>>>>         * Free up the fence memory after the RCU grace period.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>>            struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>>>>>>>>>>            struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>>>>>>>>>>        }
> >>>>>>>>>>>>        EXPORT_SYMBOL(to_drm_sched_fence);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>>>>>>> -                                            void *owner)
> >>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>>>>>>>>>>> +                                           void *owner)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>>            struct drm_sched_fence *fence = NULL;
> >>>>>>>>>>>> -     unsigned seq;
> >>>>>>>>>>>>
> >>>>>>>>>>>>            fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>>>>>>>>>>            if (fence == NULL)
> >>>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>>>>>>>            fence->sched = entity->rq->sched;
> >>>>>>>>>>>>            spin_lock_init(&fence->lock);
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     return fence;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>>>>>>> +                       struct drm_sched_entity *entity)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +     unsigned seq;
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            seq = atomic_inc_return(&entity->fence_seq);
> >>>>>>>>>>>>            dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>>>>>>>>>>                           &fence->lock, entity->fence_context, seq);
> >>>>>>>>>>>>            dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>>>>>>>>>>                           &fence->lock, entity->fence_context + 1, seq);
> >>>>>>>>>>>> -
> >>>>>>>>>>>> -     return fence;
> >>>>>>>>>>>>        }
> >>>>>>>>>>>>
> >>>>>>>>>>>>        module_init(drm_sched_fence_slab_init);
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>>>>>>> @@ -48,9 +48,11 @@
> >>>>>>>>>>>>        #include <linux/wait.h>
> >>>>>>>>>>>>        #include <linux/sched.h>
> >>>>>>>>>>>>        #include <linux/completion.h>
> >>>>>>>>>>>> +#include <linux/dma-resv.h>
> >>>>>>>>>>>>        #include <uapi/linux/sched/types.h>
> >>>>>>>>>>>>
> >>>>>>>>>>>>        #include <drm/drm_print.h>
> >>>>>>>>>>>> +#include <drm/drm_gem.h>
> >>>>>>>>>>>>        #include <drm/gpu_scheduler.h>
> >>>>>>>>>>>>        #include <drm/spsc_queue.h>
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>>>>>>
> >>>>>>>>>>>>        /**
> >>>>>>>>>>>>         * drm_sched_job_init - init a scheduler job
> >>>>>>>>>>>> - *
> >>>>>>>>>>>>         * @job: scheduler job to init
> >>>>>>>>>>>>         * @entity: scheduler entity to use
> >>>>>>>>>>>>         * @owner: job owner for debugging
> >>>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>>>>>>         * Refer to drm_sched_entity_push_job() documentation
> >>>>>>>>>>>>         * for locking considerations.
> >>>>>>>>>>>>         *
> >>>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>>            job->sched = sched;
> >>>>>>>>>>>>            job->entity = entity;
> >>>>>>>>>>>>            job->s_priority = entity->rq - sched->sched_rq;
> >>>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>>>>>>>>>>            if (!job->s_fence)
> >>>>>>>>>>>>                    return -ENOMEM;
> >>>>>>>>>>>>            job->id = atomic64_inc_return(&sched->job_id_count);
> >>>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_init);
> >>>>>>>>>>>>
> >>>>>>>>>>>>        /**
> >>>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>>>>>>>>>>> + * @job: scheduler job to arm
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>>>>>>>>>>> + * or other places that need to track the completion of this job.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>>>>>>>>>>> + * considerations.
> >>>>>>>>>>>>         *
> >>>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +/**
> >>>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>>>>>>>         * @job: scheduler job to clean up
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
> >>>>>>>>>>>> + * before drm_sched_job_arm() is called.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
> >>>>>>>>>>>> + * scheduler, and this function should be called from the
> >>>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>>        void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
> >>>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
> >>>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
> >>>>>>>>>>>> +     } else {
> >>>>>>>>>>>> +             /* aborted job before committing to run it */
> >>>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>>>>>>>>>>> +     }
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            job->s_fence = NULL;
> >>>>>>>>>>>>        }
> >>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>>>>>>>>>>            if (ret)
> >>>>>>>>>>>>                    return ret;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>>>>>>
> >>>>>>>>>>>>            /* put by scheduler job completion */
> >>>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
> >>>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
> >>>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>>                           struct drm_sched_entity *entity,
> >>>>>>>>>>>>                           void *owner);
> >>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>>>>>>>>>>        void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>>>>>>>>>>                                        struct drm_gpu_scheduler **sched_list,
> >>>>>>>>>>>>                                           unsigned int num_sched_list);
> >>>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>>>>>>>>>>                                       enum drm_sched_priority priority);
> >>>>>>>>>>>>        bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
> >>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>>>>>>>>>>            struct drm_sched_entity *s_entity, void *owner);
> >>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>>>>>>> +                       struct drm_sched_entity *entity);
> >>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>        void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>>>>>>>>>>        void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>>>>>>>>>>
> >>>> --
> >>>> Daniel Vetter
> >>>> Software Engineer, Intel Corporation
> >>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580226578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9GhYoGHD6TlcrW5dvT9Z%2BFukW%2F8%2BicK2t8180coEsJY%3D&amp;reserved=0
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580236571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Yt%2FirDjTmtDUjQS1xlYg4x5mz82cHkNyLPkNNpO31ro%3D&amp;reserved=0
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08 11:20                             ` Daniel Vetter
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2021-07-08 11:20 UTC (permalink / raw)
  To: Christian König
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jul 8, 2021 at 12:54 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 08.07.21 um 12:02 schrieb Daniel Vetter:
> > On Thu, Jul 08, 2021 at 09:53:00AM +0200, Christian König wrote:
> >> Am 08.07.21 um 09:19 schrieb Daniel Vetter:
> >>> On Thu, Jul 8, 2021 at 9:09 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >>>> On Thu, Jul 8, 2021 at 8:56 AM Christian König <christian.koenig@amd.com> wrote:
> >>>>> Am 07.07.21 um 18:32 schrieb Daniel Vetter:
> >>>>>> On Wed, Jul 7, 2021 at 2:58 PM Christian König <christian.koenig@amd.com> wrote:
> >>>>>>> Am 07.07.21 um 14:13 schrieb Daniel Vetter:
> >>>>>>>> On Wed, Jul 7, 2021 at 1:57 PM Christian König <christian.koenig@amd.com> wrote:
> >>>>>>>>> Am 07.07.21 um 13:14 schrieb Daniel Vetter:
> >>>>>>>>>> On Wed, Jul 7, 2021 at 11:30 AM Christian König
> >>>>>>>>>> <christian.koenig@amd.com> wrote:
> >>>>>>>>>>> Am 02.07.21 um 23:38 schrieb Daniel Vetter:
> >>>>>>>>>>>> This is a very confusingly named function, because not just does it
> >>>>>>>>>>>> init an object, it arms it and provides a point of no return for
> >>>>>>>>>>>> pushing a job into the scheduler. It would be nice if that's a bit
> >>>>>>>>>>>> clearer in the interface.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But the real reason is that I want to push the dependency tracking
> >>>>>>>>>>>> helpers into the scheduler code, and that means drm_sched_job_init
> >>>>>>>>>>>> must be called a lot earlier, without arming the job.
> >>>>>>>>>>>>
> >>>>>>>>>>>> v2:
> >>>>>>>>>>>> - don't change .gitignore (Steven)
> >>>>>>>>>>>> - don't forget v3d (Emma)
> >>>>>>>>>>>>
> >>>>>>>>>>>> v3: Emma noticed that I leak the memory allocated in
> >>>>>>>>>>>> drm_sched_job_init if we bail out before the point of no return in
> >>>>>>>>>>>> subsequent driver patches. To be able to fix this change
> >>>>>>>>>>>> drm_sched_job_cleanup() so it can handle being called both before and
> >>>>>>>>>>>> after drm_sched_job_arm().
> >>>>>>>>>>> Thinking more about this, I'm not sure if this really works.
> >>>>>>>>>>>
> >>>>>>>>>>> See drm_sched_job_init() was also calling drm_sched_entity_select_rq()
> >>>>>>>>>>> to update the entity->rq association.
> >>>>>>>>>>>
> >>>>>>>>>>> And that can only be done later on when we arm the fence as well.
> >>>>>>>>>> Hm yeah, but that's a bug in the existing code I think: We already
> >>>>>>>>>> fail to clean up if we fail to allocate the fences. So I think the
> >>>>>>>>>> right thing to do here is to split the checks into job_init, and do
> >>>>>>>>>> the actual arming/rq selection in job_arm? I'm not entirely sure
> >>>>>>>>>> what's all going on there, the first check looks a bit like trying to
> >>>>>>>>>> schedule before the entity is set up, which is a driver bug and should
> >>>>>>>>>> have a WARN_ON?
> >>>>>>>>> No you misunderstood me, the problem is something else.
> >>>>>>>>>
> >>>>>>>>> You asked previously why the call to drm_sched_job_init() was so late in
> >>>>>>>>> the CS.
> >>>>>>>>>
> >>>>>>>>> The reason for this was not alone the scheduler fence init, but also the
> >>>>>>>>> call to drm_sched_entity_select_rq().
> >>>>>>>> Ah ok, I think I can fix that. Needs a prep patch to first make
> >>>>>>>> drm_sched_entity_select infallible, then should be easy to do.
> >>>>>>>>
> >>>>>>>>>> The 2nd check around last_scheduled I have honeslty no idea what it's
> >>>>>>>>>> even trying to do.
> >>>>>>>>> You mean that here?
> >>>>>>>>>
> >>>>>>>>>              fence = READ_ONCE(entity->last_scheduled);
> >>>>>>>>>              if (fence && !dma_fence_is_signaled(fence))
> >>>>>>>>>                      return;
> >>>>>>>>>
> >>>>>>>>> This makes sure that load balancing is not moving the entity to a
> >>>>>>>>> different scheduler while there are still jobs running from this entity
> >>>>>>>>> on the hardware,
> >>>>>>>> Yeah after a nap that idea crossed my mind too. But now I have locking
> >>>>>>>> questions, afaiui the scheduler thread updates this, without taking
> >>>>>>>> any locks - entity dequeuing is lockless. And here we read the fence
> >>>>>>>> and then seem to yolo check whether it's signalled? What's preventing
> >>>>>>>> a use-after-free here? There's no rcu or anything going on here at
> >>>>>>>> all, and it's outside of the spinlock section, which starts a bit
> >>>>>>>> further down.
> >>>>>>> The last_scheduled fence of an entity can only change when there are
> >>>>>>> jobs on the entities queued, and we have just ruled that out in the
> >>>>>>> check before.
> >>>>>> There aren't any barriers, so the cpu could easily run the two checks
> >>>>>> the other way round. I'll ponder this and figure out where exactly we
> >>>>>> need docs for the constraint and/or barriers to make this work as
> >>>>>> intended. As-is I'm not seeing how it does ...
> >>>>> spsc_queue_count() provides the necessary barrier with the atomic_read().
> >>>> atomic_t is fully unordered, except when it's a read-modify-write
> >>> Wasn't awake yet, I think the rule is read-modify-write and return
> >>> previous value gives you full barrier. So stuff like cmpxchg, but also
> >>> a few others. See atomic_t.txt under ODERING heading (yes that
> >>> maintainer refuses to accept .rst so I can't just link you to the
> >>> right section, it's silly). get/set and even RMW atomic ops that don't
> >>> return anything are all fully unordered.
> >> As far as I know that not completely correct. The rules around atomics i
> >> once learned are:
> >>
> >> 1. Everything which modifies something is a write barrier.
> >> 2. Everything which returns something is a read barrier.
> >>
> >> And I know a whole bunch of use cases where this is relied upon in the core
> >> kernel, so I'm pretty sure that's correct.
> > That's against what the doc says, and also it would mean stuff like
> > atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.
> >
> > On x86 you're right, anywhere else where there's no total store ordering I
> > you're wrong.
>
> Good to know. I always thought that atomic_read_acquire() was just for
> documentation purpose.

Maybe you mixed it up with C++ atomics (which I think are now also in
C)? Those are strongly ordered by default (you can get the weakly
ordered kernel-style one too). It's a bit unfortunate that the default
semantics are exactly opposite between kernel and userspace :-/

> > If there's code that relies on this it needs to be fixed and properly
> > documented. I did go through the squeue code a bit, and might be better to
> > just replace this with a core data structure.
>
> Well the spsc was especially crafted for this use case and performed
> quite a bit better then a double linked list.

Yeah  double-linked list is awful.

> Or what core data structure do you have in mind?

Hm I thought there's a ready-made queue primitive, but there's just
llist.h. Which I think is roughly what the scheduler queue also does.
Minus the atomic_t for counting how many there are, and aside from the
tracepoints I don't think we're using those anywhere, we just check
for is_empty in the code (from a quick look only).
-Daniel

>
> Christian.
>
> > -Daniel
> >
> >> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
> >> the read barrier is the aromic_read() in spsc_queue_count().
> >>
> >> The READ_ONCE() is actually not even necessary as far as I can see.
> >>
> >> Christian.
> >>
> >>> -Daniel
> >>>
> >>>
> >>>> atomic op, then it's a full barrier. So yeah you need more here. But
> >>>> also since you only need a read barrier on one side, and a write
> >>>> barrier on the other, you don't actually need a cpu barriers on x86.
> >>>> And READ_ONCE gives you the compiler barrier on one side at least, I
> >>>> haven't found it on the writer side yet.
> >>>>
> >>>>> But yes a comment would be really nice here. I had to think for a while
> >>>>> why we don't need this as well.
> >>>> I'm typing a patch, which after a night's sleep I realized has the
> >>>> wrong barriers. And now I'm also typing some doc improvements for
> >>>> drm_sched_entity and related functions.
> >>>>
> >>>>> Christian.
> >>>>>
> >>>>>> -Daniel
> >>>>>>
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>>
> >>>>>>>> -Daniel
> >>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>> Christian.
> >>>>>>>>>
> >>>>>>>>>> -Daniel
> >>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>>>
> >>>>>>>>>>>> Also improve the kerneldoc for this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
> >>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >>>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
> >>>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
> >>>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> >>>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
> >>>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
> >>>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
> >>>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
> >>>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
> >>>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
> >>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
> >>>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
> >>>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
> >>>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
> >>>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
> >>>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >>>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
> >>>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
> >>>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
> >>>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
> >>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> >>>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
> >>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
> >>>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
> >>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
> >>>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
> >>>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
> >>>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
> >>>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> >>>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
> >>>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
> >>>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
> >>>>>>>>>>>> Cc: lima@lists.freedesktop.org
> >>>>>>>>>>>> Cc: linux-media@vger.kernel.org
> >>>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
> >>>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/lima/lima_sched.c        |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
> >>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
> >>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
> >>>>>>>>>>>>        drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
> >>>>>>>>>>>>        drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
> >>>>>>>>>>>>        include/drm/gpu_scheduler.h              |  7 +++-
> >>>>>>>>>>>>        10 files changed, 74 insertions(+), 14 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> >>>>>>>>>>>>            if (r)
> >>>>>>>>>>>>                    goto error_unlock;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            /* No memory allocation is allowed while holding the notifier lock.
> >>>>>>>>>>>>             * The lock is held until amdgpu_cs_submit is finished and fence is
> >>>>>>>>>>>>             * added to BOs.
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> >>>>>>>>>>>>            if (r)
> >>>>>>>>>>>>                    return r;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            *f = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>>>>>>            amdgpu_job_free_resources(job);
> >>>>>>>>>>>>            drm_sched_entity_push_job(&job->base, entity);
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> >>>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
> >>>>>>>>>>>>            if (ret)
> >>>>>>>>>>>>                    goto out_unlock;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
> >>>>>>>>>>>>            submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
> >>>>>>>>>>>>                                                    submit->out_fence, 0,
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>>>>>>> index dba8329937a3..38f755580507 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
> >>>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
> >>>>>>>>>>>>                    return err;
> >>>>>>>>>>>>            }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&task->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            task->num_bos = num_bos;
> >>>>>>>>>>>>            task->vm = lima_vm_get(vm);
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> >>>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
> >>>>>>>>>>>>                    goto unlock;
> >>>>>>>>>>>>            }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>>>>>>
> >>>>>>>>>>>>            ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> >>>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> >>>>>>>>>>>>         * @sched_job: job to submit
> >>>>>>>>>>>>         * @entity: scheduler entity
> >>>>>>>>>>>>         *
> >>>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
> >>>>>>>>>>>> - * the job's fence sequence number this function should be
> >>>>>>>>>>>> - * called with drm_sched_job_init under common lock.
> >>>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
> >>>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
> >>>>>>>>>>>> + * under common lock.
> >>>>>>>>>>>>         *
> >>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> >>>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
> >>>>>>>>>>>>         *
> >>>>>>>>>>>>         * Free up the fence memory after the RCU grace period.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>>            struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> >>>>>>>>>>>>            struct drm_sched_fence *fence = to_drm_sched_fence(f);
> >>>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> >>>>>>>>>>>>        }
> >>>>>>>>>>>>        EXPORT_SYMBOL(to_drm_sched_fence);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>>>>>>> -                                            void *owner)
> >>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
> >>>>>>>>>>>> +                                           void *owner)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>>            struct drm_sched_fence *fence = NULL;
> >>>>>>>>>>>> -     unsigned seq;
> >>>>>>>>>>>>
> >>>>>>>>>>>>            fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
> >>>>>>>>>>>>            if (fence == NULL)
> >>>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
> >>>>>>>>>>>>            fence->sched = entity->rq->sched;
> >>>>>>>>>>>>            spin_lock_init(&fence->lock);
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     return fence;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>>>>>>> +                       struct drm_sched_entity *entity)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +     unsigned seq;
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            seq = atomic_inc_return(&entity->fence_seq);
> >>>>>>>>>>>>            dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
> >>>>>>>>>>>>                           &fence->lock, entity->fence_context, seq);
> >>>>>>>>>>>>            dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
> >>>>>>>>>>>>                           &fence->lock, entity->fence_context + 1, seq);
> >>>>>>>>>>>> -
> >>>>>>>>>>>> -     return fence;
> >>>>>>>>>>>>        }
> >>>>>>>>>>>>
> >>>>>>>>>>>>        module_init(drm_sched_fence_slab_init);
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>>>>>>>> @@ -48,9 +48,11 @@
> >>>>>>>>>>>>        #include <linux/wait.h>
> >>>>>>>>>>>>        #include <linux/sched.h>
> >>>>>>>>>>>>        #include <linux/completion.h>
> >>>>>>>>>>>> +#include <linux/dma-resv.h>
> >>>>>>>>>>>>        #include <uapi/linux/sched/types.h>
> >>>>>>>>>>>>
> >>>>>>>>>>>>        #include <drm/drm_print.h>
> >>>>>>>>>>>> +#include <drm/drm_gem.h>
> >>>>>>>>>>>>        #include <drm/gpu_scheduler.h>
> >>>>>>>>>>>>        #include <drm/spsc_queue.h>
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>>>>>>
> >>>>>>>>>>>>        /**
> >>>>>>>>>>>>         * drm_sched_job_init - init a scheduler job
> >>>>>>>>>>>> - *
> >>>>>>>>>>>>         * @job: scheduler job to init
> >>>>>>>>>>>>         * @entity: scheduler entity to use
> >>>>>>>>>>>>         * @owner: job owner for debugging
> >>>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
> >>>>>>>>>>>>         * Refer to drm_sched_entity_push_job() documentation
> >>>>>>>>>>>>         * for locking considerations.
> >>>>>>>>>>>>         *
> >>>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
> >>>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>>         * Returns 0 for success, negative error code otherwise.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>>            job->sched = sched;
> >>>>>>>>>>>>            job->entity = entity;
> >>>>>>>>>>>>            job->s_priority = entity->rq - sched->sched_rq;
> >>>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
> >>>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
> >>>>>>>>>>>>            if (!job->s_fence)
> >>>>>>>>>>>>                    return -ENOMEM;
> >>>>>>>>>>>>            job->id = atomic64_inc_return(&sched->job_id_count);
> >>>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_init);
> >>>>>>>>>>>>
> >>>>>>>>>>>>        /**
> >>>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
> >>>>>>>>>>>> + * @job: scheduler job to arm
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
> >>>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
> >>>>>>>>>>>> + * or other places that need to track the completion of this job.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
> >>>>>>>>>>>> + * considerations.
> >>>>>>>>>>>>         *
> >>>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +/**
> >>>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
> >>>>>>>>>>>>         * @job: scheduler job to clean up
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
> >>>>>>>>>>>> + * before drm_sched_job_arm() is called.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
> >>>>>>>>>>>> + * scheduler, and this function should be called from the
> >>>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
> >>>>>>>>>>>>         */
> >>>>>>>>>>>>        void drm_sched_job_cleanup(struct drm_sched_job *job)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
> >>>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
> >>>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
> >>>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
> >>>>>>>>>>>> +     } else {
> >>>>>>>>>>>> +             /* aborted job before committing to run it */
> >>>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
> >>>>>>>>>>>> +     }
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            job->s_fence = NULL;
> >>>>>>>>>>>>        }
> >>>>>>>>>>>>        EXPORT_SYMBOL(drm_sched_job_cleanup);
> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
> >>>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> >>>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
> >>>>>>>>>>>>            if (ret)
> >>>>>>>>>>>>                    return ret;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> >>>>>>>>>>>>
> >>>>>>>>>>>>            /* put by scheduler job completion */
> >>>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> >>>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
> >>>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
> >>>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
> >>>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
> >>>>>>>>>>>>        int drm_sched_job_init(struct drm_sched_job *job,
> >>>>>>>>>>>>                           struct drm_sched_entity *entity,
> >>>>>>>>>>>>                           void *owner);
> >>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
> >>>>>>>>>>>>        void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >>>>>>>>>>>>                                        struct drm_gpu_scheduler **sched_list,
> >>>>>>>>>>>>                                           unsigned int num_sched_list);
> >>>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
> >>>>>>>>>>>>                                       enum drm_sched_priority priority);
> >>>>>>>>>>>>        bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
> >>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
> >>>>>>>>>>>>            struct drm_sched_entity *s_entity, void *owner);
> >>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
> >>>>>>>>>>>> +                       struct drm_sched_entity *entity);
> >>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>        void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
> >>>>>>>>>>>>        void drm_sched_fence_finished(struct drm_sched_fence *fence);
> >>>>>>>>>>>>
> >>>> --
> >>>> Daniel Vetter
> >>>> Software Engineer, Intel Corporation
> >>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580226578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9GhYoGHD6TlcrW5dvT9Z%2BFukW%2F8%2BicK2t8180coEsJY%3D&amp;reserved=0
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C7086790381b9415f60e708d941f78266%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613353580236571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Yt%2FirDjTmtDUjQS1xlYg4x5mz82cHkNyLPkNNpO31ro%3D&amp;reserved=0
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
  2021-07-08 11:20                             ` Daniel Vetter
@ 2021-07-08 11:28                               ` Christian König
  -1 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08 11:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Steven Price, Daniel Vetter, Lucas Stach,
	Russell King, Christian Gmeiner, Qiang Yu, Rob Herring,
	Tomeu Vizoso, Alyssa Rosenzweig, David Airlie, Sumit Semwal,
	Masahiro Yamada, Kees Cook, Adam Borowski, Nick Terrell,
	Mauro Carvalho Chehab, Paul Menzel, Sami Tolvanen, Viresh Kumar,
	Alex Deucher, Dave Airlie, Nirmoy Das, Deepak R Varma, Lee Jones,
	Kevin Wang, Chen Li, Luben Tuikov, Marek Olšák,
	Dennis Li, Maarten Lankhorst, Andrey Grodzovsky, Sonny Jiang,
	Boris Brezillon, Tian Tao, Jack Zhang, The etnaviv authors, lima,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Emma Anholt



Am 08.07.21 um 13:20 schrieb Daniel Vetter:
> On Thu, Jul 8, 2021 at 12:54 PM Christian König
> <christian.koenig@amd.com> wrote:
>> [SNIP]
>>>> As far as I know that not completely correct. The rules around atomics i
>>>> once learned are:
>>>>
>>>> 1. Everything which modifies something is a write barrier.
>>>> 2. Everything which returns something is a read barrier.
>>>>
>>>> And I know a whole bunch of use cases where this is relied upon in the core
>>>> kernel, so I'm pretty sure that's correct.
>>> That's against what the doc says, and also it would mean stuff like
>>> atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.
>>>
>>> On x86 you're right, anywhere else where there's no total store ordering I
>>> you're wrong.
>> Good to know. I always thought that atomic_read_acquire() was just for
>> documentation purpose.
> Maybe you mixed it up with C++ atomics (which I think are now also in
> C)? Those are strongly ordered by default (you can get the weakly
> ordered kernel-style one too). It's a bit unfortunate that the default
> semantics are exactly opposite between kernel and userspace :-/

Yeah, that's most likely it.

>>> If there's code that relies on this it needs to be fixed and properly
>>> documented. I did go through the squeue code a bit, and might be better to
>>> just replace this with a core data structure.
>> Well the spsc was especially crafted for this use case and performed
>> quite a bit better then a double linked list.
> Yeah  double-linked list is awful.
>
>> Or what core data structure do you have in mind?
> Hm I thought there's a ready-made queue primitive, but there's just
> llist.h. Which I think is roughly what the scheduler queue also does.
> Minus the atomic_t for counting how many there are, and aside from the
> tracepoints I don't think we're using those anywhere, we just check
> for is_empty in the code (from a quick look only).

I think we just need to replace the atomic_read() with 
atomic_read_acquire() and the atomic_dec() with atomic_dec_return_release().

Apart from that everything should be working as far as I can see. And 
yes llist.h doesn't really do much different, it just doesn't keeps a 
tail pointer.

Christian.

> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
>>>> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
>>>> the read barrier is the aromic_read() in spsc_queue_count().
>>>>
>>>> The READ_ONCE() is actually not even necessary as far as I can see.
>>>>
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>
>>>>>> atomic op, then it's a full barrier. So yeah you need more here. But
>>>>>> also since you only need a read barrier on one side, and a write
>>>>>> barrier on the other, you don't actually need a cpu barriers on x86.
>>>>>> And READ_ONCE gives you the compiler barrier on one side at least, I
>>>>>> haven't found it on the writer side yet.
>>>>>>
>>>>>>> But yes a comment would be really nice here. I had to think for a while
>>>>>>> why we don't need this as well.
>>>>>> I'm typing a patch, which after a night's sleep I realized has the
>>>>>> wrong barriers. And now I'm also typing some doc improvements for
>>>>>> drm_sched_entity and related functions.
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also improve the kerneldoc for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>>>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>>>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>         drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>>>>>>>>         drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>>>>>>>>         drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>>>>>>>>         drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>>>>>>>>         include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>>>>>>>>         10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>>>>>>>>             if (r)
>>>>>>>>>>>>>>                     goto error_unlock;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>>>>>>>>              * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>>>>>>>>              * added to BOs.
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>>>>>>>>             if (r)
>>>>>>>>>>>>>>                     return r;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>>>             amdgpu_job_free_resources(job);
>>>>>>>>>>>>>>             drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>>>>>>>>             if (ret)
>>>>>>>>>>>>>>                     goto out_unlock;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>>>>>>>>             submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>>>>>>>>                                                     submit->out_fence, 0,
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>>>>>>>>                     return err;
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             task->num_bos = num_bos;
>>>>>>>>>>>>>>             task->vm = lima_vm_get(vm);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>>>>>>>>                     goto unlock;
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>>>>>>>>          * @sched_job: job to submit
>>>>>>>>>>>>>>          * @entity: scheduler entity
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>>>>>>>>> - * the job's fence sequence number this function should be
>>>>>>>>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>>>>>>>>> + * under common lock.
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>>          * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>>          * Free up the fence memory after the RCU grace period.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>             struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>>>>>>>>             struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>         EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>>>> -                                            void *owner)
>>>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>>>>>>>>> +                                           void *owner)
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>             struct drm_sched_fence *fence = NULL;
>>>>>>>>>>>>>> -     unsigned seq;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>>>>>>>>             if (fence == NULL)
>>>>>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>>>>             fence->sched = entity->rq->sched;
>>>>>>>>>>>>>>             spin_lock_init(&fence->lock);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     return fence;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +     unsigned seq;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>>>>>>>>             dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>>>>>>>>                            &fence->lock, entity->fence_context, seq);
>>>>>>>>>>>>>>             dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>>>>>>>>                            &fence->lock, entity->fence_context + 1, seq);
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -     return fence;
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         module_init(drm_sched_fence_slab_init);
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>>>>>>>>         #include <linux/wait.h>
>>>>>>>>>>>>>>         #include <linux/sched.h>
>>>>>>>>>>>>>>         #include <linux/completion.h>
>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>         #include <uapi/linux/sched/types.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         #include <drm/drm_print.h>
>>>>>>>>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>>>>>>>>         #include <drm/gpu_scheduler.h>
>>>>>>>>>>>>>>         #include <drm/spsc_queue.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         /**
>>>>>>>>>>>>>>          * drm_sched_job_init - init a scheduler job
>>>>>>>>>>>>>> - *
>>>>>>>>>>>>>>          * @job: scheduler job to init
>>>>>>>>>>>>>>          * @entity: scheduler entity to use
>>>>>>>>>>>>>>          * @owner: job owner for debugging
>>>>>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>>>          * Refer to drm_sched_entity_push_job() documentation
>>>>>>>>>>>>>>          * for locking considerations.
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>          * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>>         int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>>             job->sched = sched;
>>>>>>>>>>>>>>             job->entity = entity;
>>>>>>>>>>>>>>             job->s_priority = entity->rq - sched->sched_rq;
>>>>>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>>>>>>>>             if (!job->s_fence)
>>>>>>>>>>>>>>                     return -ENOMEM;
>>>>>>>>>>>>>>             job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>>         EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         /**
>>>>>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>>>>>>>>> + * @job: scheduler job to arm
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>>>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>>>>>>>>> + * considerations.
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>>>>          * @job: scheduler job to clean up
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>>>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>>>>>>>>> + * scheduler, and this function should be called from the
>>>>>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>>         void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>>>> +     } else {
>>>>>>>>>>>>>> +             /* aborted job before committing to run it */
>>>>>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>>>>>>>>> +     }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             job->s_fence = NULL;
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>         EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>>>>>>>>             if (ret)
>>>>>>>>>>>>>>                     return ret;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             /* put by scheduler job completion */
>>>>>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>>>>>>>         int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>>                            struct drm_sched_entity *entity,
>>>>>>>>>>>>>>                            void *owner);
>>>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>>>>>>>>         void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>>>>>>>>                                         struct drm_gpu_scheduler **sched_list,
>>>>>>>>>>>>>>                                            unsigned int num_sched_list);
>>>>>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>>>>>>>>                                        enum drm_sched_priority priority);
>>>>>>>>>>>>>>         bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>>>>>>>>             struct drm_sched_entity *s_entity, void *owner);
>>>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>         void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>>>>>>>>         void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>>>>>>>>
>>>>>> --
>>>>>> Daniel Vetter
>>>>>> Software Engineer, Intel Corporation
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9ff11edafb334411dbf508d942026d53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613400464979063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MMhNTs1WSu%2B07ho3MOap4fbbpAh2vkCd0IJ0snpYvYo%3D&amp;reserved=0
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9ff11edafb334411dbf508d942026d53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613400464979063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MMhNTs1WSu%2B07ho3MOap4fbbpAh2vkCd0IJ0snpYvYo%3D&amp;reserved=0
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 01/11] drm/sched: Split drm_sched_job_init
@ 2021-07-08 11:28                               ` Christian König
  0 siblings, 0 replies; 58+ messages in thread
From: Christian König @ 2021-07-08 11:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Emma Anholt, Adam Borowski, David Airlie, Viresh Kumar,
	DRI Development, Sonny Jiang, Nirmoy Das, Daniel Vetter,
	Lee Jones, Jack Zhang, lima, Mauro Carvalho Chehab,
	Masahiro Yamada, Steven Price, Luben Tuikov, Alyssa Rosenzweig,
	Sami Tolvanen, Russell King, Dave Airlie, Dennis Li, Chen Li,
	Paul Menzel, Kees Cook, Marek Olšák, Kevin Wang,
	The etnaviv authors, moderated list:DMA BUFFER SHARING FRAMEWORK,
	Nick Terrell, Deepak R Varma, Tomeu Vizoso, Boris Brezillon,
	Qiang Yu, Alex Deucher, Tian Tao,
	open list:DMA BUFFER SHARING FRAMEWORK



Am 08.07.21 um 13:20 schrieb Daniel Vetter:
> On Thu, Jul 8, 2021 at 12:54 PM Christian König
> <christian.koenig@amd.com> wrote:
>> [SNIP]
>>>> As far as I know that not completely correct. The rules around atomics i
>>>> once learned are:
>>>>
>>>> 1. Everything which modifies something is a write barrier.
>>>> 2. Everything which returns something is a read barrier.
>>>>
>>>> And I know a whole bunch of use cases where this is relied upon in the core
>>>> kernel, so I'm pretty sure that's correct.
>>> That's against what the doc says, and also it would mean stuff like
>>> atomic_read_acquire or smp_mb__after/before_atomic is completely pointless.
>>>
>>> On x86 you're right, anywhere else where there's no total store ordering I
>>> you're wrong.
>> Good to know. I always thought that atomic_read_acquire() was just for
>> documentation purpose.
> Maybe you mixed it up with C++ atomics (which I think are now also in
> C)? Those are strongly ordered by default (you can get the weakly
> ordered kernel-style one too). It's a bit unfortunate that the default
> semantics are exactly opposite between kernel and userspace :-/

Yeah, that's most likely it.

>>> If there's code that relies on this it needs to be fixed and properly
>>> documented. I did go through the squeue code a bit, and might be better to
>>> just replace this with a core data structure.
>> Well the spsc was especially crafted for this use case and performed
>> quite a bit better then a double linked list.
> Yeah  double-linked list is awful.
>
>> Or what core data structure do you have in mind?
> Hm I thought there's a ready-made queue primitive, but there's just
> llist.h. Which I think is roughly what the scheduler queue also does.
> Minus the atomic_t for counting how many there are, and aside from the
> tracepoints I don't think we're using those anywhere, we just check
> for is_empty in the code (from a quick look only).

I think we just need to replace the atomic_read() with 
atomic_read_acquire() and the atomic_dec() with atomic_dec_return_release().

Apart from that everything should be working as far as I can see. And 
yes llist.h doesn't really do much different, it just doesn't keeps a 
tail pointer.

Christian.

> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
>>>> In this case the write barrier is the atomic_dec() in spsc_queue_pop() and
>>>> the read barrier is the aromic_read() in spsc_queue_count().
>>>>
>>>> The READ_ONCE() is actually not even necessary as far as I can see.
>>>>
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>
>>>>>> atomic op, then it's a full barrier. So yeah you need more here. But
>>>>>> also since you only need a read barrier on one side, and a write
>>>>>> barrier on the other, you don't actually need a cpu barriers on x86.
>>>>>> And READ_ONCE gives you the compiler barrier on one side at least, I
>>>>>> haven't found it on the writer side yet.
>>>>>>
>>>>>>> But yes a comment would be really nice here. I had to think for a while
>>>>>>> why we don't need this as well.
>>>>>> I'm typing a patch, which after a night's sleep I realized has the
>>>>>> wrong barriers. And now I'm also typing some doc improvements for
>>>>>> drm_sched_entity and related functions.
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also improve the kerneldoc for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Acked-by: Steven Price <steven.price@arm.com> (v2)
>>>>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>>>>>>>>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>>>>>>>>>>> Cc: Russell King <linux+etnaviv@armlinux.org.uk>
>>>>>>>>>>>>>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>>>>>>>>>>>>>> Cc: Qiang Yu <yuq825@gmail.com>
>>>>>>>>>>>>>> Cc: Rob Herring <robh@kernel.org>
>>>>>>>>>>>>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>>>>>>>>>>>>> Cc: Steven Price <steven.price@arm.com>
>>>>>>>>>>>>>> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
>>>>>>>>>>>>>> Cc: David Airlie <airlied@linux.ie>
>>>>>>>>>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>>>>>>>>> Cc: "Christian König" <christian.koenig@amd.com>
>>>>>>>>>>>>>> Cc: Masahiro Yamada <masahiroy@kernel.org>
>>>>>>>>>>>>>> Cc: Kees Cook <keescook@chromium.org>
>>>>>>>>>>>>>> Cc: Adam Borowski <kilobyte@angband.pl>
>>>>>>>>>>>>>> Cc: Nick Terrell <terrelln@fb.com>
>>>>>>>>>>>>>> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>>>>>>>>>>>>>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>
>>>>>>>>>>>>>> Cc: Sami Tolvanen <samitolvanen@google.com>
>>>>>>>>>>>>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>>>> Cc: Dave Airlie <airlied@redhat.com>
>>>>>>>>>>>>>> Cc: Nirmoy Das <nirmoy.das@amd.com>
>>>>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825@gmail.com>
>>>>>>>>>>>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang@amd.com>
>>>>>>>>>>>>>> Cc: Chen Li <chenli@uniontech.com>
>>>>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@amd.com>
>>>>>>>>>>>>>> Cc: "Marek Olšák" <marek.olsak@amd.com>
>>>>>>>>>>>>>> Cc: Dennis Li <Dennis.Li@amd.com>
>>>>>>>>>>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>>>>>>>>>> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>>>> Cc: Sonny Jiang <sonny.jiang@amd.com>
>>>>>>>>>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>>>>>>>>>> Cc: Tian Tao <tiantao6@hisilicon.com>
>>>>>>>>>>>>>> Cc: Jack Zhang <Jack.Zhang1@amd.com>
>>>>>>>>>>>>>> Cc: etnaviv@lists.freedesktop.org
>>>>>>>>>>>>>> Cc: lima@lists.freedesktop.org
>>>>>>>>>>>>>> Cc: linux-media@vger.kernel.org
>>>>>>>>>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>>>>>>>>>> Cc: Emma Anholt <emma@anholt.net>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>         drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/etnaviv/etnaviv_sched.c  |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/lima/lima_sched.c        |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/panfrost/panfrost_job.c  |  2 ++
>>>>>>>>>>>>>>         drivers/gpu/drm/scheduler/sched_entity.c |  6 ++--
>>>>>>>>>>>>>>         drivers/gpu/drm/scheduler/sched_fence.c  | 17 +++++----
>>>>>>>>>>>>>>         drivers/gpu/drm/scheduler/sched_main.c   | 46 +++++++++++++++++++++---
>>>>>>>>>>>>>>         drivers/gpu/drm/v3d/v3d_gem.c            |  2 ++
>>>>>>>>>>>>>>         include/drm/gpu_scheduler.h              |  7 +++-
>>>>>>>>>>>>>>         10 files changed, 74 insertions(+), 14 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>>> index c5386d13eb4a..a4ec092af9a7 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>>> @@ -1226,6 +1226,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>>>>>>>>>>>>>             if (r)
>>>>>>>>>>>>>>                     goto error_unlock;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             /* No memory allocation is allowed while holding the notifier lock.
>>>>>>>>>>>>>>              * The lock is held until amdgpu_cs_submit is finished and fence is
>>>>>>>>>>>>>>              * added to BOs.
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>>>> index d33e6d97cc89..5ddb955d2315 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>>>>>>> @@ -170,6 +170,8 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
>>>>>>>>>>>>>>             if (r)
>>>>>>>>>>>>>>                     return r;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             *f = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>>>             amdgpu_job_free_resources(job);
>>>>>>>>>>>>>>             drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>>>> index feb6da1b6ceb..05f412204118 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>>>>>>>> @@ -163,6 +163,8 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity,
>>>>>>>>>>>>>>             if (ret)
>>>>>>>>>>>>>>                     goto out_unlock;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&submit->sched_job);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
>>>>>>>>>>>>>>             submit->out_fence_id = idr_alloc_cyclic(&submit->gpu->fence_idr,
>>>>>>>>>>>>>>                                                     submit->out_fence, 0,
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>>>> index dba8329937a3..38f755580507 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>>>>>>>> @@ -129,6 +129,8 @@ int lima_sched_task_init(struct lima_sched_task *task,
>>>>>>>>>>>>>>                     return err;
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&task->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             task->num_bos = num_bos;
>>>>>>>>>>>>>>             task->vm = lima_vm_get(vm);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>>>> index 71a72fb50e6b..2992dc85325f 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>>>>>>>> @@ -288,6 +288,8 @@ int panfrost_job_push(struct panfrost_job *job)
>>>>>>>>>>>>>>                     goto unlock;
>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             ret = panfrost_acquire_object_fences(job->bos, job->bo_count,
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>>>> index 79554aa4dbb1..f7347c284886 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>>>>>>>>>>>> @@ -485,9 +485,9 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>>>>>>>>>>>          * @sched_job: job to submit
>>>>>>>>>>>>>>          * @entity: scheduler entity
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>> - * Note: To guarantee that the order of insertion to queue matches
>>>>>>>>>>>>>> - * the job's fence sequence number this function should be
>>>>>>>>>>>>>> - * called with drm_sched_job_init under common lock.
>>>>>>>>>>>>>> + * Note: To guarantee that the order of insertion to queue matches the job's
>>>>>>>>>>>>>> + * fence sequence number this function should be called with drm_sched_job_arm()
>>>>>>>>>>>>>> + * under common lock.
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>>          * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>>>> index 69de2c76731f..c451ee9a30d7 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>>>>>>>>>> @@ -90,7 +90,7 @@ static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>>          * Free up the fence memory after the RCU grace period.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>> -static void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu)
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>             struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>>>>>>>>>>>>>>             struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>>>>>>>>>>>>> @@ -152,11 +152,10 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>         EXPORT_SYMBOL(to_drm_sched_fence);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>>>> -                                            void *owner)
>>>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>>>>>>>>>> +                                           void *owner)
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>             struct drm_sched_fence *fence = NULL;
>>>>>>>>>>>>>> -     unsigned seq;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             fence = kmem_cache_zalloc(sched_fence_slab, GFP_KERNEL);
>>>>>>>>>>>>>>             if (fence == NULL)
>>>>>>>>>>>>>> @@ -166,13 +165,19 @@ struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity *entity,
>>>>>>>>>>>>>>             fence->sched = entity->rq->sched;
>>>>>>>>>>>>>>             spin_lock_init(&fence->lock);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     return fence;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>>>> +                       struct drm_sched_entity *entity)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +     unsigned seq;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             seq = atomic_inc_return(&entity->fence_seq);
>>>>>>>>>>>>>>             dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>>>>>>>>>>                            &fence->lock, entity->fence_context, seq);
>>>>>>>>>>>>>>             dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>>>>>>>>>>                            &fence->lock, entity->fence_context + 1, seq);
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -     return fence;
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         module_init(drm_sched_fence_slab_init);
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>> index 33c414d55fab..5e84e1500c32 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>> @@ -48,9 +48,11 @@
>>>>>>>>>>>>>>         #include <linux/wait.h>
>>>>>>>>>>>>>>         #include <linux/sched.h>
>>>>>>>>>>>>>>         #include <linux/completion.h>
>>>>>>>>>>>>>> +#include <linux/dma-resv.h>
>>>>>>>>>>>>>>         #include <uapi/linux/sched/types.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         #include <drm/drm_print.h>
>>>>>>>>>>>>>> +#include <drm/drm_gem.h>
>>>>>>>>>>>>>>         #include <drm/gpu_scheduler.h>
>>>>>>>>>>>>>>         #include <drm/spsc_queue.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -569,7 +571,6 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         /**
>>>>>>>>>>>>>>          * drm_sched_job_init - init a scheduler job
>>>>>>>>>>>>>> - *
>>>>>>>>>>>>>>          * @job: scheduler job to init
>>>>>>>>>>>>>>          * @entity: scheduler entity to use
>>>>>>>>>>>>>>          * @owner: job owner for debugging
>>>>>>>>>>>>>> @@ -577,6 +578,9 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs_ext);
>>>>>>>>>>>>>>          * Refer to drm_sched_entity_push_job() documentation
>>>>>>>>>>>>>>          * for locking considerations.
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>> + * Drivers must make sure drm_sched_job_cleanup() if this function returns
>>>>>>>>>>>>>> + * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>>          * Returns 0 for success, negative error code otherwise.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>>         int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>> @@ -594,7 +598,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>>             job->sched = sched;
>>>>>>>>>>>>>>             job->entity = entity;
>>>>>>>>>>>>>>             job->s_priority = entity->rq - sched->sched_rq;
>>>>>>>>>>>>>> -     job->s_fence = drm_sched_fence_create(entity, owner);
>>>>>>>>>>>>>> +     job->s_fence = drm_sched_fence_alloc(entity, owner);
>>>>>>>>>>>>>>             if (!job->s_fence)
>>>>>>>>>>>>>>                     return -ENOMEM;
>>>>>>>>>>>>>>             job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>>>>>>>> @@ -606,13 +610,47 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>>         EXPORT_SYMBOL(drm_sched_job_init);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         /**
>>>>>>>>>>>>>> - * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>>>> + * drm_sched_job_arm - arm a scheduler job for execution
>>>>>>>>>>>>>> + * @job: scheduler job to arm
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * This arms a scheduler job for execution. Specifically it initializes the
>>>>>>>>>>>>>> + * &drm_sched_job.s_fence of @job, so that it can be attached to struct dma_resv
>>>>>>>>>>>>>> + * or other places that need to track the completion of this job.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Refer to drm_sched_entity_push_job() documentation for locking
>>>>>>>>>>>>>> + * considerations.
>>>>>>>>>>>>>>          *
>>>>>>>>>>>>>> + * This can only be called if drm_sched_job_init() succeeded.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +     drm_sched_fence_init(job->s_fence, job->entity);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +EXPORT_SYMBOL(drm_sched_job_arm);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * drm_sched_job_cleanup - clean up scheduler job resources
>>>>>>>>>>>>>>          * @job: scheduler job to clean up
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Cleans up the resources allocated with drm_sched_job_init().
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * Drivers should call this from their error unwind code if @job is aborted
>>>>>>>>>>>>>> + * before drm_sched_job_arm() is called.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * After that point of no return @job is committed to be executed by the
>>>>>>>>>>>>>> + * scheduler, and this function should be called from the
>>>>>>>>>>>>>> + * &drm_sched_backend_ops.free_job callback.
>>>>>>>>>>>>>>          */
>>>>>>>>>>>>>>         void drm_sched_job_cleanup(struct drm_sched_job *job)
>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>> -     dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>>>> +     if (!kref_read(&job->s_fence->finished.refcount)) {
>>>>>>>>>>>>>> +             /* drm_sched_job_arm() has been called */
>>>>>>>>>>>>>> +             dma_fence_put(&job->s_fence->finished);
>>>>>>>>>>>>>> +     } else {
>>>>>>>>>>>>>> +             /* aborted job before committing to run it */
>>>>>>>>>>>>>> +             drm_sched_fence_free(&job->s_fence->finished.rcu);
>>>>>>>>>>>>>> +     }
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             job->s_fence = NULL;
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>         EXPORT_SYMBOL(drm_sched_job_cleanup);
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>>>> index 4eb354226972..5c3a99027ecd 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>>>>>>>>>>>>>> @@ -475,6 +475,8 @@ v3d_push_job(struct v3d_file_priv *v3d_priv,
>>>>>>>>>>>>>>             if (ret)
>>>>>>>>>>>>>>                     return ret;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +     drm_sched_job_arm(&job->base);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>             job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>             /* put by scheduler job completion */
>>>>>>>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>>>>>>>> index 88ae7f331bb1..83afc3aa8e2f 100644
>>>>>>>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>>>>>>>> @@ -348,6 +348,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>>>>>>>         int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>>>>>>>>                            struct drm_sched_entity *entity,
>>>>>>>>>>>>>>                            void *owner);
>>>>>>>>>>>>>> +void drm_sched_job_arm(struct drm_sched_job *job);
>>>>>>>>>>>>>>         void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>>>>>>>>>>>>>>                                         struct drm_gpu_scheduler **sched_list,
>>>>>>>>>>>>>>                                            unsigned int num_sched_list);
>>>>>>>>>>>>>> @@ -387,8 +388,12 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>>>>>>>>>>>>>>                                        enum drm_sched_priority priority);
>>>>>>>>>>>>>>         bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>>>>>>>> +struct drm_sched_fence *drm_sched_fence_alloc(
>>>>>>>>>>>>>>             struct drm_sched_entity *s_entity, void *owner);
>>>>>>>>>>>>>> +void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>>>>>>>>>> +                       struct drm_sched_entity *entity);
>>>>>>>>>>>>>> +void drm_sched_fence_free(struct rcu_head *rcu);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>         void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>>>>>>>>>>>>         void drm_sched_fence_finished(struct drm_sched_fence *fence);
>>>>>>>>>>>>>>
>>>>>> --
>>>>>> Daniel Vetter
>>>>>> Software Engineer, Intel Corporation
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9ff11edafb334411dbf508d942026d53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613400464979063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MMhNTs1WSu%2B07ho3MOap4fbbpAh2vkCd0IJ0snpYvYo%3D&amp;reserved=0
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C9ff11edafb334411dbf508d942026d53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637613400464979063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MMhNTs1WSu%2B07ho3MOap4fbbpAh2vkCd0IJ0snpYvYo%3D&amp;reserved=0
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2021-07-08 11:28 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 21:38 [PATCH v2 00/11] drm/scheduler dependency tracking Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 01/11] drm/sched: Split drm_sched_job_init Daniel Vetter
2021-07-02 21:38   ` Daniel Vetter
2021-07-07  9:29   ` Christian König
2021-07-07  9:29     ` Christian König
2021-07-07 11:14     ` Daniel Vetter
2021-07-07 11:14       ` Daniel Vetter
2021-07-07 11:57       ` Christian König
2021-07-07 11:57         ` Christian König
2021-07-07 12:13         ` Daniel Vetter
2021-07-07 12:13           ` Daniel Vetter
2021-07-07 12:58           ` Christian König
2021-07-07 12:58             ` Christian König
2021-07-07 16:32             ` Daniel Vetter
2021-07-07 16:32               ` Daniel Vetter
2021-07-08  6:56               ` Christian König
2021-07-08  6:56                 ` Christian König
2021-07-08  7:09                 ` Daniel Vetter
2021-07-08  7:09                   ` Daniel Vetter
2021-07-08  7:19                   ` Daniel Vetter
2021-07-08  7:19                     ` Daniel Vetter
2021-07-08  7:53                     ` Christian König
2021-07-08  7:53                       ` Christian König
2021-07-08 10:02                       ` Daniel Vetter
2021-07-08 10:02                         ` Daniel Vetter
2021-07-08 10:54                         ` Christian König
2021-07-08 10:54                           ` Christian König
2021-07-08 11:20                           ` Daniel Vetter
2021-07-08 11:20                             ` Daniel Vetter
2021-07-08 11:28                             ` Christian König
2021-07-08 11:28                               ` Christian König
2021-07-02 21:38 ` [PATCH v2 02/11] drm/sched: Add dependency tracking Daniel Vetter
2021-07-02 21:38   ` Daniel Vetter
2021-07-07  9:26   ` [Linaro-mm-sig] " Christian König
2021-07-07  9:26     ` Christian König
2021-07-07 11:23     ` Daniel Vetter
2021-07-07 11:23       ` Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 03/11] drm/sched: drop entity parameter from drm_sched_push_job Daniel Vetter
2021-07-02 21:38   ` Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 04/11] drm/panfrost: use scheduler dependency tracking Daniel Vetter
2021-07-02 21:38   ` Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 05/11] drm/lima: " Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 06/11] drm/v3d: Move drm_sched_job_init to v3d_job_init Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 07/11] drm/v3d: Use scheduler dependency handling Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 08/11] drm/etnaviv: " Daniel Vetter
2021-07-02 21:38   ` Daniel Vetter
2021-07-07  9:08   ` Lucas Stach
2021-07-07  9:08     ` Lucas Stach
2021-07-07 11:26     ` Daniel Vetter
2021-07-07 11:26       ` Daniel Vetter
2021-07-07 11:32       ` Daniel Vetter
2021-07-07 11:32         ` Daniel Vetter
2021-07-07 12:34         ` Lucas Stach
2021-07-07 12:34           ` Lucas Stach
2021-07-02 21:38 ` [PATCH v2 09/11] drm/gem: Delete gem array fencing helpers Daniel Vetter
2021-07-02 21:38   ` Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 10/11] drm/sched: Don't store self-dependencies Daniel Vetter
2021-07-02 21:38 ` [PATCH v2 11/11] drm/sched: Check locking in drm_sched_job_await_implicit Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.