[PATCH v4 1/5] drm/amd/display: wait for fence without holding reservation lock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 1/5] drm/amd/display: wait for fence without holding reservation lock
@ 2019-04-16 18:23 Andrey Grodzovsky
  2019-04-16 18:23 ` [PATCH v4 2/5] drm/amd/display: Use a reasonable timeout for framebuffer fence waits Andrey Grodzovsky
       [not found] ` <1555438986-18303-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 2 replies; 12+ messages in thread
From: Andrey Grodzovsky @ 2019-04-16 18:23 UTC (permalink / raw)
  To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	eric-WhKQ6XTQaPysTnJN9+BGXg,
	etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Nicholas.Kazlauskas-5C7GfCeVMHo, Christian König

From: Christian König <ckoenig.leichtzumerken@gmail.com>

Don't block others while waiting for the fences to finish, concurrent
submission is perfectly valid in this case and holding the lock can
prevent killed applications from terminating.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 380a7f9..ad4f0e5 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4814,23 +4814,26 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 			continue;
 		}
 
+		abo = gem_to_amdgpu_bo(fb->obj[0]);
+
+		/* Wait for all fences on this FB */
+		r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true,
+							false,
+							MAX_SCHEDULE_TIMEOUT);
+		WARN_ON(r < 0);
+
 		/*
 		 * TODO This might fail and hence better not used, wait
 		 * explicitly on fences instead
 		 * and in general should be called for
 		 * blocking commit to as per framework helpers
 		 */
-		abo = gem_to_amdgpu_bo(fb->obj[0]);
 		r = amdgpu_bo_reserve(abo, true);
 		if (unlikely(r != 0)) {
 			DRM_ERROR("failed to reserve buffer before flip\n");
 			WARN_ON(1);
 		}
 
-		/* Wait for all fences on this FB */
-		WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
-									    MAX_SCHEDULE_TIMEOUT) < 0);
-
 		amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
 
 		amdgpu_bo_unreserve(abo);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/5] drm/amd/display: Use a reasonable timeout for framebuffer fence waits
  2019-04-16 18:23 [PATCH v4 1/5] drm/amd/display: wait for fence without holding reservation lock Andrey Grodzovsky
@ 2019-04-16 18:23 ` Andrey Grodzovsky
       [not found] ` <1555438986-18303-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Andrey Grodzovsky @ 2019-04-16 18:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken
  Cc: Nicholas.Kazlauskas

Patch '5edb0c9b Fix deadlock with display during hanged ring recovery'
was accidentaly removed during one of DALs code merges.

v4: Update description.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ad4f0e5..88e42ad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4816,11 +4816,16 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 
 		abo = gem_to_amdgpu_bo(fb->obj[0]);
 
-		/* Wait for all fences on this FB */
+		/*
+		 * Wait for all fences on this FB. Do limited wait to avoid
+		 * deadlock during GPU reset when this fence will not signal
+		 * but we hold reservation lock for the BO.
+		 */
 		r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true,
 							false,
-							MAX_SCHEDULE_TIMEOUT);
-		WARN_ON(r < 0);
+							msecs_to_jiffies(5000));
+		if (unlikely(r <= 0))
+			DRM_ERROR("Waiting for fences timed out or interrupted!");
 
 		/*
 		 * TODO This might fail and hence better not used, wait
@@ -4829,10 +4834,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 		 * blocking commit to as per framework helpers
 		 */
 		r = amdgpu_bo_reserve(abo, true);
-		if (unlikely(r != 0)) {
+		if (unlikely(r != 0))
 			DRM_ERROR("failed to reserve buffer before flip\n");
-			WARN_ON(1);
-		}
 
 		amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
 
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/5] drm/scheduler: rework job destruction
       [not found] ` <1555438986-18303-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-16 18:23   ` Andrey Grodzovsky
  2019-04-17 14:36     ` Grodzovsky, Andrey
  2019-04-16 18:23   ` [PATCH v4 4/5] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky
  2019-04-16 18:23   ` [PATCH v4 5/5] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky
  2 siblings, 1 reply; 12+ messages in thread
From: Andrey Grodzovsky @ 2019-04-16 18:23 UTC (permalink / raw)
  To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	eric-WhKQ6XTQaPysTnJN9+BGXg,
	etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo, Christian König

From: Christian König <christian.koenig@amd.com>

We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.
v4: Move guilty job free into sched code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
 drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
 drivers/gpu/drm/lima/lima_sched.c          |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 145 +++++++++++++++++------------
 drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
 include/drm/gpu_scheduler.h                |   6 +-
 8 files changed, 94 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7cee269..a0e165c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 		if (!ring || !ring->sched.thread)
 			continue;
 
-		drm_sched_stop(&ring->sched);
+		drm_sched_stop(&ring->sched, &job->base);
 
 		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
 		amdgpu_fence_driver_force_completion(ring);
@@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 	if(job)
 		drm_sched_increase_karma(&job->base);
 
-
-
 	if (!amdgpu_sriov_vf(adev)) {
 
 		if (!need_full_reset)
@@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
 	return r;
 }
 
-static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
-					  struct amdgpu_job *job)
+static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
 {
 	int i;
 
@@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
 	/* Post ASIC reset for all devs .*/
 	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
-		amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
+		amdgpu_device_post_asic_reset(tmp_adev);
 
 		if (r) {
 			/* bad news, how to tell it to userspace ? */
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
index 33854c9..5778d9c 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
@@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
 		    mmu_size + gpu->buffer.size;
 
 	/* Add in the active command buffers */
-	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
 	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
 		submit = to_etnaviv_submit(s_job);
 		file_size += submit->cmdbuf.size;
 		n_obj++;
 	}
-	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
 
 	/* Add in the active buffer objects */
 	list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
@@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
 			      gpu->buffer.size,
 			      etnaviv_cmdbuf_get_va(&gpu->buffer));
 
-	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
 	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
 		submit = to_etnaviv_submit(s_job);
 		etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
 				      submit->cmdbuf.vaddr, submit->cmdbuf.size,
 				      etnaviv_cmdbuf_get_va(&submit->cmdbuf));
 	}
-	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
 
 	/* Reserve space for the bomap */
 	if (n_bomap_pages) {
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 6d24fea..a813c82 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 	}
 
 	/* block scheduler */
-	drm_sched_stop(&gpu->sched);
+	drm_sched_stop(&gpu->sched, sched_job);
 
 	if(sched_job)
 		drm_sched_increase_karma(sched_job);
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 97bd9c1..df98931 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job)
 static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe,
 					 struct lima_sched_task *task)
 {
-	drm_sched_stop(&pipe->base);
+	drm_sched_stop(&pipe->base, &task->base);
 
 	if (task)
 		drm_sched_increase_karma(&task->base);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 0a7ed04..c6336b7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
 		sched_job);
 
 	for (i = 0; i < NUM_JOB_SLOTS; i++)
-		drm_sched_stop(&pfdev->js->queue[i].sched);
+		drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
 
 	if (sched_job)
 		drm_sched_increase_karma(sched_job);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 19fc601..21e8734 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
 }
 EXPORT_SYMBOL(drm_sched_resume_timeout);
 
-/* job_finish is called after hw fence signaled
- */
-static void drm_sched_job_finish(struct work_struct *work)
-{
-	struct drm_sched_job *s_job = container_of(work, struct drm_sched_job,
-						   finish_work);
-	struct drm_gpu_scheduler *sched = s_job->sched;
-	unsigned long flags;
-
-	/*
-	 * Canceling the timeout without removing our job from the ring mirror
-	 * list is safe, as we will only end up in this worker if our jobs
-	 * finished fence has been signaled. So even if some another worker
-	 * manages to find this job as the next job in the list, the fence
-	 * signaled check below will prevent the timeout to be restarted.
-	 */
-	cancel_delayed_work_sync(&sched->work_tdr);
-
-	spin_lock_irqsave(&sched->job_list_lock, flags);
-	/* queue TDR for next job */
-	drm_sched_start_timeout(sched);
-	spin_unlock_irqrestore(&sched->job_list_lock, flags);
-
-	sched->ops->free_job(s_job);
-}
-
 static void drm_sched_job_begin(struct drm_sched_job *s_job)
 {
 	struct drm_gpu_scheduler *sched = s_job->sched;
@@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
 	if (job)
 		job->sched->ops->timedout_job(job);
 
+	/*
+	 * Guilty job did complete and hence needs to be manually removed
+	 * See drm_sched_stop doc.
+	 */
+	if (list_empty(&job->node))
+		job->sched->ops->free_job(job);
+
 	spin_lock_irqsave(&sched->job_list_lock, flags);
 	drm_sched_start_timeout(sched);
 	spin_unlock_irqrestore(&sched->job_list_lock, flags);
@@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
  * @sched: scheduler instance
  * @bad: bad scheduler job
  *
+ * Stop the scheduler and also removes and frees all completed jobs.
+ * Note: bad job will not be freed as it might be used later and so it's
+ * callers responsibility to release it manually if it's not part of the
+ * mirror list any more.
+ *
  */
-void drm_sched_stop(struct drm_gpu_scheduler *sched)
+void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 {
-	struct drm_sched_job *s_job;
+	struct drm_sched_job *s_job, *tmp;
 	unsigned long flags;
-	struct dma_fence *last_fence =  NULL;
 
 	kthread_park(sched->thread);
 
 	/*
-	 * Verify all the signaled jobs in mirror list are removed from the ring
-	 * by waiting for the latest job to enter the list. This should insure that
-	 * also all the previous jobs that were in flight also already singaled
-	 * and removed from the list.
+	 * Iterate the job list from later to  earlier one and either deactive
+	 * their HW callbacks or remove them from mirror list if they already
+	 * signaled.
+	 * This iteration is thread safe as sched thread is stopped.
 	 */
-	spin_lock_irqsave(&sched->job_list_lock, flags);
-	list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
+	list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) {
 		if (s_job->s_fence->parent &&
 		    dma_fence_remove_callback(s_job->s_fence->parent,
 					      &s_job->cb)) {
@@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched)
 			s_job->s_fence->parent = NULL;
 			atomic_dec(&sched->hw_rq_count);
 		} else {
-			 last_fence = dma_fence_get(&s_job->s_fence->finished);
-			 break;
+			/*
+			 * remove job from ring_mirror_list.
+			 * Locking here is for concurrent resume timeout
+			 */
+			spin_lock_irqsave(&sched->job_list_lock, flags);
+			list_del_init(&s_job->node);
+			spin_unlock_irqrestore(&sched->job_list_lock, flags);
+
+			/*
+			 * Wait for job's HW fence callback to finish using s_job
+			 * before releasing it.
+			 *
+			 * Job is still alive so fence refcount at least 1
+			 */
+			dma_fence_wait(&s_job->s_fence->finished, false);
+
+			/*
+			 * We must keep bad job alive for later use during
+			 * recovery by some of the drivers
+			 */
+			if (bad != s_job)
+				sched->ops->free_job(s_job);
 		}
 	}
-	spin_unlock_irqrestore(&sched->job_list_lock, flags);
-
-	if (last_fence) {
-		dma_fence_wait(last_fence, false);
-		dma_fence_put(last_fence);
-	}
 }
 
 EXPORT_SYMBOL(drm_sched_stop);
@@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
 void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 {
 	struct drm_sched_job *s_job, *tmp;
+	unsigned long flags;
 	int r;
 
 	if (!full_recovery)
@@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 
 	/*
 	 * Locking the list is not required here as the sched thread is parked
-	 * so no new jobs are being pushed in to HW and in drm_sched_stop we
-	 * flushed all the jobs who were still in mirror list but who already
-	 * signaled and removed them self from the list. Also concurrent
+	 * so no new jobs are being inserted or removed. Also concurrent
 	 * GPU recovers can't run in parallel.
 	 */
 	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
@@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 			drm_sched_process_job(NULL, &s_job->cb);
 	}
 
+	spin_lock_irqsave(&sched->job_list_lock, flags);
 	drm_sched_start_timeout(sched);
+	spin_unlock_irqrestore(&sched->job_list_lock, flags);
 
 unpark:
 	kthread_unpark(sched->thread);
@@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
 	uint64_t guilty_context;
 	bool found_guilty = false;
 
-	/*TODO DO we need spinlock here ? */
 	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
 		struct drm_sched_fence *s_fence = s_job->s_fence;
 
@@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		return -ENOMEM;
 	job->id = atomic64_inc_return(&sched->job_id_count);
 
-	INIT_WORK(&job->finish_work, drm_sched_job_finish);
 	INIT_LIST_HEAD(&job->node);
 
 	return 0;
@@ -597,24 +594,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
 	struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb);
 	struct drm_sched_fence *s_fence = s_job->s_fence;
 	struct drm_gpu_scheduler *sched = s_fence->sched;
-	unsigned long flags;
-
-	cancel_delayed_work(&sched->work_tdr);
 
 	atomic_dec(&sched->hw_rq_count);
 	atomic_dec(&sched->num_jobs);
 
-	spin_lock_irqsave(&sched->job_list_lock, flags);
-	/* remove job from ring_mirror_list */
-	list_del_init(&s_job->node);
-	spin_unlock_irqrestore(&sched->job_list_lock, flags);
+	trace_drm_sched_process_job(s_fence);
 
 	drm_sched_fence_finished(s_fence);
-
-	trace_drm_sched_process_job(s_fence);
 	wake_up_interruptible(&sched->wake_up_worker);
+}
+
+/**
+ * drm_sched_cleanup_jobs - destroy finished jobs
+ *
+ * @sched: scheduler instance
+ *
+ * Remove all finished jobs from the mirror list and destroy them.
+ */
+static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
+{
+	unsigned long flags;
+
+	/* Don't destroy jobs while the timeout worker is running */
+	if (!cancel_delayed_work(&sched->work_tdr))
+		return;
+
+
+	while (!list_empty(&sched->ring_mirror_list)) {
+		struct drm_sched_job *job;
+
+		job = list_first_entry(&sched->ring_mirror_list,
+				       struct drm_sched_job, node);
+		if (!dma_fence_is_signaled(&job->s_fence->finished))
+			break;
+
+		spin_lock_irqsave(&sched->job_list_lock, flags);
+		/* remove job from ring_mirror_list */
+		list_del_init(&job->node);
+		spin_unlock_irqrestore(&sched->job_list_lock, flags);
+
+		sched->ops->free_job(job);
+	}
+
+	/* queue timeout for next job */
+	spin_lock_irqsave(&sched->job_list_lock, flags);
+	drm_sched_start_timeout(sched);
+	spin_unlock_irqrestore(&sched->job_list_lock, flags);
 
-	schedule_work(&s_job->finish_work);
 }
 
 /**
@@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
 		struct dma_fence *fence;
 
 		wait_event_interruptible(sched->wake_up_worker,
+					 (drm_sched_cleanup_jobs(sched),
 					 (!drm_sched_blocked(sched) &&
 					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop());
+					 kthread_should_stop()));
 
 		if (!entity)
 			continue;
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index e740f3b..1a4abe7 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
 
 	/* block scheduler */
 	for (q = 0; q < V3D_MAX_QUEUES; q++)
-		drm_sched_stop(&v3d->queue[q].sched);
+		drm_sched_stop(&v3d->queue[q].sched, sched_job);
 
 	if (sched_job)
 		drm_sched_increase_karma(sched_job);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 0daca4d..9ee0f27 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * @sched: the scheduler instance on which this job is scheduled.
  * @s_fence: contains the fences for the scheduling of job.
  * @finish_cb: the callback for the finished fence.
- * @finish_work: schedules the function @drm_sched_job_finish once the job has
- *               finished to remove the job from the
- *               @drm_gpu_scheduler.ring_mirror_list.
  * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
  * @id: a unique id assigned to each job scheduled on the scheduler.
  * @karma: increment on every hang caused by this job. If this exceeds the hang
@@ -188,7 +185,6 @@ struct drm_sched_job {
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
 	struct dma_fence_cb		finish_cb;
-	struct work_struct		finish_work;
 	struct list_head		node;
 	uint64_t			id;
 	atomic_t			karma;
@@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		       void *owner);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
-void drm_sched_stop(struct drm_gpu_scheduler *sched);
+void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
 void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
 void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
 void drm_sched_increase_karma(struct drm_sched_job *bad);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/5] drm/sched: Keep s_fence->parent pointer
       [not found] ` <1555438986-18303-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  2019-04-16 18:23   ` [PATCH v4 3/5] drm/scheduler: rework job destruction Andrey Grodzovsky
@ 2019-04-16 18:23   ` Andrey Grodzovsky
  2019-04-16 18:23   ` [PATCH v4 5/5] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky
  2 siblings, 0 replies; 12+ messages in thread
From: Andrey Grodzovsky @ 2019-04-16 18:23 UTC (permalink / raw)
  To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	eric-WhKQ6XTQaPysTnJN9+BGXg,
	etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo

For later driver's reference to see if the fence is signaled.

v2: Move parent fence put to resubmit jobs.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 21e8734..04645bb 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 		if (s_job->s_fence->parent &&
 		    dma_fence_remove_callback(s_job->s_fence->parent,
 					      &s_job->cb)) {
-			dma_fence_put(s_job->s_fence->parent);
-			s_job->s_fence->parent = NULL;
 			atomic_dec(&sched->hw_rq_count);
 		} else {
 			/*
@@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 				sched->ops->free_job(s_job);
 		}
 	}
+
+	/*
+	 * Stop pending timer in flight as we rearm it in  drm_sched_start. This
+	 * avoids the pending timeout work in progress to fire right away after
+	 * this TDR finished and before the newly restarted jobs had a
+	 * chance to complete.
+	 */
+	cancel_delayed_work(&sched->work_tdr);
 }
 
 EXPORT_SYMBOL(drm_sched_stop);
@@ -474,6 +480,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
 		if (found_guilty && s_job->s_fence->scheduled.context == guilty_context)
 			dma_fence_set_error(&s_fence->finished, -ECANCELED);
 
+		dma_fence_put(s_job->s_fence->parent);
 		s_job->s_fence->parent = sched->ops->run_job(s_job);
 		atomic_inc(&sched->hw_rq_count);
 	}
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/5] drm/amdgpu: Avoid HW reset if guilty job already signaled.
       [not found] ` <1555438986-18303-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
  2019-04-16 18:23   ` [PATCH v4 3/5] drm/scheduler: rework job destruction Andrey Grodzovsky
  2019-04-16 18:23   ` [PATCH v4 4/5] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky
@ 2019-04-16 18:23   ` Andrey Grodzovsky
  2 siblings, 0 replies; 12+ messages in thread
From: Andrey Grodzovsky @ 2019-04-16 18:23 UTC (permalink / raw)
  To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	eric-WhKQ6XTQaPysTnJN9+BGXg,
	etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo

Also reject TDRs if another one already running.

v2:
Stop all schedulers across device and entire XGMI hive before
force signaling HW fences.
Avoid passing job_signaled to helper fnctions to keep all the decision
making about skipping HW reset in one place.

v3:
Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
against it's decrement in drm_sched_stop in non HW reset case.
v4: rebase

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 151 ++++++++++++++++++++---------
 1 file changed, 103 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a0e165c..bfb6ab0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 		if (!ring || !ring->sched.thread)
 			continue;
 
-		drm_sched_stop(&ring->sched, &job->base);
-
 		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
 		amdgpu_fence_driver_force_completion(ring);
 	}
@@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 	if(job)
 		drm_sched_increase_karma(&job->base);
 
+	/* Don't suspend on bare metal if we are not going to HW reset the ASIC */
 	if (!amdgpu_sriov_vf(adev)) {
 
 		if (!need_full_reset)
@@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
 	return r;
 }
 
-static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
+static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock)
 {
-	int i;
-
-	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
-		struct amdgpu_ring *ring = adev->rings[i];
+	if (trylock) {
+		if (!mutex_trylock(&adev->lock_reset))
+			return false;
+	} else
+		mutex_lock(&adev->lock_reset);
 
-		if (!ring || !ring->sched.thread)
-			continue;
-
-		if (!adev->asic_reset_res)
-			drm_sched_resubmit_jobs(&ring->sched);
-
-		drm_sched_start(&ring->sched, !adev->asic_reset_res);
-	}
-
-	if (!amdgpu_device_has_dc_support(adev)) {
-		drm_helper_resume_force_mode(adev->ddev);
-	}
-
-	adev->asic_reset_res = 0;
-}
-
-static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
-{
-	mutex_lock(&adev->lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
 	adev->in_gpu_reset = 1;
 	/* Block kfd: SRIOV would do it separately */
 	if (!amdgpu_sriov_vf(adev))
                 amdgpu_amdkfd_pre_reset(adev);
+
+	return true;
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
@@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 			      struct amdgpu_job *job)
 {
-	int r;
+	struct list_head device_list, *device_list_handle =  NULL;
+	bool need_full_reset, job_signaled;
 	struct amdgpu_hive_info *hive = NULL;
-	bool need_full_reset = false;
 	struct amdgpu_device *tmp_adev = NULL;
-	struct list_head device_list, *device_list_handle =  NULL;
+	int i, r = 0;
 
+	need_full_reset = job_signaled = false;
 	INIT_LIST_HEAD(&device_list);
 
 	dev_info(adev->dev, "GPU reset begin!\n");
 
+	hive = amdgpu_get_xgmi_hive(adev, false);
+
 	/*
-	 * In case of XGMI hive disallow concurrent resets to be triggered
-	 * by different nodes. No point also since the one node already executing
-	 * reset will also reset all the other nodes in the hive.
+	 * Here we trylock to avoid chain of resets executing from
+	 * either trigger by jobs on different adevs in XGMI hive or jobs on
+	 * different schedulers for same device while this TO handler is running.
+	 * We always reset all schedulers for device and all devices for XGMI
+	 * hive so that should take care of them too.
 	 */
-	hive = amdgpu_get_xgmi_hive(adev, 0);
-	if (hive && adev->gmc.xgmi.num_physical_nodes > 1 &&
-	    !mutex_trylock(&hive->reset_lock))
+
+	if (hive && !mutex_trylock(&hive->reset_lock)) {
+		DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
+			 job->base.id, hive->hive_id);
 		return 0;
+	}
 
 	/* Start with adev pre asic reset first for soft reset check.*/
-	amdgpu_device_lock_adev(adev);
-	r = amdgpu_device_pre_asic_reset(adev,
-					 job,
-					 &need_full_reset);
-	if (r) {
-		/*TODO Should we stop ?*/
-		DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ",
-			  r, adev->ddev->unique);
-		adev->asic_reset_res = r;
+	if (!amdgpu_device_lock_adev(adev, !hive)) {
+		DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress",
+					 job->base.id);
+		return 0;
 	}
 
 	/* Build list of devices to reset */
-	if  (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) {
+	if  (adev->gmc.xgmi.num_physical_nodes > 1) {
 		if (!hive) {
 			amdgpu_device_unlock_adev(adev);
 			return -ENODEV;
@@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		device_list_handle = &device_list;
 	}
 
+	/* block all schedulers and reset given job's ring */
+	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
+		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+			struct amdgpu_ring *ring = tmp_adev->rings[i];
+
+			if (!ring || !ring->sched.thread)
+				continue;
+
+			drm_sched_stop(&ring->sched, &job->base);
+		}
+	}
+
+
+	/*
+	 * Must check guilty signal here since after this point all old
+	 * HW fences are force signaled.
+	 *
+	 * job->base holds a reference to parent fence
+	 */
+	if (job && job->base.s_fence->parent &&
+	    dma_fence_is_signaled(job->base.s_fence->parent))
+		job_signaled = true;
+
+	if (!amdgpu_device_ip_need_full_reset(adev))
+		device_list_handle = &device_list;
+
+	if (job_signaled) {
+		dev_info(adev->dev, "Guilty job already signaled, skipping HW reset");
+		goto skip_hw_reset;
+	}
+
+
+	/* Guilty job will be freed after this*/
+	r = amdgpu_device_pre_asic_reset(adev,
+					 job,
+					 &need_full_reset);
+	if (r) {
+		/*TODO Should we stop ?*/
+		DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ",
+			  r, adev->ddev->unique);
+		adev->asic_reset_res = r;
+	}
+
 retry:	/* Rest of adevs pre asic reset from XGMI hive. */
 	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
 
 		if (tmp_adev == adev)
 			continue;
 
-		amdgpu_device_lock_adev(tmp_adev);
+		amdgpu_device_lock_adev(tmp_adev, false);
 		r = amdgpu_device_pre_asic_reset(tmp_adev,
 						 NULL,
 						 &need_full_reset);
@@ -3618,9 +3646,36 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 			goto retry;
 	}
 
+skip_hw_reset:
+
 	/* Post ASIC reset for all devs .*/
 	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
-		amdgpu_device_post_asic_reset(tmp_adev);
+		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+			struct drm_sched_job *s_job;
+			struct amdgpu_ring *ring = tmp_adev->rings[i];
+
+			if (!ring || !ring->sched.thread)
+				continue;
+
+			/* No point to resubmit jobs if we didn't HW reset*/
+			if (!tmp_adev->asic_reset_res && !job_signaled)
+				drm_sched_resubmit_jobs(&ring->sched);
+
+			/* hw_rq_count was decremented in drm_sched_sop */
+			if (job_signaled) {
+				list_for_each_entry(s_job, &ring->sched.ring_mirror_list, node) {
+					atomic_inc(&ring->sched.hw_rq_count);
+				}
+			}
+
+			drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res);
+		}
+
+		if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) {
+			drm_helper_resume_force_mode(tmp_adev->ddev);
+		}
+
+		tmp_adev->asic_reset_res = 0;
 
 		if (r) {
 			/* bad news, how to tell it to userspace ? */
@@ -3633,7 +3688,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		amdgpu_device_unlock_adev(tmp_adev);
 	}
 
-	if (hive && adev->gmc.xgmi.num_physical_nodes > 1)
+	if (hive)
 		mutex_unlock(&hive->reset_lock);
 
 	if (r)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
  2019-04-16 18:23   ` [PATCH v4 3/5] drm/scheduler: rework job destruction Andrey Grodzovsky
@ 2019-04-17 14:36     ` Grodzovsky, Andrey
  2019-04-17 17:17       ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Grodzovsky, Andrey @ 2019-04-17 14:36 UTC (permalink / raw)
  To: dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken
  Cc: Kazlauskas, Nicholas, Koenig, Christian

Ping on this patch and patch 5. The rest already RBed.

Andrey

On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
> From: Christian König <christian.koenig@amd.com>
>
> We now destroy finished jobs from the worker thread to make sure that
> we never destroy a job currently in timeout processing.
> By this we avoid holding lock around ring mirror list in drm_sched_stop
> which should solve a deadlock reported by a user.
>
> v2: Remove unused variable.
> v4: Move guilty job free into sched code.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>   drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>   drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>   drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>   drivers/gpu/drm/scheduler/sched_main.c     | 145 +++++++++++++++++------------
>   drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>   include/drm/gpu_scheduler.h                |   6 +-
>   8 files changed, 94 insertions(+), 78 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7cee269..a0e165c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>   		if (!ring || !ring->sched.thread)
>   			continue;
>   
> -		drm_sched_stop(&ring->sched);
> +		drm_sched_stop(&ring->sched, &job->base);
>   
>   		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
>   		amdgpu_fence_driver_force_completion(ring);
> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>   	if(job)
>   		drm_sched_increase_karma(&job->base);
>   
> -
> -
>   	if (!amdgpu_sriov_vf(adev)) {
>   
>   		if (!need_full_reset)
> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>   	return r;
>   }
>   
> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
> -					  struct amdgpu_job *job)
> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>   {
>   	int i;
>   
> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   
>   	/* Post ASIC reset for all devs .*/
>   	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
> -		amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
> +		amdgpu_device_post_asic_reset(tmp_adev);
>   
>   		if (r) {
>   			/* bad news, how to tell it to userspace ? */
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> index 33854c9..5778d9c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>   		    mmu_size + gpu->buffer.size;
>   
>   	/* Add in the active command buffers */
> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>   	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>   		submit = to_etnaviv_submit(s_job);
>   		file_size += submit->cmdbuf.size;
>   		n_obj++;
>   	}
> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>   
>   	/* Add in the active buffer objects */
>   	list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>   			      gpu->buffer.size,
>   			      etnaviv_cmdbuf_get_va(&gpu->buffer));
>   
> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>   	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>   		submit = to_etnaviv_submit(s_job);
>   		etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>   				      submit->cmdbuf.vaddr, submit->cmdbuf.size,
>   				      etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>   	}
> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>   
>   	/* Reserve space for the bomap */
>   	if (n_bomap_pages) {
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 6d24fea..a813c82 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>   	}
>   
>   	/* block scheduler */
> -	drm_sched_stop(&gpu->sched);
> +	drm_sched_stop(&gpu->sched, sched_job);
>   
>   	if(sched_job)
>   		drm_sched_increase_karma(sched_job);
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 97bd9c1..df98931 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job)
>   static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe,
>   					 struct lima_sched_task *task)
>   {
> -	drm_sched_stop(&pipe->base);
> +	drm_sched_stop(&pipe->base, &task->base);
>   
>   	if (task)
>   		drm_sched_increase_karma(&task->base);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 0a7ed04..c6336b7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
>   		sched_job);
>   
>   	for (i = 0; i < NUM_JOB_SLOTS; i++)
> -		drm_sched_stop(&pfdev->js->queue[i].sched);
> +		drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>   
>   	if (sched_job)
>   		drm_sched_increase_karma(sched_job);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 19fc601..21e8734 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>   }
>   EXPORT_SYMBOL(drm_sched_resume_timeout);
>   
> -/* job_finish is called after hw fence signaled
> - */
> -static void drm_sched_job_finish(struct work_struct *work)
> -{
> -	struct drm_sched_job *s_job = container_of(work, struct drm_sched_job,
> -						   finish_work);
> -	struct drm_gpu_scheduler *sched = s_job->sched;
> -	unsigned long flags;
> -
> -	/*
> -	 * Canceling the timeout without removing our job from the ring mirror
> -	 * list is safe, as we will only end up in this worker if our jobs
> -	 * finished fence has been signaled. So even if some another worker
> -	 * manages to find this job as the next job in the list, the fence
> -	 * signaled check below will prevent the timeout to be restarted.
> -	 */
> -	cancel_delayed_work_sync(&sched->work_tdr);
> -
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	/* queue TDR for next job */
> -	drm_sched_start_timeout(sched);
> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> -
> -	sched->ops->free_job(s_job);
> -}
> -
>   static void drm_sched_job_begin(struct drm_sched_job *s_job)
>   {
>   	struct drm_gpu_scheduler *sched = s_job->sched;
> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	if (job)
>   		job->sched->ops->timedout_job(job);
>   
> +	/*
> +	 * Guilty job did complete and hence needs to be manually removed
> +	 * See drm_sched_stop doc.
> +	 */
> +	if (list_empty(&job->node))
> +		job->sched->ops->free_job(job);
> +
>   	spin_lock_irqsave(&sched->job_list_lock, flags);
>   	drm_sched_start_timeout(sched);
>   	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>    * @sched: scheduler instance
>    * @bad: bad scheduler job
>    *
> + * Stop the scheduler and also removes and frees all completed jobs.
> + * Note: bad job will not be freed as it might be used later and so it's
> + * callers responsibility to release it manually if it's not part of the
> + * mirror list any more.
> + *
>    */
> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   {
> -	struct drm_sched_job *s_job;
> +	struct drm_sched_job *s_job, *tmp;
>   	unsigned long flags;
> -	struct dma_fence *last_fence =  NULL;
>   
>   	kthread_park(sched->thread);
>   
>   	/*
> -	 * Verify all the signaled jobs in mirror list are removed from the ring
> -	 * by waiting for the latest job to enter the list. This should insure that
> -	 * also all the previous jobs that were in flight also already singaled
> -	 * and removed from the list.
> +	 * Iterate the job list from later to  earlier one and either deactive
> +	 * their HW callbacks or remove them from mirror list if they already
> +	 * signaled.
> +	 * This iteration is thread safe as sched thread is stopped.
>   	 */
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
> +	list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) {
>   		if (s_job->s_fence->parent &&
>   		    dma_fence_remove_callback(s_job->s_fence->parent,
>   					      &s_job->cb)) {
> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched)
>   			s_job->s_fence->parent = NULL;
>   			atomic_dec(&sched->hw_rq_count);
>   		} else {
> -			 last_fence = dma_fence_get(&s_job->s_fence->finished);
> -			 break;
> +			/*
> +			 * remove job from ring_mirror_list.
> +			 * Locking here is for concurrent resume timeout
> +			 */
> +			spin_lock_irqsave(&sched->job_list_lock, flags);
> +			list_del_init(&s_job->node);
> +			spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +
> +			/*
> +			 * Wait for job's HW fence callback to finish using s_job
> +			 * before releasing it.
> +			 *
> +			 * Job is still alive so fence refcount at least 1
> +			 */
> +			dma_fence_wait(&s_job->s_fence->finished, false);
> +
> +			/*
> +			 * We must keep bad job alive for later use during
> +			 * recovery by some of the drivers
> +			 */
> +			if (bad != s_job)
> +				sched->ops->free_job(s_job);
>   		}
>   	}
> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> -
> -	if (last_fence) {
> -		dma_fence_wait(last_fence, false);
> -		dma_fence_put(last_fence);
> -	}
>   }
>   
>   EXPORT_SYMBOL(drm_sched_stop);
> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>   void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>   {
>   	struct drm_sched_job *s_job, *tmp;
> +	unsigned long flags;
>   	int r;
>   
>   	if (!full_recovery)
> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>   
>   	/*
>   	 * Locking the list is not required here as the sched thread is parked
> -	 * so no new jobs are being pushed in to HW and in drm_sched_stop we
> -	 * flushed all the jobs who were still in mirror list but who already
> -	 * signaled and removed them self from the list. Also concurrent
> +	 * so no new jobs are being inserted or removed. Also concurrent
>   	 * GPU recovers can't run in parallel.
>   	 */
>   	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>   			drm_sched_process_job(NULL, &s_job->cb);
>   	}
>   
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>   	drm_sched_start_timeout(sched);
> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
>   unpark:
>   	kthread_unpark(sched->thread);
> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>   	uint64_t guilty_context;
>   	bool found_guilty = false;
>   
> -	/*TODO DO we need spinlock here ? */
>   	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>   		struct drm_sched_fence *s_fence = s_job->s_fence;
>   
> @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		return -ENOMEM;
>   	job->id = atomic64_inc_return(&sched->job_id_count);
>   
> -	INIT_WORK(&job->finish_work, drm_sched_job_finish);
>   	INIT_LIST_HEAD(&job->node);
>   
>   	return 0;
> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>   	struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb);
>   	struct drm_sched_fence *s_fence = s_job->s_fence;
>   	struct drm_gpu_scheduler *sched = s_fence->sched;
> -	unsigned long flags;
> -
> -	cancel_delayed_work(&sched->work_tdr);
>   
>   	atomic_dec(&sched->hw_rq_count);
>   	atomic_dec(&sched->num_jobs);
>   
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	/* remove job from ring_mirror_list */
> -	list_del_init(&s_job->node);
> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +	trace_drm_sched_process_job(s_fence);
>   
>   	drm_sched_fence_finished(s_fence);
> -
> -	trace_drm_sched_process_job(s_fence);
>   	wake_up_interruptible(&sched->wake_up_worker);
> +}
> +
> +/**
> + * drm_sched_cleanup_jobs - destroy finished jobs
> + *
> + * @sched: scheduler instance
> + *
> + * Remove all finished jobs from the mirror list and destroy them.
> + */
> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> +{
> +	unsigned long flags;
> +
> +	/* Don't destroy jobs while the timeout worker is running */
> +	if (!cancel_delayed_work(&sched->work_tdr))
> +		return;
> +
> +
> +	while (!list_empty(&sched->ring_mirror_list)) {
> +		struct drm_sched_job *job;
> +
> +		job = list_first_entry(&sched->ring_mirror_list,
> +				       struct drm_sched_job, node);
> +		if (!dma_fence_is_signaled(&job->s_fence->finished))
> +			break;
> +
> +		spin_lock_irqsave(&sched->job_list_lock, flags);
> +		/* remove job from ring_mirror_list */
> +		list_del_init(&job->node);
> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +
> +		sched->ops->free_job(job);
> +	}
> +
> +	/* queue timeout for next job */
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
> +	drm_sched_start_timeout(sched);
> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
> -	schedule_work(&s_job->finish_work);
>   }
>   
>   /**
> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>   		struct dma_fence *fence;
>   
>   		wait_event_interruptible(sched->wake_up_worker,
> +					 (drm_sched_cleanup_jobs(sched),
>   					 (!drm_sched_blocked(sched) &&
>   					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop());
> +					 kthread_should_stop()));
>   
>   		if (!entity)
>   			continue;
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index e740f3b..1a4abe7 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
>   
>   	/* block scheduler */
>   	for (q = 0; q < V3D_MAX_QUEUES; q++)
> -		drm_sched_stop(&v3d->queue[q].sched);
> +		drm_sched_stop(&v3d->queue[q].sched, sched_job);
>   
>   	if (sched_job)
>   		drm_sched_increase_karma(sched_job);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 0daca4d..9ee0f27 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>    * @sched: the scheduler instance on which this job is scheduled.
>    * @s_fence: contains the fences for the scheduling of job.
>    * @finish_cb: the callback for the finished fence.
> - * @finish_work: schedules the function @drm_sched_job_finish once the job has
> - *               finished to remove the job from the
> - *               @drm_gpu_scheduler.ring_mirror_list.
>    * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
>    * @id: a unique id assigned to each job scheduled on the scheduler.
>    * @karma: increment on every hang caused by this job. If this exceeds the hang
> @@ -188,7 +185,6 @@ struct drm_sched_job {
>   	struct drm_gpu_scheduler	*sched;
>   	struct drm_sched_fence		*s_fence;
>   	struct dma_fence_cb		finish_cb;
> -	struct work_struct		finish_work;
>   	struct list_head		node;
>   	uint64_t			id;
>   	atomic_t			karma;
> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		       void *owner);
>   void drm_sched_job_cleanup(struct drm_sched_job *job);
>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
>   void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
>   void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>   void drm_sched_increase_karma(struct drm_sched_job *bad);
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
  2019-04-17 14:36     ` Grodzovsky, Andrey
@ 2019-04-17 17:17       ` Christian König
       [not found]         ` <a7766a6f-e22f-daf8-99f5-e46a7eb8e679-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2019-04-17 17:17 UTC (permalink / raw)
  To: Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv
  Cc: Kazlauskas, Nicholas, Koenig, Christian

I can't review this patch, since I'm one of the authors of it, but in 
general your changes look good to me now.

For patch #5 I think it might be cleaner if we move incrementing of the 
hw_rq_count while starting the scheduler again.

Regards,
Christian.

Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
> Ping on this patch and patch 5. The rest already RBed.
>
> Andrey
>
> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>> From: Christian König <christian.koenig@amd.com>
>>
>> We now destroy finished jobs from the worker thread to make sure that
>> we never destroy a job currently in timeout processing.
>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>> which should solve a deadlock reported by a user.
>>
>> v2: Remove unused variable.
>> v4: Move guilty job free into sched code.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>    drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>    drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>    drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>    drivers/gpu/drm/scheduler/sched_main.c     | 145 +++++++++++++++++------------
>>    drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>    include/drm/gpu_scheduler.h                |   6 +-
>>    8 files changed, 94 insertions(+), 78 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 7cee269..a0e165c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>    		if (!ring || !ring->sched.thread)
>>    			continue;
>>    
>> -		drm_sched_stop(&ring->sched);
>> +		drm_sched_stop(&ring->sched, &job->base);
>>    
>>    		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
>>    		amdgpu_fence_driver_force_completion(ring);
>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>    	if(job)
>>    		drm_sched_increase_karma(&job->base);
>>    
>> -
>> -
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    
>>    		if (!need_full_reset)
>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>>    	return r;
>>    }
>>    
>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
>> -					  struct amdgpu_job *job)
>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>    {
>>    	int i;
>>    
>> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>    
>>    	/* Post ASIC reset for all devs .*/
>>    	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
>> -		amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
>> +		amdgpu_device_post_asic_reset(tmp_adev);
>>    
>>    		if (r) {
>>    			/* bad news, how to tell it to userspace ? */
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> index 33854c9..5778d9c 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>    		    mmu_size + gpu->buffer.size;
>>    
>>    	/* Add in the active command buffers */
>> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>    	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>    		submit = to_etnaviv_submit(s_job);
>>    		file_size += submit->cmdbuf.size;
>>    		n_obj++;
>>    	}
>> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>    
>>    	/* Add in the active buffer objects */
>>    	list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>    			      gpu->buffer.size,
>>    			      etnaviv_cmdbuf_get_va(&gpu->buffer));
>>    
>> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>    	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>    		submit = to_etnaviv_submit(s_job);
>>    		etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>    				      submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>    				      etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>    	}
>> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>    
>>    	/* Reserve space for the bomap */
>>    	if (n_bomap_pages) {
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index 6d24fea..a813c82 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>>    	}
>>    
>>    	/* block scheduler */
>> -	drm_sched_stop(&gpu->sched);
>> +	drm_sched_stop(&gpu->sched, sched_job);
>>    
>>    	if(sched_job)
>>    		drm_sched_increase_karma(sched_job);
>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>> index 97bd9c1..df98931 100644
>> --- a/drivers/gpu/drm/lima/lima_sched.c
>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job)
>>    static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe,
>>    					 struct lima_sched_task *task)
>>    {
>> -	drm_sched_stop(&pipe->base);
>> +	drm_sched_stop(&pipe->base, &task->base);
>>    
>>    	if (task)
>>    		drm_sched_increase_karma(&task->base);
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 0a7ed04..c6336b7 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
>>    		sched_job);
>>    
>>    	for (i = 0; i < NUM_JOB_SLOTS; i++)
>> -		drm_sched_stop(&pfdev->js->queue[i].sched);
>> +		drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>    
>>    	if (sched_job)
>>    		drm_sched_increase_karma(sched_job);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 19fc601..21e8734 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>>    }
>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>    
>> -/* job_finish is called after hw fence signaled
>> - */
>> -static void drm_sched_job_finish(struct work_struct *work)
>> -{
>> -	struct drm_sched_job *s_job = container_of(work, struct drm_sched_job,
>> -						   finish_work);
>> -	struct drm_gpu_scheduler *sched = s_job->sched;
>> -	unsigned long flags;
>> -
>> -	/*
>> -	 * Canceling the timeout without removing our job from the ring mirror
>> -	 * list is safe, as we will only end up in this worker if our jobs
>> -	 * finished fence has been signaled. So even if some another worker
>> -	 * manages to find this job as the next job in the list, the fence
>> -	 * signaled check below will prevent the timeout to be restarted.
>> -	 */
>> -	cancel_delayed_work_sync(&sched->work_tdr);
>> -
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	/* queue TDR for next job */
>> -	drm_sched_start_timeout(sched);
>> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> -	sched->ops->free_job(s_job);
>> -}
>> -
>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>    {
>>    	struct drm_gpu_scheduler *sched = s_job->sched;
>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>    	if (job)
>>    		job->sched->ops->timedout_job(job);
>>    
>> +	/*
>> +	 * Guilty job did complete and hence needs to be manually removed
>> +	 * See drm_sched_stop doc.
>> +	 */
>> +	if (list_empty(&job->node))
>> +		job->sched->ops->free_job(job);
>> +
>>    	spin_lock_irqsave(&sched->job_list_lock, flags);
>>    	drm_sched_start_timeout(sched);
>>    	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>     * @sched: scheduler instance
>>     * @bad: bad scheduler job
>>     *
>> + * Stop the scheduler and also removes and frees all completed jobs.
>> + * Note: bad job will not be freed as it might be used later and so it's
>> + * callers responsibility to release it manually if it's not part of the
>> + * mirror list any more.
>> + *
>>     */
>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    {
>> -	struct drm_sched_job *s_job;
>> +	struct drm_sched_job *s_job, *tmp;
>>    	unsigned long flags;
>> -	struct dma_fence *last_fence =  NULL;
>>    
>>    	kthread_park(sched->thread);
>>    
>>    	/*
>> -	 * Verify all the signaled jobs in mirror list are removed from the ring
>> -	 * by waiting for the latest job to enter the list. This should insure that
>> -	 * also all the previous jobs that were in flight also already singaled
>> -	 * and removed from the list.
>> +	 * Iterate the job list from later to  earlier one and either deactive
>> +	 * their HW callbacks or remove them from mirror list if they already
>> +	 * signaled.
>> +	 * This iteration is thread safe as sched thread is stopped.
>>    	 */
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
>> +	list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) {
>>    		if (s_job->s_fence->parent &&
>>    		    dma_fence_remove_callback(s_job->s_fence->parent,
>>    					      &s_job->cb)) {
>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>    			s_job->s_fence->parent = NULL;
>>    			atomic_dec(&sched->hw_rq_count);
>>    		} else {
>> -			 last_fence = dma_fence_get(&s_job->s_fence->finished);
>> -			 break;
>> +			/*
>> +			 * remove job from ring_mirror_list.
>> +			 * Locking here is for concurrent resume timeout
>> +			 */
>> +			spin_lock_irqsave(&sched->job_list_lock, flags);
>> +			list_del_init(&s_job->node);
>> +			spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>> +			/*
>> +			 * Wait for job's HW fence callback to finish using s_job
>> +			 * before releasing it.
>> +			 *
>> +			 * Job is still alive so fence refcount at least 1
>> +			 */
>> +			dma_fence_wait(&s_job->s_fence->finished, false);
>> +
>> +			/*
>> +			 * We must keep bad job alive for later use during
>> +			 * recovery by some of the drivers
>> +			 */
>> +			if (bad != s_job)
>> +				sched->ops->free_job(s_job);
>>    		}
>>    	}
>> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> -	if (last_fence) {
>> -		dma_fence_wait(last_fence, false);
>> -		dma_fence_put(last_fence);
>> -	}
>>    }
>>    
>>    EXPORT_SYMBOL(drm_sched_stop);
>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    {
>>    	struct drm_sched_job *s_job, *tmp;
>> +	unsigned long flags;
>>    	int r;
>>    
>>    	if (!full_recovery)
>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    
>>    	/*
>>    	 * Locking the list is not required here as the sched thread is parked
>> -	 * so no new jobs are being pushed in to HW and in drm_sched_stop we
>> -	 * flushed all the jobs who were still in mirror list but who already
>> -	 * signaled and removed them self from the list. Also concurrent
>> +	 * so no new jobs are being inserted or removed. Also concurrent
>>    	 * GPU recovers can't run in parallel.
>>    	 */
>>    	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    			drm_sched_process_job(NULL, &s_job->cb);
>>    	}
>>    
>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>    	drm_sched_start_timeout(sched);
>> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>    
>>    unpark:
>>    	kthread_unpark(sched->thread);
>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>>    	uint64_t guilty_context;
>>    	bool found_guilty = false;
>>    
>> -	/*TODO DO we need spinlock here ? */
>>    	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>>    		struct drm_sched_fence *s_fence = s_job->s_fence;
>>    
>> @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>    		return -ENOMEM;
>>    	job->id = atomic64_inc_return(&sched->job_id_count);
>>    
>> -	INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>    	INIT_LIST_HEAD(&job->node);
>>    
>>    	return 0;
>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>    	struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb);
>>    	struct drm_sched_fence *s_fence = s_job->s_fence;
>>    	struct drm_gpu_scheduler *sched = s_fence->sched;
>> -	unsigned long flags;
>> -
>> -	cancel_delayed_work(&sched->work_tdr);
>>    
>>    	atomic_dec(&sched->hw_rq_count);
>>    	atomic_dec(&sched->num_jobs);
>>    
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	/* remove job from ring_mirror_list */
>> -	list_del_init(&s_job->node);
>> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +	trace_drm_sched_process_job(s_fence);
>>    
>>    	drm_sched_fence_finished(s_fence);
>> -
>> -	trace_drm_sched_process_job(s_fence);
>>    	wake_up_interruptible(&sched->wake_up_worker);
>> +}
>> +
>> +/**
>> + * drm_sched_cleanup_jobs - destroy finished jobs
>> + *
>> + * @sched: scheduler instance
>> + *
>> + * Remove all finished jobs from the mirror list and destroy them.
>> + */
>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>> +{
>> +	unsigned long flags;
>> +
>> +	/* Don't destroy jobs while the timeout worker is running */
>> +	if (!cancel_delayed_work(&sched->work_tdr))
>> +		return;
>> +
>> +
>> +	while (!list_empty(&sched->ring_mirror_list)) {
>> +		struct drm_sched_job *job;
>> +
>> +		job = list_first_entry(&sched->ring_mirror_list,
>> +				       struct drm_sched_job, node);
>> +		if (!dma_fence_is_signaled(&job->s_fence->finished))
>> +			break;
>> +
>> +		spin_lock_irqsave(&sched->job_list_lock, flags);
>> +		/* remove job from ring_mirror_list */
>> +		list_del_init(&job->node);
>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>> +		sched->ops->free_job(job);
>> +	}
>> +
>> +	/* queue timeout for next job */
>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>> +	drm_sched_start_timeout(sched);
>> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>    
>> -	schedule_work(&s_job->finish_work);
>>    }
>>    
>>    /**
>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>    		struct dma_fence *fence;
>>    
>>    		wait_event_interruptible(sched->wake_up_worker,
>> +					 (drm_sched_cleanup_jobs(sched),
>>    					 (!drm_sched_blocked(sched) &&
>>    					  (entity = drm_sched_select_entity(sched))) ||
>> -					 kthread_should_stop());
>> +					 kthread_should_stop()));
>>    
>>    		if (!entity)
>>    			continue;
>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
>> index e740f3b..1a4abe7 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
>>    
>>    	/* block scheduler */
>>    	for (q = 0; q < V3D_MAX_QUEUES; q++)
>> -		drm_sched_stop(&v3d->queue[q].sched);
>> +		drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>    
>>    	if (sched_job)
>>    		drm_sched_increase_karma(sched_job);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 0daca4d..9ee0f27 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>>     * @sched: the scheduler instance on which this job is scheduled.
>>     * @s_fence: contains the fences for the scheduling of job.
>>     * @finish_cb: the callback for the finished fence.
>> - * @finish_work: schedules the function @drm_sched_job_finish once the job has
>> - *               finished to remove the job from the
>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>     * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
>>     * @id: a unique id assigned to each job scheduled on the scheduler.
>>     * @karma: increment on every hang caused by this job. If this exceeds the hang
>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>    	struct drm_gpu_scheduler	*sched;
>>    	struct drm_sched_fence		*s_fence;
>>    	struct dma_fence_cb		finish_cb;
>> -	struct work_struct		finish_work;
>>    	struct list_head		node;
>>    	uint64_t			id;
>>    	atomic_t			karma;
>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>    		       void *owner);
>>    void drm_sched_job_cleanup(struct drm_sched_job *job);
>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
>>    void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>    void drm_sched_increase_karma(struct drm_sched_job *bad);

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
       [not found]         ` <a7766a6f-e22f-daf8-99f5-e46a7eb8e679-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-17 17:53           ` Grodzovsky, Andrey
       [not found]             ` <a883b69d-a84e-d498-658c-7bc1ad10bbef-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Grodzovsky, Andrey @ 2019-04-17 17:53 UTC (permalink / raw)
  To: Koenig, Christian, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	eric-WhKQ6XTQaPysTnJN9+BGXg,
	etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Kazlauskas, Nicholas


On 4/17/19 1:17 PM, Christian König wrote:
> I can't review this patch, since I'm one of the authors of it, but in 
> general your changes look good to me now.
>
> For patch #5 I think it might be cleaner if we move incrementing of 
> the hw_rq_count while starting the scheduler again.


But the increment of  hw_rq_count is conditional on if the guilty job 
was signaled, moving it into drm_sched_start will also force me to pass  
'job_signaled' flag into drm_sched_start which is against your original 
comment that we don't want to pass this logic around helper functions 
and keep it all in one place which is amdgpu_device_gpu_recover.

Andrey


>
> Regards,
> Christian.
>
> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>> Ping on this patch and patch 5. The rest already RBed.
>>
>> Andrey
>>
>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>> From: Christian König <christian.koenig@amd.com>
>>>
>>> We now destroy finished jobs from the worker thread to make sure that
>>> we never destroy a job currently in timeout processing.
>>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>>> which should solve a deadlock reported by a user.
>>>
>>> v2: Remove unused variable.
>>> v4: Move guilty job free into sched code.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>    drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>    drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>    drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>    drivers/gpu/drm/scheduler/sched_main.c     | 145 
>>> +++++++++++++++++------------
>>>    drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>    include/drm/gpu_scheduler.h                |   6 +-
>>>    8 files changed, 94 insertions(+), 78 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 7cee269..a0e165c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct 
>>> amdgpu_device *adev,
>>>            if (!ring || !ring->sched.thread)
>>>                continue;
>>>    -        drm_sched_stop(&ring->sched);
>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>               /* after all hw jobs are reset, hw fence is 
>>> meaningless, so force_completion */
>>>            amdgpu_fence_driver_force_completion(ring);
>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct 
>>> amdgpu_device *adev,
>>>        if(job)
>>>            drm_sched_increase_karma(&job->base);
>>>    -
>>> -
>>>        if (!amdgpu_sriov_vf(adev)) {
>>>               if (!need_full_reset)
>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct 
>>> amdgpu_hive_info *hive,
>>>        return r;
>>>    }
>>>    -static void amdgpu_device_post_asic_reset(struct amdgpu_device 
>>> *adev,
>>> -                      struct amdgpu_job *job)
>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>>    {
>>>        int i;
>>>    @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct 
>>> amdgpu_device *adev,
>>>           /* Post ASIC reset for all devs .*/
>>>        list_for_each_entry(tmp_adev, device_list_handle, 
>>> gmc.xgmi.head) {
>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? 
>>> job : NULL);
>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>               if (r) {
>>>                /* bad news, how to tell it to userspace ? */
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c 
>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>> index 33854c9..5778d9c 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>                mmu_size + gpu->buffer.size;
>>>           /* Add in the active command buffers */
>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>        list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>            submit = to_etnaviv_submit(s_job);
>>>            file_size += submit->cmdbuf.size;
>>>            n_obj++;
>>>        }
>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>           /* Add in the active buffer objects */
>>>        list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>                      gpu->buffer.size,
>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>    -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>        list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>            submit = to_etnaviv_submit(s_job);
>>>            etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>                          submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>        }
>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>           /* Reserve space for the bomap */
>>>        if (n_bomap_pages) {
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> index 6d24fea..a813c82 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct 
>>> drm_sched_job *sched_job)
>>>        }
>>>           /* block scheduler */
>>> -    drm_sched_stop(&gpu->sched);
>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>           if(sched_job)
>>>            drm_sched_increase_karma(sched_job);
>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>>> b/drivers/gpu/drm/lima/lima_sched.c
>>> index 97bd9c1..df98931 100644
>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>> @@ -300,7 +300,7 @@ static struct dma_fence 
>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>    static void lima_sched_handle_error_task(struct lima_sched_pipe 
>>> *pipe,
>>>                         struct lima_sched_task *task)
>>>    {
>>> -    drm_sched_stop(&pipe->base);
>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>           if (task)
>>>            drm_sched_increase_karma(&task->base);
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 0a7ed04..c6336b7 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct 
>>> drm_sched_job *sched_job)
>>>            sched_job);
>>>           for (i = 0; i < NUM_JOB_SLOTS; i++)
>>> -        drm_sched_stop(&pfdev->js->queue[i].sched);
>>> +        drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>           if (sched_job)
>>>            drm_sched_increase_karma(sched_job);
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 19fc601..21e8734 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct 
>>> drm_gpu_scheduler *sched,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>    -/* job_finish is called after hw fence signaled
>>> - */
>>> -static void drm_sched_job_finish(struct work_struct *work)
>>> -{
>>> -    struct drm_sched_job *s_job = container_of(work, struct 
>>> drm_sched_job,
>>> -                           finish_work);
>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>> -    unsigned long flags;
>>> -
>>> -    /*
>>> -     * Canceling the timeout without removing our job from the ring 
>>> mirror
>>> -     * list is safe, as we will only end up in this worker if our jobs
>>> -     * finished fence has been signaled. So even if some another 
>>> worker
>>> -     * manages to find this job as the next job in the list, the fence
>>> -     * signaled check below will prevent the timeout to be restarted.
>>> -     */
>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>> -
>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -    /* queue TDR for next job */
>>> -    drm_sched_start_timeout(sched);
>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> -    sched->ops->free_job(s_job);
>>> -}
>>> -
>>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>    {
>>>        struct drm_gpu_scheduler *sched = s_job->sched;
>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>        if (job)
>>>            job->sched->ops->timedout_job(job);
>>>    +    /*
>>> +     * Guilty job did complete and hence needs to be manually removed
>>> +     * See drm_sched_stop doc.
>>> +     */
>>> +    if (list_empty(&job->node))
>>> +        job->sched->ops->free_job(job);
>>> +
>>>        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>        drm_sched_start_timeout(sched);
>>>        spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>     * @sched: scheduler instance
>>>     * @bad: bad scheduler job
>>>     *
>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>> + * Note: bad job will not be freed as it might be used later and so 
>>> it's
>>> + * callers responsibility to release it manually if it's not part 
>>> of the
>>> + * mirror list any more.
>>> + *
>>>     */
>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
>>> drm_sched_job *bad)
>>>    {
>>> -    struct drm_sched_job *s_job;
>>> +    struct drm_sched_job *s_job, *tmp;
>>>        unsigned long flags;
>>> -    struct dma_fence *last_fence =  NULL;
>>>           kthread_park(sched->thread);
>>>           /*
>>> -     * Verify all the signaled jobs in mirror list are removed from 
>>> the ring
>>> -     * by waiting for the latest job to enter the list. This should 
>>> insure that
>>> -     * also all the previous jobs that were in flight also already 
>>> singaled
>>> -     * and removed from the list.
>>> +     * Iterate the job list from later to  earlier one and either 
>>> deactive
>>> +     * their HW callbacks or remove them from mirror list if they 
>>> already
>>> +     * signaled.
>>> +     * This iteration is thread safe as sched thread is stopped.
>>>         */
>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, 
>>> node) {
>>> +    list_for_each_entry_safe_reverse(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>>            if (s_job->s_fence->parent &&
>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>                              &s_job->cb)) {
>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>> *sched)
>>>                s_job->s_fence->parent = NULL;
>>>                atomic_dec(&sched->hw_rq_count);
>>>            } else {
>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>> -             break;
>>> +            /*
>>> +             * remove job from ring_mirror_list.
>>> +             * Locking here is for concurrent resume timeout
>>> +             */
>>> +            spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +            list_del_init(&s_job->node);
>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +
>>> +            /*
>>> +             * Wait for job's HW fence callback to finish using s_job
>>> +             * before releasing it.
>>> +             *
>>> +             * Job is still alive so fence refcount at least 1
>>> +             */
>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>> +
>>> +            /*
>>> +             * We must keep bad job alive for later use during
>>> +             * recovery by some of the drivers
>>> +             */
>>> +            if (bad != s_job)
>>> +                sched->ops->free_job(s_job);
>>>            }
>>>        }
>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> -    if (last_fence) {
>>> -        dma_fence_wait(last_fence, false);
>>> -        dma_fence_put(last_fence);
>>> -    }
>>>    }
>>>       EXPORT_SYMBOL(drm_sched_stop);
>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool 
>>> full_recovery)
>>>    {
>>>        struct drm_sched_job *s_job, *tmp;
>>> +    unsigned long flags;
>>>        int r;
>>>           if (!full_recovery)
>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler 
>>> *sched, bool full_recovery)
>>>           /*
>>>         * Locking the list is not required here as the sched thread 
>>> is parked
>>> -     * so no new jobs are being pushed in to HW and in 
>>> drm_sched_stop we
>>> -     * flushed all the jobs who were still in mirror list but who 
>>> already
>>> -     * signaled and removed them self from the list. Also concurrent
>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>         * GPU recovers can't run in parallel.
>>>         */
>>>        list_for_each_entry_safe(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler 
>>> *sched, bool full_recovery)
>>>                drm_sched_process_job(NULL, &s_job->cb);
>>>        }
>>>    +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>        drm_sched_start_timeout(sched);
>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>       unpark:
>>>        kthread_unpark(sched->thread);
>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct 
>>> drm_gpu_scheduler *sched)
>>>        uint64_t guilty_context;
>>>        bool found_guilty = false;
>>>    -    /*TODO DO we need spinlock here ? */
>>>        list_for_each_entry_safe(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>>            struct drm_sched_fence *s_fence = s_job->s_fence;
>>>    @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job 
>>> *job,
>>>            return -ENOMEM;
>>>        job->id = atomic64_inc_return(&sched->job_id_count);
>>>    -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>        INIT_LIST_HEAD(&job->node);
>>>           return 0;
>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct 
>>> dma_fence *f, struct dma_fence_cb *cb)
>>>        struct drm_sched_job *s_job = container_of(cb, struct 
>>> drm_sched_job, cb);
>>>        struct drm_sched_fence *s_fence = s_job->s_fence;
>>>        struct drm_gpu_scheduler *sched = s_fence->sched;
>>> -    unsigned long flags;
>>> -
>>> -    cancel_delayed_work(&sched->work_tdr);
>>>           atomic_dec(&sched->hw_rq_count);
>>>        atomic_dec(&sched->num_jobs);
>>>    -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -    /* remove job from ring_mirror_list */
>>> -    list_del_init(&s_job->node);
>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +    trace_drm_sched_process_job(s_fence);
>>>           drm_sched_fence_finished(s_fence);
>>> -
>>> -    trace_drm_sched_process_job(s_fence);
>>>        wake_up_interruptible(&sched->wake_up_worker);
>>> +}
>>> +
>>> +/**
>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>> + *
>>> + * @sched: scheduler instance
>>> + *
>>> + * Remove all finished jobs from the mirror list and destroy them.
>>> + */
>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>> +{
>>> +    unsigned long flags;
>>> +
>>> +    /* Don't destroy jobs while the timeout worker is running */
>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>> +        return;
>>> +
>>> +
>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>> +        struct drm_sched_job *job;
>>> +
>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>> +                       struct drm_sched_job, node);
>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>> +            break;
>>> +
>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +        /* remove job from ring_mirror_list */
>>> +        list_del_init(&job->node);
>>> +        spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +
>>> +        sched->ops->free_job(job);
>>> +    }
>>> +
>>> +    /* queue timeout for next job */
>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +    drm_sched_start_timeout(sched);
>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>    -    schedule_work(&s_job->finish_work);
>>>    }
>>>       /**
>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>            struct dma_fence *fence;
>>> wait_event_interruptible(sched->wake_up_worker,
>>> +                     (drm_sched_cleanup_jobs(sched),
>>>                         (!drm_sched_blocked(sched) &&
>>>                          (entity = drm_sched_select_entity(sched))) ||
>>> -                     kthread_should_stop());
>>> +                     kthread_should_stop()));
>>>               if (!entity)
>>>                continue;
>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>> index e740f3b..1a4abe7 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, 
>>> struct drm_sched_job *sched_job)
>>>           /* block scheduler */
>>>        for (q = 0; q < V3D_MAX_QUEUES; q++)
>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>           if (sched_job)
>>>            drm_sched_increase_karma(sched_job);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 0daca4d..9ee0f27 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -167,9 +167,6 @@ struct drm_sched_fence 
>>> *to_drm_sched_fence(struct dma_fence *f);
>>>     * @sched: the scheduler instance on which this job is scheduled.
>>>     * @s_fence: contains the fences for the scheduling of job.
>>>     * @finish_cb: the callback for the finished fence.
>>> - * @finish_work: schedules the function @drm_sched_job_finish once 
>>> the job has
>>> - *               finished to remove the job from the
>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>     * @node: used to append this struct to the 
>>> @drm_gpu_scheduler.ring_mirror_list.
>>>     * @id: a unique id assigned to each job scheduled on the scheduler.
>>>     * @karma: increment on every hang caused by this job. If this 
>>> exceeds the hang
>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>        struct drm_gpu_scheduler    *sched;
>>>        struct drm_sched_fence        *s_fence;
>>>        struct dma_fence_cb        finish_cb;
>>> -    struct work_struct        finish_work;
>>>        struct list_head        node;
>>>        uint64_t            id;
>>>        atomic_t            karma;
>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>                   void *owner);
>>>    void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
>>> drm_sched_job *bad);
>>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool 
>>> full_recovery);
>>>    void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>    void drm_sched_increase_karma(struct drm_sched_job *bad);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
       [not found]             ` <a883b69d-a84e-d498-658c-7bc1ad10bbef-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-17 17:59               ` Christian König
  2019-04-17 18:01                 ` Koenig, Christian
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2019-04-17 17:59 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Koenig, Christian,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	eric-WhKQ6XTQaPysTnJN9+BGXg,
	etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Kazlauskas, Nicholas

Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
> On 4/17/19 1:17 PM, Christian König wrote:
>> I can't review this patch, since I'm one of the authors of it, but in
>> general your changes look good to me now.
>>
>> For patch #5 I think it might be cleaner if we move incrementing of
>> the hw_rq_count while starting the scheduler again.
>
> But the increment of  hw_rq_count is conditional on if the guilty job
> was signaled, moving it into drm_sched_start will also force me to pass
> 'job_signaled' flag into drm_sched_start which is against your original
> comment that we don't want to pass this logic around helper functions
> and keep it all in one place which is amdgpu_device_gpu_recover.

Well I hope that incrementing hw_rq_count is conditional for signaled 
jobs anyway, or otherwise we would seriously mess up the counter.

E.g. in drm_sched_stop() we also only decrement it when we where able to 
remove the callback.

Christian.

>
> Andrey
>
>
>> Regards,
>> Christian.
>>
>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>> Ping on this patch and patch 5. The rest already RBed.
>>>
>>> Andrey
>>>
>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>> From: Christian König <christian.koenig@amd.com>
>>>>
>>>> We now destroy finished jobs from the worker thread to make sure that
>>>> we never destroy a job currently in timeout processing.
>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>>>> which should solve a deadlock reported by a user.
>>>>
>>>> v2: Remove unused variable.
>>>> v4: Move guilty job free into sched code.
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>     drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>     drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>     drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>     drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>> +++++++++++++++++------------
>>>>     drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>     include/drm/gpu_scheduler.h                |   6 +-
>>>>     8 files changed, 94 insertions(+), 78 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 7cee269..a0e165c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>> amdgpu_device *adev,
>>>>             if (!ring || !ring->sched.thread)
>>>>                 continue;
>>>>     -        drm_sched_stop(&ring->sched);
>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>                /* after all hw jobs are reset, hw fence is
>>>> meaningless, so force_completion */
>>>>             amdgpu_fence_driver_force_completion(ring);
>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>> amdgpu_device *adev,
>>>>         if(job)
>>>>             drm_sched_increase_karma(&job->base);
>>>>     -
>>>> -
>>>>         if (!amdgpu_sriov_vf(adev)) {
>>>>                if (!need_full_reset)
>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>> amdgpu_hive_info *hive,
>>>>         return r;
>>>>     }
>>>>     -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>> *adev,
>>>> -                      struct amdgpu_job *job)
>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>>>     {
>>>>         int i;
>>>>     @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>> amdgpu_device *adev,
>>>>            /* Post ASIC reset for all devs .*/
>>>>         list_for_each_entry(tmp_adev, device_list_handle,
>>>> gmc.xgmi.head) {
>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>> job : NULL);
>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>                if (r) {
>>>>                 /* bad news, how to tell it to userspace ? */
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> index 33854c9..5778d9c 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>                 mmu_size + gpu->buffer.size;
>>>>            /* Add in the active command buffers */
>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>>             submit = to_etnaviv_submit(s_job);
>>>>             file_size += submit->cmdbuf.size;
>>>>             n_obj++;
>>>>         }
>>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>            /* Add in the active buffer objects */
>>>>         list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>                       gpu->buffer.size,
>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>     -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>>             submit = to_etnaviv_submit(s_job);
>>>>             etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>                           submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>         }
>>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>            /* Reserve space for the bomap */
>>>>         if (n_bomap_pages) {
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> index 6d24fea..a813c82 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>> drm_sched_job *sched_job)
>>>>         }
>>>>            /* block scheduler */
>>>> -    drm_sched_stop(&gpu->sched);
>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>            if(sched_job)
>>>>             drm_sched_increase_karma(sched_job);
>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>> index 97bd9c1..df98931 100644
>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>     static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>> *pipe,
>>>>                          struct lima_sched_task *task)
>>>>     {
>>>> -    drm_sched_stop(&pipe->base);
>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>            if (task)
>>>>             drm_sched_increase_karma(&task->base);
>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> index 0a7ed04..c6336b7 100644
>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>> drm_sched_job *sched_job)
>>>>             sched_job);
>>>>            for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>> -        drm_sched_stop(&pfdev->js->queue[i].sched);
>>>> +        drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>            if (sched_job)
>>>>             drm_sched_increase_karma(sched_job);
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 19fc601..21e8734 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>> drm_gpu_scheduler *sched,
>>>>     }
>>>>     EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>     -/* job_finish is called after hw fence signaled
>>>> - */
>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>> -{
>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>> drm_sched_job,
>>>> -                           finish_work);
>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>> -    unsigned long flags;
>>>> -
>>>> -    /*
>>>> -     * Canceling the timeout without removing our job from the ring
>>>> mirror
>>>> -     * list is safe, as we will only end up in this worker if our jobs
>>>> -     * finished fence has been signaled. So even if some another
>>>> worker
>>>> -     * manages to find this job as the next job in the list, the fence
>>>> -     * signaled check below will prevent the timeout to be restarted.
>>>> -     */
>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>> -
>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -    /* queue TDR for next job */
>>>> -    drm_sched_start_timeout(sched);
>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> -
>>>> -    sched->ops->free_job(s_job);
>>>> -}
>>>> -
>>>>     static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>     {
>>>>         struct drm_gpu_scheduler *sched = s_job->sched;
>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>> work_struct *work)
>>>>         if (job)
>>>>             job->sched->ops->timedout_job(job);
>>>>     +    /*
>>>> +     * Guilty job did complete and hence needs to be manually removed
>>>> +     * See drm_sched_stop doc.
>>>> +     */
>>>> +    if (list_empty(&job->node))
>>>> +        job->sched->ops->free_job(job);
>>>> +
>>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>         drm_sched_start_timeout(sched);
>>>>         spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>      * @sched: scheduler instance
>>>>      * @bad: bad scheduler job
>>>>      *
>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>> + * Note: bad job will not be freed as it might be used later and so
>>>> it's
>>>> + * callers responsibility to release it manually if it's not part
>>>> of the
>>>> + * mirror list any more.
>>>> + *
>>>>      */
>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>> drm_sched_job *bad)
>>>>     {
>>>> -    struct drm_sched_job *s_job;
>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>         unsigned long flags;
>>>> -    struct dma_fence *last_fence =  NULL;
>>>>            kthread_park(sched->thread);
>>>>            /*
>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>> the ring
>>>> -     * by waiting for the latest job to enter the list. This should
>>>> insure that
>>>> -     * also all the previous jobs that were in flight also already
>>>> singaled
>>>> -     * and removed from the list.
>>>> +     * Iterate the job list from later to  earlier one and either
>>>> deactive
>>>> +     * their HW callbacks or remove them from mirror list if they
>>>> already
>>>> +     * signaled.
>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>          */
>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>> node) {
>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>> &sched->ring_mirror_list, node) {
>>>>             if (s_job->s_fence->parent &&
>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>                               &s_job->cb)) {
>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>> *sched)
>>>>                 s_job->s_fence->parent = NULL;
>>>>                 atomic_dec(&sched->hw_rq_count);
>>>>             } else {
>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>> -             break;
>>>> +            /*
>>>> +             * remove job from ring_mirror_list.
>>>> +             * Locking here is for concurrent resume timeout
>>>> +             */
>>>> +            spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +            list_del_init(&s_job->node);
>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +
>>>> +            /*
>>>> +             * Wait for job's HW fence callback to finish using s_job
>>>> +             * before releasing it.
>>>> +             *
>>>> +             * Job is still alive so fence refcount at least 1
>>>> +             */
>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>> +
>>>> +            /*
>>>> +             * We must keep bad job alive for later use during
>>>> +             * recovery by some of the drivers
>>>> +             */
>>>> +            if (bad != s_job)
>>>> +                sched->ops->free_job(s_job);
>>>>             }
>>>>         }
>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> -
>>>> -    if (last_fence) {
>>>> -        dma_fence_wait(last_fence, false);
>>>> -        dma_fence_put(last_fence);
>>>> -    }
>>>>     }
>>>>        EXPORT_SYMBOL(drm_sched_stop);
>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>> full_recovery)
>>>>     {
>>>>         struct drm_sched_job *s_job, *tmp;
>>>> +    unsigned long flags;
>>>>         int r;
>>>>            if (!full_recovery)
>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>> *sched, bool full_recovery)
>>>>            /*
>>>>          * Locking the list is not required here as the sched thread
>>>> is parked
>>>> -     * so no new jobs are being pushed in to HW and in
>>>> drm_sched_stop we
>>>> -     * flushed all the jobs who were still in mirror list but who
>>>> already
>>>> -     * signaled and removed them self from the list. Also concurrent
>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>          * GPU recovers can't run in parallel.
>>>>          */
>>>>         list_for_each_entry_safe(s_job, tmp,
>>>> &sched->ring_mirror_list, node) {
>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>> *sched, bool full_recovery)
>>>>                 drm_sched_process_job(NULL, &s_job->cb);
>>>>         }
>>>>     +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>         drm_sched_start_timeout(sched);
>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>        unpark:
>>>>         kthread_unpark(sched->thread);
>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>> drm_gpu_scheduler *sched)
>>>>         uint64_t guilty_context;
>>>>         bool found_guilty = false;
>>>>     -    /*TODO DO we need spinlock here ? */
>>>>         list_for_each_entry_safe(s_job, tmp,
>>>> &sched->ring_mirror_list, node) {
>>>>             struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>     @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>> *job,
>>>>             return -ENOMEM;
>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
>>>>     -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>         INIT_LIST_HEAD(&job->node);
>>>>            return 0;
>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>         struct drm_sched_job *s_job = container_of(cb, struct
>>>> drm_sched_job, cb);
>>>>         struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>         struct drm_gpu_scheduler *sched = s_fence->sched;
>>>> -    unsigned long flags;
>>>> -
>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>            atomic_dec(&sched->hw_rq_count);
>>>>         atomic_dec(&sched->num_jobs);
>>>>     -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -    /* remove job from ring_mirror_list */
>>>> -    list_del_init(&s_job->node);
>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +    trace_drm_sched_process_job(s_fence);
>>>>            drm_sched_fence_finished(s_fence);
>>>> -
>>>> -    trace_drm_sched_process_job(s_fence);
>>>>         wake_up_interruptible(&sched->wake_up_worker);
>>>> +}
>>>> +
>>>> +/**
>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>> + *
>>>> + * @sched: scheduler instance
>>>> + *
>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>> + */
>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>> +{
>>>> +    unsigned long flags;
>>>> +
>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>> +        return;
>>>> +
>>>> +
>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>> +        struct drm_sched_job *job;
>>>> +
>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>> +                       struct drm_sched_job, node);
>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>> +            break;
>>>> +
>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +        /* remove job from ring_mirror_list */
>>>> +        list_del_init(&job->node);
>>>> +        spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +
>>>> +        sched->ops->free_job(job);
>>>> +    }
>>>> +
>>>> +    /* queue timeout for next job */
>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +    drm_sched_start_timeout(sched);
>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>     -    schedule_work(&s_job->finish_work);
>>>>     }
>>>>        /**
>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>             struct dma_fence *fence;
>>>> wait_event_interruptible(sched->wake_up_worker,
>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>                          (!drm_sched_blocked(sched) &&
>>>>                           (entity = drm_sched_select_entity(sched))) ||
>>>> -                     kthread_should_stop());
>>>> +                     kthread_should_stop()));
>>>>                if (!entity)
>>>>                 continue;
>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> index e740f3b..1a4abe7 100644
>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>> struct drm_sched_job *sched_job)
>>>>            /* block scheduler */
>>>>         for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>            if (sched_job)
>>>>             drm_sched_increase_karma(sched_job);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index 0daca4d..9ee0f27 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>      * @sched: the scheduler instance on which this job is scheduled.
>>>>      * @s_fence: contains the fences for the scheduling of job.
>>>>      * @finish_cb: the callback for the finished fence.
>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>> the job has
>>>> - *               finished to remove the job from the
>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>      * @node: used to append this struct to the
>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>      * @id: a unique id assigned to each job scheduled on the scheduler.
>>>>      * @karma: increment on every hang caused by this job. If this
>>>> exceeds the hang
>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>         struct drm_gpu_scheduler    *sched;
>>>>         struct drm_sched_fence        *s_fence;
>>>>         struct dma_fence_cb        finish_cb;
>>>> -    struct work_struct        finish_work;
>>>>         struct list_head        node;
>>>>         uint64_t            id;
>>>>         atomic_t            karma;
>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>                    void *owner);
>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>     void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>> drm_sched_job *bad);
>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>> full_recovery);
>>>>     void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>     void drm_sched_increase_karma(struct drm_sched_job *bad);
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
  2019-04-17 17:59               ` Christian König
@ 2019-04-17 18:01                 ` Koenig, Christian
  2019-04-17 18:29                   ` Grodzovsky, Andrey
  0 siblings, 1 reply; 12+ messages in thread
From: Koenig, Christian @ 2019-04-17 18:01 UTC (permalink / raw)
  To: Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv
  Cc: Kazlauskas, Nicholas

Am 17.04.19 um 19:59 schrieb Christian König:
> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
>> On 4/17/19 1:17 PM, Christian König wrote:
>>> I can't review this patch, since I'm one of the authors of it, but in
>>> general your changes look good to me now.
>>>
>>> For patch #5 I think it might be cleaner if we move incrementing of
>>> the hw_rq_count while starting the scheduler again.
>>
>> But the increment of  hw_rq_count is conditional on if the guilty job
>> was signaled, moving it into drm_sched_start will also force me to pass
>> 'job_signaled' flag into drm_sched_start which is against your original
>> comment that we don't want to pass this logic around helper functions
>> and keep it all in one place which is amdgpu_device_gpu_recover.
>
> Well I hope that incrementing hw_rq_count is conditional for signaled 
> jobs anyway, or otherwise we would seriously mess up the counter.
>
> E.g. in drm_sched_stop() we also only decrement it when we where able 
> to remove the callback.

Ok, checking the code again we don't need any special handling here 
since all signaled jobs are already removed from the mirror_list.

Christian.

>
> Christian.
>
>>
>> Andrey
>>
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>> Ping on this patch and patch 5. The rest already RBed.
>>>>
>>>> Andrey
>>>>
>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>> we never destroy a job currently in timeout processing.
>>>>> By this we avoid holding lock around ring mirror list in 
>>>>> drm_sched_stop
>>>>> which should solve a deadlock reported by a user.
>>>>>
>>>>> v2: Remove unused variable.
>>>>> v4: Move guilty job free into sched code.
>>>>>
>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>>     drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>>     drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>>     drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>>> +++++++++++++++++------------
>>>>>     drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>>     include/drm/gpu_scheduler.h                |   6 +-
>>>>>     8 files changed, 94 insertions(+), 78 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index 7cee269..a0e165c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>> amdgpu_device *adev,
>>>>>             if (!ring || !ring->sched.thread)
>>>>>                 continue;
>>>>>     -        drm_sched_stop(&ring->sched);
>>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>>                /* after all hw jobs are reset, hw fence is
>>>>> meaningless, so force_completion */
>>>>>             amdgpu_fence_driver_force_completion(ring);
>>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>> amdgpu_device *adev,
>>>>>         if(job)
>>>>>             drm_sched_increase_karma(&job->base);
>>>>>     -
>>>>> -
>>>>>         if (!amdgpu_sriov_vf(adev)) {
>>>>>                if (!need_full_reset)
>>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>>> amdgpu_hive_info *hive,
>>>>>         return r;
>>>>>     }
>>>>>     -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>> *adev,
>>>>> -                      struct amdgpu_job *job)
>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device 
>>>>> *adev)
>>>>>     {
>>>>>         int i;
>>>>>     @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>>> amdgpu_device *adev,
>>>>>            /* Post ASIC reset for all devs .*/
>>>>>         list_for_each_entry(tmp_adev, device_list_handle,
>>>>> gmc.xgmi.head) {
>>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>> job : NULL);
>>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>>                if (r) {
>>>>>                 /* bad news, how to tell it to userspace ? */
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> index 33854c9..5778d9c 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>                 mmu_size + gpu->buffer.size;
>>>>>            /* Add in the active command buffers */
>>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, 
>>>>> node) {
>>>>>             submit = to_etnaviv_submit(s_job);
>>>>>             file_size += submit->cmdbuf.size;
>>>>>             n_obj++;
>>>>>         }
>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>            /* Add in the active buffer objects */
>>>>>         list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>                       gpu->buffer.size,
>>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>>     - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, 
>>>>> node) {
>>>>>             submit = to_etnaviv_submit(s_job);
>>>>>             etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>>                           submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>>         }
>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>            /* Reserve space for the bomap */
>>>>>         if (n_bomap_pages) {
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> index 6d24fea..a813c82 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>>> drm_sched_job *sched_job)
>>>>>         }
>>>>>            /* block scheduler */
>>>>> -    drm_sched_stop(&gpu->sched);
>>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>>            if(sched_job)
>>>>>             drm_sched_increase_karma(sched_job);
>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>> index 97bd9c1..df98931 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>>     static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>>> *pipe,
>>>>>                          struct lima_sched_task *task)
>>>>>     {
>>>>> -    drm_sched_stop(&pipe->base);
>>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>>            if (task)
>>>>>             drm_sched_increase_karma(&task->base);
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> index 0a7ed04..c6336b7 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>>> drm_sched_job *sched_job)
>>>>>             sched_job);
>>>>>            for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>>>>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>>            if (sched_job)
>>>>>             drm_sched_increase_karma(sched_job);
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 19fc601..21e8734 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>>> drm_gpu_scheduler *sched,
>>>>>     }
>>>>>     EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>     -/* job_finish is called after hw fence signaled
>>>>> - */
>>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>>> -{
>>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>>> drm_sched_job,
>>>>> -                           finish_work);
>>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>>> -    unsigned long flags;
>>>>> -
>>>>> -    /*
>>>>> -     * Canceling the timeout without removing our job from the ring
>>>>> mirror
>>>>> -     * list is safe, as we will only end up in this worker if our 
>>>>> jobs
>>>>> -     * finished fence has been signaled. So even if some another
>>>>> worker
>>>>> -     * manages to find this job as the next job in the list, the 
>>>>> fence
>>>>> -     * signaled check below will prevent the timeout to be 
>>>>> restarted.
>>>>> -     */
>>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>>> -
>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -    /* queue TDR for next job */
>>>>> -    drm_sched_start_timeout(sched);
>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> -
>>>>> -    sched->ops->free_job(s_job);
>>>>> -}
>>>>> -
>>>>>     static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>     {
>>>>>         struct drm_gpu_scheduler *sched = s_job->sched;
>>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>>> work_struct *work)
>>>>>         if (job)
>>>>>             job->sched->ops->timedout_job(job);
>>>>>     +    /*
>>>>> +     * Guilty job did complete and hence needs to be manually 
>>>>> removed
>>>>> +     * See drm_sched_stop doc.
>>>>> +     */
>>>>> +    if (list_empty(&job->node))
>>>>> +        job->sched->ops->free_job(job);
>>>>> +
>>>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>         drm_sched_start_timeout(sched);
>>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>>      * @sched: scheduler instance
>>>>>      * @bad: bad scheduler job
>>>>>      *
>>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>>> + * Note: bad job will not be freed as it might be used later and so
>>>>> it's
>>>>> + * callers responsibility to release it manually if it's not part
>>>>> of the
>>>>> + * mirror list any more.
>>>>> + *
>>>>>      */
>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>> drm_sched_job *bad)
>>>>>     {
>>>>> -    struct drm_sched_job *s_job;
>>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>>         unsigned long flags;
>>>>> -    struct dma_fence *last_fence =  NULL;
>>>>>            kthread_park(sched->thread);
>>>>>            /*
>>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>>> the ring
>>>>> -     * by waiting for the latest job to enter the list. This should
>>>>> insure that
>>>>> -     * also all the previous jobs that were in flight also already
>>>>> singaled
>>>>> -     * and removed from the list.
>>>>> +     * Iterate the job list from later to  earlier one and either
>>>>> deactive
>>>>> +     * their HW callbacks or remove them from mirror list if they
>>>>> already
>>>>> +     * signaled.
>>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>>          */
>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>>> node) {
>>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>>> &sched->ring_mirror_list, node) {
>>>>>             if (s_job->s_fence->parent &&
>>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>>                               &s_job->cb)) {
>>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>>> *sched)
>>>>>                 s_job->s_fence->parent = NULL;
>>>>>                 atomic_dec(&sched->hw_rq_count);
>>>>>             } else {
>>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>>> -             break;
>>>>> +            /*
>>>>> +             * remove job from ring_mirror_list.
>>>>> +             * Locking here is for concurrent resume timeout
>>>>> +             */
>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +            list_del_init(&s_job->node);
>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +
>>>>> +            /*
>>>>> +             * Wait for job's HW fence callback to finish using 
>>>>> s_job
>>>>> +             * before releasing it.
>>>>> +             *
>>>>> +             * Job is still alive so fence refcount at least 1
>>>>> +             */
>>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>>> +
>>>>> +            /*
>>>>> +             * We must keep bad job alive for later use during
>>>>> +             * recovery by some of the drivers
>>>>> +             */
>>>>> +            if (bad != s_job)
>>>>> +                sched->ops->free_job(s_job);
>>>>>             }
>>>>>         }
>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> -
>>>>> -    if (last_fence) {
>>>>> -        dma_fence_wait(last_fence, false);
>>>>> -        dma_fence_put(last_fence);
>>>>> -    }
>>>>>     }
>>>>>        EXPORT_SYMBOL(drm_sched_stop);
>>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>> full_recovery)
>>>>>     {
>>>>>         struct drm_sched_job *s_job, *tmp;
>>>>> +    unsigned long flags;
>>>>>         int r;
>>>>>            if (!full_recovery)
>>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>> *sched, bool full_recovery)
>>>>>            /*
>>>>>          * Locking the list is not required here as the sched thread
>>>>> is parked
>>>>> -     * so no new jobs are being pushed in to HW and in
>>>>> drm_sched_stop we
>>>>> -     * flushed all the jobs who were still in mirror list but who
>>>>> already
>>>>> -     * signaled and removed them self from the list. Also concurrent
>>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>>          * GPU recovers can't run in parallel.
>>>>>          */
>>>>>         list_for_each_entry_safe(s_job, tmp,
>>>>> &sched->ring_mirror_list, node) {
>>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>> *sched, bool full_recovery)
>>>>>                 drm_sched_process_job(NULL, &s_job->cb);
>>>>>         }
>>>>>     +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>         drm_sched_start_timeout(sched);
>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>        unpark:
>>>>>         kthread_unpark(sched->thread);
>>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>>> drm_gpu_scheduler *sched)
>>>>>         uint64_t guilty_context;
>>>>>         bool found_guilty = false;
>>>>>     -    /*TODO DO we need spinlock here ? */
>>>>>         list_for_each_entry_safe(s_job, tmp,
>>>>> &sched->ring_mirror_list, node) {
>>>>>             struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>     @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>>> *job,
>>>>>             return -ENOMEM;
>>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>     -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>>         INIT_LIST_HEAD(&job->node);
>>>>>            return 0;
>>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>         struct drm_sched_job *s_job = container_of(cb, struct
>>>>> drm_sched_job, cb);
>>>>>         struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>         struct drm_gpu_scheduler *sched = s_fence->sched;
>>>>> -    unsigned long flags;
>>>>> -
>>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>>            atomic_dec(&sched->hw_rq_count);
>>>>>         atomic_dec(&sched->num_jobs);
>>>>>     -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -    /* remove job from ring_mirror_list */
>>>>> -    list_del_init(&s_job->node);
>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +    trace_drm_sched_process_job(s_fence);
>>>>>            drm_sched_fence_finished(s_fence);
>>>>> -
>>>>> -    trace_drm_sched_process_job(s_fence);
>>>>> wake_up_interruptible(&sched->wake_up_worker);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>>> + *
>>>>> + * @sched: scheduler instance
>>>>> + *
>>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>>> + */
>>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>> +{
>>>>> +    unsigned long flags;
>>>>> +
>>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>>> +        return;
>>>>> +
>>>>> +
>>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>>> +        struct drm_sched_job *job;
>>>>> +
>>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>>> +                       struct drm_sched_job, node);
>>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>> +            break;
>>>>> +
>>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +        /* remove job from ring_mirror_list */
>>>>> +        list_del_init(&job->node);
>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +
>>>>> +        sched->ops->free_job(job);
>>>>> +    }
>>>>> +
>>>>> +    /* queue timeout for next job */
>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +    drm_sched_start_timeout(sched);
>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>     -    schedule_work(&s_job->finish_work);
>>>>>     }
>>>>>        /**
>>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>>             struct dma_fence *fence;
>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>>                          (!drm_sched_blocked(sched) &&
>>>>>                           (entity = 
>>>>> drm_sched_select_entity(sched))) ||
>>>>> -                     kthread_should_stop());
>>>>> +                     kthread_should_stop()));
>>>>>                if (!entity)
>>>>>                 continue;
>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> index e740f3b..1a4abe7 100644
>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>> struct drm_sched_job *sched_job)
>>>>>            /* block scheduler */
>>>>>         for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>            if (sched_job)
>>>>>             drm_sched_increase_karma(sched_job);
>>>>> diff --git a/include/drm/gpu_scheduler.h 
>>>>> b/include/drm/gpu_scheduler.h
>>>>> index 0daca4d..9ee0f27 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>>      * @sched: the scheduler instance on which this job is scheduled.
>>>>>      * @s_fence: contains the fences for the scheduling of job.
>>>>>      * @finish_cb: the callback for the finished fence.
>>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>>> the job has
>>>>> - *               finished to remove the job from the
>>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>>      * @node: used to append this struct to the
>>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>>      * @id: a unique id assigned to each job scheduled on the 
>>>>> scheduler.
>>>>>      * @karma: increment on every hang caused by this job. If this
>>>>> exceeds the hang
>>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>>         struct drm_gpu_scheduler    *sched;
>>>>>         struct drm_sched_fence        *s_fence;
>>>>>         struct dma_fence_cb        finish_cb;
>>>>> -    struct work_struct        finish_work;
>>>>>         struct list_head        node;
>>>>>         uint64_t            id;
>>>>>         atomic_t            karma;
>>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>                    void *owner);
>>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>>     void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>> drm_sched_job *bad);
>>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>> full_recovery);
>>>>>     void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>>     void drm_sched_increase_karma(struct drm_sched_job *bad);
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
  2019-04-17 18:01                 ` Koenig, Christian
@ 2019-04-17 18:29                   ` Grodzovsky, Andrey
  2019-04-17 19:08                     ` Koenig, Christian
  0 siblings, 1 reply; 12+ messages in thread
From: Grodzovsky, Andrey @ 2019-04-17 18:29 UTC (permalink / raw)
  To: Koenig, Christian, dri-devel, amd-gfx, eric, etnaviv; +Cc: Kazlauskas, Nicholas


On 4/17/19 2:01 PM, Koenig, Christian wrote:
> Am 17.04.19 um 19:59 schrieb Christian König:
>> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
>>> On 4/17/19 1:17 PM, Christian König wrote:
>>>> I can't review this patch, since I'm one of the authors of it, but in
>>>> general your changes look good to me now.
>>>>
>>>> For patch #5 I think it might be cleaner if we move incrementing of
>>>> the hw_rq_count while starting the scheduler again.
>>> But the increment of  hw_rq_count is conditional on if the guilty job
>>> was signaled, moving it into drm_sched_start will also force me to pass
>>> 'job_signaled' flag into drm_sched_start which is against your original
>>> comment that we don't want to pass this logic around helper functions
>>> and keep it all in one place which is amdgpu_device_gpu_recover.
>> Well I hope that incrementing hw_rq_count is conditional for signaled
>> jobs anyway, or otherwise we would seriously mess up the counter.
>>
>> E.g. in drm_sched_stop() we also only decrement it when we where able
>> to remove the callback.
> Ok, checking the code again we don't need any special handling here
> since all signaled jobs are already removed from the mirror_list.
>
> Christian.

We decrement in drm_sched_stop and then later if the guilty job is found 
to be signaled we are skipping drm_sched_resubmit_jobs and so will not 
increment back and then the count becomes 'negative' when the fence 
signals and i got a bug. But now i think what i need is to just move the 
atomic_inc(&sched->hw_rq_count) from drm_sched_resubmit_jobs into 
drm_sched_start and so this way i can get rid of the conditional 
re-incriment i am doing now. Agree ?

Andrey

>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>>> Ping on this patch and patch 5. The rest already RBed.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>>
>>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>>> we never destroy a job currently in timeout processing.
>>>>>> By this we avoid holding lock around ring mirror list in
>>>>>> drm_sched_stop
>>>>>> which should solve a deadlock reported by a user.
>>>>>>
>>>>>> v2: Remove unused variable.
>>>>>> v4: Move guilty job free into sched code.
>>>>>>
>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>>>      drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>>>      drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>>>> +++++++++++++++++------------
>>>>>>      drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>>>      include/drm/gpu_scheduler.h                |   6 +-
>>>>>>      8 files changed, 94 insertions(+), 78 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index 7cee269..a0e165c 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>> amdgpu_device *adev,
>>>>>>              if (!ring || !ring->sched.thread)
>>>>>>                  continue;
>>>>>>      -        drm_sched_stop(&ring->sched);
>>>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>>>                 /* after all hw jobs are reset, hw fence is
>>>>>> meaningless, so force_completion */
>>>>>>              amdgpu_fence_driver_force_completion(ring);
>>>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>> amdgpu_device *adev,
>>>>>>          if(job)
>>>>>>              drm_sched_increase_karma(&job->base);
>>>>>>      -
>>>>>> -
>>>>>>          if (!amdgpu_sriov_vf(adev)) {
>>>>>>                 if (!need_full_reset)
>>>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>>>> amdgpu_hive_info *hive,
>>>>>>          return r;
>>>>>>      }
>>>>>>      -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>> *adev,
>>>>>> -                      struct amdgpu_job *job)
>>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>> *adev)
>>>>>>      {
>>>>>>          int i;
>>>>>>      @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>>>> amdgpu_device *adev,
>>>>>>             /* Post ASIC reset for all devs .*/
>>>>>>          list_for_each_entry(tmp_adev, device_list_handle,
>>>>>> gmc.xgmi.head) {
>>>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>>> job : NULL);
>>>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>>>                 if (r) {
>>>>>>                  /* bad news, how to tell it to userspace ? */
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> index 33854c9..5778d9c 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>                  mmu_size + gpu->buffer.size;
>>>>>>             /* Add in the active command buffers */
>>>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>          list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>> node) {
>>>>>>              submit = to_etnaviv_submit(s_job);
>>>>>>              file_size += submit->cmdbuf.size;
>>>>>>              n_obj++;
>>>>>>          }
>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>             /* Add in the active buffer objects */
>>>>>>          list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>                        gpu->buffer.size,
>>>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>>>      - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>          list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>> node) {
>>>>>>              submit = to_etnaviv_submit(s_job);
>>>>>>              etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>>>                            submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>>>          }
>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>             /* Reserve space for the bomap */
>>>>>>          if (n_bomap_pages) {
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> index 6d24fea..a813c82 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>>>> drm_sched_job *sched_job)
>>>>>>          }
>>>>>>             /* block scheduler */
>>>>>> -    drm_sched_stop(&gpu->sched);
>>>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>>>             if(sched_job)
>>>>>>              drm_sched_increase_karma(sched_job);
>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> index 97bd9c1..df98931 100644
>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>>>      static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>>>> *pipe,
>>>>>>                           struct lima_sched_task *task)
>>>>>>      {
>>>>>> -    drm_sched_stop(&pipe->base);
>>>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>>>             if (task)
>>>>>>              drm_sched_increase_karma(&task->base);
>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> index 0a7ed04..c6336b7 100644
>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>>>> drm_sched_job *sched_job)
>>>>>>              sched_job);
>>>>>>             for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>>>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>>>>>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>>>             if (sched_job)
>>>>>>              drm_sched_increase_karma(sched_job);
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 19fc601..21e8734 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>>>> drm_gpu_scheduler *sched,
>>>>>>      }
>>>>>>      EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>>      -/* job_finish is called after hw fence signaled
>>>>>> - */
>>>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>>>> -{
>>>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>>>> drm_sched_job,
>>>>>> -                           finish_work);
>>>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>> -    unsigned long flags;
>>>>>> -
>>>>>> -    /*
>>>>>> -     * Canceling the timeout without removing our job from the ring
>>>>>> mirror
>>>>>> -     * list is safe, as we will only end up in this worker if our
>>>>>> jobs
>>>>>> -     * finished fence has been signaled. So even if some another
>>>>>> worker
>>>>>> -     * manages to find this job as the next job in the list, the
>>>>>> fence
>>>>>> -     * signaled check below will prevent the timeout to be
>>>>>> restarted.
>>>>>> -     */
>>>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>>>> -
>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    /* queue TDR for next job */
>>>>>> -    drm_sched_start_timeout(sched);
>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> -
>>>>>> -    sched->ops->free_job(s_job);
>>>>>> -}
>>>>>> -
>>>>>>      static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>>      {
>>>>>>          struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>>>> work_struct *work)
>>>>>>          if (job)
>>>>>>              job->sched->ops->timedout_job(job);
>>>>>>      +    /*
>>>>>> +     * Guilty job did complete and hence needs to be manually
>>>>>> removed
>>>>>> +     * See drm_sched_stop doc.
>>>>>> +     */
>>>>>> +    if (list_empty(&job->node))
>>>>>> +        job->sched->ops->free_job(job);
>>>>>> +
>>>>>>          spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>          drm_sched_start_timeout(sched);
>>>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>>>       * @sched: scheduler instance
>>>>>>       * @bad: bad scheduler job
>>>>>>       *
>>>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>>>> + * Note: bad job will not be freed as it might be used later and so
>>>>>> it's
>>>>>> + * callers responsibility to release it manually if it's not part
>>>>>> of the
>>>>>> + * mirror list any more.
>>>>>> + *
>>>>>>       */
>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>> drm_sched_job *bad)
>>>>>>      {
>>>>>> -    struct drm_sched_job *s_job;
>>>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>>>          unsigned long flags;
>>>>>> -    struct dma_fence *last_fence =  NULL;
>>>>>>             kthread_park(sched->thread);
>>>>>>             /*
>>>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>>>> the ring
>>>>>> -     * by waiting for the latest job to enter the list. This should
>>>>>> insure that
>>>>>> -     * also all the previous jobs that were in flight also already
>>>>>> singaled
>>>>>> -     * and removed from the list.
>>>>>> +     * Iterate the job list from later to  earlier one and either
>>>>>> deactive
>>>>>> +     * their HW callbacks or remove them from mirror list if they
>>>>>> already
>>>>>> +     * signaled.
>>>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>>>           */
>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>>>> node) {
>>>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>>>> &sched->ring_mirror_list, node) {
>>>>>>              if (s_job->s_fence->parent &&
>>>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>>>                                &s_job->cb)) {
>>>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>>>> *sched)
>>>>>>                  s_job->s_fence->parent = NULL;
>>>>>>                  atomic_dec(&sched->hw_rq_count);
>>>>>>              } else {
>>>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>>>> -             break;
>>>>>> +            /*
>>>>>> +             * remove job from ring_mirror_list.
>>>>>> +             * Locking here is for concurrent resume timeout
>>>>>> +             */
>>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +            list_del_init(&s_job->node);
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> +            /*
>>>>>> +             * Wait for job's HW fence callback to finish using
>>>>>> s_job
>>>>>> +             * before releasing it.
>>>>>> +             *
>>>>>> +             * Job is still alive so fence refcount at least 1
>>>>>> +             */
>>>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>>>> +
>>>>>> +            /*
>>>>>> +             * We must keep bad job alive for later use during
>>>>>> +             * recovery by some of the drivers
>>>>>> +             */
>>>>>> +            if (bad != s_job)
>>>>>> +                sched->ops->free_job(s_job);
>>>>>>              }
>>>>>>          }
>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> -
>>>>>> -    if (last_fence) {
>>>>>> -        dma_fence_wait(last_fence, false);
>>>>>> -        dma_fence_put(last_fence);
>>>>>> -    }
>>>>>>      }
>>>>>>         EXPORT_SYMBOL(drm_sched_stop);
>>>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>>>      void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>> full_recovery)
>>>>>>      {
>>>>>>          struct drm_sched_job *s_job, *tmp;
>>>>>> +    unsigned long flags;
>>>>>>          int r;
>>>>>>             if (!full_recovery)
>>>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>> *sched, bool full_recovery)
>>>>>>             /*
>>>>>>           * Locking the list is not required here as the sched thread
>>>>>> is parked
>>>>>> -     * so no new jobs are being pushed in to HW and in
>>>>>> drm_sched_stop we
>>>>>> -     * flushed all the jobs who were still in mirror list but who
>>>>>> already
>>>>>> -     * signaled and removed them self from the list. Also concurrent
>>>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>>>           * GPU recovers can't run in parallel.
>>>>>>           */
>>>>>>          list_for_each_entry_safe(s_job, tmp,
>>>>>> &sched->ring_mirror_list, node) {
>>>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>> *sched, bool full_recovery)
>>>>>>                  drm_sched_process_job(NULL, &s_job->cb);
>>>>>>          }
>>>>>>      +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>          drm_sched_start_timeout(sched);
>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>         unpark:
>>>>>>          kthread_unpark(sched->thread);
>>>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>>>> drm_gpu_scheduler *sched)
>>>>>>          uint64_t guilty_context;
>>>>>>          bool found_guilty = false;
>>>>>>      -    /*TODO DO we need spinlock here ? */
>>>>>>          list_for_each_entry_safe(s_job, tmp,
>>>>>> &sched->ring_mirror_list, node) {
>>>>>>              struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>      @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>>>> *job,
>>>>>>              return -ENOMEM;
>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>      -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>>>          INIT_LIST_HEAD(&job->node);
>>>>>>             return 0;
>>>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>>          struct drm_sched_job *s_job = container_of(cb, struct
>>>>>> drm_sched_job, cb);
>>>>>>          struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>          struct drm_gpu_scheduler *sched = s_fence->sched;
>>>>>> -    unsigned long flags;
>>>>>> -
>>>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>>>             atomic_dec(&sched->hw_rq_count);
>>>>>>          atomic_dec(&sched->num_jobs);
>>>>>>      -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    /* remove job from ring_mirror_list */
>>>>>> -    list_del_init(&s_job->node);
>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +    trace_drm_sched_process_job(s_fence);
>>>>>>             drm_sched_fence_finished(s_fence);
>>>>>> -
>>>>>> -    trace_drm_sched_process_job(s_fence);
>>>>>> wake_up_interruptible(&sched->wake_up_worker);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>>>> + *
>>>>>> + * @sched: scheduler instance
>>>>>> + *
>>>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>>>> + */
>>>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>>> +{
>>>>>> +    unsigned long flags;
>>>>>> +
>>>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>>>> +        return;
>>>>>> +
>>>>>> +
>>>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>>>> +        struct drm_sched_job *job;
>>>>>> +
>>>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>>>> +                       struct drm_sched_job, node);
>>>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>>> +            break;
>>>>>> +
>>>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +        /* remove job from ring_mirror_list */
>>>>>> +        list_del_init(&job->node);
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> +        sched->ops->free_job(job);
>>>>>> +    }
>>>>>> +
>>>>>> +    /* queue timeout for next job */
>>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +    drm_sched_start_timeout(sched);
>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>      -    schedule_work(&s_job->finish_work);
>>>>>>      }
>>>>>>         /**
>>>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>>>              struct dma_fence *fence;
>>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>>>                           (!drm_sched_blocked(sched) &&
>>>>>>                            (entity =
>>>>>> drm_sched_select_entity(sched))) ||
>>>>>> -                     kthread_should_stop());
>>>>>> +                     kthread_should_stop()));
>>>>>>                 if (!entity)
>>>>>>                  continue;
>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> index e740f3b..1a4abe7 100644
>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>>> struct drm_sched_job *sched_job)
>>>>>>             /* block scheduler */
>>>>>>          for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>>             if (sched_job)
>>>>>>              drm_sched_increase_karma(sched_job);
>>>>>> diff --git a/include/drm/gpu_scheduler.h
>>>>>> b/include/drm/gpu_scheduler.h
>>>>>> index 0daca4d..9ee0f27 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>>>       * @sched: the scheduler instance on which this job is scheduled.
>>>>>>       * @s_fence: contains the fences for the scheduling of job.
>>>>>>       * @finish_cb: the callback for the finished fence.
>>>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>>>> the job has
>>>>>> - *               finished to remove the job from the
>>>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>>>       * @node: used to append this struct to the
>>>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>>>       * @id: a unique id assigned to each job scheduled on the
>>>>>> scheduler.
>>>>>>       * @karma: increment on every hang caused by this job. If this
>>>>>> exceeds the hang
>>>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>>>          struct drm_gpu_scheduler    *sched;
>>>>>>          struct drm_sched_fence        *s_fence;
>>>>>>          struct dma_fence_cb        finish_cb;
>>>>>> -    struct work_struct        finish_work;
>>>>>>          struct list_head        node;
>>>>>>          uint64_t            id;
>>>>>>          atomic_t            karma;
>>>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>                     void *owner);
>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>>>      void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>> drm_sched_job *bad);
>>>>>>      void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>> full_recovery);
>>>>>>      void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>>>      void drm_sched_increase_karma(struct drm_sched_job *bad);
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
  2019-04-17 18:29                   ` Grodzovsky, Andrey
@ 2019-04-17 19:08                     ` Koenig, Christian
  0 siblings, 0 replies; 12+ messages in thread
From: Koenig, Christian @ 2019-04-17 19:08 UTC (permalink / raw)
  To: Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv
  Cc: Kazlauskas, Nicholas

Am 17.04.19 um 20:29 schrieb Grodzovsky, Andrey:
> On 4/17/19 2:01 PM, Koenig, Christian wrote:
>> Am 17.04.19 um 19:59 schrieb Christian König:
>>> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
>>>> On 4/17/19 1:17 PM, Christian König wrote:
>>>>> I can't review this patch, since I'm one of the authors of it, but in
>>>>> general your changes look good to me now.
>>>>>
>>>>> For patch #5 I think it might be cleaner if we move incrementing of
>>>>> the hw_rq_count while starting the scheduler again.
>>>> But the increment of  hw_rq_count is conditional on if the guilty job
>>>> was signaled, moving it into drm_sched_start will also force me to pass
>>>> 'job_signaled' flag into drm_sched_start which is against your original
>>>> comment that we don't want to pass this logic around helper functions
>>>> and keep it all in one place which is amdgpu_device_gpu_recover.
>>> Well I hope that incrementing hw_rq_count is conditional for signaled
>>> jobs anyway, or otherwise we would seriously mess up the counter.
>>>
>>> E.g. in drm_sched_stop() we also only decrement it when we where able
>>> to remove the callback.
>> Ok, checking the code again we don't need any special handling here
>> since all signaled jobs are already removed from the mirror_list.
>>
>> Christian.
> We decrement in drm_sched_stop and then later if the guilty job is found
> to be signaled we are skipping drm_sched_resubmit_jobs and so will not
> increment back and then the count becomes 'negative' when the fence
> signals and i got a bug. But now i think what i need is to just move the
> atomic_inc(&sched->hw_rq_count) from drm_sched_resubmit_jobs into
> drm_sched_start and so this way i can get rid of the conditional
> re-incriment i am doing now. Agree ?

Yes, exactly what I had in mind after checking the code once more as well.

Christian.

>
> Andrey
>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>>>> Ping on this patch and patch 5. The rest already RBed.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>>>
>>>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>>>> we never destroy a job currently in timeout processing.
>>>>>>> By this we avoid holding lock around ring mirror list in
>>>>>>> drm_sched_stop
>>>>>>> which should solve a deadlock reported by a user.
>>>>>>>
>>>>>>> v2: Remove unused variable.
>>>>>>> v4: Move guilty job free into sched code.
>>>>>>>
>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>>>>       drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>>>>       drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>>>>       drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>>>>       drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>>>>       drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>>>>> +++++++++++++++++------------
>>>>>>>       drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>>>>       include/drm/gpu_scheduler.h                |   6 +-
>>>>>>>       8 files changed, 94 insertions(+), 78 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> index 7cee269..a0e165c 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>               if (!ring || !ring->sched.thread)
>>>>>>>                   continue;
>>>>>>>       -        drm_sched_stop(&ring->sched);
>>>>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>>>>                  /* after all hw jobs are reset, hw fence is
>>>>>>> meaningless, so force_completion */
>>>>>>>               amdgpu_fence_driver_force_completion(ring);
>>>>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>           if(job)
>>>>>>>               drm_sched_increase_karma(&job->base);
>>>>>>>       -
>>>>>>> -
>>>>>>>           if (!amdgpu_sriov_vf(adev)) {
>>>>>>>                  if (!need_full_reset)
>>>>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>>>>> amdgpu_hive_info *hive,
>>>>>>>           return r;
>>>>>>>       }
>>>>>>>       -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>>> *adev,
>>>>>>> -                      struct amdgpu_job *job)
>>>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>>> *adev)
>>>>>>>       {
>>>>>>>           int i;
>>>>>>>       @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>              /* Post ASIC reset for all devs .*/
>>>>>>>           list_for_each_entry(tmp_adev, device_list_handle,
>>>>>>> gmc.xgmi.head) {
>>>>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>>>> job : NULL);
>>>>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>>>>                  if (r) {
>>>>>>>                   /* bad news, how to tell it to userspace ? */
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> index 33854c9..5778d9c 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>>                   mmu_size + gpu->buffer.size;
>>>>>>>              /* Add in the active command buffers */
>>>>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>>           list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>>> node) {
>>>>>>>               submit = to_etnaviv_submit(s_job);
>>>>>>>               file_size += submit->cmdbuf.size;
>>>>>>>               n_obj++;
>>>>>>>           }
>>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>>              /* Add in the active buffer objects */
>>>>>>>           list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>>                         gpu->buffer.size,
>>>>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>>>>       - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>>           list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>>> node) {
>>>>>>>               submit = to_etnaviv_submit(s_job);
>>>>>>>               etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>>>>                             submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>>>>           }
>>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>>              /* Reserve space for the bomap */
>>>>>>>           if (n_bomap_pages) {
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index 6d24fea..a813c82 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>>>>> drm_sched_job *sched_job)
>>>>>>>           }
>>>>>>>              /* block scheduler */
>>>>>>> -    drm_sched_stop(&gpu->sched);
>>>>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>>>>              if(sched_job)
>>>>>>>               drm_sched_increase_karma(sched_job);
>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index 97bd9c1..df98931 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>>>>       static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>>>>> *pipe,
>>>>>>>                            struct lima_sched_task *task)
>>>>>>>       {
>>>>>>> -    drm_sched_stop(&pipe->base);
>>>>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>>>>              if (task)
>>>>>>>               drm_sched_increase_karma(&task->base);
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 0a7ed04..c6336b7 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>>>>> drm_sched_job *sched_job)
>>>>>>>               sched_job);
>>>>>>>              for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>>>>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>>>>>>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>>>>              if (sched_job)
>>>>>>>               drm_sched_increase_karma(sched_job);
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index 19fc601..21e8734 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>>>>> drm_gpu_scheduler *sched,
>>>>>>>       }
>>>>>>>       EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>>>       -/* job_finish is called after hw fence signaled
>>>>>>> - */
>>>>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>>>>> -{
>>>>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>>>>> drm_sched_job,
>>>>>>> -                           finish_work);
>>>>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>>> -    unsigned long flags;
>>>>>>> -
>>>>>>> -    /*
>>>>>>> -     * Canceling the timeout without removing our job from the ring
>>>>>>> mirror
>>>>>>> -     * list is safe, as we will only end up in this worker if our
>>>>>>> jobs
>>>>>>> -     * finished fence has been signaled. So even if some another
>>>>>>> worker
>>>>>>> -     * manages to find this job as the next job in the list, the
>>>>>>> fence
>>>>>>> -     * signaled check below will prevent the timeout to be
>>>>>>> restarted.
>>>>>>> -     */
>>>>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>>>>> -
>>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> -    /* queue TDR for next job */
>>>>>>> -    drm_sched_start_timeout(sched);
>>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> -
>>>>>>> -    sched->ops->free_job(s_job);
>>>>>>> -}
>>>>>>> -
>>>>>>>       static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>>>       {
>>>>>>>           struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>           if (job)
>>>>>>>               job->sched->ops->timedout_job(job);
>>>>>>>       +    /*
>>>>>>> +     * Guilty job did complete and hence needs to be manually
>>>>>>> removed
>>>>>>> +     * See drm_sched_stop doc.
>>>>>>> +     */
>>>>>>> +    if (list_empty(&job->node))
>>>>>>> +        job->sched->ops->free_job(job);
>>>>>>> +
>>>>>>>           spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>>           drm_sched_start_timeout(sched);
>>>>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>>>>        * @sched: scheduler instance
>>>>>>>        * @bad: bad scheduler job
>>>>>>>        *
>>>>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>>>>> + * Note: bad job will not be freed as it might be used later and so
>>>>>>> it's
>>>>>>> + * callers responsibility to release it manually if it's not part
>>>>>>> of the
>>>>>>> + * mirror list any more.
>>>>>>> + *
>>>>>>>        */
>>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>>> drm_sched_job *bad)
>>>>>>>       {
>>>>>>> -    struct drm_sched_job *s_job;
>>>>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>>>>           unsigned long flags;
>>>>>>> -    struct dma_fence *last_fence =  NULL;
>>>>>>>              kthread_park(sched->thread);
>>>>>>>              /*
>>>>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>>>>> the ring
>>>>>>> -     * by waiting for the latest job to enter the list. This should
>>>>>>> insure that
>>>>>>> -     * also all the previous jobs that were in flight also already
>>>>>>> singaled
>>>>>>> -     * and removed from the list.
>>>>>>> +     * Iterate the job list from later to  earlier one and either
>>>>>>> deactive
>>>>>>> +     * their HW callbacks or remove them from mirror list if they
>>>>>>> already
>>>>>>> +     * signaled.
>>>>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>>>>            */
>>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>>>>> node) {
>>>>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>>>>> &sched->ring_mirror_list, node) {
>>>>>>>               if (s_job->s_fence->parent &&
>>>>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>>>>                                 &s_job->cb)) {
>>>>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>>>>> *sched)
>>>>>>>                   s_job->s_fence->parent = NULL;
>>>>>>>                   atomic_dec(&sched->hw_rq_count);
>>>>>>>               } else {
>>>>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>>>>> -             break;
>>>>>>> +            /*
>>>>>>> +             * remove job from ring_mirror_list.
>>>>>>> +             * Locking here is for concurrent resume timeout
>>>>>>> +             */
>>>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> +            list_del_init(&s_job->node);
>>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * Wait for job's HW fence callback to finish using
>>>>>>> s_job
>>>>>>> +             * before releasing it.
>>>>>>> +             *
>>>>>>> +             * Job is still alive so fence refcount at least 1
>>>>>>> +             */
>>>>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * We must keep bad job alive for later use during
>>>>>>> +             * recovery by some of the drivers
>>>>>>> +             */
>>>>>>> +            if (bad != s_job)
>>>>>>> +                sched->ops->free_job(s_job);
>>>>>>>               }
>>>>>>>           }
>>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> -
>>>>>>> -    if (last_fence) {
>>>>>>> -        dma_fence_wait(last_fence, false);
>>>>>>> -        dma_fence_put(last_fence);
>>>>>>> -    }
>>>>>>>       }
>>>>>>>          EXPORT_SYMBOL(drm_sched_stop);
>>>>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>>>>       void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>>> full_recovery)
>>>>>>>       {
>>>>>>>           struct drm_sched_job *s_job, *tmp;
>>>>>>> +    unsigned long flags;
>>>>>>>           int r;
>>>>>>>              if (!full_recovery)
>>>>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>>> *sched, bool full_recovery)
>>>>>>>              /*
>>>>>>>            * Locking the list is not required here as the sched thread
>>>>>>> is parked
>>>>>>> -     * so no new jobs are being pushed in to HW and in
>>>>>>> drm_sched_stop we
>>>>>>> -     * flushed all the jobs who were still in mirror list but who
>>>>>>> already
>>>>>>> -     * signaled and removed them self from the list. Also concurrent
>>>>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>>>>            * GPU recovers can't run in parallel.
>>>>>>>            */
>>>>>>>           list_for_each_entry_safe(s_job, tmp,
>>>>>>> &sched->ring_mirror_list, node) {
>>>>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>>> *sched, bool full_recovery)
>>>>>>>                   drm_sched_process_job(NULL, &s_job->cb);
>>>>>>>           }
>>>>>>>       +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>>           drm_sched_start_timeout(sched);
>>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>>          unpark:
>>>>>>>           kthread_unpark(sched->thread);
>>>>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>>>>> drm_gpu_scheduler *sched)
>>>>>>>           uint64_t guilty_context;
>>>>>>>           bool found_guilty = false;
>>>>>>>       -    /*TODO DO we need spinlock here ? */
>>>>>>>           list_for_each_entry_safe(s_job, tmp,
>>>>>>> &sched->ring_mirror_list, node) {
>>>>>>>               struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>>       @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>>>>> *job,
>>>>>>>               return -ENOMEM;
>>>>>>>           job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>       -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>>>>           INIT_LIST_HEAD(&job->node);
>>>>>>>              return 0;
>>>>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>>>           struct drm_sched_job *s_job = container_of(cb, struct
>>>>>>> drm_sched_job, cb);
>>>>>>>           struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>>           struct drm_gpu_scheduler *sched = s_fence->sched;
>>>>>>> -    unsigned long flags;
>>>>>>> -
>>>>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>>>>              atomic_dec(&sched->hw_rq_count);
>>>>>>>           atomic_dec(&sched->num_jobs);
>>>>>>>       -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> -    /* remove job from ring_mirror_list */
>>>>>>> -    list_del_init(&s_job->node);
>>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> +    trace_drm_sched_process_job(s_fence);
>>>>>>>              drm_sched_fence_finished(s_fence);
>>>>>>> -
>>>>>>> -    trace_drm_sched_process_job(s_fence);
>>>>>>> wake_up_interruptible(&sched->wake_up_worker);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>>>>> + *
>>>>>>> + * @sched: scheduler instance
>>>>>>> + *
>>>>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>>>>> + */
>>>>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>>>> +{
>>>>>>> +    unsigned long flags;
>>>>>>> +
>>>>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +
>>>>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>>>>> +        struct drm_sched_job *job;
>>>>>>> +
>>>>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>>>>> +                       struct drm_sched_job, node);
>>>>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>>>> +            break;
>>>>>>> +
>>>>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> +        /* remove job from ring_mirror_list */
>>>>>>> +        list_del_init(&job->node);
>>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> +
>>>>>>> +        sched->ops->free_job(job);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* queue timeout for next job */
>>>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> +    drm_sched_start_timeout(sched);
>>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>>       -    schedule_work(&s_job->finish_work);
>>>>>>>       }
>>>>>>>          /**
>>>>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>>>>               struct dma_fence *fence;
>>>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>>>>                            (!drm_sched_blocked(sched) &&
>>>>>>>                             (entity =
>>>>>>> drm_sched_select_entity(sched))) ||
>>>>>>> -                     kthread_should_stop());
>>>>>>> +                     kthread_should_stop()));
>>>>>>>                  if (!entity)
>>>>>>>                   continue;
>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index e740f3b..1a4abe7 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>>>> struct drm_sched_job *sched_job)
>>>>>>>              /* block scheduler */
>>>>>>>           for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>>>              if (sched_job)
>>>>>>>               drm_sched_increase_karma(sched_job);
>>>>>>> diff --git a/include/drm/gpu_scheduler.h
>>>>>>> b/include/drm/gpu_scheduler.h
>>>>>>> index 0daca4d..9ee0f27 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>>>>        * @sched: the scheduler instance on which this job is scheduled.
>>>>>>>        * @s_fence: contains the fences for the scheduling of job.
>>>>>>>        * @finish_cb: the callback for the finished fence.
>>>>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>>>>> the job has
>>>>>>> - *               finished to remove the job from the
>>>>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>>>>        * @node: used to append this struct to the
>>>>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>>>>        * @id: a unique id assigned to each job scheduled on the
>>>>>>> scheduler.
>>>>>>>        * @karma: increment on every hang caused by this job. If this
>>>>>>> exceeds the hang
>>>>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>>>>           struct drm_gpu_scheduler    *sched;
>>>>>>>           struct drm_sched_fence        *s_fence;
>>>>>>>           struct dma_fence_cb        finish_cb;
>>>>>>> -    struct work_struct        finish_work;
>>>>>>>           struct list_head        node;
>>>>>>>           uint64_t            id;
>>>>>>>           atomic_t            karma;
>>>>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>                      void *owner);
>>>>>>>       void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>>>>       void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>>> drm_sched_job *bad);
>>>>>>>       void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>>> full_recovery);
>>>>>>>       void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>>>>       void drm_sched_increase_karma(struct drm_sched_job *bad);
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-17 19:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-16 18:23 [PATCH v4 1/5] drm/amd/display: wait for fence without holding reservation lock Andrey Grodzovsky
2019-04-16 18:23 ` [PATCH v4 2/5] drm/amd/display: Use a reasonable timeout for framebuffer fence waits Andrey Grodzovsky
     [not found] ` <1555438986-18303-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2019-04-16 18:23   ` [PATCH v4 3/5] drm/scheduler: rework job destruction Andrey Grodzovsky
2019-04-17 14:36     ` Grodzovsky, Andrey
2019-04-17 17:17       ` Christian König
     [not found]         ` <a7766a6f-e22f-daf8-99f5-e46a7eb8e679-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-17 17:53           ` Grodzovsky, Andrey
     [not found]             ` <a883b69d-a84e-d498-658c-7bc1ad10bbef-5C7GfCeVMHo@public.gmane.org>
2019-04-17 17:59               ` Christian König
2019-04-17 18:01                 ` Koenig, Christian
2019-04-17 18:29                   ` Grodzovsky, Andrey
2019-04-17 19:08                     ` Koenig, Christian
2019-04-16 18:23   ` [PATCH v4 4/5] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky
2019-04-16 18:23   ` [PATCH v4 5/5] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.