* [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock @ 2019-04-18 15:00 Andrey Grodzovsky [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Andrey Grodzovsky @ 2019-04-18 15:00 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Nicholas.Kazlauskas-5C7GfCeVMHo, Christian König From: Christian König <ckoenig.leichtzumerken@gmail.com> Don't block others while waiting for the fences to finish, concurrent submission is perfectly valid in this case and holding the lock can prevent killed applications from terminating. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 380a7f9..ad4f0e5 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4814,23 +4814,26 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, continue; } + abo = gem_to_amdgpu_bo(fb->obj[0]); + + /* Wait for all fences on this FB */ + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, + false, + MAX_SCHEDULE_TIMEOUT); + WARN_ON(r < 0); + /* * TODO This might fail and hence better not used, wait * explicitly on fences instead * and in general should be called for * blocking commit to as per framework helpers */ - abo = gem_to_amdgpu_bo(fb->obj[0]); r = amdgpu_bo_reserve(abo, true); if (unlikely(r != 0)) { DRM_ERROR("failed to reserve buffer before flip\n"); WARN_ON(1); } - /* Wait for all fences on this FB */ - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false, - MAX_SCHEDULE_TIMEOUT) < 0); - amdgpu_bo_get_tiling_flags(abo, &tiling_flags); amdgpu_bo_unreserve(abo); -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 31+ messages in thread
[parent not found: <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>]
* [PATCH v5 2/6] drm/amd/display: Use a reasonable timeout for framebuffer fence waits [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> @ 2019-04-18 15:00 ` Andrey Grodzovsky 2019-04-18 15:00 ` [PATCH v5 3/6] drm/scheduler: rework job destruction Andrey Grodzovsky ` (4 subsequent siblings) 5 siblings, 0 replies; 31+ messages in thread From: Andrey Grodzovsky @ 2019-04-18 15:00 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo Patch '5edb0c9b Fix deadlock with display during hanged ring recovery' was accidentaly removed during one of DALs code merges. v4: Update description. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index ad4f0e5..88e42ad 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4816,11 +4816,16 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, abo = gem_to_amdgpu_bo(fb->obj[0]); - /* Wait for all fences on this FB */ + /* + * Wait for all fences on this FB. Do limited wait to avoid + * deadlock during GPU reset when this fence will not signal + * but we hold reservation lock for the BO. + */ r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false, - MAX_SCHEDULE_TIMEOUT); - WARN_ON(r < 0); + msecs_to_jiffies(5000)); + if (unlikely(r <= 0)) + DRM_ERROR("Waiting for fences timed out or interrupted!"); /* * TODO This might fail and hence better not used, wait @@ -4829,10 +4834,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, * blocking commit to as per framework helpers */ r = amdgpu_bo_reserve(abo, true); - if (unlikely(r != 0)) { + if (unlikely(r != 0)) DRM_ERROR("failed to reserve buffer before flip\n"); - WARN_ON(1); - } amdgpu_bo_get_tiling_flags(abo, &tiling_flags); -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 3/6] drm/scheduler: rework job destruction [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-18 15:00 ` [PATCH v5 2/6] drm/amd/display: Use a reasonable timeout for framebuffer fence waits Andrey Grodzovsky @ 2019-04-18 15:00 ` Andrey Grodzovsky 2019-04-22 12:48 ` Chunming Zhou 2019-05-29 10:02 ` Daniel Vetter 2019-04-18 15:00 ` [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky ` (3 subsequent siblings) 5 siblings, 2 replies; 31+ messages in thread From: Andrey Grodzovsky @ 2019-04-18 15:00 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo, Christian König From: Christian König <christian.koenig@amd.com> We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. v4: Move guilty job free into sched code. v5: Move sched->hw_rq_count to drm_sched_start to account for counter decrement in drm_sched_stop even when we don't call resubmit jobs if guily job did signal. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- drivers/gpu/drm/lima/lima_sched.c | 2 +- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- drivers/gpu/drm/scheduler/sched_main.c | 159 +++++++++++++++++------------ drivers/gpu/drm/v3d/v3d_sched.c | 2 +- include/drm/gpu_scheduler.h | 6 +- 8 files changed, 102 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7cee269..a0e165c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; - drm_sched_stop(&ring->sched); + drm_sched_stop(&ring->sched, &job->base); /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if(job) drm_sched_increase_karma(&job->base); - - if (!amdgpu_sriov_vf(adev)) { if (!need_full_reset) @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, return r; } -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, - struct amdgpu_job *job) +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) { int i; @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, /* Post ASIC reset for all devs .*/ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); + amdgpu_device_post_asic_reset(tmp_adev); if (r) { /* bad news, how to tell it to userspace ? */ diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c index 33854c9..5778d9c 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) mmu_size + gpu->buffer.size; /* Add in the active command buffers */ - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { submit = to_etnaviv_submit(s_job); file_size += submit->cmdbuf.size; n_obj++; } - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); /* Add in the active buffer objects */ list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) gpu->buffer.size, etnaviv_cmdbuf_get_va(&gpu->buffer)); - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { submit = to_etnaviv_submit(s_job); etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, submit->cmdbuf.vaddr, submit->cmdbuf.size, etnaviv_cmdbuf_get_va(&submit->cmdbuf)); } - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); /* Reserve space for the bomap */ if (n_bomap_pages) { diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 6d24fea..a813c82 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) } /* block scheduler */ - drm_sched_stop(&gpu->sched); + drm_sched_stop(&gpu->sched, sched_job); if(sched_job) drm_sched_increase_karma(sched_job); diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 97bd9c1..df98931 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job) static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe, struct lima_sched_task *task) { - drm_sched_stop(&pipe->base); + drm_sched_stop(&pipe->base, &task->base); if (task) drm_sched_increase_karma(&task->base); diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 0a7ed04..c6336b7 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) sched_job); for (i = 0; i < NUM_JOB_SLOTS; i++) - drm_sched_stop(&pfdev->js->queue[i].sched); + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); if (sched_job) drm_sched_increase_karma(sched_job); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 19fc601..7816de7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, } EXPORT_SYMBOL(drm_sched_resume_timeout); -/* job_finish is called after hw fence signaled - */ -static void drm_sched_job_finish(struct work_struct *work) -{ - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, - finish_work); - struct drm_gpu_scheduler *sched = s_job->sched; - unsigned long flags; - - /* - * Canceling the timeout without removing our job from the ring mirror - * list is safe, as we will only end up in this worker if our jobs - * finished fence has been signaled. So even if some another worker - * manages to find this job as the next job in the list, the fence - * signaled check below will prevent the timeout to be restarted. - */ - cancel_delayed_work_sync(&sched->work_tdr); - - spin_lock_irqsave(&sched->job_list_lock, flags); - /* queue TDR for next job */ - drm_sched_start_timeout(sched); - spin_unlock_irqrestore(&sched->job_list_lock, flags); - - sched->ops->free_job(s_job); -} - static void drm_sched_job_begin(struct drm_sched_job *s_job) { struct drm_gpu_scheduler *sched = s_job->sched; @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work) if (job) job->sched->ops->timedout_job(job); + /* + * Guilty job did complete and hence needs to be manually removed + * See drm_sched_stop doc. + */ + if (list_empty(&job->node)) + job->sched->ops->free_job(job); + spin_lock_irqsave(&sched->job_list_lock, flags); drm_sched_start_timeout(sched); spin_unlock_irqrestore(&sched->job_list_lock, flags); @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); * @sched: scheduler instance * @bad: bad scheduler job * + * Stop the scheduler and also removes and frees all completed jobs. + * Note: bad job will not be freed as it might be used later and so it's + * callers responsibility to release it manually if it's not part of the + * mirror list any more. + * */ -void drm_sched_stop(struct drm_gpu_scheduler *sched) +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) { - struct drm_sched_job *s_job; + struct drm_sched_job *s_job, *tmp; unsigned long flags; - struct dma_fence *last_fence = NULL; kthread_park(sched->thread); /* - * Verify all the signaled jobs in mirror list are removed from the ring - * by waiting for the latest job to enter the list. This should insure that - * also all the previous jobs that were in flight also already singaled - * and removed from the list. + * Iterate the job list from later to earlier one and either deactive + * their HW callbacks or remove them from mirror list if they already + * signaled. + * This iteration is thread safe as sched thread is stopped. */ - spin_lock_irqsave(&sched->job_list_lock, flags); - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, &s_job->cb)) { @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) s_job->s_fence->parent = NULL; atomic_dec(&sched->hw_rq_count); } else { - last_fence = dma_fence_get(&s_job->s_fence->finished); - break; + /* + * remove job from ring_mirror_list. + * Locking here is for concurrent resume timeout + */ + spin_lock_irqsave(&sched->job_list_lock, flags); + list_del_init(&s_job->node); + spin_unlock_irqrestore(&sched->job_list_lock, flags); + + /* + * Wait for job's HW fence callback to finish using s_job + * before releasing it. + * + * Job is still alive so fence refcount at least 1 + */ + dma_fence_wait(&s_job->s_fence->finished, false); + + /* + * We must keep bad job alive for later use during + * recovery by some of the drivers + */ + if (bad != s_job) + sched->ops->free_job(s_job); } } - spin_unlock_irqrestore(&sched->job_list_lock, flags); - - if (last_fence) { - dma_fence_wait(last_fence, false); - dma_fence_put(last_fence); - } } EXPORT_SYMBOL(drm_sched_stop); @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop); void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) { struct drm_sched_job *s_job, *tmp; + unsigned long flags; int r; - if (!full_recovery) - goto unpark; - /* * Locking the list is not required here as the sched thread is parked - * so no new jobs are being pushed in to HW and in drm_sched_stop we - * flushed all the jobs who were still in mirror list but who already - * signaled and removed them self from the list. Also concurrent + * so no new jobs are being inserted or removed. Also concurrent * GPU recovers can't run in parallel. */ list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { struct dma_fence *fence = s_job->s_fence->parent; + atomic_inc(&sched->hw_rq_count); + + if (!full_recovery) + continue; + if (fence) { r = dma_fence_add_callback(fence, &s_job->cb, drm_sched_process_job); @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) drm_sched_process_job(NULL, &s_job->cb); } - drm_sched_start_timeout(sched); + if (full_recovery) { + spin_lock_irqsave(&sched->job_list_lock, flags); + drm_sched_start_timeout(sched); + spin_unlock_irqrestore(&sched->job_list_lock, flags); + } -unpark: kthread_unpark(sched->thread); } EXPORT_SYMBOL(drm_sched_start); @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) uint64_t guilty_context; bool found_guilty = false; - /*TODO DO we need spinlock here ? */ list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { struct drm_sched_fence *s_fence = s_job->s_fence; @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) dma_fence_set_error(&s_fence->finished, -ECANCELED); s_job->s_fence->parent = sched->ops->run_job(s_job); - atomic_inc(&sched->hw_rq_count); } } EXPORT_SYMBOL(drm_sched_resubmit_jobs); @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job, return -ENOMEM; job->id = atomic64_inc_return(&sched->job_id_count); - INIT_WORK(&job->finish_work, drm_sched_job_finish); INIT_LIST_HEAD(&job->node); return 0; @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); struct drm_sched_fence *s_fence = s_job->s_fence; struct drm_gpu_scheduler *sched = s_fence->sched; - unsigned long flags; - - cancel_delayed_work(&sched->work_tdr); atomic_dec(&sched->hw_rq_count); atomic_dec(&sched->num_jobs); - spin_lock_irqsave(&sched->job_list_lock, flags); - /* remove job from ring_mirror_list */ - list_del_init(&s_job->node); - spin_unlock_irqrestore(&sched->job_list_lock, flags); + trace_drm_sched_process_job(s_fence); drm_sched_fence_finished(s_fence); - - trace_drm_sched_process_job(s_fence); wake_up_interruptible(&sched->wake_up_worker); +} + +/** + * drm_sched_cleanup_jobs - destroy finished jobs + * + * @sched: scheduler instance + * + * Remove all finished jobs from the mirror list and destroy them. + */ +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) +{ + unsigned long flags; + + /* Don't destroy jobs while the timeout worker is running */ + if (!cancel_delayed_work(&sched->work_tdr)) + return; + + + while (!list_empty(&sched->ring_mirror_list)) { + struct drm_sched_job *job; + + job = list_first_entry(&sched->ring_mirror_list, + struct drm_sched_job, node); + if (!dma_fence_is_signaled(&job->s_fence->finished)) + break; + + spin_lock_irqsave(&sched->job_list_lock, flags); + /* remove job from ring_mirror_list */ + list_del_init(&job->node); + spin_unlock_irqrestore(&sched->job_list_lock, flags); + + sched->ops->free_job(job); + } + + /* queue timeout for next job */ + spin_lock_irqsave(&sched->job_list_lock, flags); + drm_sched_start_timeout(sched); + spin_unlock_irqrestore(&sched->job_list_lock, flags); - schedule_work(&s_job->finish_work); } /** @@ -656,9 +684,10 @@ static int drm_sched_main(void *param) struct dma_fence *fence; wait_event_interruptible(sched->wake_up_worker, + (drm_sched_cleanup_jobs(sched), (!drm_sched_blocked(sched) && (entity = drm_sched_select_entity(sched))) || - kthread_should_stop()); + kthread_should_stop())); if (!entity) continue; diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index e740f3b..1a4abe7 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) /* block scheduler */ for (q = 0; q < V3D_MAX_QUEUES; q++) - drm_sched_stop(&v3d->queue[q].sched); + drm_sched_stop(&v3d->queue[q].sched, sched_job); if (sched_job) drm_sched_increase_karma(sched_job); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0daca4d..9ee0f27 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); * @sched: the scheduler instance on which this job is scheduled. * @s_fence: contains the fences for the scheduling of job. * @finish_cb: the callback for the finished fence. - * @finish_work: schedules the function @drm_sched_job_finish once the job has - * finished to remove the job from the - * @drm_gpu_scheduler.ring_mirror_list. * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. * @id: a unique id assigned to each job scheduled on the scheduler. * @karma: increment on every hang caused by this job. If this exceeds the hang @@ -188,7 +185,6 @@ struct drm_sched_job { struct drm_gpu_scheduler *sched; struct drm_sched_fence *s_fence; struct dma_fence_cb finish_cb; - struct work_struct finish_work; struct list_head node; uint64_t id; atomic_t karma; @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, void *owner); void drm_sched_job_cleanup(struct drm_sched_job *job); void drm_sched_wakeup(struct drm_gpu_scheduler *sched); -void drm_sched_stop(struct drm_gpu_scheduler *sched); +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); void drm_sched_increase_karma(struct drm_sched_job *bad); -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v5 3/6] drm/scheduler: rework job destruction 2019-04-18 15:00 ` [PATCH v5 3/6] drm/scheduler: rework job destruction Andrey Grodzovsky @ 2019-04-22 12:48 ` Chunming Zhou [not found] ` <9f7112b1-0348-b4f6-374d-e44c0d448112-5C7GfCeVMHo@public.gmane.org> 2019-05-29 10:02 ` Daniel Vetter 1 sibling, 1 reply; 31+ messages in thread From: Chunming Zhou @ 2019-04-22 12:48 UTC (permalink / raw) To: Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas, Koenig, Christian Hi Andrey, static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) { ... spin_lock_irqsave(&sched->job_list_lock, flags); /* remove job from ring_mirror_list */ list_del_init(&s_job->node); spin_unlock_irqrestore(&sched->job_list_lock, flags); [David] How about just remove above to worker from irq process? Any problem? Maybe I missed previous your discussion, but I think removing lock for list is a risk for future maintenance although you make sure thread safe currently. -David ... schedule_work(&s_job->finish_work); } 在 2019/4/18 23:00, Andrey Grodzovsky 写道: > From: Christian König <christian.koenig@amd.com> > > We now destroy finished jobs from the worker thread to make sure that > we never destroy a job currently in timeout processing. > By this we avoid holding lock around ring mirror list in drm_sched_stop > which should solve a deadlock reported by a user. > > v2: Remove unused variable. > v4: Move guilty job free into sched code. > v5: > Move sched->hw_rq_count to drm_sched_start to account for counter > decrement in drm_sched_stop even when we don't call resubmit jobs > if guily job did signal. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 > > Signed-off-by: Christian König <christian.koenig@amd.com> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- > drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- > drivers/gpu/drm/lima/lima_sched.c | 2 +- > drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 159 +++++++++++++++++------------ > drivers/gpu/drm/v3d/v3d_sched.c | 2 +- > include/drm/gpu_scheduler.h | 6 +- > 8 files changed, 102 insertions(+), 84 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 7cee269..a0e165c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if (!ring || !ring->sched.thread) > continue; > > - drm_sched_stop(&ring->sched); > + drm_sched_stop(&ring->sched, &job->base); > > /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ > amdgpu_fence_driver_force_completion(ring); > @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if(job) > drm_sched_increase_karma(&job->base); > > - > - > if (!amdgpu_sriov_vf(adev)) { > > if (!need_full_reset) > @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, > return r; > } > > -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, > - struct amdgpu_job *job) > +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) > { > int i; > > @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > > /* Post ASIC reset for all devs .*/ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); > + amdgpu_device_post_asic_reset(tmp_adev); > > if (r) { > /* bad news, how to tell it to userspace ? */ > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c > index 33854c9..5778d9c 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c > @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) > mmu_size + gpu->buffer.size; > > /* Add in the active command buffers */ > - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); > list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { > submit = to_etnaviv_submit(s_job); > file_size += submit->cmdbuf.size; > n_obj++; > } > - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); > > /* Add in the active buffer objects */ > list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { > @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) > gpu->buffer.size, > etnaviv_cmdbuf_get_va(&gpu->buffer)); > > - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); > list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { > submit = to_etnaviv_submit(s_job); > etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, > submit->cmdbuf.vaddr, submit->cmdbuf.size, > etnaviv_cmdbuf_get_va(&submit->cmdbuf)); > } > - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); > > /* Reserve space for the bomap */ > if (n_bomap_pages) { > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 6d24fea..a813c82 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) > } > > /* block scheduler */ > - drm_sched_stop(&gpu->sched); > + drm_sched_stop(&gpu->sched, sched_job); > > if(sched_job) > drm_sched_increase_karma(sched_job); > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c > index 97bd9c1..df98931 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job) > static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe, > struct lima_sched_task *task) > { > - drm_sched_stop(&pipe->base); > + drm_sched_stop(&pipe->base, &task->base); > > if (task) > drm_sched_increase_karma(&task->base); > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c > index 0a7ed04..c6336b7 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) > sched_job); > > for (i = 0; i < NUM_JOB_SLOTS; i++) > - drm_sched_stop(&pfdev->js->queue[i].sched); > + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); > > if (sched_job) > drm_sched_increase_karma(sched_job); > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 19fc601..7816de7 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, > } > EXPORT_SYMBOL(drm_sched_resume_timeout); > > -/* job_finish is called after hw fence signaled > - */ > -static void drm_sched_job_finish(struct work_struct *work) > -{ > - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, > - finish_work); > - struct drm_gpu_scheduler *sched = s_job->sched; > - unsigned long flags; > - > - /* > - * Canceling the timeout without removing our job from the ring mirror > - * list is safe, as we will only end up in this worker if our jobs > - * finished fence has been signaled. So even if some another worker > - * manages to find this job as the next job in the list, the fence > - * signaled check below will prevent the timeout to be restarted. > - */ > - cancel_delayed_work_sync(&sched->work_tdr); > - > - spin_lock_irqsave(&sched->job_list_lock, flags); > - /* queue TDR for next job */ > - drm_sched_start_timeout(sched); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - sched->ops->free_job(s_job); > -} > - > static void drm_sched_job_begin(struct drm_sched_job *s_job) > { > struct drm_gpu_scheduler *sched = s_job->sched; > @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work) > if (job) > job->sched->ops->timedout_job(job); > > + /* > + * Guilty job did complete and hence needs to be manually removed > + * See drm_sched_stop doc. > + */ > + if (list_empty(&job->node)) > + job->sched->ops->free_job(job); > + > spin_lock_irqsave(&sched->job_list_lock, flags); > drm_sched_start_timeout(sched); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); > * @sched: scheduler instance > * @bad: bad scheduler job > * > + * Stop the scheduler and also removes and frees all completed jobs. > + * Note: bad job will not be freed as it might be used later and so it's > + * callers responsibility to release it manually if it's not part of the > + * mirror list any more. > + * > */ > -void drm_sched_stop(struct drm_gpu_scheduler *sched) > +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > { > - struct drm_sched_job *s_job; > + struct drm_sched_job *s_job, *tmp; > unsigned long flags; > - struct dma_fence *last_fence = NULL; > > kthread_park(sched->thread); > > /* > - * Verify all the signaled jobs in mirror list are removed from the ring > - * by waiting for the latest job to enter the list. This should insure that > - * also all the previous jobs that were in flight also already singaled > - * and removed from the list. > + * Iterate the job list from later to earlier one and either deactive > + * their HW callbacks or remove them from mirror list if they already > + * signaled. > + * This iteration is thread safe as sched thread is stopped. > */ > - spin_lock_irqsave(&sched->job_list_lock, flags); > - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { > + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { > if (s_job->s_fence->parent && > dma_fence_remove_callback(s_job->s_fence->parent, > &s_job->cb)) { > @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) > s_job->s_fence->parent = NULL; > atomic_dec(&sched->hw_rq_count); > } else { > - last_fence = dma_fence_get(&s_job->s_fence->finished); > - break; > + /* > + * remove job from ring_mirror_list. > + * Locking here is for concurrent resume timeout > + */ > + spin_lock_irqsave(&sched->job_list_lock, flags); > + list_del_init(&s_job->node); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > + > + /* > + * Wait for job's HW fence callback to finish using s_job > + * before releasing it. > + * > + * Job is still alive so fence refcount at least 1 > + */ > + dma_fence_wait(&s_job->s_fence->finished, false); > + > + /* > + * We must keep bad job alive for later use during > + * recovery by some of the drivers > + */ > + if (bad != s_job) > + sched->ops->free_job(s_job); > } > } > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - if (last_fence) { > - dma_fence_wait(last_fence, false); > - dma_fence_put(last_fence); > - } > } > > EXPORT_SYMBOL(drm_sched_stop); > @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop); > void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) > { > struct drm_sched_job *s_job, *tmp; > + unsigned long flags; > int r; > > - if (!full_recovery) > - goto unpark; > - > /* > * Locking the list is not required here as the sched thread is parked > - * so no new jobs are being pushed in to HW and in drm_sched_stop we > - * flushed all the jobs who were still in mirror list but who already > - * signaled and removed them self from the list. Also concurrent > + * so no new jobs are being inserted or removed. Also concurrent > * GPU recovers can't run in parallel. > */ > list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { > struct dma_fence *fence = s_job->s_fence->parent; > > + atomic_inc(&sched->hw_rq_count); > + > + if (!full_recovery) > + continue; > + > if (fence) { > r = dma_fence_add_callback(fence, &s_job->cb, > drm_sched_process_job); > @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) > drm_sched_process_job(NULL, &s_job->cb); > } > > - drm_sched_start_timeout(sched); > + if (full_recovery) { > + spin_lock_irqsave(&sched->job_list_lock, flags); > + drm_sched_start_timeout(sched); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > + } > > -unpark: > kthread_unpark(sched->thread); > } > EXPORT_SYMBOL(drm_sched_start); > @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) > uint64_t guilty_context; > bool found_guilty = false; > > - /*TODO DO we need spinlock here ? */ > list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { > struct drm_sched_fence *s_fence = s_job->s_fence; > > @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) > dma_fence_set_error(&s_fence->finished, -ECANCELED); > > s_job->s_fence->parent = sched->ops->run_job(s_job); > - atomic_inc(&sched->hw_rq_count); > } > } > EXPORT_SYMBOL(drm_sched_resubmit_jobs); > @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job, > return -ENOMEM; > job->id = atomic64_inc_return(&sched->job_id_count); > > - INIT_WORK(&job->finish_work, drm_sched_job_finish); > INIT_LIST_HEAD(&job->node); > > return 0; > @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) > struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); > struct drm_sched_fence *s_fence = s_job->s_fence; > struct drm_gpu_scheduler *sched = s_fence->sched; > - unsigned long flags; > - > - cancel_delayed_work(&sched->work_tdr); > > atomic_dec(&sched->hw_rq_count); > atomic_dec(&sched->num_jobs); > > - spin_lock_irqsave(&sched->job_list_lock, flags); > - /* remove job from ring_mirror_list */ > - list_del_init(&s_job->node); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > + trace_drm_sched_process_job(s_fence); > > drm_sched_fence_finished(s_fence); > - > - trace_drm_sched_process_job(s_fence); > wake_up_interruptible(&sched->wake_up_worker); > +} > + > +/** > + * drm_sched_cleanup_jobs - destroy finished jobs > + * > + * @sched: scheduler instance > + * > + * Remove all finished jobs from the mirror list and destroy them. > + */ > +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) > +{ > + unsigned long flags; > + > + /* Don't destroy jobs while the timeout worker is running */ > + if (!cancel_delayed_work(&sched->work_tdr)) > + return; > + > + > + while (!list_empty(&sched->ring_mirror_list)) { > + struct drm_sched_job *job; > + > + job = list_first_entry(&sched->ring_mirror_list, > + struct drm_sched_job, node); > + if (!dma_fence_is_signaled(&job->s_fence->finished)) > + break; > + > + spin_lock_irqsave(&sched->job_list_lock, flags); > + /* remove job from ring_mirror_list */ > + list_del_init(&job->node); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > + > + sched->ops->free_job(job); > + } > + > + /* queue timeout for next job */ > + spin_lock_irqsave(&sched->job_list_lock, flags); > + drm_sched_start_timeout(sched); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > > - schedule_work(&s_job->finish_work); > } > > /** > @@ -656,9 +684,10 @@ static int drm_sched_main(void *param) > struct dma_fence *fence; > > wait_event_interruptible(sched->wake_up_worker, > + (drm_sched_cleanup_jobs(sched), > (!drm_sched_blocked(sched) && > (entity = drm_sched_select_entity(sched))) || > - kthread_should_stop()); > + kthread_should_stop())); > > if (!entity) > continue; > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c > index e740f3b..1a4abe7 100644 > --- a/drivers/gpu/drm/v3d/v3d_sched.c > +++ b/drivers/gpu/drm/v3d/v3d_sched.c > @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) > > /* block scheduler */ > for (q = 0; q < V3D_MAX_QUEUES; q++) > - drm_sched_stop(&v3d->queue[q].sched); > + drm_sched_stop(&v3d->queue[q].sched, sched_job); > > if (sched_job) > drm_sched_increase_karma(sched_job); > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 0daca4d..9ee0f27 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); > * @sched: the scheduler instance on which this job is scheduled. > * @s_fence: contains the fences for the scheduling of job. > * @finish_cb: the callback for the finished fence. > - * @finish_work: schedules the function @drm_sched_job_finish once the job has > - * finished to remove the job from the > - * @drm_gpu_scheduler.ring_mirror_list. > * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. > * @id: a unique id assigned to each job scheduled on the scheduler. > * @karma: increment on every hang caused by this job. If this exceeds the hang > @@ -188,7 +185,6 @@ struct drm_sched_job { > struct drm_gpu_scheduler *sched; > struct drm_sched_fence *s_fence; > struct dma_fence_cb finish_cb; > - struct work_struct finish_work; > struct list_head node; > uint64_t id; > atomic_t karma; > @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, > void *owner); > void drm_sched_job_cleanup(struct drm_sched_job *job); > void drm_sched_wakeup(struct drm_gpu_scheduler *sched); > -void drm_sched_stop(struct drm_gpu_scheduler *sched); > +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); > void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); > void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); > void drm_sched_increase_karma(struct drm_sched_job *bad); _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <9f7112b1-0348-b4f6-374d-e44c0d448112-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH v5 3/6] drm/scheduler: rework job destruction [not found] ` <9f7112b1-0348-b4f6-374d-e44c0d448112-5C7GfCeVMHo@public.gmane.org> @ 2019-04-23 14:26 ` Grodzovsky, Andrey 2019-04-23 14:44 ` Zhou, David(ChunMing) 0 siblings, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 14:26 UTC (permalink / raw) To: Zhou, David(ChunMing), dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Koenig, Christian On 4/22/19 8:48 AM, Chunming Zhou wrote: > Hi Andrey, > > static void drm_sched_process_job(struct dma_fence *f, struct > dma_fence_cb *cb) > { > ... > spin_lock_irqsave(&sched->job_list_lock, flags); > /* remove job from ring_mirror_list */ > list_del_init(&s_job->node); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > [David] How about just remove above to worker from irq process? Any > problem? Maybe I missed previous your discussion, but I think removing > lock for list is a risk for future maintenance although you make sure > thread safe currently. > > -David We remove the lock exactly because of the fact that insertion and removal to/from the list will be done form exactly one thread at ant time now. So I am not sure I understand what you mean. Andrey > > ... > > schedule_work(&s_job->finish_work); > } > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> From: Christian König <christian.koenig@amd.com> >> >> We now destroy finished jobs from the worker thread to make sure that >> we never destroy a job currently in timeout processing. >> By this we avoid holding lock around ring mirror list in drm_sched_stop >> which should solve a deadlock reported by a user. >> >> v2: Remove unused variable. >> v4: Move guilty job free into sched code. >> v5: >> Move sched->hw_rq_count to drm_sched_start to account for counter >> decrement in drm_sched_stop even when we don't call resubmit jobs >> if guily job did signal. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- >> drivers/gpu/drm/lima/lima_sched.c | 2 +- >> drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- >> drivers/gpu/drm/scheduler/sched_main.c | 159 +++++++++++++++++------------ >> drivers/gpu/drm/v3d/v3d_sched.c | 2 +- >> include/drm/gpu_scheduler.h | 6 +- >> 8 files changed, 102 insertions(+), 84 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 7cee269..a0e165c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched); >> + drm_sched_stop(&ring->sched, &job->base); >> >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> - >> - >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> - struct amdgpu_job *job) >> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> { >> int i; >> >> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); >> + amdgpu_device_post_asic_reset(tmp_adev); >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> index 33854c9..5778d9c 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> mmu_size + gpu->buffer.size; >> >> /* Add in the active command buffers */ >> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> file_size += submit->cmdbuf.size; >> n_obj++; >> } >> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Add in the active buffer objects */ >> list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { >> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> gpu->buffer.size, >> etnaviv_cmdbuf_get_va(&gpu->buffer)); >> >> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, >> submit->cmdbuf.vaddr, submit->cmdbuf.size, >> etnaviv_cmdbuf_get_va(&submit->cmdbuf)); >> } >> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Reserve space for the bomap */ >> if (n_bomap_pages) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 6d24fea..a813c82 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) >> } >> >> /* block scheduler */ >> - drm_sched_stop(&gpu->sched); >> + drm_sched_stop(&gpu->sched, sched_job); >> >> if(sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c >> index 97bd9c1..df98931 100644 >> --- a/drivers/gpu/drm/lima/lima_sched.c >> +++ b/drivers/gpu/drm/lima/lima_sched.c >> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job) >> static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe, >> struct lima_sched_task *task) >> { >> - drm_sched_stop(&pipe->base); >> + drm_sched_stop(&pipe->base, &task->base); >> >> if (task) >> drm_sched_increase_karma(&task->base); >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c >> index 0a7ed04..c6336b7 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c >> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) >> sched_job); >> >> for (i = 0; i < NUM_JOB_SLOTS; i++) >> - drm_sched_stop(&pfdev->js->queue[i].sched); >> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); >> >> if (sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 19fc601..7816de7 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, >> } >> EXPORT_SYMBOL(drm_sched_resume_timeout); >> >> -/* job_finish is called after hw fence signaled >> - */ >> -static void drm_sched_job_finish(struct work_struct *work) >> -{ >> - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, >> - finish_work); >> - struct drm_gpu_scheduler *sched = s_job->sched; >> - unsigned long flags; >> - >> - /* >> - * Canceling the timeout without removing our job from the ring mirror >> - * list is safe, as we will only end up in this worker if our jobs >> - * finished fence has been signaled. So even if some another worker >> - * manages to find this job as the next job in the list, the fence >> - * signaled check below will prevent the timeout to be restarted. >> - */ >> - cancel_delayed_work_sync(&sched->work_tdr); >> - >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - /* queue TDR for next job */ >> - drm_sched_start_timeout(sched); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - sched->ops->free_job(s_job); >> -} >> - >> static void drm_sched_job_begin(struct drm_sched_job *s_job) >> { >> struct drm_gpu_scheduler *sched = s_job->sched; >> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work) >> if (job) >> job->sched->ops->timedout_job(job); >> >> + /* >> + * Guilty job did complete and hence needs to be manually removed >> + * See drm_sched_stop doc. >> + */ >> + if (list_empty(&job->node)) >> + job->sched->ops->free_job(job); >> + >> spin_lock_irqsave(&sched->job_list_lock, flags); >> drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); >> * @sched: scheduler instance >> * @bad: bad scheduler job >> * >> + * Stop the scheduler and also removes and frees all completed jobs. >> + * Note: bad job will not be freed as it might be used later and so it's >> + * callers responsibility to release it manually if it's not part of the >> + * mirror list any more. >> + * >> */ >> -void drm_sched_stop(struct drm_gpu_scheduler *sched) >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> { >> - struct drm_sched_job *s_job; >> + struct drm_sched_job *s_job, *tmp; >> unsigned long flags; >> - struct dma_fence *last_fence = NULL; >> >> kthread_park(sched->thread); >> >> /* >> - * Verify all the signaled jobs in mirror list are removed from the ring >> - * by waiting for the latest job to enter the list. This should insure that >> - * also all the previous jobs that were in flight also already singaled >> - * and removed from the list. >> + * Iterate the job list from later to earlier one and either deactive >> + * their HW callbacks or remove them from mirror list if they already >> + * signaled. >> + * This iteration is thread safe as sched thread is stopped. >> */ >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { >> + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >> &s_job->cb)) { >> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) >> s_job->s_fence->parent = NULL; >> atomic_dec(&sched->hw_rq_count); >> } else { >> - last_fence = dma_fence_get(&s_job->s_fence->finished); >> - break; >> + /* >> + * remove job from ring_mirror_list. >> + * Locking here is for concurrent resume timeout >> + */ >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + list_del_init(&s_job->node); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + >> + /* >> + * Wait for job's HW fence callback to finish using s_job >> + * before releasing it. >> + * >> + * Job is still alive so fence refcount at least 1 >> + */ >> + dma_fence_wait(&s_job->s_fence->finished, false); >> + >> + /* >> + * We must keep bad job alive for later use during >> + * recovery by some of the drivers >> + */ >> + if (bad != s_job) >> + sched->ops->free_job(s_job); >> } >> } >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - if (last_fence) { >> - dma_fence_wait(last_fence, false); >> - dma_fence_put(last_fence); >> - } >> } >> >> EXPORT_SYMBOL(drm_sched_stop); >> @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop); >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> { >> struct drm_sched_job *s_job, *tmp; >> + unsigned long flags; >> int r; >> >> - if (!full_recovery) >> - goto unpark; >> - >> /* >> * Locking the list is not required here as the sched thread is parked >> - * so no new jobs are being pushed in to HW and in drm_sched_stop we >> - * flushed all the jobs who were still in mirror list but who already >> - * signaled and removed them self from the list. Also concurrent >> + * so no new jobs are being inserted or removed. Also concurrent >> * GPU recovers can't run in parallel. >> */ >> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { >> struct dma_fence *fence = s_job->s_fence->parent; >> >> + atomic_inc(&sched->hw_rq_count); >> + >> + if (!full_recovery) >> + continue; >> + >> if (fence) { >> r = dma_fence_add_callback(fence, &s_job->cb, >> drm_sched_process_job); >> @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> drm_sched_process_job(NULL, &s_job->cb); >> } >> >> - drm_sched_start_timeout(sched); >> + if (full_recovery) { >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + drm_sched_start_timeout(sched); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + } >> >> -unpark: >> kthread_unpark(sched->thread); >> } >> EXPORT_SYMBOL(drm_sched_start); >> @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> uint64_t guilty_context; >> bool found_guilty = false; >> >> - /*TODO DO we need spinlock here ? */ >> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { >> struct drm_sched_fence *s_fence = s_job->s_fence; >> >> @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> >> s_job->s_fence->parent = sched->ops->run_job(s_job); >> - atomic_inc(&sched->hw_rq_count); >> } >> } >> EXPORT_SYMBOL(drm_sched_resubmit_jobs); >> @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job, >> return -ENOMEM; >> job->id = atomic64_inc_return(&sched->job_id_count); >> >> - INIT_WORK(&job->finish_work, drm_sched_job_finish); >> INIT_LIST_HEAD(&job->node); >> >> return 0; >> @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >> struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); >> struct drm_sched_fence *s_fence = s_job->s_fence; >> struct drm_gpu_scheduler *sched = s_fence->sched; >> - unsigned long flags; >> - >> - cancel_delayed_work(&sched->work_tdr); >> >> atomic_dec(&sched->hw_rq_count); >> atomic_dec(&sched->num_jobs); >> >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - /* remove job from ring_mirror_list */ >> - list_del_init(&s_job->node); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + trace_drm_sched_process_job(s_fence); >> >> drm_sched_fence_finished(s_fence); >> - >> - trace_drm_sched_process_job(s_fence); >> wake_up_interruptible(&sched->wake_up_worker); >> +} >> + >> +/** >> + * drm_sched_cleanup_jobs - destroy finished jobs >> + * >> + * @sched: scheduler instance >> + * >> + * Remove all finished jobs from the mirror list and destroy them. >> + */ >> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >> +{ >> + unsigned long flags; >> + >> + /* Don't destroy jobs while the timeout worker is running */ >> + if (!cancel_delayed_work(&sched->work_tdr)) >> + return; >> + >> + >> + while (!list_empty(&sched->ring_mirror_list)) { >> + struct drm_sched_job *job; >> + >> + job = list_first_entry(&sched->ring_mirror_list, >> + struct drm_sched_job, node); >> + if (!dma_fence_is_signaled(&job->s_fence->finished)) >> + break; >> + >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + /* remove job from ring_mirror_list */ >> + list_del_init(&job->node); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + >> + sched->ops->free_job(job); >> + } >> + >> + /* queue timeout for next job */ >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + drm_sched_start_timeout(sched); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> >> - schedule_work(&s_job->finish_work); >> } >> >> /** >> @@ -656,9 +684,10 @@ static int drm_sched_main(void *param) >> struct dma_fence *fence; >> >> wait_event_interruptible(sched->wake_up_worker, >> + (drm_sched_cleanup_jobs(sched), >> (!drm_sched_blocked(sched) && >> (entity = drm_sched_select_entity(sched))) || >> - kthread_should_stop()); >> + kthread_should_stop())); >> >> if (!entity) >> continue; >> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c >> index e740f3b..1a4abe7 100644 >> --- a/drivers/gpu/drm/v3d/v3d_sched.c >> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) >> >> /* block scheduler */ >> for (q = 0; q < V3D_MAX_QUEUES; q++) >> - drm_sched_stop(&v3d->queue[q].sched); >> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >> >> if (sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index 0daca4d..9ee0f27 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); >> * @sched: the scheduler instance on which this job is scheduled. >> * @s_fence: contains the fences for the scheduling of job. >> * @finish_cb: the callback for the finished fence. >> - * @finish_work: schedules the function @drm_sched_job_finish once the job has >> - * finished to remove the job from the >> - * @drm_gpu_scheduler.ring_mirror_list. >> * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. >> * @id: a unique id assigned to each job scheduled on the scheduler. >> * @karma: increment on every hang caused by this job. If this exceeds the hang >> @@ -188,7 +185,6 @@ struct drm_sched_job { >> struct drm_gpu_scheduler *sched; >> struct drm_sched_fence *s_fence; >> struct dma_fence_cb finish_cb; >> - struct work_struct finish_work; >> struct list_head node; >> uint64_t id; >> atomic_t karma; >> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, >> void *owner); >> void drm_sched_job_cleanup(struct drm_sched_job *job); >> void drm_sched_wakeup(struct drm_gpu_scheduler *sched); >> -void drm_sched_stop(struct drm_gpu_scheduler *sched); >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); >> void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); >> void drm_sched_increase_karma(struct drm_sched_job *bad); > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re:[PATCH v5 3/6] drm/scheduler: rework job destruction 2019-04-23 14:26 ` Grodzovsky, Andrey @ 2019-04-23 14:44 ` Zhou, David(ChunMing) 2019-04-23 15:01 ` [PATCH " Grodzovsky, Andrey 0 siblings, 1 reply; 31+ messages in thread From: Zhou, David(ChunMing) @ 2019-04-23 14:44 UTC (permalink / raw) To: Grodzovsky, Andrey, Zhou, David(ChunMing), dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Koenig, Christian [-- Attachment #1.1: Type: text/plain, Size: 20067 bytes --] This patch is to fix deadlock between fence->lock and sched->job_list_lock, right? So I suggest to just move list_del_init(&s_job->node) from drm_sched_process_job to work thread. That will avoid deadlock described in the link. -------- Original Message -------- Subject: Re: [PATCH v5 3/6] drm/scheduler: rework job destruction From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com CC: "Kazlauskas, Nicholas" ,"Koenig, Christian" On 4/22/19 8:48 AM, Chunming Zhou wrote: > Hi Andrey, > > static void drm_sched_process_job(struct dma_fence *f, struct > dma_fence_cb *cb) > { > ... > spin_lock_irqsave(&sched->job_list_lock, flags); > /* remove job from ring_mirror_list */ > list_del_init(&s_job->node); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > [David] How about just remove above to worker from irq process? Any > problem? Maybe I missed previous your discussion, but I think removing > lock for list is a risk for future maintenance although you make sure > thread safe currently. > > -David We remove the lock exactly because of the fact that insertion and removal to/from the list will be done form exactly one thread at ant time now. So I am not sure I understand what you mean. Andrey > > ... > > schedule_work(&s_job->finish_work); > } > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> From: Christian König <christian.koenig@amd.com> >> >> We now destroy finished jobs from the worker thread to make sure that >> we never destroy a job currently in timeout processing. >> By this we avoid holding lock around ring mirror list in drm_sched_stop >> which should solve a deadlock reported by a user. >> >> v2: Remove unused variable. >> v4: Move guilty job free into sched code. >> v5: >> Move sched->hw_rq_count to drm_sched_start to account for counter >> decrement in drm_sched_stop even when we don't call resubmit jobs >> if guily job did signal. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- >> drivers/gpu/drm/lima/lima_sched.c | 2 +- >> drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- >> drivers/gpu/drm/scheduler/sched_main.c | 159 +++++++++++++++++------------ >> drivers/gpu/drm/v3d/v3d_sched.c | 2 +- >> include/drm/gpu_scheduler.h | 6 +- >> 8 files changed, 102 insertions(+), 84 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 7cee269..a0e165c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched); >> + drm_sched_stop(&ring->sched, &job->base); >> >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> - >> - >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> - struct amdgpu_job *job) >> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> { >> int i; >> >> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); >> + amdgpu_device_post_asic_reset(tmp_adev); >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> index 33854c9..5778d9c 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> mmu_size + gpu->buffer.size; >> >> /* Add in the active command buffers */ >> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> file_size += submit->cmdbuf.size; >> n_obj++; >> } >> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Add in the active buffer objects */ >> list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { >> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> gpu->buffer.size, >> etnaviv_cmdbuf_get_va(&gpu->buffer)); >> >> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, >> submit->cmdbuf.vaddr, submit->cmdbuf.size, >> etnaviv_cmdbuf_get_va(&submit->cmdbuf)); >> } >> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Reserve space for the bomap */ >> if (n_bomap_pages) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 6d24fea..a813c82 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) >> } >> >> /* block scheduler */ >> - drm_sched_stop(&gpu->sched); >> + drm_sched_stop(&gpu->sched, sched_job); >> >> if(sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c >> index 97bd9c1..df98931 100644 >> --- a/drivers/gpu/drm/lima/lima_sched.c >> +++ b/drivers/gpu/drm/lima/lima_sched.c >> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job) >> static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe, >> struct lima_sched_task *task) >> { >> - drm_sched_stop(&pipe->base); >> + drm_sched_stop(&pipe->base, &task->base); >> >> if (task) >> drm_sched_increase_karma(&task->base); >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c >> index 0a7ed04..c6336b7 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c >> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) >> sched_job); >> >> for (i = 0; i < NUM_JOB_SLOTS; i++) >> - drm_sched_stop(&pfdev->js->queue[i].sched); >> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); >> >> if (sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 19fc601..7816de7 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, >> } >> EXPORT_SYMBOL(drm_sched_resume_timeout); >> >> -/* job_finish is called after hw fence signaled >> - */ >> -static void drm_sched_job_finish(struct work_struct *work) >> -{ >> - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, >> - finish_work); >> - struct drm_gpu_scheduler *sched = s_job->sched; >> - unsigned long flags; >> - >> - /* >> - * Canceling the timeout without removing our job from the ring mirror >> - * list is safe, as we will only end up in this worker if our jobs >> - * finished fence has been signaled. So even if some another worker >> - * manages to find this job as the next job in the list, the fence >> - * signaled check below will prevent the timeout to be restarted. >> - */ >> - cancel_delayed_work_sync(&sched->work_tdr); >> - >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - /* queue TDR for next job */ >> - drm_sched_start_timeout(sched); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - sched->ops->free_job(s_job); >> -} >> - >> static void drm_sched_job_begin(struct drm_sched_job *s_job) >> { >> struct drm_gpu_scheduler *sched = s_job->sched; >> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work) >> if (job) >> job->sched->ops->timedout_job(job); >> >> + /* >> + * Guilty job did complete and hence needs to be manually removed >> + * See drm_sched_stop doc. >> + */ >> + if (list_empty(&job->node)) >> + job->sched->ops->free_job(job); >> + >> spin_lock_irqsave(&sched->job_list_lock, flags); >> drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); >> * @sched: scheduler instance >> * @bad: bad scheduler job >> * >> + * Stop the scheduler and also removes and frees all completed jobs. >> + * Note: bad job will not be freed as it might be used later and so it's >> + * callers responsibility to release it manually if it's not part of the >> + * mirror list any more. >> + * >> */ >> -void drm_sched_stop(struct drm_gpu_scheduler *sched) >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> { >> - struct drm_sched_job *s_job; >> + struct drm_sched_job *s_job, *tmp; >> unsigned long flags; >> - struct dma_fence *last_fence = NULL; >> >> kthread_park(sched->thread); >> >> /* >> - * Verify all the signaled jobs in mirror list are removed from the ring >> - * by waiting for the latest job to enter the list. This should insure that >> - * also all the previous jobs that were in flight also already singaled >> - * and removed from the list. >> + * Iterate the job list from later to earlier one and either deactive >> + * their HW callbacks or remove them from mirror list if they already >> + * signaled. >> + * This iteration is thread safe as sched thread is stopped. >> */ >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { >> + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >> &s_job->cb)) { >> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) >> s_job->s_fence->parent = NULL; >> atomic_dec(&sched->hw_rq_count); >> } else { >> - last_fence = dma_fence_get(&s_job->s_fence->finished); >> - break; >> + /* >> + * remove job from ring_mirror_list. >> + * Locking here is for concurrent resume timeout >> + */ >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + list_del_init(&s_job->node); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + >> + /* >> + * Wait for job's HW fence callback to finish using s_job >> + * before releasing it. >> + * >> + * Job is still alive so fence refcount at least 1 >> + */ >> + dma_fence_wait(&s_job->s_fence->finished, false); >> + >> + /* >> + * We must keep bad job alive for later use during >> + * recovery by some of the drivers >> + */ >> + if (bad != s_job) >> + sched->ops->free_job(s_job); >> } >> } >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - if (last_fence) { >> - dma_fence_wait(last_fence, false); >> - dma_fence_put(last_fence); >> - } >> } >> >> EXPORT_SYMBOL(drm_sched_stop); >> @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop); >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> { >> struct drm_sched_job *s_job, *tmp; >> + unsigned long flags; >> int r; >> >> - if (!full_recovery) >> - goto unpark; >> - >> /* >> * Locking the list is not required here as the sched thread is parked >> - * so no new jobs are being pushed in to HW and in drm_sched_stop we >> - * flushed all the jobs who were still in mirror list but who already >> - * signaled and removed them self from the list. Also concurrent >> + * so no new jobs are being inserted or removed. Also concurrent >> * GPU recovers can't run in parallel. >> */ >> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { >> struct dma_fence *fence = s_job->s_fence->parent; >> >> + atomic_inc(&sched->hw_rq_count); >> + >> + if (!full_recovery) >> + continue; >> + >> if (fence) { >> r = dma_fence_add_callback(fence, &s_job->cb, >> drm_sched_process_job); >> @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> drm_sched_process_job(NULL, &s_job->cb); >> } >> >> - drm_sched_start_timeout(sched); >> + if (full_recovery) { >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + drm_sched_start_timeout(sched); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + } >> >> -unpark: >> kthread_unpark(sched->thread); >> } >> EXPORT_SYMBOL(drm_sched_start); >> @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> uint64_t guilty_context; >> bool found_guilty = false; >> >> - /*TODO DO we need spinlock here ? */ >> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { >> struct drm_sched_fence *s_fence = s_job->s_fence; >> >> @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> >> s_job->s_fence->parent = sched->ops->run_job(s_job); >> - atomic_inc(&sched->hw_rq_count); >> } >> } >> EXPORT_SYMBOL(drm_sched_resubmit_jobs); >> @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job, >> return -ENOMEM; >> job->id = atomic64_inc_return(&sched->job_id_count); >> >> - INIT_WORK(&job->finish_work, drm_sched_job_finish); >> INIT_LIST_HEAD(&job->node); >> >> return 0; >> @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >> struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); >> struct drm_sched_fence *s_fence = s_job->s_fence; >> struct drm_gpu_scheduler *sched = s_fence->sched; >> - unsigned long flags; >> - >> - cancel_delayed_work(&sched->work_tdr); >> >> atomic_dec(&sched->hw_rq_count); >> atomic_dec(&sched->num_jobs); >> >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - /* remove job from ring_mirror_list */ >> - list_del_init(&s_job->node); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + trace_drm_sched_process_job(s_fence); >> >> drm_sched_fence_finished(s_fence); >> - >> - trace_drm_sched_process_job(s_fence); >> wake_up_interruptible(&sched->wake_up_worker); >> +} >> + >> +/** >> + * drm_sched_cleanup_jobs - destroy finished jobs >> + * >> + * @sched: scheduler instance >> + * >> + * Remove all finished jobs from the mirror list and destroy them. >> + */ >> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >> +{ >> + unsigned long flags; >> + >> + /* Don't destroy jobs while the timeout worker is running */ >> + if (!cancel_delayed_work(&sched->work_tdr)) >> + return; >> + >> + >> + while (!list_empty(&sched->ring_mirror_list)) { >> + struct drm_sched_job *job; >> + >> + job = list_first_entry(&sched->ring_mirror_list, >> + struct drm_sched_job, node); >> + if (!dma_fence_is_signaled(&job->s_fence->finished)) >> + break; >> + >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + /* remove job from ring_mirror_list */ >> + list_del_init(&job->node); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + >> + sched->ops->free_job(job); >> + } >> + >> + /* queue timeout for next job */ >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + drm_sched_start_timeout(sched); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> >> - schedule_work(&s_job->finish_work); >> } >> >> /** >> @@ -656,9 +684,10 @@ static int drm_sched_main(void *param) >> struct dma_fence *fence; >> >> wait_event_interruptible(sched->wake_up_worker, >> + (drm_sched_cleanup_jobs(sched), >> (!drm_sched_blocked(sched) && >> (entity = drm_sched_select_entity(sched))) || >> - kthread_should_stop()); >> + kthread_should_stop())); >> >> if (!entity) >> continue; >> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c >> index e740f3b..1a4abe7 100644 >> --- a/drivers/gpu/drm/v3d/v3d_sched.c >> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) >> >> /* block scheduler */ >> for (q = 0; q < V3D_MAX_QUEUES; q++) >> - drm_sched_stop(&v3d->queue[q].sched); >> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >> >> if (sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index 0daca4d..9ee0f27 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); >> * @sched: the scheduler instance on which this job is scheduled. >> * @s_fence: contains the fences for the scheduling of job. >> * @finish_cb: the callback for the finished fence. >> - * @finish_work: schedules the function @drm_sched_job_finish once the job has >> - * finished to remove the job from the >> - * @drm_gpu_scheduler.ring_mirror_list. >> * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. >> * @id: a unique id assigned to each job scheduled on the scheduler. >> * @karma: increment on every hang caused by this job. If this exceeds the hang >> @@ -188,7 +185,6 @@ struct drm_sched_job { >> struct drm_gpu_scheduler *sched; >> struct drm_sched_fence *s_fence; >> struct dma_fence_cb finish_cb; >> - struct work_struct finish_work; >> struct list_head node; >> uint64_t id; >> atomic_t karma; >> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, >> void *owner); >> void drm_sched_job_cleanup(struct drm_sched_job *job); >> void drm_sched_wakeup(struct drm_gpu_scheduler *sched); >> -void drm_sched_stop(struct drm_gpu_scheduler *sched); >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); >> void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); >> void drm_sched_increase_karma(struct drm_sched_job *bad); > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 31021 bytes --] [-- Attachment #2: Type: text/plain, Size: 153 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 3/6] drm/scheduler: rework job destruction 2019-04-23 14:44 ` Zhou, David(ChunMing) @ 2019-04-23 15:01 ` Grodzovsky, Andrey 0 siblings, 0 replies; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 15:01 UTC (permalink / raw) To: Zhou, David(ChunMing), dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas, Koenig, Christian [-- Attachment #1.1: Type: text/plain, Size: 21052 bytes --] On 4/23/19 10:44 AM, Zhou, David(ChunMing) wrote: This patch is to fix deadlock between fence->lock and sched->job_list_lock, right? So I suggest to just move list_del_init(&s_job->node) from drm_sched_process_job to work thread. That will avoid deadlock described in the link. Do you mean restoring back scheduling work from HW fence interrupt handler and deleting there ? Yes, I suggested this as an option (take a look at my comment 9 in https://bugs.freedesktop.org/show_bug.cgi?id=109692) but since we still have to wait for all fences in flight to signal to avoid the problem fixed in '3741540 drm/sched: Rework HW fence processing.' this thing becomes somewhat complicated and so Christian came up with the core idea in this patch which is to do all deletions/insertions thread safe by grantee it's always dome from one thread. It does simplify the handling. Andrey -------- Original Message -------- Subject: Re: [PATCH v5 3/6] drm/scheduler: rework job destruction From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> CC: "Kazlauskas, Nicholas" ,"Koenig, Christian" On 4/22/19 8:48 AM, Chunming Zhou wrote: > Hi Andrey, > > static void drm_sched_process_job(struct dma_fence *f, struct > dma_fence_cb *cb) > { > ... > spin_lock_irqsave(&sched->job_list_lock, flags); > /* remove job from ring_mirror_list */ > list_del_init(&s_job->node); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > [David] How about just remove above to worker from irq process? Any > problem? Maybe I missed previous your discussion, but I think removing > lock for list is a risk for future maintenance although you make sure > thread safe currently. > > -David We remove the lock exactly because of the fact that insertion and removal to/from the list will be done form exactly one thread at ant time now. So I am not sure I understand what you mean. Andrey > > ... > > schedule_work(&s_job->finish_work); > } > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> From: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> >> >> We now destroy finished jobs from the worker thread to make sure that >> we never destroy a job currently in timeout processing. >> By this we avoid holding lock around ring mirror list in drm_sched_stop >> which should solve a deadlock reported by a user. >> >> v2: Remove unused variable. >> v4: Move guilty job free into sched code. >> v5: >> Move sched->hw_rq_count to drm_sched_start to account for counter >> decrement in drm_sched_stop even when we don't call resubmit jobs >> if guily job did signal. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >> >> Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- >> drivers/gpu/drm/lima/lima_sched.c | 2 +- >> drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- >> drivers/gpu/drm/scheduler/sched_main.c | 159 +++++++++++++++++------------ >> drivers/gpu/drm/v3d/v3d_sched.c | 2 +- >> include/drm/gpu_scheduler.h | 6 +- >> 8 files changed, 102 insertions(+), 84 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 7cee269..a0e165c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched); >> + drm_sched_stop(&ring->sched, &job->base); >> >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> - >> - >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> - struct amdgpu_job *job) >> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> { >> int i; >> >> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); >> + amdgpu_device_post_asic_reset(tmp_adev); >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> index 33854c9..5778d9c 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> mmu_size + gpu->buffer.size; >> >> /* Add in the active command buffers */ >> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> file_size += submit->cmdbuf.size; >> n_obj++; >> } >> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Add in the active buffer objects */ >> list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { >> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> gpu->buffer.size, >> etnaviv_cmdbuf_get_va(&gpu->buffer)); >> >> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, >> submit->cmdbuf.vaddr, submit->cmdbuf.size, >> etnaviv_cmdbuf_get_va(&submit->cmdbuf)); >> } >> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Reserve space for the bomap */ >> if (n_bomap_pages) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 6d24fea..a813c82 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) >> } >> >> /* block scheduler */ >> - drm_sched_stop(&gpu->sched); >> + drm_sched_stop(&gpu->sched, sched_job); >> >> if(sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c >> index 97bd9c1..df98931 100644 >> --- a/drivers/gpu/drm/lima/lima_sched.c >> +++ b/drivers/gpu/drm/lima/lima_sched.c >> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job) >> static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe, >> struct lima_sched_task *task) >> { >> - drm_sched_stop(&pipe->base); >> + drm_sched_stop(&pipe->base, &task->base); >> >> if (task) >> drm_sched_increase_karma(&task->base); >> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c >> index 0a7ed04..c6336b7 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_job.c >> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c >> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) >> sched_job); >> >> for (i = 0; i < NUM_JOB_SLOTS; i++) >> - drm_sched_stop(&pfdev->js->queue[i].sched); >> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); >> >> if (sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 19fc601..7816de7 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, >> } >> EXPORT_SYMBOL(drm_sched_resume_timeout); >> >> -/* job_finish is called after hw fence signaled >> - */ >> -static void drm_sched_job_finish(struct work_struct *work) >> -{ >> - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, >> - finish_work); >> - struct drm_gpu_scheduler *sched = s_job->sched; >> - unsigned long flags; >> - >> - /* >> - * Canceling the timeout without removing our job from the ring mirror >> - * list is safe, as we will only end up in this worker if our jobs >> - * finished fence has been signaled. So even if some another worker >> - * manages to find this job as the next job in the list, the fence >> - * signaled check below will prevent the timeout to be restarted. >> - */ >> - cancel_delayed_work_sync(&sched->work_tdr); >> - >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - /* queue TDR for next job */ >> - drm_sched_start_timeout(sched); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - sched->ops->free_job(s_job); >> -} >> - >> static void drm_sched_job_begin(struct drm_sched_job *s_job) >> { >> struct drm_gpu_scheduler *sched = s_job->sched; >> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work) >> if (job) >> job->sched->ops->timedout_job(job); >> >> + /* >> + * Guilty job did complete and hence needs to be manually removed >> + * See drm_sched_stop doc. >> + */ >> + if (list_empty(&job->node)) >> + job->sched->ops->free_job(job); >> + >> spin_lock_irqsave(&sched->job_list_lock, flags); >> drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); >> * @sched: scheduler instance >> * @bad: bad scheduler job >> * >> + * Stop the scheduler and also removes and frees all completed jobs. >> + * Note: bad job will not be freed as it might be used later and so it's >> + * callers responsibility to release it manually if it's not part of the >> + * mirror list any more. >> + * >> */ >> -void drm_sched_stop(struct drm_gpu_scheduler *sched) >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> { >> - struct drm_sched_job *s_job; >> + struct drm_sched_job *s_job, *tmp; >> unsigned long flags; >> - struct dma_fence *last_fence = NULL; >> >> kthread_park(sched->thread); >> >> /* >> - * Verify all the signaled jobs in mirror list are removed from the ring >> - * by waiting for the latest job to enter the list. This should insure that >> - * also all the previous jobs that were in flight also already singaled >> - * and removed from the list. >> + * Iterate the job list from later to earlier one and either deactive >> + * their HW callbacks or remove them from mirror list if they already >> + * signaled. >> + * This iteration is thread safe as sched thread is stopped. >> */ >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { >> + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >> &s_job->cb)) { >> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) >> s_job->s_fence->parent = NULL; >> atomic_dec(&sched->hw_rq_count); >> } else { >> - last_fence = dma_fence_get(&s_job->s_fence->finished); >> - break; >> + /* >> + * remove job from ring_mirror_list. >> + * Locking here is for concurrent resume timeout >> + */ >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + list_del_init(&s_job->node); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + >> + /* >> + * Wait for job's HW fence callback to finish using s_job >> + * before releasing it. >> + * >> + * Job is still alive so fence refcount at least 1 >> + */ >> + dma_fence_wait(&s_job->s_fence->finished, false); >> + >> + /* >> + * We must keep bad job alive for later use during >> + * recovery by some of the drivers >> + */ >> + if (bad != s_job) >> + sched->ops->free_job(s_job); >> } >> } >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - if (last_fence) { >> - dma_fence_wait(last_fence, false); >> - dma_fence_put(last_fence); >> - } >> } >> >> EXPORT_SYMBOL(drm_sched_stop); >> @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop); >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> { >> struct drm_sched_job *s_job, *tmp; >> + unsigned long flags; >> int r; >> >> - if (!full_recovery) >> - goto unpark; >> - >> /* >> * Locking the list is not required here as the sched thread is parked >> - * so no new jobs are being pushed in to HW and in drm_sched_stop we >> - * flushed all the jobs who were still in mirror list but who already >> - * signaled and removed them self from the list. Also concurrent >> + * so no new jobs are being inserted or removed. Also concurrent >> * GPU recovers can't run in parallel. >> */ >> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { >> struct dma_fence *fence = s_job->s_fence->parent; >> >> + atomic_inc(&sched->hw_rq_count); >> + >> + if (!full_recovery) >> + continue; >> + >> if (fence) { >> r = dma_fence_add_callback(fence, &s_job->cb, >> drm_sched_process_job); >> @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> drm_sched_process_job(NULL, &s_job->cb); >> } >> >> - drm_sched_start_timeout(sched); >> + if (full_recovery) { >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + drm_sched_start_timeout(sched); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + } >> >> -unpark: >> kthread_unpark(sched->thread); >> } >> EXPORT_SYMBOL(drm_sched_start); >> @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> uint64_t guilty_context; >> bool found_guilty = false; >> >> - /*TODO DO we need spinlock here ? */ >> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { >> struct drm_sched_fence *s_fence = s_job->s_fence; >> >> @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> >> s_job->s_fence->parent = sched->ops->run_job(s_job); >> - atomic_inc(&sched->hw_rq_count); >> } >> } >> EXPORT_SYMBOL(drm_sched_resubmit_jobs); >> @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job, >> return -ENOMEM; >> job->id = atomic64_inc_return(&sched->job_id_count); >> >> - INIT_WORK(&job->finish_work, drm_sched_job_finish); >> INIT_LIST_HEAD(&job->node); >> >> return 0; >> @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >> struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); >> struct drm_sched_fence *s_fence = s_job->s_fence; >> struct drm_gpu_scheduler *sched = s_fence->sched; >> - unsigned long flags; >> - >> - cancel_delayed_work(&sched->work_tdr); >> >> atomic_dec(&sched->hw_rq_count); >> atomic_dec(&sched->num_jobs); >> >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - /* remove job from ring_mirror_list */ >> - list_del_init(&s_job->node); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + trace_drm_sched_process_job(s_fence); >> >> drm_sched_fence_finished(s_fence); >> - >> - trace_drm_sched_process_job(s_fence); >> wake_up_interruptible(&sched->wake_up_worker); >> +} >> + >> +/** >> + * drm_sched_cleanup_jobs - destroy finished jobs >> + * >> + * @sched: scheduler instance >> + * >> + * Remove all finished jobs from the mirror list and destroy them. >> + */ >> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >> +{ >> + unsigned long flags; >> + >> + /* Don't destroy jobs while the timeout worker is running */ >> + if (!cancel_delayed_work(&sched->work_tdr)) >> + return; >> + >> + >> + while (!list_empty(&sched->ring_mirror_list)) { >> + struct drm_sched_job *job; >> + >> + job = list_first_entry(&sched->ring_mirror_list, >> + struct drm_sched_job, node); >> + if (!dma_fence_is_signaled(&job->s_fence->finished)) >> + break; >> + >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + /* remove job from ring_mirror_list */ >> + list_del_init(&job->node); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> + >> + sched->ops->free_job(job); >> + } >> + >> + /* queue timeout for next job */ >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> + drm_sched_start_timeout(sched); >> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >> >> - schedule_work(&s_job->finish_work); >> } >> >> /** >> @@ -656,9 +684,10 @@ static int drm_sched_main(void *param) >> struct dma_fence *fence; >> >> wait_event_interruptible(sched->wake_up_worker, >> + (drm_sched_cleanup_jobs(sched), >> (!drm_sched_blocked(sched) && >> (entity = drm_sched_select_entity(sched))) || >> - kthread_should_stop()); >> + kthread_should_stop())); >> >> if (!entity) >> continue; >> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c >> index e740f3b..1a4abe7 100644 >> --- a/drivers/gpu/drm/v3d/v3d_sched.c >> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) >> >> /* block scheduler */ >> for (q = 0; q < V3D_MAX_QUEUES; q++) >> - drm_sched_stop(&v3d->queue[q].sched); >> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >> >> if (sched_job) >> drm_sched_increase_karma(sched_job); >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index 0daca4d..9ee0f27 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); >> * @sched: the scheduler instance on which this job is scheduled. >> * @s_fence: contains the fences for the scheduling of job. >> * @finish_cb: the callback for the finished fence. >> - * @finish_work: schedules the function @drm_sched_job_finish once the job has >> - * finished to remove the job from the >> - * @drm_gpu_scheduler.ring_mirror_list. >> * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. >> * @id: a unique id assigned to each job scheduled on the scheduler. >> * @karma: increment on every hang caused by this job. If this exceeds the hang >> @@ -188,7 +185,6 @@ struct drm_sched_job { >> struct drm_gpu_scheduler *sched; >> struct drm_sched_fence *s_fence; >> struct dma_fence_cb finish_cb; >> - struct work_struct finish_work; >> struct list_head node; >> uint64_t id; >> atomic_t karma; >> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, >> void *owner); >> void drm_sched_job_cleanup(struct drm_sched_job *job); >> void drm_sched_wakeup(struct drm_gpu_scheduler *sched); >> -void drm_sched_stop(struct drm_gpu_scheduler *sched); >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); >> void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); >> void drm_sched_increase_karma(struct drm_sched_job *bad); > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 33333 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 3/6] drm/scheduler: rework job destruction 2019-04-18 15:00 ` [PATCH v5 3/6] drm/scheduler: rework job destruction Andrey Grodzovsky 2019-04-22 12:48 ` Chunming Zhou @ 2019-05-29 10:02 ` Daniel Vetter 1 sibling, 0 replies; 31+ messages in thread From: Daniel Vetter @ 2019-05-29 10:02 UTC (permalink / raw) To: Andrey Grodzovsky Cc: Christian König, The etnaviv authors, amd-gfx list, Christian König, dri-devel, Kazlauskas, Nicholas On Thu, Apr 18, 2019 at 5:00 PM Andrey Grodzovsky <andrey.grodzovsky@amd.com> wrote: > > From: Christian König <christian.koenig@amd.com> > > We now destroy finished jobs from the worker thread to make sure that > we never destroy a job currently in timeout processing. > By this we avoid holding lock around ring mirror list in drm_sched_stop > which should solve a deadlock reported by a user. > > v2: Remove unused variable. > v4: Move guilty job free into sched code. > v5: > Move sched->hw_rq_count to drm_sched_start to account for counter > decrement in drm_sched_stop even when we don't call resubmit jobs > if guily job did signal. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 > > Signed-off-by: Christian König <christian.koenig@amd.com> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> $ make htmldocs ./drivers/gpu/drm/scheduler/sched_main.c:365: warning: Function parameter or member 'bad' not described in 'drm_sched_stop' Please fix, thanks. -Daniel > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- > drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- > drivers/gpu/drm/lima/lima_sched.c | 2 +- > drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 159 +++++++++++++++++------------ > drivers/gpu/drm/v3d/v3d_sched.c | 2 +- > include/drm/gpu_scheduler.h | 6 +- > 8 files changed, 102 insertions(+), 84 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 7cee269..a0e165c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if (!ring || !ring->sched.thread) > continue; > > - drm_sched_stop(&ring->sched); > + drm_sched_stop(&ring->sched, &job->base); > > /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ > amdgpu_fence_driver_force_completion(ring); > @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if(job) > drm_sched_increase_karma(&job->base); > > - > - > if (!amdgpu_sriov_vf(adev)) { > > if (!need_full_reset) > @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, > return r; > } > > -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, > - struct amdgpu_job *job) > +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) > { > int i; > > @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > > /* Post ASIC reset for all devs .*/ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL); > + amdgpu_device_post_asic_reset(tmp_adev); > > if (r) { > /* bad news, how to tell it to userspace ? */ > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c > index 33854c9..5778d9c 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c > @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) > mmu_size + gpu->buffer.size; > > /* Add in the active command buffers */ > - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); > list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { > submit = to_etnaviv_submit(s_job); > file_size += submit->cmdbuf.size; > n_obj++; > } > - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); > > /* Add in the active buffer objects */ > list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { > @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) > gpu->buffer.size, > etnaviv_cmdbuf_get_va(&gpu->buffer)); > > - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); > list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { > submit = to_etnaviv_submit(s_job); > etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, > submit->cmdbuf.vaddr, submit->cmdbuf.size, > etnaviv_cmdbuf_get_va(&submit->cmdbuf)); > } > - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); > > /* Reserve space for the bomap */ > if (n_bomap_pages) { > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 6d24fea..a813c82 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) > } > > /* block scheduler */ > - drm_sched_stop(&gpu->sched); > + drm_sched_stop(&gpu->sched, sched_job); > > if(sched_job) > drm_sched_increase_karma(sched_job); > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c > index 97bd9c1..df98931 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job) > static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe, > struct lima_sched_task *task) > { > - drm_sched_stop(&pipe->base); > + drm_sched_stop(&pipe->base, &task->base); > > if (task) > drm_sched_increase_karma(&task->base); > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c > index 0a7ed04..c6336b7 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) > sched_job); > > for (i = 0; i < NUM_JOB_SLOTS; i++) > - drm_sched_stop(&pfdev->js->queue[i].sched); > + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); > > if (sched_job) > drm_sched_increase_karma(sched_job); > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 19fc601..7816de7 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched, > } > EXPORT_SYMBOL(drm_sched_resume_timeout); > > -/* job_finish is called after hw fence signaled > - */ > -static void drm_sched_job_finish(struct work_struct *work) > -{ > - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job, > - finish_work); > - struct drm_gpu_scheduler *sched = s_job->sched; > - unsigned long flags; > - > - /* > - * Canceling the timeout without removing our job from the ring mirror > - * list is safe, as we will only end up in this worker if our jobs > - * finished fence has been signaled. So even if some another worker > - * manages to find this job as the next job in the list, the fence > - * signaled check below will prevent the timeout to be restarted. > - */ > - cancel_delayed_work_sync(&sched->work_tdr); > - > - spin_lock_irqsave(&sched->job_list_lock, flags); > - /* queue TDR for next job */ > - drm_sched_start_timeout(sched); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - sched->ops->free_job(s_job); > -} > - > static void drm_sched_job_begin(struct drm_sched_job *s_job) > { > struct drm_gpu_scheduler *sched = s_job->sched; > @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work) > if (job) > job->sched->ops->timedout_job(job); > > + /* > + * Guilty job did complete and hence needs to be manually removed > + * See drm_sched_stop doc. > + */ > + if (list_empty(&job->node)) > + job->sched->ops->free_job(job); > + > spin_lock_irqsave(&sched->job_list_lock, flags); > drm_sched_start_timeout(sched); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma); > * @sched: scheduler instance > * @bad: bad scheduler job > * > + * Stop the scheduler and also removes and frees all completed jobs. > + * Note: bad job will not be freed as it might be used later and so it's > + * callers responsibility to release it manually if it's not part of the > + * mirror list any more. > + * > */ > -void drm_sched_stop(struct drm_gpu_scheduler *sched) > +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > { > - struct drm_sched_job *s_job; > + struct drm_sched_job *s_job, *tmp; > unsigned long flags; > - struct dma_fence *last_fence = NULL; > > kthread_park(sched->thread); > > /* > - * Verify all the signaled jobs in mirror list are removed from the ring > - * by waiting for the latest job to enter the list. This should insure that > - * also all the previous jobs that were in flight also already singaled > - * and removed from the list. > + * Iterate the job list from later to earlier one and either deactive > + * their HW callbacks or remove them from mirror list if they already > + * signaled. > + * This iteration is thread safe as sched thread is stopped. > */ > - spin_lock_irqsave(&sched->job_list_lock, flags); > - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) { > + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) { > if (s_job->s_fence->parent && > dma_fence_remove_callback(s_job->s_fence->parent, > &s_job->cb)) { > @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched) > s_job->s_fence->parent = NULL; > atomic_dec(&sched->hw_rq_count); > } else { > - last_fence = dma_fence_get(&s_job->s_fence->finished); > - break; > + /* > + * remove job from ring_mirror_list. > + * Locking here is for concurrent resume timeout > + */ > + spin_lock_irqsave(&sched->job_list_lock, flags); > + list_del_init(&s_job->node); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > + > + /* > + * Wait for job's HW fence callback to finish using s_job > + * before releasing it. > + * > + * Job is still alive so fence refcount at least 1 > + */ > + dma_fence_wait(&s_job->s_fence->finished, false); > + > + /* > + * We must keep bad job alive for later use during > + * recovery by some of the drivers > + */ > + if (bad != s_job) > + sched->ops->free_job(s_job); > } > } > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - if (last_fence) { > - dma_fence_wait(last_fence, false); > - dma_fence_put(last_fence); > - } > } > > EXPORT_SYMBOL(drm_sched_stop); > @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop); > void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) > { > struct drm_sched_job *s_job, *tmp; > + unsigned long flags; > int r; > > - if (!full_recovery) > - goto unpark; > - > /* > * Locking the list is not required here as the sched thread is parked > - * so no new jobs are being pushed in to HW and in drm_sched_stop we > - * flushed all the jobs who were still in mirror list but who already > - * signaled and removed them self from the list. Also concurrent > + * so no new jobs are being inserted or removed. Also concurrent > * GPU recovers can't run in parallel. > */ > list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { > struct dma_fence *fence = s_job->s_fence->parent; > > + atomic_inc(&sched->hw_rq_count); > + > + if (!full_recovery) > + continue; > + > if (fence) { > r = dma_fence_add_callback(fence, &s_job->cb, > drm_sched_process_job); > @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) > drm_sched_process_job(NULL, &s_job->cb); > } > > - drm_sched_start_timeout(sched); > + if (full_recovery) { > + spin_lock_irqsave(&sched->job_list_lock, flags); > + drm_sched_start_timeout(sched); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > + } > > -unpark: > kthread_unpark(sched->thread); > } > EXPORT_SYMBOL(drm_sched_start); > @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) > uint64_t guilty_context; > bool found_guilty = false; > > - /*TODO DO we need spinlock here ? */ > list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { > struct drm_sched_fence *s_fence = s_job->s_fence; > > @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) > dma_fence_set_error(&s_fence->finished, -ECANCELED); > > s_job->s_fence->parent = sched->ops->run_job(s_job); > - atomic_inc(&sched->hw_rq_count); > } > } > EXPORT_SYMBOL(drm_sched_resubmit_jobs); > @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job, > return -ENOMEM; > job->id = atomic64_inc_return(&sched->job_id_count); > > - INIT_WORK(&job->finish_work, drm_sched_job_finish); > INIT_LIST_HEAD(&job->node); > > return 0; > @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) > struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb); > struct drm_sched_fence *s_fence = s_job->s_fence; > struct drm_gpu_scheduler *sched = s_fence->sched; > - unsigned long flags; > - > - cancel_delayed_work(&sched->work_tdr); > > atomic_dec(&sched->hw_rq_count); > atomic_dec(&sched->num_jobs); > > - spin_lock_irqsave(&sched->job_list_lock, flags); > - /* remove job from ring_mirror_list */ > - list_del_init(&s_job->node); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > + trace_drm_sched_process_job(s_fence); > > drm_sched_fence_finished(s_fence); > - > - trace_drm_sched_process_job(s_fence); > wake_up_interruptible(&sched->wake_up_worker); > +} > + > +/** > + * drm_sched_cleanup_jobs - destroy finished jobs > + * > + * @sched: scheduler instance > + * > + * Remove all finished jobs from the mirror list and destroy them. > + */ > +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) > +{ > + unsigned long flags; > + > + /* Don't destroy jobs while the timeout worker is running */ > + if (!cancel_delayed_work(&sched->work_tdr)) > + return; > + > + > + while (!list_empty(&sched->ring_mirror_list)) { > + struct drm_sched_job *job; > + > + job = list_first_entry(&sched->ring_mirror_list, > + struct drm_sched_job, node); > + if (!dma_fence_is_signaled(&job->s_fence->finished)) > + break; > + > + spin_lock_irqsave(&sched->job_list_lock, flags); > + /* remove job from ring_mirror_list */ > + list_del_init(&job->node); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > + > + sched->ops->free_job(job); > + } > + > + /* queue timeout for next job */ > + spin_lock_irqsave(&sched->job_list_lock, flags); > + drm_sched_start_timeout(sched); > + spin_unlock_irqrestore(&sched->job_list_lock, flags); > > - schedule_work(&s_job->finish_work); > } > > /** > @@ -656,9 +684,10 @@ static int drm_sched_main(void *param) > struct dma_fence *fence; > > wait_event_interruptible(sched->wake_up_worker, > + (drm_sched_cleanup_jobs(sched), > (!drm_sched_blocked(sched) && > (entity = drm_sched_select_entity(sched))) || > - kthread_should_stop()); > + kthread_should_stop())); > > if (!entity) > continue; > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c > index e740f3b..1a4abe7 100644 > --- a/drivers/gpu/drm/v3d/v3d_sched.c > +++ b/drivers/gpu/drm/v3d/v3d_sched.c > @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) > > /* block scheduler */ > for (q = 0; q < V3D_MAX_QUEUES; q++) > - drm_sched_stop(&v3d->queue[q].sched); > + drm_sched_stop(&v3d->queue[q].sched, sched_job); > > if (sched_job) > drm_sched_increase_karma(sched_job); > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 0daca4d..9ee0f27 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); > * @sched: the scheduler instance on which this job is scheduled. > * @s_fence: contains the fences for the scheduling of job. > * @finish_cb: the callback for the finished fence. > - * @finish_work: schedules the function @drm_sched_job_finish once the job has > - * finished to remove the job from the > - * @drm_gpu_scheduler.ring_mirror_list. > * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list. > * @id: a unique id assigned to each job scheduled on the scheduler. > * @karma: increment on every hang caused by this job. If this exceeds the hang > @@ -188,7 +185,6 @@ struct drm_sched_job { > struct drm_gpu_scheduler *sched; > struct drm_sched_fence *s_fence; > struct dma_fence_cb finish_cb; > - struct work_struct finish_work; > struct list_head node; > uint64_t id; > atomic_t karma; > @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job, > void *owner); > void drm_sched_job_cleanup(struct drm_sched_job *job); > void drm_sched_wakeup(struct drm_gpu_scheduler *sched); > -void drm_sched_stop(struct drm_gpu_scheduler *sched); > +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); > void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); > void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); > void drm_sched_increase_karma(struct drm_sched_job *bad); > -- > 2.7.4 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-18 15:00 ` [PATCH v5 2/6] drm/amd/display: Use a reasonable timeout for framebuffer fence waits Andrey Grodzovsky 2019-04-18 15:00 ` [PATCH v5 3/6] drm/scheduler: rework job destruction Andrey Grodzovsky @ 2019-04-18 15:00 ` Andrey Grodzovsky [not found] ` <1555599624-12285-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-18 15:00 ` [PATCH v5 5/6] drm/scheduler: Add flag to hint the release of guilty job Andrey Grodzovsky ` (2 subsequent siblings) 5 siblings, 1 reply; 31+ messages in thread From: Andrey Grodzovsky @ 2019-04-18 15:00 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo For later driver's reference to see if the fence is signaled. v2: Move parent fence put to resubmit jobs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 7816de7..03e6bd8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, &s_job->cb)) { - dma_fence_put(s_job->s_fence->parent); - s_job->s_fence->parent = NULL; atomic_dec(&sched->hw_rq_count); } else { /* @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) sched->ops->free_job(s_job); } } + + /* + * Stop pending timer in flight as we rearm it in drm_sched_start. This + * avoids the pending timeout work in progress to fire right away after + * this TDR finished and before the newly restarted jobs had a + * chance to complete. + */ + cancel_delayed_work(&sched->work_tdr); } EXPORT_SYMBOL(drm_sched_stop); @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) dma_fence_set_error(&s_fence->finished, -ECANCELED); + dma_fence_put(s_job->s_fence->parent); s_job->s_fence->parent = sched->ops->run_job(s_job); } } -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 31+ messages in thread
[parent not found: <1555599624-12285-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer [not found] ` <1555599624-12285-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> @ 2019-04-22 12:59 ` Chunming Zhou 2019-04-23 15:14 ` Grodzovsky, Andrey 0 siblings, 1 reply; 31+ messages in thread From: Chunming Zhou @ 2019-04-22 12:59 UTC (permalink / raw) To: Grodzovsky, Andrey, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Liu, Monk +Monk to response this patch. 在 2019/4/18 23:00, Andrey Grodzovsky 写道: > For later driver's reference to see if the fence is signaled. > > v2: Move parent fence put to resubmit jobs. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 7816de7..03e6bd8 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > if (s_job->s_fence->parent && > dma_fence_remove_callback(s_job->s_fence->parent, > &s_job->cb)) { > - dma_fence_put(s_job->s_fence->parent); > - s_job->s_fence->parent = NULL; I vaguely remember Monk set parent to be NULL to avoiod potiential free problem after callback removal. -David > atomic_dec(&sched->hw_rq_count); > } else { > /* > @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > sched->ops->free_job(s_job); > } > } > + > + /* > + * Stop pending timer in flight as we rearm it in drm_sched_start. This > + * avoids the pending timeout work in progress to fire right away after > + * this TDR finished and before the newly restarted jobs had a > + * chance to complete. > + */ > + cancel_delayed_work(&sched->work_tdr); > } > > EXPORT_SYMBOL(drm_sched_stop); > @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) > if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) > dma_fence_set_error(&s_fence->finished, -ECANCELED); > > + dma_fence_put(s_job->s_fence->parent); > s_job->s_fence->parent = sched->ops->run_job(s_job); > } > } _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer 2019-04-22 12:59 ` Chunming Zhou @ 2019-04-23 15:14 ` Grodzovsky, Andrey 0 siblings, 0 replies; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 15:14 UTC (permalink / raw) To: Zhou, David(ChunMing), dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas, Liu, Monk On 4/22/19 8:59 AM, Zhou, David(ChunMing) wrote: > +Monk to response this patch. > > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> For later driver's reference to see if the fence is signaled. >> >> v2: Move parent fence put to resubmit jobs. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Reviewed-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 7816de7..03e6bd8 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >> &s_job->cb)) { >> - dma_fence_put(s_job->s_fence->parent); >> - s_job->s_fence->parent = NULL; > I vaguely remember Monk set parent to be NULL to avoiod potiential free > problem after callback removal. > > > -David I see, we have to avoid setting it to NULL here as in case the guilty job does signal and we avoid HW reset we are not going to resubmit the jobs and hence stay with the same parent on reattachment of the cb. So I need to know exactly what scenario this set to NULL fixes. Andrey > > >> atomic_dec(&sched->hw_rq_count); >> } else { >> /* >> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> sched->ops->free_job(s_job); >> } >> } >> + >> + /* >> + * Stop pending timer in flight as we rearm it in drm_sched_start. This >> + * avoids the pending timeout work in progress to fire right away after >> + * this TDR finished and before the newly restarted jobs had a >> + * chance to complete. >> + */ >> + cancel_delayed_work(&sched->work_tdr); >> } >> >> EXPORT_SYMBOL(drm_sched_stop); >> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> >> + dma_fence_put(s_job->s_fence->parent); >> s_job->s_fence->parent = sched->ops->run_job(s_job); >> } >> } _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v5 5/6] drm/scheduler: Add flag to hint the release of guilty job. [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> ` (2 preceding siblings ...) 2019-04-18 15:00 ` [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky @ 2019-04-18 15:00 ` Andrey Grodzovsky 2019-04-18 15:00 ` [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky 2019-04-23 2:35 ` [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock Dieter Nützel 5 siblings, 0 replies; 31+ messages in thread From: Andrey Grodzovsky @ 2019-04-18 15:00 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo Problem: Sched thread's cleanup function races against TO handler and removes the guilty job from mirror list and we have no way of differentiating if the job was removed from within the TO handler or from the sched thread's clean-up function. Fix: Add a flag to scheduler to hint the TO handler that the guilty job needs to be explicitly released. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/scheduler/sched_main.c | 9 +++++++-- include/drm/gpu_scheduler.h | 2 ++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 03e6bd8..f8f0e1c 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -293,8 +293,10 @@ static void drm_sched_job_timedout(struct work_struct *work) * Guilty job did complete and hence needs to be manually removed * See drm_sched_stop doc. */ - if (list_empty(&job->node)) + if (sched->free_guilty) { job->sched->ops->free_job(job); + sched->free_guilty = false; + } spin_lock_irqsave(&sched->job_list_lock, flags); drm_sched_start_timeout(sched); @@ -395,10 +397,13 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) /* * We must keep bad job alive for later use during - * recovery by some of the drivers + * recovery by some of the drivers but leave a hint + * that the guilty job must be released. */ if (bad != s_job) sched->ops->free_job(s_job); + else + sched->free_guilty = true; } } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 9ee0f27..fc0b421 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -259,6 +259,7 @@ struct drm_sched_backend_ops { * guilty and it will be considered for scheduling further. * @num_jobs: the number of jobs in queue in the scheduler * @ready: marks if the underlying HW is ready to work + * @free_guilty: A hit to time out handler to free the guilty job. * * One scheduler is implemented for each hardware ring. */ @@ -279,6 +280,7 @@ struct drm_gpu_scheduler { int hang_limit; atomic_t num_jobs; bool ready; + bool free_guilty; }; int drm_sched_init(struct drm_gpu_scheduler *sched, -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> ` (3 preceding siblings ...) 2019-04-18 15:00 ` [PATCH v5 5/6] drm/scheduler: Add flag to hint the release of guilty job Andrey Grodzovsky @ 2019-04-18 15:00 ` Andrey Grodzovsky [not found] ` <1555599624-12285-6-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-23 2:35 ` [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock Dieter Nützel 5 siblings, 1 reply; 31+ messages in thread From: Andrey Grodzovsky @ 2019-04-18 15:00 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Andrey Grodzovsky, Nicholas.Kazlauskas-5C7GfCeVMHo Also reject TDRs if another one already running. v2: Stop all schedulers across device and entire XGMI hive before force signaling HW fences. Avoid passing job_signaled to helper fnctions to keep all the decision making about skipping HW reset in one place. v3: Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced against it's decrement in drm_sched_stop in non HW reset case. v4: rebase v5: Revert v3 as we do it now in sceduler code. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- 1 file changed, 95 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a0e165c..85f8792 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if (!ring || !ring->sched.thread) continue; - drm_sched_stop(&ring->sched, &job->base); - /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ amdgpu_fence_driver_force_completion(ring); } @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, if(job) drm_sched_increase_karma(&job->base); + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ if (!amdgpu_sriov_vf(adev)) { if (!need_full_reset) @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, return r; } -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) { - int i; - - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { - struct amdgpu_ring *ring = adev->rings[i]; - - if (!ring || !ring->sched.thread) - continue; - - if (!adev->asic_reset_res) - drm_sched_resubmit_jobs(&ring->sched); + if (trylock) { + if (!mutex_trylock(&adev->lock_reset)) + return false; + } else + mutex_lock(&adev->lock_reset); - drm_sched_start(&ring->sched, !adev->asic_reset_res); - } - - if (!amdgpu_device_has_dc_support(adev)) { - drm_helper_resume_force_mode(adev->ddev); - } - - adev->asic_reset_res = 0; -} - -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) -{ - mutex_lock(&adev->lock_reset); atomic_inc(&adev->gpu_reset_counter); adev->in_gpu_reset = 1; /* Block kfd: SRIOV would do it separately */ if (!amdgpu_sriov_vf(adev)) amdgpu_amdkfd_pre_reset(adev); + + return true; } static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) { - int r; + struct list_head device_list, *device_list_handle = NULL; + bool need_full_reset, job_signaled; struct amdgpu_hive_info *hive = NULL; - bool need_full_reset = false; struct amdgpu_device *tmp_adev = NULL; - struct list_head device_list, *device_list_handle = NULL; + int i, r = 0; + need_full_reset = job_signaled = false; INIT_LIST_HEAD(&device_list); dev_info(adev->dev, "GPU reset begin!\n"); + hive = amdgpu_get_xgmi_hive(adev, false); + /* - * In case of XGMI hive disallow concurrent resets to be triggered - * by different nodes. No point also since the one node already executing - * reset will also reset all the other nodes in the hive. + * Here we trylock to avoid chain of resets executing from + * either trigger by jobs on different adevs in XGMI hive or jobs on + * different schedulers for same device while this TO handler is running. + * We always reset all schedulers for device and all devices for XGMI + * hive so that should take care of them too. */ - hive = amdgpu_get_xgmi_hive(adev, 0); - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && - !mutex_trylock(&hive->reset_lock)) + + if (hive && !mutex_trylock(&hive->reset_lock)) { + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", + job->base.id, hive->hive_id); return 0; + } /* Start with adev pre asic reset first for soft reset check.*/ - amdgpu_device_lock_adev(adev); - r = amdgpu_device_pre_asic_reset(adev, - job, - &need_full_reset); - if (r) { - /*TODO Should we stop ?*/ - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", - r, adev->ddev->unique); - adev->asic_reset_res = r; + if (!amdgpu_device_lock_adev(adev, !hive)) { + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", + job->base.id); + return 0; } /* Build list of devices to reset */ - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { + if (adev->gmc.xgmi.num_physical_nodes > 1) { if (!hive) { amdgpu_device_unlock_adev(adev); return -ENODEV; @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, device_list_handle = &device_list; } + /* block all schedulers and reset given job's ring */ + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = tmp_adev->rings[i]; + + if (!ring || !ring->sched.thread) + continue; + + drm_sched_stop(&ring->sched, &job->base); + } + } + + + /* + * Must check guilty signal here since after this point all old + * HW fences are force signaled. + * + * job->base holds a reference to parent fence + */ + if (job && job->base.s_fence->parent && + dma_fence_is_signaled(job->base.s_fence->parent)) + job_signaled = true; + + if (!amdgpu_device_ip_need_full_reset(adev)) + device_list_handle = &device_list; + + if (job_signaled) { + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); + goto skip_hw_reset; + } + + + /* Guilty job will be freed after this*/ + r = amdgpu_device_pre_asic_reset(adev, + job, + &need_full_reset); + if (r) { + /*TODO Should we stop ?*/ + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", + r, adev->ddev->unique); + adev->asic_reset_res = r; + } + retry: /* Rest of adevs pre asic reset from XGMI hive. */ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { if (tmp_adev == adev) continue; - amdgpu_device_lock_adev(tmp_adev); + amdgpu_device_lock_adev(tmp_adev, false); r = amdgpu_device_pre_asic_reset(tmp_adev, NULL, &need_full_reset); @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, goto retry; } +skip_hw_reset: + /* Post ASIC reset for all devs .*/ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { - amdgpu_device_post_asic_reset(tmp_adev); + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = tmp_adev->rings[i]; + + if (!ring || !ring->sched.thread) + continue; + + /* No point to resubmit jobs if we didn't HW reset*/ + if (!tmp_adev->asic_reset_res && !job_signaled) + drm_sched_resubmit_jobs(&ring->sched); + + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); + } + + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { + drm_helper_resume_force_mode(tmp_adev->ddev); + } + + tmp_adev->asic_reset_res = 0; if (r) { /* bad news, how to tell it to userspace ? */ @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, amdgpu_device_unlock_adev(tmp_adev); } - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) + if (hive) mutex_unlock(&hive->reset_lock); if (r) -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 31+ messages in thread
[parent not found: <1555599624-12285-6-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <1555599624-12285-6-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> @ 2019-04-22 11:54 ` Grodzovsky, Andrey 2019-04-23 12:32 ` Koenig, Christian 2019-04-22 13:09 ` Chunming Zhou 1 sibling, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-22 11:54 UTC (permalink / raw) To: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Koenig, Christian Ping for patches 3, new patch 5 and patch 6. Andrey On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: > Also reject TDRs if another one already running. > > v2: > Stop all schedulers across device and entire XGMI hive before > force signaling HW fences. > Avoid passing job_signaled to helper fnctions to keep all the decision > making about skipping HW reset in one place. > > v3: > Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced > against it's decrement in drm_sched_stop in non HW reset case. > v4: rebase > v5: Revert v3 as we do it now in sceduler code. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- > 1 file changed, 95 insertions(+), 48 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index a0e165c..85f8792 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if (!ring || !ring->sched.thread) > continue; > > - drm_sched_stop(&ring->sched, &job->base); > - > /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ > amdgpu_fence_driver_force_completion(ring); > } > @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if(job) > drm_sched_increase_karma(&job->base); > > + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ > if (!amdgpu_sriov_vf(adev)) { > > if (!need_full_reset) > @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, > return r; > } > > -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) > +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) > { > - int i; > - > - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > - struct amdgpu_ring *ring = adev->rings[i]; > - > - if (!ring || !ring->sched.thread) > - continue; > - > - if (!adev->asic_reset_res) > - drm_sched_resubmit_jobs(&ring->sched); > + if (trylock) { > + if (!mutex_trylock(&adev->lock_reset)) > + return false; > + } else > + mutex_lock(&adev->lock_reset); > > - drm_sched_start(&ring->sched, !adev->asic_reset_res); > - } > - > - if (!amdgpu_device_has_dc_support(adev)) { > - drm_helper_resume_force_mode(adev->ddev); > - } > - > - adev->asic_reset_res = 0; > -} > - > -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) > -{ > - mutex_lock(&adev->lock_reset); > atomic_inc(&adev->gpu_reset_counter); > adev->in_gpu_reset = 1; > /* Block kfd: SRIOV would do it separately */ > if (!amdgpu_sriov_vf(adev)) > amdgpu_amdkfd_pre_reset(adev); > + > + return true; > } > > static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) > @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) > int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > struct amdgpu_job *job) > { > - int r; > + struct list_head device_list, *device_list_handle = NULL; > + bool need_full_reset, job_signaled; > struct amdgpu_hive_info *hive = NULL; > - bool need_full_reset = false; > struct amdgpu_device *tmp_adev = NULL; > - struct list_head device_list, *device_list_handle = NULL; > + int i, r = 0; > > + need_full_reset = job_signaled = false; > INIT_LIST_HEAD(&device_list); > > dev_info(adev->dev, "GPU reset begin!\n"); > > + hive = amdgpu_get_xgmi_hive(adev, false); > + > /* > - * In case of XGMI hive disallow concurrent resets to be triggered > - * by different nodes. No point also since the one node already executing > - * reset will also reset all the other nodes in the hive. > + * Here we trylock to avoid chain of resets executing from > + * either trigger by jobs on different adevs in XGMI hive or jobs on > + * different schedulers for same device while this TO handler is running. > + * We always reset all schedulers for device and all devices for XGMI > + * hive so that should take care of them too. > */ > - hive = amdgpu_get_xgmi_hive(adev, 0); > - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && > - !mutex_trylock(&hive->reset_lock)) > + > + if (hive && !mutex_trylock(&hive->reset_lock)) { > + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", > + job->base.id, hive->hive_id); > return 0; > + } > > /* Start with adev pre asic reset first for soft reset check.*/ > - amdgpu_device_lock_adev(adev); > - r = amdgpu_device_pre_asic_reset(adev, > - job, > - &need_full_reset); > - if (r) { > - /*TODO Should we stop ?*/ > - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", > - r, adev->ddev->unique); > - adev->asic_reset_res = r; > + if (!amdgpu_device_lock_adev(adev, !hive)) { > + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", > + job->base.id); > + return 0; > } > > /* Build list of devices to reset */ > - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { > + if (adev->gmc.xgmi.num_physical_nodes > 1) { > if (!hive) { > amdgpu_device_unlock_adev(adev); > return -ENODEV; > @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > device_list_handle = &device_list; > } > > + /* block all schedulers and reset given job's ring */ > + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > + struct amdgpu_ring *ring = tmp_adev->rings[i]; > + > + if (!ring || !ring->sched.thread) > + continue; > + > + drm_sched_stop(&ring->sched, &job->base); > + } > + } > + > + > + /* > + * Must check guilty signal here since after this point all old > + * HW fences are force signaled. > + * > + * job->base holds a reference to parent fence > + */ > + if (job && job->base.s_fence->parent && > + dma_fence_is_signaled(job->base.s_fence->parent)) > + job_signaled = true; > + > + if (!amdgpu_device_ip_need_full_reset(adev)) > + device_list_handle = &device_list; > + > + if (job_signaled) { > + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); > + goto skip_hw_reset; > + } > + > + > + /* Guilty job will be freed after this*/ > + r = amdgpu_device_pre_asic_reset(adev, > + job, > + &need_full_reset); > + if (r) { > + /*TODO Should we stop ?*/ > + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", > + r, adev->ddev->unique); > + adev->asic_reset_res = r; > + } > + > retry: /* Rest of adevs pre asic reset from XGMI hive. */ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > > if (tmp_adev == adev) > continue; > > - amdgpu_device_lock_adev(tmp_adev); > + amdgpu_device_lock_adev(tmp_adev, false); > r = amdgpu_device_pre_asic_reset(tmp_adev, > NULL, > &need_full_reset); > @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > goto retry; > } > > +skip_hw_reset: > + > /* Post ASIC reset for all devs .*/ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > - amdgpu_device_post_asic_reset(tmp_adev); > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > + struct amdgpu_ring *ring = tmp_adev->rings[i]; > + > + if (!ring || !ring->sched.thread) > + continue; > + > + /* No point to resubmit jobs if we didn't HW reset*/ > + if (!tmp_adev->asic_reset_res && !job_signaled) > + drm_sched_resubmit_jobs(&ring->sched); > + > + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); > + } > + > + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { > + drm_helper_resume_force_mode(tmp_adev->ddev); > + } > + > + tmp_adev->asic_reset_res = 0; > > if (r) { > /* bad news, how to tell it to userspace ? */ > @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > amdgpu_device_unlock_adev(tmp_adev); > } > > - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) > + if (hive) > mutex_unlock(&hive->reset_lock); > > if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-22 11:54 ` Grodzovsky, Andrey @ 2019-04-23 12:32 ` Koenig, Christian [not found] ` <9774408b-cc4c-90dd-cbc7-6ef5c6fd8c46-5C7GfCeVMHo@public.gmane.org> 2019-04-23 14:12 ` Grodzovsky, Andrey 0 siblings, 2 replies; 31+ messages in thread From: Koenig, Christian @ 2019-04-23 12:32 UTC (permalink / raw) To: Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas Well you at least have to give me time till after the holidays to get going again :) Not sure exactly jet why we need patch number 5. And we should probably commit patch #1 and #2. Christian. Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: > Ping for patches 3, new patch 5 and patch 6. > > Andrey > > On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <9774408b-cc4c-90dd-cbc7-6ef5c6fd8c46-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <9774408b-cc4c-90dd-cbc7-6ef5c6fd8c46-5C7GfCeVMHo@public.gmane.org> @ 2019-04-23 13:14 ` Kazlauskas, Nicholas 2019-04-23 14:03 ` Grodzovsky, Andrey 0 siblings, 1 reply; 31+ messages in thread From: Kazlauskas, Nicholas @ 2019-04-23 13:14 UTC (permalink / raw) To: Koenig, Christian, Grodzovsky, Andrey, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Feel free to merge 1+2 since they don't really depend on any other work in the series and they were previously reviewed. Nicholas Kazlauskas On 4/23/19 8:32 AM, Koenig, Christian wrote: > Well you at least have to give me time till after the holidays to get > going again :) > > Not sure exactly jet why we need patch number 5. > > And we should probably commit patch #1 and #2. > > Christian. > > Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >> Ping for patches 3, new patch 5 and patch 6. >> >> Andrey >> >> On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >>> Also reject TDRs if another one already running. >>> >>> v2: >>> Stop all schedulers across device and entire XGMI hive before >>> force signaling HW fences. >>> Avoid passing job_signaled to helper fnctions to keep all the decision >>> making about skipping HW reset in one place. >>> >>> v3: >>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >>> against it's decrement in drm_sched_stop in non HW reset case. >>> v4: rebase >>> v5: Revert v3 as we do it now in sceduler code. >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >>> 1 file changed, 95 insertions(+), 48 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index a0e165c..85f8792 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>> if (!ring || !ring->sched.thread) >>> continue; >>> >>> - drm_sched_stop(&ring->sched, &job->base); >>> - >>> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >>> amdgpu_fence_driver_force_completion(ring); >>> } >>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>> if(job) >>> drm_sched_increase_karma(&job->base); >>> >>> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >>> if (!amdgpu_sriov_vf(adev)) { >>> >>> if (!need_full_reset) >>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >>> return r; >>> } >>> >>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >>> { >>> - int i; >>> - >>> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> - struct amdgpu_ring *ring = adev->rings[i]; >>> - >>> - if (!ring || !ring->sched.thread) >>> - continue; >>> - >>> - if (!adev->asic_reset_res) >>> - drm_sched_resubmit_jobs(&ring->sched); >>> + if (trylock) { >>> + if (!mutex_trylock(&adev->lock_reset)) >>> + return false; >>> + } else >>> + mutex_lock(&adev->lock_reset); >>> >>> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >>> - } >>> - >>> - if (!amdgpu_device_has_dc_support(adev)) { >>> - drm_helper_resume_force_mode(adev->ddev); >>> - } >>> - >>> - adev->asic_reset_res = 0; >>> -} >>> - >>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >>> -{ >>> - mutex_lock(&adev->lock_reset); >>> atomic_inc(&adev->gpu_reset_counter); >>> adev->in_gpu_reset = 1; >>> /* Block kfd: SRIOV would do it separately */ >>> if (!amdgpu_sriov_vf(adev)) >>> amdgpu_amdkfd_pre_reset(adev); >>> + >>> + return true; >>> } >>> >>> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> struct amdgpu_job *job) >>> { >>> - int r; >>> + struct list_head device_list, *device_list_handle = NULL; >>> + bool need_full_reset, job_signaled; >>> struct amdgpu_hive_info *hive = NULL; >>> - bool need_full_reset = false; >>> struct amdgpu_device *tmp_adev = NULL; >>> - struct list_head device_list, *device_list_handle = NULL; >>> + int i, r = 0; >>> >>> + need_full_reset = job_signaled = false; >>> INIT_LIST_HEAD(&device_list); >>> >>> dev_info(adev->dev, "GPU reset begin!\n"); >>> >>> + hive = amdgpu_get_xgmi_hive(adev, false); >>> + >>> /* >>> - * In case of XGMI hive disallow concurrent resets to be triggered >>> - * by different nodes. No point also since the one node already executing >>> - * reset will also reset all the other nodes in the hive. >>> + * Here we trylock to avoid chain of resets executing from >>> + * either trigger by jobs on different adevs in XGMI hive or jobs on >>> + * different schedulers for same device while this TO handler is running. >>> + * We always reset all schedulers for device and all devices for XGMI >>> + * hive so that should take care of them too. >>> */ >>> - hive = amdgpu_get_xgmi_hive(adev, 0); >>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >>> - !mutex_trylock(&hive->reset_lock)) >>> + >>> + if (hive && !mutex_trylock(&hive->reset_lock)) { >>> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >>> + job->base.id, hive->hive_id); >>> return 0; >>> + } >>> >>> /* Start with adev pre asic reset first for soft reset check.*/ >>> - amdgpu_device_lock_adev(adev); >>> - r = amdgpu_device_pre_asic_reset(adev, >>> - job, >>> - &need_full_reset); >>> - if (r) { >>> - /*TODO Should we stop ?*/ >>> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>> - r, adev->ddev->unique); >>> - adev->asic_reset_res = r; >>> + if (!amdgpu_device_lock_adev(adev, !hive)) { >>> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >>> + job->base.id); >>> + return 0; >>> } >>> >>> /* Build list of devices to reset */ >>> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >>> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >>> if (!hive) { >>> amdgpu_device_unlock_adev(adev); >>> return -ENODEV; >>> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> device_list_handle = &device_list; >>> } >>> >>> + /* block all schedulers and reset given job's ring */ >>> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>> + >>> + if (!ring || !ring->sched.thread) >>> + continue; >>> + >>> + drm_sched_stop(&ring->sched, &job->base); >>> + } >>> + } >>> + >>> + >>> + /* >>> + * Must check guilty signal here since after this point all old >>> + * HW fences are force signaled. >>> + * >>> + * job->base holds a reference to parent fence >>> + */ >>> + if (job && job->base.s_fence->parent && >>> + dma_fence_is_signaled(job->base.s_fence->parent)) >>> + job_signaled = true; >>> + >>> + if (!amdgpu_device_ip_need_full_reset(adev)) >>> + device_list_handle = &device_list; >>> + >>> + if (job_signaled) { >>> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >>> + goto skip_hw_reset; >>> + } >>> + >>> + >>> + /* Guilty job will be freed after this*/ >>> + r = amdgpu_device_pre_asic_reset(adev, >>> + job, >>> + &need_full_reset); >>> + if (r) { >>> + /*TODO Should we stop ?*/ >>> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>> + r, adev->ddev->unique); >>> + adev->asic_reset_res = r; >>> + } >>> + >>> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>> >>> if (tmp_adev == adev) >>> continue; >>> >>> - amdgpu_device_lock_adev(tmp_adev); >>> + amdgpu_device_lock_adev(tmp_adev, false); >>> r = amdgpu_device_pre_asic_reset(tmp_adev, >>> NULL, >>> &need_full_reset); >>> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> goto retry; >>> } >>> >>> +skip_hw_reset: >>> + >>> /* Post ASIC reset for all devs .*/ >>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>> - amdgpu_device_post_asic_reset(tmp_adev); >>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>> + >>> + if (!ring || !ring->sched.thread) >>> + continue; >>> + >>> + /* No point to resubmit jobs if we didn't HW reset*/ >>> + if (!tmp_adev->asic_reset_res && !job_signaled) >>> + drm_sched_resubmit_jobs(&ring->sched); >>> + >>> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >>> + } >>> + >>> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >>> + drm_helper_resume_force_mode(tmp_adev->ddev); >>> + } >>> + >>> + tmp_adev->asic_reset_res = 0; >>> >>> if (r) { >>> /* bad news, how to tell it to userspace ? */ >>> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> amdgpu_device_unlock_adev(tmp_adev); >>> } >>> >>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >>> + if (hive) >>> mutex_unlock(&hive->reset_lock); >>> >>> if (r) > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-23 13:14 ` Kazlauskas, Nicholas @ 2019-04-23 14:03 ` Grodzovsky, Andrey 0 siblings, 0 replies; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 14:03 UTC (permalink / raw) To: Kazlauskas, Nicholas, Koenig, Christian, dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken OK, i will merge them into amd-staging drm-next. Andrey On 4/23/19 9:14 AM, Kazlauskas, Nicholas wrote: > Feel free to merge 1+2 since they don't really depend on any other work > in the series and they were previously reviewed. > > Nicholas Kazlauskas > > On 4/23/19 8:32 AM, Koenig, Christian wrote: >> Well you at least have to give me time till after the holidays to get >> going again :) >> >> Not sure exactly jet why we need patch number 5. >> >> And we should probably commit patch #1 and #2. >> >> Christian. >> >> Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >>> Ping for patches 3, new patch 5 and patch 6. >>> >>> Andrey >>> >>> On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >>>> Also reject TDRs if another one already running. >>>> >>>> v2: >>>> Stop all schedulers across device and entire XGMI hive before >>>> force signaling HW fences. >>>> Avoid passing job_signaled to helper fnctions to keep all the decision >>>> making about skipping HW reset in one place. >>>> >>>> v3: >>>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >>>> against it's decrement in drm_sched_stop in non HW reset case. >>>> v4: rebase >>>> v5: Revert v3 as we do it now in sceduler code. >>>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >>>> 1 file changed, 95 insertions(+), 48 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index a0e165c..85f8792 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>> if (!ring || !ring->sched.thread) >>>> continue; >>>> >>>> - drm_sched_stop(&ring->sched, &job->base); >>>> - >>>> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >>>> amdgpu_fence_driver_force_completion(ring); >>>> } >>>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>> if(job) >>>> drm_sched_increase_karma(&job->base); >>>> >>>> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >>>> if (!amdgpu_sriov_vf(adev)) { >>>> >>>> if (!need_full_reset) >>>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >>>> return r; >>>> } >>>> >>>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >>>> { >>>> - int i; >>>> - >>>> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> - struct amdgpu_ring *ring = adev->rings[i]; >>>> - >>>> - if (!ring || !ring->sched.thread) >>>> - continue; >>>> - >>>> - if (!adev->asic_reset_res) >>>> - drm_sched_resubmit_jobs(&ring->sched); >>>> + if (trylock) { >>>> + if (!mutex_trylock(&adev->lock_reset)) >>>> + return false; >>>> + } else >>>> + mutex_lock(&adev->lock_reset); >>>> >>>> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >>>> - } >>>> - >>>> - if (!amdgpu_device_has_dc_support(adev)) { >>>> - drm_helper_resume_force_mode(adev->ddev); >>>> - } >>>> - >>>> - adev->asic_reset_res = 0; >>>> -} >>>> - >>>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >>>> -{ >>>> - mutex_lock(&adev->lock_reset); >>>> atomic_inc(&adev->gpu_reset_counter); >>>> adev->in_gpu_reset = 1; >>>> /* Block kfd: SRIOV would do it separately */ >>>> if (!amdgpu_sriov_vf(adev)) >>>> amdgpu_amdkfd_pre_reset(adev); >>>> + >>>> + return true; >>>> } >>>> >>>> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>>> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>>> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> struct amdgpu_job *job) >>>> { >>>> - int r; >>>> + struct list_head device_list, *device_list_handle = NULL; >>>> + bool need_full_reset, job_signaled; >>>> struct amdgpu_hive_info *hive = NULL; >>>> - bool need_full_reset = false; >>>> struct amdgpu_device *tmp_adev = NULL; >>>> - struct list_head device_list, *device_list_handle = NULL; >>>> + int i, r = 0; >>>> >>>> + need_full_reset = job_signaled = false; >>>> INIT_LIST_HEAD(&device_list); >>>> >>>> dev_info(adev->dev, "GPU reset begin!\n"); >>>> >>>> + hive = amdgpu_get_xgmi_hive(adev, false); >>>> + >>>> /* >>>> - * In case of XGMI hive disallow concurrent resets to be triggered >>>> - * by different nodes. No point also since the one node already executing >>>> - * reset will also reset all the other nodes in the hive. >>>> + * Here we trylock to avoid chain of resets executing from >>>> + * either trigger by jobs on different adevs in XGMI hive or jobs on >>>> + * different schedulers for same device while this TO handler is running. >>>> + * We always reset all schedulers for device and all devices for XGMI >>>> + * hive so that should take care of them too. >>>> */ >>>> - hive = amdgpu_get_xgmi_hive(adev, 0); >>>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >>>> - !mutex_trylock(&hive->reset_lock)) >>>> + >>>> + if (hive && !mutex_trylock(&hive->reset_lock)) { >>>> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >>>> + job->base.id, hive->hive_id); >>>> return 0; >>>> + } >>>> >>>> /* Start with adev pre asic reset first for soft reset check.*/ >>>> - amdgpu_device_lock_adev(adev); >>>> - r = amdgpu_device_pre_asic_reset(adev, >>>> - job, >>>> - &need_full_reset); >>>> - if (r) { >>>> - /*TODO Should we stop ?*/ >>>> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>>> - r, adev->ddev->unique); >>>> - adev->asic_reset_res = r; >>>> + if (!amdgpu_device_lock_adev(adev, !hive)) { >>>> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >>>> + job->base.id); >>>> + return 0; >>>> } >>>> >>>> /* Build list of devices to reset */ >>>> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >>>> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >>>> if (!hive) { >>>> amdgpu_device_unlock_adev(adev); >>>> return -ENODEV; >>>> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> device_list_handle = &device_list; >>>> } >>>> >>>> + /* block all schedulers and reset given job's ring */ >>>> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>>> + >>>> + if (!ring || !ring->sched.thread) >>>> + continue; >>>> + >>>> + drm_sched_stop(&ring->sched, &job->base); >>>> + } >>>> + } >>>> + >>>> + >>>> + /* >>>> + * Must check guilty signal here since after this point all old >>>> + * HW fences are force signaled. >>>> + * >>>> + * job->base holds a reference to parent fence >>>> + */ >>>> + if (job && job->base.s_fence->parent && >>>> + dma_fence_is_signaled(job->base.s_fence->parent)) >>>> + job_signaled = true; >>>> + >>>> + if (!amdgpu_device_ip_need_full_reset(adev)) >>>> + device_list_handle = &device_list; >>>> + >>>> + if (job_signaled) { >>>> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >>>> + goto skip_hw_reset; >>>> + } >>>> + >>>> + >>>> + /* Guilty job will be freed after this*/ >>>> + r = amdgpu_device_pre_asic_reset(adev, >>>> + job, >>>> + &need_full_reset); >>>> + if (r) { >>>> + /*TODO Should we stop ?*/ >>>> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>>> + r, adev->ddev->unique); >>>> + adev->asic_reset_res = r; >>>> + } >>>> + >>>> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >>>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>>> >>>> if (tmp_adev == adev) >>>> continue; >>>> >>>> - amdgpu_device_lock_adev(tmp_adev); >>>> + amdgpu_device_lock_adev(tmp_adev, false); >>>> r = amdgpu_device_pre_asic_reset(tmp_adev, >>>> NULL, >>>> &need_full_reset); >>>> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> goto retry; >>>> } >>>> >>>> +skip_hw_reset: >>>> + >>>> /* Post ASIC reset for all devs .*/ >>>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>>> - amdgpu_device_post_asic_reset(tmp_adev); >>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>>> + >>>> + if (!ring || !ring->sched.thread) >>>> + continue; >>>> + >>>> + /* No point to resubmit jobs if we didn't HW reset*/ >>>> + if (!tmp_adev->asic_reset_res && !job_signaled) >>>> + drm_sched_resubmit_jobs(&ring->sched); >>>> + >>>> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >>>> + } >>>> + >>>> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >>>> + drm_helper_resume_force_mode(tmp_adev->ddev); >>>> + } >>>> + >>>> + tmp_adev->asic_reset_res = 0; >>>> >>>> if (r) { >>>> /* bad news, how to tell it to userspace ? */ >>>> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> amdgpu_device_unlock_adev(tmp_adev); >>>> } >>>> >>>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >>>> + if (hive) >>>> mutex_unlock(&hive->reset_lock); >>>> >>>> if (r) >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-23 12:32 ` Koenig, Christian [not found] ` <9774408b-cc4c-90dd-cbc7-6ef5c6fd8c46-5C7GfCeVMHo@public.gmane.org> @ 2019-04-23 14:12 ` Grodzovsky, Andrey [not found] ` <a5c97356-66d8-b79e-32ab-a03e4c4d3e39-5C7GfCeVMHo@public.gmane.org> 1 sibling, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 14:12 UTC (permalink / raw) To: Koenig, Christian, dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas [-- Attachment #1: Type: text/plain, Size: 9850 bytes --] On 4/23/19 8:32 AM, Koenig, Christian wrote: > Well you at least have to give me time till after the holidays to get > going again :) > > Not sure exactly jet why we need patch number 5. Probably you missed the mail where I pointed out a bug I found during testing - I am reattaching the mail and the KASAN dump. Andrey > > And we should probably commit patch #1 and #2. > > Christian. > > Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >> Ping for patches 3, new patch 5 and patch 6. >> >> Andrey >> >> On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >>> Also reject TDRs if another one already running. >>> >>> v2: >>> Stop all schedulers across device and entire XGMI hive before >>> force signaling HW fences. >>> Avoid passing job_signaled to helper fnctions to keep all the decision >>> making about skipping HW reset in one place. >>> >>> v3: >>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >>> against it's decrement in drm_sched_stop in non HW reset case. >>> v4: rebase >>> v5: Revert v3 as we do it now in sceduler code. >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >>> 1 file changed, 95 insertions(+), 48 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index a0e165c..85f8792 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>> if (!ring || !ring->sched.thread) >>> continue; >>> >>> - drm_sched_stop(&ring->sched, &job->base); >>> - >>> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >>> amdgpu_fence_driver_force_completion(ring); >>> } >>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>> if(job) >>> drm_sched_increase_karma(&job->base); >>> >>> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >>> if (!amdgpu_sriov_vf(adev)) { >>> >>> if (!need_full_reset) >>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >>> return r; >>> } >>> >>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >>> { >>> - int i; >>> - >>> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> - struct amdgpu_ring *ring = adev->rings[i]; >>> - >>> - if (!ring || !ring->sched.thread) >>> - continue; >>> - >>> - if (!adev->asic_reset_res) >>> - drm_sched_resubmit_jobs(&ring->sched); >>> + if (trylock) { >>> + if (!mutex_trylock(&adev->lock_reset)) >>> + return false; >>> + } else >>> + mutex_lock(&adev->lock_reset); >>> >>> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >>> - } >>> - >>> - if (!amdgpu_device_has_dc_support(adev)) { >>> - drm_helper_resume_force_mode(adev->ddev); >>> - } >>> - >>> - adev->asic_reset_res = 0; >>> -} >>> - >>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >>> -{ >>> - mutex_lock(&adev->lock_reset); >>> atomic_inc(&adev->gpu_reset_counter); >>> adev->in_gpu_reset = 1; >>> /* Block kfd: SRIOV would do it separately */ >>> if (!amdgpu_sriov_vf(adev)) >>> amdgpu_amdkfd_pre_reset(adev); >>> + >>> + return true; >>> } >>> >>> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> struct amdgpu_job *job) >>> { >>> - int r; >>> + struct list_head device_list, *device_list_handle = NULL; >>> + bool need_full_reset, job_signaled; >>> struct amdgpu_hive_info *hive = NULL; >>> - bool need_full_reset = false; >>> struct amdgpu_device *tmp_adev = NULL; >>> - struct list_head device_list, *device_list_handle = NULL; >>> + int i, r = 0; >>> >>> + need_full_reset = job_signaled = false; >>> INIT_LIST_HEAD(&device_list); >>> >>> dev_info(adev->dev, "GPU reset begin!\n"); >>> >>> + hive = amdgpu_get_xgmi_hive(adev, false); >>> + >>> /* >>> - * In case of XGMI hive disallow concurrent resets to be triggered >>> - * by different nodes. No point also since the one node already executing >>> - * reset will also reset all the other nodes in the hive. >>> + * Here we trylock to avoid chain of resets executing from >>> + * either trigger by jobs on different adevs in XGMI hive or jobs on >>> + * different schedulers for same device while this TO handler is running. >>> + * We always reset all schedulers for device and all devices for XGMI >>> + * hive so that should take care of them too. >>> */ >>> - hive = amdgpu_get_xgmi_hive(adev, 0); >>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >>> - !mutex_trylock(&hive->reset_lock)) >>> + >>> + if (hive && !mutex_trylock(&hive->reset_lock)) { >>> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >>> + job->base.id, hive->hive_id); >>> return 0; >>> + } >>> >>> /* Start with adev pre asic reset first for soft reset check.*/ >>> - amdgpu_device_lock_adev(adev); >>> - r = amdgpu_device_pre_asic_reset(adev, >>> - job, >>> - &need_full_reset); >>> - if (r) { >>> - /*TODO Should we stop ?*/ >>> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>> - r, adev->ddev->unique); >>> - adev->asic_reset_res = r; >>> + if (!amdgpu_device_lock_adev(adev, !hive)) { >>> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >>> + job->base.id); >>> + return 0; >>> } >>> >>> /* Build list of devices to reset */ >>> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >>> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >>> if (!hive) { >>> amdgpu_device_unlock_adev(adev); >>> return -ENODEV; >>> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> device_list_handle = &device_list; >>> } >>> >>> + /* block all schedulers and reset given job's ring */ >>> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>> + >>> + if (!ring || !ring->sched.thread) >>> + continue; >>> + >>> + drm_sched_stop(&ring->sched, &job->base); >>> + } >>> + } >>> + >>> + >>> + /* >>> + * Must check guilty signal here since after this point all old >>> + * HW fences are force signaled. >>> + * >>> + * job->base holds a reference to parent fence >>> + */ >>> + if (job && job->base.s_fence->parent && >>> + dma_fence_is_signaled(job->base.s_fence->parent)) >>> + job_signaled = true; >>> + >>> + if (!amdgpu_device_ip_need_full_reset(adev)) >>> + device_list_handle = &device_list; >>> + >>> + if (job_signaled) { >>> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >>> + goto skip_hw_reset; >>> + } >>> + >>> + >>> + /* Guilty job will be freed after this*/ >>> + r = amdgpu_device_pre_asic_reset(adev, >>> + job, >>> + &need_full_reset); >>> + if (r) { >>> + /*TODO Should we stop ?*/ >>> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>> + r, adev->ddev->unique); >>> + adev->asic_reset_res = r; >>> + } >>> + >>> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>> >>> if (tmp_adev == adev) >>> continue; >>> >>> - amdgpu_device_lock_adev(tmp_adev); >>> + amdgpu_device_lock_adev(tmp_adev, false); >>> r = amdgpu_device_pre_asic_reset(tmp_adev, >>> NULL, >>> &need_full_reset); >>> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> goto retry; >>> } >>> >>> +skip_hw_reset: >>> + >>> /* Post ASIC reset for all devs .*/ >>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>> - amdgpu_device_post_asic_reset(tmp_adev); >>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>> + >>> + if (!ring || !ring->sched.thread) >>> + continue; >>> + >>> + /* No point to resubmit jobs if we didn't HW reset*/ >>> + if (!tmp_adev->asic_reset_res && !job_signaled) >>> + drm_sched_resubmit_jobs(&ring->sched); >>> + >>> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >>> + } >>> + >>> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >>> + drm_helper_resume_force_mode(tmp_adev->ddev); >>> + } >>> + >>> + tmp_adev->asic_reset_res = 0; >>> >>> if (r) { >>> /* bad news, how to tell it to userspace ? */ >>> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>> amdgpu_device_unlock_adev(tmp_adev); >>> } >>> >>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >>> + if (hive) >>> mutex_unlock(&hive->reset_lock); >>> >>> if (r) > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: kasan.log --] [-- Type: text/x-log; name="kasan.log", Size: 3511 bytes --] 121.189757 < 0.000171>] amdgpu 0000:01:00.0: GPU reset(5) succeeded! passed[ 121.189894 < 0.000137>] ================================================================== [ 121.189951 < 0.000057>] BUG: KASAN: use-after-free in drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] Run Summary: Type Total Ran Passed Failed Inactive suites 8 0 n/a 0 0 tests 39 1 1 0 0 asserts 8 8 8 0 n/a Elapsed time = 0.001 seconds[ 121.189956 < 0.000005>] Read of size 8 at addr ffff88840389a8b0 by task kworker/2:2/1140 [ 121.189969 < 0.000013>] CPU: 2 PID: 1140 Comm: kworker/2:2 Tainted: G OE 5.1.0-rc2-misc+ #1 [ 121.189972 < 0.000003>] Hardware name: System manufacturer System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 121.189977 < 0.000005>] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 121.189980 < 0.000003>] Call Trace: [ 121.189985 < 0.000005>] dump_stack+0x9b/0xf5 [ 121.189992 < 0.000007>] print_address_description+0x70/0x290 [ 121.189997 < 0.000005>] ? drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190002 < 0.000005>] kasan_report+0x134/0x191 [ 121.190006 < 0.000004>] ? drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190014 < 0.000008>] ? drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190019 < 0.000005>] __asan_load8+0x54/0x90 [ 121.190024 < 0.000005>] drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190034 < 0.000010>] process_one_work+0x466/0xb00 [ 121.190046 < 0.000012>] ? queue_work_node+0x180/0x180 [ 121.190061 < 0.000015>] worker_thread+0x83/0x6c0 [ 121.190075 < 0.000014>] kthread+0x1a9/0x1f0 [ 121.190079 < 0.000004>] ? rescuer_thread+0x760/0x760 [ 121.190081 < 0.000002>] ? kthread_cancel_delayed_work_sync+0x20/0x20 [ 121.190088 < 0.000007>] ret_from_fork+0x3a/0x50 [ 121.190105 < 0.000017>] Allocated by task 1421: [ 121.190110 < 0.000005>] save_stack+0x46/0xd0 [ 121.190112 < 0.000002>] __kasan_kmalloc+0xab/0xe0 [ 121.190115 < 0.000003>] kasan_kmalloc+0xf/0x20 [ 121.190117 < 0.000002>] __kmalloc+0x167/0x390 [ 121.190210 < 0.000093>] amdgpu_job_alloc+0x47/0x170 [amdgpu] [ 121.190289 < 0.000079>] amdgpu_cs_ioctl+0x9bd/0x2e70 [amdgpu] [ 121.190312 < 0.000023>] drm_ioctl_kernel+0x17e/0x1d0 [drm] [ 121.190334 < 0.000022>] drm_ioctl+0x5e1/0x640 [drm] [ 121.190409 < 0.000075>] amdgpu_drm_ioctl+0x78/0xd0 [amdgpu] [ 121.190413 < 0.000004>] do_vfs_ioctl+0x152/0xa30 [ 121.190415 < 0.000002>] ksys_ioctl+0x6d/0x80 [ 121.190418 < 0.000003>] __x64_sys_ioctl+0x43/0x50 [ 121.190425 < 0.000007>] do_syscall_64+0x7d/0x240 [ 121.190430 < 0.000005>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 121.190440 < 0.000010>] Freed by task 1242: [ 121.190448 < 0.000008>] save_stack+0x46/0xd0 [ 121.190453 < 0.000005>] __kasan_slab_free+0x13c/0x1a0 [ 121.190458 < 0.000005>] kasan_slab_free+0xe/0x10 [ 121.190462 < 0.000004>] kfree+0xfa/0x2e0 [ 121.190584 < 0.000122>] amdgpu_job_free_cb+0x7f/0x90 [amdgpu] [ 121.190589 < 0.000005>] drm_sched_cleanup_jobs.part.10+0xcf/0x1a0 [gpu_sched] [ 121.190594 < 0.000005>] drm_sched_main+0x38a/0x430 [gpu_sched] [ 121.190596 < 0.000002>] kthread+0x1a9/0x1f0 [ 121.190599 < 0.000003>] ret_from_fork+0x3a/0x50 [-- Attachment #3: Type: message/rfc822, Size: 12159 bytes --] [-- Attachment #3.1.1: Type: text/plain, Size: 6042 bytes --] On 4/16/19 12:00 PM, Koenig, Christian wrote: > Am 16.04.19 um 17:42 schrieb Grodzovsky, Andrey: >> On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote: >>> On 4/16/19 10:43 AM, Koenig, Christian wrote: >>>> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>>>> On 4/16/19 5:47 AM, Christian König wrote: >>>>>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>>>>> Andrey Grodzovsky <andrey.grodzovsky@amd.com> writes: >>>>>>> >>>>>>>> From: Christian König <christian.koenig@amd.com> >>>>>>>> >>>>>>>> We now destroy finished jobs from the worker thread to make sure that >>>>>>>> we never destroy a job currently in timeout processing. >>>>>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>>>>>>> which should solve a deadlock reported by a user. >>>>>>>> >>>>>>>> v2: Remove unused variable. >>>>>>>> >>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>>>>>>> >>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com> >>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- >>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- >>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 138 >>>>>>>> +++++++++++++++++------------ >>>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >>>>>>> Missing corresponding panfrost and lima updates. You should probably >>>>>>> pull in drm-misc for hacking on the scheduler. >>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> index ce7c737b..8efb091 100644 >>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, >>>>>>>> struct drm_sched_job *sched_job) >>>>>>>> /* block scheduler */ >>>>>>>> for (q = 0; q < V3D_MAX_QUEUES; q++) >>>>>>>> - drm_sched_stop(&v3d->queue[q].sched); >>>>>>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >>>>>>>> if(sched_job) >>>>>>>> drm_sched_increase_karma(sched_job); >>>>>>>> + /* >>>>>>>> + * Guilty job did complete and hence needs to be manually removed >>>>>>>> + * See drm_sched_stop doc. >>>>>>>> + */ >>>>>>>> + if (list_empty(&sched_job->node)) >>>>>>>> + sched_job->sched->ops->free_job(sched_job); >>>>>>> If the if (sched_job) is necessary up above, then this should clearly be >>>>>>> under it. >>>>>>> >>>>>>> But, can we please have a core scheduler thing we call here instead of >>>>>>> drivers all replicating it? >>>>>> Yeah that's also something I noted before. >>>>>> >>>>>> Essential problem is that we remove finished jobs from the mirror list >>>>>> and so need to destruct them because we otherwise leak them. >>>>>> >>>>>> Alternative approach here would be to keep the jobs on the ring mirror >>>>>> list, but not submit them again. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>> I really prefer to avoid this, it means adding extra flag to sched_job >>>>> to check in each iteration of the ring mirror list. >>>> Mhm, why actually? We just need to check if the scheduler fence is signaled. >>> OK, i see it's equivalent but this still en extra check for all the >>> iterations. >>> >>>>> What about changing >>>>> signature of drm_sched_backend_ops.timedout_job to return drm_sched_job* >>>>> instead of void, this way we can return the guilty job back from the >>>>> driver specific handler to the generic drm_sched_job_timedout and >>>>> release it there. >>>> Well the timeout handler already has the job, so returning it doesn't >>>> make much sense. >>>> >>>> The problem is rather that the timeout handler doesn't know if it should >>>> destroy the job or not. >>> But the driver specific handler does, and actually returning back either >>> the pointer to the job or null will give an indication of that. We can >>> even return bool. >>> >>> Andrey >> Thinking a bit more about this - the way this check is done now "if >> (list_empty(&sched_job->node)) then free the sched_job" actually makes >> it possible to just move this as is from driver specific callbacks into >> drm_sched_job_timeout without any other changes. > Oh, well that sounds like a good idea off hand. > > Need to see the final code, but at least the best idea so far. > > Christian. Unfortunately looks like it's not that good idea at the end, take a look at the attached KASAN print - sched thread's cleanup function races against TDR handler and removes the guilty job from mirror list and we have no way of differentiating if the job was removed from within the TDR handler or from the sched. thread's clean-up function. So looks like we either need 'keep the jobs on the ring mirror list, but not submit them again' as you suggested before or add a flag to sched_job to hint to drm_sched_job_timedout that guilty job requires manual removal. Your suggestion implies we will need an extra check in almost every place of traversal of the mirror ring to avoid handling signaled jobs while mine requires extra flag in sched_job struct . I feel that keeping completed jobs in the mirror list when they actually don't belong there any more is confusing and an opening for future bugs. Andrey > >> Andrey >> >>>> Christian. >>>> >>>>> Andrey >>>>> >>>>>>>> + >>>>>>>> /* get the GPU back into the init state */ >>>>>>>> v3d_reset(v3d); >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #3.1.2: kasan.log --] [-- Type: text/x-log, Size: 3450 bytes --] 121.189757 < 0.000171>] amdgpu 0000:01:00.0: GPU reset(5) succeeded! passed[ 121.189894 < 0.000137>] ================================================================== [ 121.189951 < 0.000057>] BUG: KASAN: use-after-free in drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] Run Summary: Type Total Ran Passed Failed Inactive suites 8 0 n/a 0 0 tests 39 1 1 0 0 asserts 8 8 8 0 n/a Elapsed time = 0.001 seconds[ 121.189956 < 0.000005>] Read of size 8 at addr ffff88840389a8b0 by task kworker/2:2/1140 [ 121.189969 < 0.000013>] CPU: 2 PID: 1140 Comm: kworker/2:2 Tainted: G OE 5.1.0-rc2-misc+ #1 [ 121.189972 < 0.000003>] Hardware name: System manufacturer System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 121.189977 < 0.000005>] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 121.189980 < 0.000003>] Call Trace: [ 121.189985 < 0.000005>] dump_stack+0x9b/0xf5 [ 121.189992 < 0.000007>] print_address_description+0x70/0x290 [ 121.189997 < 0.000005>] ? drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190002 < 0.000005>] kasan_report+0x134/0x191 [ 121.190006 < 0.000004>] ? drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190014 < 0.000008>] ? drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190019 < 0.000005>] __asan_load8+0x54/0x90 [ 121.190024 < 0.000005>] drm_sched_job_timedout+0x7a/0xf0 [gpu_sched] [ 121.190034 < 0.000010>] process_one_work+0x466/0xb00 [ 121.190046 < 0.000012>] ? queue_work_node+0x180/0x180 [ 121.190061 < 0.000015>] worker_thread+0x83/0x6c0 [ 121.190075 < 0.000014>] kthread+0x1a9/0x1f0 [ 121.190079 < 0.000004>] ? rescuer_thread+0x760/0x760 [ 121.190081 < 0.000002>] ? kthread_cancel_delayed_work_sync+0x20/0x20 [ 121.190088 < 0.000007>] ret_from_fork+0x3a/0x50 [ 121.190105 < 0.000017>] Allocated by task 1421: [ 121.190110 < 0.000005>] save_stack+0x46/0xd0 [ 121.190112 < 0.000002>] __kasan_kmalloc+0xab/0xe0 [ 121.190115 < 0.000003>] kasan_kmalloc+0xf/0x20 [ 121.190117 < 0.000002>] __kmalloc+0x167/0x390 [ 121.190210 < 0.000093>] amdgpu_job_alloc+0x47/0x170 [amdgpu] [ 121.190289 < 0.000079>] amdgpu_cs_ioctl+0x9bd/0x2e70 [amdgpu] [ 121.190312 < 0.000023>] drm_ioctl_kernel+0x17e/0x1d0 [drm] [ 121.190334 < 0.000022>] drm_ioctl+0x5e1/0x640 [drm] [ 121.190409 < 0.000075>] amdgpu_drm_ioctl+0x78/0xd0 [amdgpu] [ 121.190413 < 0.000004>] do_vfs_ioctl+0x152/0xa30 [ 121.190415 < 0.000002>] ksys_ioctl+0x6d/0x80 [ 121.190418 < 0.000003>] __x64_sys_ioctl+0x43/0x50 [ 121.190425 < 0.000007>] do_syscall_64+0x7d/0x240 [ 121.190430 < 0.000005>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 121.190440 < 0.000010>] Freed by task 1242: [ 121.190448 < 0.000008>] save_stack+0x46/0xd0 [ 121.190453 < 0.000005>] __kasan_slab_free+0x13c/0x1a0 [ 121.190458 < 0.000005>] kasan_slab_free+0xe/0x10 [ 121.190462 < 0.000004>] kfree+0xfa/0x2e0 [ 121.190584 < 0.000122>] amdgpu_job_free_cb+0x7f/0x90 [amdgpu] [ 121.190589 < 0.000005>] drm_sched_cleanup_jobs.part.10+0xcf/0x1a0 [gpu_sched] [ 121.190594 < 0.000005>] drm_sched_main+0x38a/0x430 [gpu_sched] [ 121.190596 < 0.000002>] kthread+0x1a9/0x1f0 [ 121.190599 < 0.000003>] ret_from_fork+0x3a/0x50 [-- Attachment #4: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <a5c97356-66d8-b79e-32ab-a03e4c4d3e39-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <a5c97356-66d8-b79e-32ab-a03e4c4d3e39-5C7GfCeVMHo@public.gmane.org> @ 2019-04-23 14:49 ` Christian König 0 siblings, 0 replies; 31+ messages in thread From: Christian König @ 2019-04-23 14:49 UTC (permalink / raw) To: Grodzovsky, Andrey, Koenig, Christian, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Cc: Kazlauskas, Nicholas Am 23.04.19 um 16:12 schrieb Grodzovsky, Andrey: > On 4/23/19 8:32 AM, Koenig, Christian wrote: > >> Well you at least have to give me time till after the holidays to get >> going again :) >> >> Not sure exactly jet why we need patch number 5. > Probably you missed the mail where I pointed out a bug I found during > testing - I am reattaching the mail and the KASAN dump. Ah, so the job is actually resubmitted and we race with finishing and destroying it. Well that is a really ugly problem we have here, but your solution should work. Christian. > > Andrey > > >> And we should probably commit patch #1 and #2. >> >> Christian. >> >> Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >>> Ping for patches 3, new patch 5 and patch 6. >>> >>> Andrey >>> >>> On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >>>> Also reject TDRs if another one already running. >>>> >>>> v2: >>>> Stop all schedulers across device and entire XGMI hive before >>>> force signaling HW fences. >>>> Avoid passing job_signaled to helper fnctions to keep all the decision >>>> making about skipping HW reset in one place. >>>> >>>> v3: >>>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >>>> against it's decrement in drm_sched_stop in non HW reset case. >>>> v4: rebase >>>> v5: Revert v3 as we do it now in sceduler code. >>>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >>>> 1 file changed, 95 insertions(+), 48 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index a0e165c..85f8792 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>> if (!ring || !ring->sched.thread) >>>> continue; >>>> >>>> - drm_sched_stop(&ring->sched, &job->base); >>>> - >>>> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >>>> amdgpu_fence_driver_force_completion(ring); >>>> } >>>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>> if(job) >>>> drm_sched_increase_karma(&job->base); >>>> >>>> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >>>> if (!amdgpu_sriov_vf(adev)) { >>>> >>>> if (!need_full_reset) >>>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >>>> return r; >>>> } >>>> >>>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >>>> { >>>> - int i; >>>> - >>>> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> - struct amdgpu_ring *ring = adev->rings[i]; >>>> - >>>> - if (!ring || !ring->sched.thread) >>>> - continue; >>>> - >>>> - if (!adev->asic_reset_res) >>>> - drm_sched_resubmit_jobs(&ring->sched); >>>> + if (trylock) { >>>> + if (!mutex_trylock(&adev->lock_reset)) >>>> + return false; >>>> + } else >>>> + mutex_lock(&adev->lock_reset); >>>> >>>> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >>>> - } >>>> - >>>> - if (!amdgpu_device_has_dc_support(adev)) { >>>> - drm_helper_resume_force_mode(adev->ddev); >>>> - } >>>> - >>>> - adev->asic_reset_res = 0; >>>> -} >>>> - >>>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >>>> -{ >>>> - mutex_lock(&adev->lock_reset); >>>> atomic_inc(&adev->gpu_reset_counter); >>>> adev->in_gpu_reset = 1; >>>> /* Block kfd: SRIOV would do it separately */ >>>> if (!amdgpu_sriov_vf(adev)) >>>> amdgpu_amdkfd_pre_reset(adev); >>>> + >>>> + return true; >>>> } >>>> >>>> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>>> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>>> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> struct amdgpu_job *job) >>>> { >>>> - int r; >>>> + struct list_head device_list, *device_list_handle = NULL; >>>> + bool need_full_reset, job_signaled; >>>> struct amdgpu_hive_info *hive = NULL; >>>> - bool need_full_reset = false; >>>> struct amdgpu_device *tmp_adev = NULL; >>>> - struct list_head device_list, *device_list_handle = NULL; >>>> + int i, r = 0; >>>> >>>> + need_full_reset = job_signaled = false; >>>> INIT_LIST_HEAD(&device_list); >>>> >>>> dev_info(adev->dev, "GPU reset begin!\n"); >>>> >>>> + hive = amdgpu_get_xgmi_hive(adev, false); >>>> + >>>> /* >>>> - * In case of XGMI hive disallow concurrent resets to be triggered >>>> - * by different nodes. No point also since the one node already executing >>>> - * reset will also reset all the other nodes in the hive. >>>> + * Here we trylock to avoid chain of resets executing from >>>> + * either trigger by jobs on different adevs in XGMI hive or jobs on >>>> + * different schedulers for same device while this TO handler is running. >>>> + * We always reset all schedulers for device and all devices for XGMI >>>> + * hive so that should take care of them too. >>>> */ >>>> - hive = amdgpu_get_xgmi_hive(adev, 0); >>>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >>>> - !mutex_trylock(&hive->reset_lock)) >>>> + >>>> + if (hive && !mutex_trylock(&hive->reset_lock)) { >>>> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >>>> + job->base.id, hive->hive_id); >>>> return 0; >>>> + } >>>> >>>> /* Start with adev pre asic reset first for soft reset check.*/ >>>> - amdgpu_device_lock_adev(adev); >>>> - r = amdgpu_device_pre_asic_reset(adev, >>>> - job, >>>> - &need_full_reset); >>>> - if (r) { >>>> - /*TODO Should we stop ?*/ >>>> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>>> - r, adev->ddev->unique); >>>> - adev->asic_reset_res = r; >>>> + if (!amdgpu_device_lock_adev(adev, !hive)) { >>>> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >>>> + job->base.id); >>>> + return 0; >>>> } >>>> >>>> /* Build list of devices to reset */ >>>> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >>>> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >>>> if (!hive) { >>>> amdgpu_device_unlock_adev(adev); >>>> return -ENODEV; >>>> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> device_list_handle = &device_list; >>>> } >>>> >>>> + /* block all schedulers and reset given job's ring */ >>>> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>>> + >>>> + if (!ring || !ring->sched.thread) >>>> + continue; >>>> + >>>> + drm_sched_stop(&ring->sched, &job->base); >>>> + } >>>> + } >>>> + >>>> + >>>> + /* >>>> + * Must check guilty signal here since after this point all old >>>> + * HW fences are force signaled. >>>> + * >>>> + * job->base holds a reference to parent fence >>>> + */ >>>> + if (job && job->base.s_fence->parent && >>>> + dma_fence_is_signaled(job->base.s_fence->parent)) >>>> + job_signaled = true; >>>> + >>>> + if (!amdgpu_device_ip_need_full_reset(adev)) >>>> + device_list_handle = &device_list; >>>> + >>>> + if (job_signaled) { >>>> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >>>> + goto skip_hw_reset; >>>> + } >>>> + >>>> + >>>> + /* Guilty job will be freed after this*/ >>>> + r = amdgpu_device_pre_asic_reset(adev, >>>> + job, >>>> + &need_full_reset); >>>> + if (r) { >>>> + /*TODO Should we stop ?*/ >>>> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >>>> + r, adev->ddev->unique); >>>> + adev->asic_reset_res = r; >>>> + } >>>> + >>>> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >>>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>>> >>>> if (tmp_adev == adev) >>>> continue; >>>> >>>> - amdgpu_device_lock_adev(tmp_adev); >>>> + amdgpu_device_lock_adev(tmp_adev, false); >>>> r = amdgpu_device_pre_asic_reset(tmp_adev, >>>> NULL, >>>> &need_full_reset); >>>> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> goto retry; >>>> } >>>> >>>> +skip_hw_reset: >>>> + >>>> /* Post ASIC reset for all devs .*/ >>>> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >>>> - amdgpu_device_post_asic_reset(tmp_adev); >>>> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >>>> + >>>> + if (!ring || !ring->sched.thread) >>>> + continue; >>>> + >>>> + /* No point to resubmit jobs if we didn't HW reset*/ >>>> + if (!tmp_adev->asic_reset_res && !job_signaled) >>>> + drm_sched_resubmit_jobs(&ring->sched); >>>> + >>>> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >>>> + } >>>> + >>>> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >>>> + drm_helper_resume_force_mode(tmp_adev->ddev); >>>> + } >>>> + >>>> + tmp_adev->asic_reset_res = 0; >>>> >>>> if (r) { >>>> /* bad news, how to tell it to userspace ? */ >>>> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>>> amdgpu_device_unlock_adev(tmp_adev); >>>> } >>>> >>>> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >>>> + if (hive) >>>> mutex_unlock(&hive->reset_lock); >>>> >>>> if (r) >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <1555599624-12285-6-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-22 11:54 ` Grodzovsky, Andrey @ 2019-04-22 13:09 ` Chunming Zhou 2019-04-23 14:51 ` Grodzovsky, Andrey 1 sibling, 1 reply; 31+ messages in thread From: Chunming Zhou @ 2019-04-22 13:09 UTC (permalink / raw) To: Grodzovsky, Andrey, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Liu, Monk +Monk. GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. But out of curious, why guilty job can signal more if the job is already set to guilty? set it wrongly? -David 在 2019/4/18 23:00, Andrey Grodzovsky 写道: > Also reject TDRs if another one already running. > > v2: > Stop all schedulers across device and entire XGMI hive before > force signaling HW fences. > Avoid passing job_signaled to helper fnctions to keep all the decision > making about skipping HW reset in one place. > > v3: > Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced > against it's decrement in drm_sched_stop in non HW reset case. > v4: rebase > v5: Revert v3 as we do it now in sceduler code. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- > 1 file changed, 95 insertions(+), 48 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index a0e165c..85f8792 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if (!ring || !ring->sched.thread) > continue; > > - drm_sched_stop(&ring->sched, &job->base); > - > /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ > amdgpu_fence_driver_force_completion(ring); > } > @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > if(job) > drm_sched_increase_karma(&job->base); > > + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ > if (!amdgpu_sriov_vf(adev)) { > > if (!need_full_reset) > @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, > return r; > } > > -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) > +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) > { > - int i; > - > - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > - struct amdgpu_ring *ring = adev->rings[i]; > - > - if (!ring || !ring->sched.thread) > - continue; > - > - if (!adev->asic_reset_res) > - drm_sched_resubmit_jobs(&ring->sched); > + if (trylock) { > + if (!mutex_trylock(&adev->lock_reset)) > + return false; > + } else > + mutex_lock(&adev->lock_reset); > > - drm_sched_start(&ring->sched, !adev->asic_reset_res); > - } > - > - if (!amdgpu_device_has_dc_support(adev)) { > - drm_helper_resume_force_mode(adev->ddev); > - } > - > - adev->asic_reset_res = 0; > -} > - > -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) > -{ > - mutex_lock(&adev->lock_reset); > atomic_inc(&adev->gpu_reset_counter); > adev->in_gpu_reset = 1; > /* Block kfd: SRIOV would do it separately */ > if (!amdgpu_sriov_vf(adev)) > amdgpu_amdkfd_pre_reset(adev); > + > + return true; > } > > static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) > @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) > int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > struct amdgpu_job *job) > { > - int r; > + struct list_head device_list, *device_list_handle = NULL; > + bool need_full_reset, job_signaled; > struct amdgpu_hive_info *hive = NULL; > - bool need_full_reset = false; > struct amdgpu_device *tmp_adev = NULL; > - struct list_head device_list, *device_list_handle = NULL; > + int i, r = 0; > > + need_full_reset = job_signaled = false; > INIT_LIST_HEAD(&device_list); > > dev_info(adev->dev, "GPU reset begin!\n"); > > + hive = amdgpu_get_xgmi_hive(adev, false); > + > /* > - * In case of XGMI hive disallow concurrent resets to be triggered > - * by different nodes. No point also since the one node already executing > - * reset will also reset all the other nodes in the hive. > + * Here we trylock to avoid chain of resets executing from > + * either trigger by jobs on different adevs in XGMI hive or jobs on > + * different schedulers for same device while this TO handler is running. > + * We always reset all schedulers for device and all devices for XGMI > + * hive so that should take care of them too. > */ > - hive = amdgpu_get_xgmi_hive(adev, 0); > - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && > - !mutex_trylock(&hive->reset_lock)) > + > + if (hive && !mutex_trylock(&hive->reset_lock)) { > + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", > + job->base.id, hive->hive_id); > return 0; > + } > > /* Start with adev pre asic reset first for soft reset check.*/ > - amdgpu_device_lock_adev(adev); > - r = amdgpu_device_pre_asic_reset(adev, > - job, > - &need_full_reset); > - if (r) { > - /*TODO Should we stop ?*/ > - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", > - r, adev->ddev->unique); > - adev->asic_reset_res = r; > + if (!amdgpu_device_lock_adev(adev, !hive)) { > + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", > + job->base.id); > + return 0; > } > > /* Build list of devices to reset */ > - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { > + if (adev->gmc.xgmi.num_physical_nodes > 1) { > if (!hive) { > amdgpu_device_unlock_adev(adev); > return -ENODEV; > @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > device_list_handle = &device_list; > } > > + /* block all schedulers and reset given job's ring */ > + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > + struct amdgpu_ring *ring = tmp_adev->rings[i]; > + > + if (!ring || !ring->sched.thread) > + continue; > + > + drm_sched_stop(&ring->sched, &job->base); > + } > + } > + > + > + /* > + * Must check guilty signal here since after this point all old > + * HW fences are force signaled. > + * > + * job->base holds a reference to parent fence > + */ > + if (job && job->base.s_fence->parent && > + dma_fence_is_signaled(job->base.s_fence->parent)) > + job_signaled = true; > + > + if (!amdgpu_device_ip_need_full_reset(adev)) > + device_list_handle = &device_list; > + > + if (job_signaled) { > + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); > + goto skip_hw_reset; > + } > + > + > + /* Guilty job will be freed after this*/ > + r = amdgpu_device_pre_asic_reset(adev, > + job, > + &need_full_reset); > + if (r) { > + /*TODO Should we stop ?*/ > + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", > + r, adev->ddev->unique); > + adev->asic_reset_res = r; > + } > + > retry: /* Rest of adevs pre asic reset from XGMI hive. */ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > > if (tmp_adev == adev) > continue; > > - amdgpu_device_lock_adev(tmp_adev); > + amdgpu_device_lock_adev(tmp_adev, false); > r = amdgpu_device_pre_asic_reset(tmp_adev, > NULL, > &need_full_reset); > @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > goto retry; > } > > +skip_hw_reset: > + > /* Post ASIC reset for all devs .*/ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > - amdgpu_device_post_asic_reset(tmp_adev); > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > + struct amdgpu_ring *ring = tmp_adev->rings[i]; > + > + if (!ring || !ring->sched.thread) > + continue; > + > + /* No point to resubmit jobs if we didn't HW reset*/ > + if (!tmp_adev->asic_reset_res && !job_signaled) > + drm_sched_resubmit_jobs(&ring->sched); > + > + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); > + } > + > + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { > + drm_helper_resume_force_mode(tmp_adev->ddev); > + } > + > + tmp_adev->asic_reset_res = 0; > > if (r) { > /* bad news, how to tell it to userspace ? */ > @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > amdgpu_device_unlock_adev(tmp_adev); > } > > - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) > + if (hive) > mutex_unlock(&hive->reset_lock); > > if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-22 13:09 ` Chunming Zhou @ 2019-04-23 14:51 ` Grodzovsky, Andrey [not found] ` <1b41c4f1-b406-8710-2a7a-e5c54a116fe9-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 14:51 UTC (permalink / raw) To: Zhou, David(ChunMing), dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas, Liu, Monk On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <1b41c4f1-b406-8710-2a7a-e5c54a116fe9-5C7GfCeVMHo@public.gmane.org>]
* Re:[PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <1b41c4f1-b406-8710-2a7a-e5c54a116fe9-5C7GfCeVMHo@public.gmane.org> @ 2019-04-23 15:19 ` Zhou, David(ChunMing) [not found] ` <-hyv5g0n8ru25qelb0v-8u6jdi1vp2c7z1m3f5-uygwc1o5ji6s-9zli9v-srreuk-3pvse1en6kx0-6se95l-6jsafd-a6sboi-j814xf-ijgwfc-qewgmm-vnafjgrn2fq0-jgir949hx4yo-i772hz-tn7ial.1556032736536-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Zhou, David(ChunMing) @ 2019-04-23 15:19 UTC (permalink / raw) To: Grodzovsky, Andrey, Zhou, David(ChunMing), dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 11499 bytes --] do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? -------- Original Message -------- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,amd-gfx@lists.freedesktop.org,eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org,etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ amd-gfx mailing list amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 24573 bytes --] [-- Attachment #2: Type: text/plain, Size: 153 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <-hyv5g0n8ru25qelb0v-8u6jdi1vp2c7z1m3f5-uygwc1o5ji6s-9zli9v-srreuk-3pvse1en6kx0-6se95l-6jsafd-a6sboi-j814xf-ijgwfc-qewgmm-vnafjgrn2fq0-jgir949hx4yo-i772hz-tn7ial.1556032736536-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>]
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <-hyv5g0n8ru25qelb0v-8u6jdi1vp2c7z1m3f5-uygwc1o5ji6s-9zli9v-srreuk-3pvse1en6kx0-6se95l-6jsafd-a6sboi-j814xf-ijgwfc-qewgmm-vnafjgrn2fq0-jgir949hx4yo-i772hz-tn7ial.1556032736536-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org> @ 2019-04-23 15:59 ` Grodzovsky, Andrey 2019-04-24 3:02 ` Zhou, David(ChunMing) 0 siblings, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 15:59 UTC (permalink / raw) To: Zhou, David(ChunMing), dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w Cc: Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 11699 bytes --] No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? -------- Original Message -------- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 25232 bytes --] [-- Attachment #2: Type: text/plain, Size: 153 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-23 15:59 ` [PATCH " Grodzovsky, Andrey @ 2019-04-24 3:02 ` Zhou, David(ChunMing) 2019-04-24 7:09 ` Christian König 0 siblings, 1 reply; 31+ messages in thread From: Zhou, David(ChunMing) @ 2019-04-24 3:02 UTC (permalink / raw) To: Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv, ckoenig.leichtzumerken Cc: Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 12762 bytes --] >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } HW fence are already forced completion, then we can just disable irq fence process and ignore hw fence signal when we are trying to do GPU reset, I think. Otherwise which will make the logic much more complex. If this situation happens because of long time execution, we can increase timeout of reset detection. -David From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, David(ChunMing) <David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; eric@anholt.net; etnaviv@lists.freedesktop.org; ckoenig.leichtzumerken@gmail.com Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Liu, Monk <Monk.Liu@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? -------- Original Message -------- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 29432 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-24 3:02 ` Zhou, David(ChunMing) @ 2019-04-24 7:09 ` Christian König [not found] ` <e20d013e-df21-1300-27d1-7f9b829cc067-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 31+ messages in thread From: Christian König @ 2019-04-24 7:09 UTC (permalink / raw) To: Zhou, David(ChunMing), Grodzovsky, Andrey, dri-devel, amd-gfx, eric, etnaviv Cc: Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 15590 bytes --] Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): > > >> - drm_sched_stop(&ring->sched, &job->base); > >> - > >> /* after all hw jobs are reset, hw fence is > meaningless, so force_completion */ > >> amdgpu_fence_driver_force_completion(ring); > >> } > > HW fence are already forced completion, then we can just disable irq > fence process and ignore hw fence signal when we are trying to do GPU > reset, I think. Otherwise which will make the logic much more complex. > > If this situation happens because of long time execution, we can > increase timeout of reset detection. > You are not thinking widely enough, forcing the hw fence to complete can trigger other to start other activity in the system. We first need to stop everything and make sure that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. > -David > > *From:*amd-gfx <amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of > *Grodzovsky, Andrey > *Sent:* Wednesday, April 24, 2019 12:00 AM > *To:* Zhou, David(ChunMing) <David1.Zhou@amd.com>; > dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; > eric@anholt.net; etnaviv@lists.freedesktop.org; > ckoenig.leichtzumerken@gmail.com > *Cc:* Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Liu, Monk > <Monk.Liu@amd.com> > *Subject:* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job > already signaled. > > No, i mean the actual HW fence which signals when the job finished > execution on the HW. > > Andrey > > On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: > > do you mean fence timer? why not stop it as well when stopping > sched for the reason of hw reset? > > -------- Original Message -------- > Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty > job already signaled. > From: "Grodzovsky, Andrey" > To: "Zhou, David(ChunMing)" > ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com > <mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> > CC: "Kazlauskas, Nicholas" ,"Liu, Monk" > > > On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > > +Monk. > > > > GPU reset is used widely in SRIOV, so need virtulizatino guy > take a look. > > > > But out of curious, why guilty job can signal more if the job is > already > > set to guilty? set it wrongly? > > > > > > -David > > > It's possible that the job does completes at a later time then it's > timeout handler started processing so in this patch we try to protect > against this by rechecking the HW fence after stopping all SW > schedulers. We do it BEFORE marking guilty on the job's > sched_entity so > at the point we check the guilty flag is not set yet. > > Andrey > > > > > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: > >> Also reject TDRs if another one already running. > >> > >> v2: > >> Stop all schedulers across device and entire XGMI hive before > >> force signaling HW fences. > >> Avoid passing job_signaled to helper fnctions to keep all the > decision > >> making about skipping HW reset in one place. > >> > >> v3: > >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to > be balanced > >> against it's decrement in drm_sched_stop in non HW reset case. > >> v4: rebase > >> v5: Revert v3 as we do it now in sceduler code. > >> > >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > <mailto:andrey.grodzovsky@amd.com> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 > +++++++++++++++++++---------- > >> 1 file changed, 95 insertions(+), 48 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> index a0e165c..85f8792 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> @@ -3334,8 +3334,6 @@ static int > amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > >> if (!ring || !ring->sched.thread) > >> continue; > >> > >> - drm_sched_stop(&ring->sched, &job->base); > >> - > >> /* after all hw jobs are reset, hw fence is > meaningless, so force_completion */ > >> amdgpu_fence_driver_force_completion(ring); > >> } > >> @@ -3343,6 +3341,7 @@ static int > amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, > >> if(job) > >> drm_sched_increase_karma(&job->base); > >> > >> + /* Don't suspend on bare metal if we are not going to HW > reset the ASIC */ > >> if (!amdgpu_sriov_vf(adev)) { > >> > >> if (!need_full_reset) > >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct > amdgpu_hive_info *hive, > >> return r; > >> } > >> > >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device > *adev) > >> +static bool amdgpu_device_lock_adev(struct amdgpu_device > *adev, bool trylock) > >> { > >> - int i; > >> - > >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > >> - struct amdgpu_ring *ring = adev->rings[i]; > >> - > >> - if (!ring || !ring->sched.thread) > >> - continue; > >> - > >> - if (!adev->asic_reset_res) > >> - drm_sched_resubmit_jobs(&ring->sched); > >> + if (trylock) { > >> + if (!mutex_trylock(&adev->lock_reset)) > >> + return false; > >> + } else > >> + mutex_lock(&adev->lock_reset); > >> > >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); > >> - } > >> - > >> - if (!amdgpu_device_has_dc_support(adev)) { > >> - drm_helper_resume_force_mode(adev->ddev); > >> - } > >> - > >> - adev->asic_reset_res = 0; > >> -} > >> - > >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) > >> -{ > >> - mutex_lock(&adev->lock_reset); > >> atomic_inc(&adev->gpu_reset_counter); > >> adev->in_gpu_reset = 1; > >> /* Block kfd: SRIOV would do it separately */ > >> if (!amdgpu_sriov_vf(adev)) > >> amdgpu_amdkfd_pre_reset(adev); > >> + > >> + return true; > >> } > >> > >> static void amdgpu_device_unlock_adev(struct amdgpu_device > *adev) > >> @@ -3538,40 +3521,42 @@ static void > amdgpu_device_unlock_adev(struct amdgpu_device *adev) > >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > >> struct amdgpu_job *job) > >> { > >> - int r; > >> + struct list_head device_list, *device_list_handle = NULL; > >> + bool need_full_reset, job_signaled; > >> struct amdgpu_hive_info *hive = NULL; > >> - bool need_full_reset = false; > >> struct amdgpu_device *tmp_adev = NULL; > >> - struct list_head device_list, *device_list_handle = NULL; > >> + int i, r = 0; > >> > >> + need_full_reset = job_signaled = false; > >> INIT_LIST_HEAD(&device_list); > >> > >> dev_info(adev->dev, "GPU reset begin!\n"); > >> > >> + hive = amdgpu_get_xgmi_hive(adev, false); > >> + > >> /* > >> - * In case of XGMI hive disallow concurrent resets to be > triggered > >> - * by different nodes. No point also since the one node > already executing > >> - * reset will also reset all the other nodes in the hive. > >> + * Here we trylock to avoid chain of resets executing from > >> + * either trigger by jobs on different adevs in XGMI hive > or jobs on > >> + * different schedulers for same device while this TO > handler is running. > >> + * We always reset all schedulers for device and all > devices for XGMI > >> + * hive so that should take care of them too. > >> */ > >> - hive = amdgpu_get_xgmi_hive(adev, 0); > >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && > >> - !mutex_trylock(&hive->reset_lock)) > >> + > >> + if (hive && !mutex_trylock(&hive->reset_lock)) { > >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: > %llx as another already in progress", > >> + job->base.id, hive->hive_id); > >> return 0; > >> + } > >> > >> /* Start with adev pre asic reset first for soft reset > check.*/ > >> - amdgpu_device_lock_adev(adev); > >> - r = amdgpu_device_pre_asic_reset(adev, > >> - job, > >> - &need_full_reset); > >> - if (r) { > >> - /*TODO Should we stop ?*/ > >> - DRM_ERROR("GPU pre asic reset failed with err, %d > for drm dev, %s ", > >> - r, adev->ddev->unique); > >> - adev->asic_reset_res = r; > >> + if (!amdgpu_device_lock_adev(adev, !hive)) { > >> + DRM_INFO("Bailing on TDR for s_job:%llx, as > another already in progress", > >> + job->base.id); > >> + return 0; > >> } > >> > >> /* Build list of devices to reset */ > >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > > 1) { > >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { > >> if (!hive) { > >> amdgpu_device_unlock_adev(adev); > >> return -ENODEV; > >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct > amdgpu_device *adev, > >> device_list_handle = &device_list; > >> } > >> > >> + /* block all schedulers and reset given job's ring */ > >> + list_for_each_entry(tmp_adev, device_list_handle, > gmc.xgmi.head) { > >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; > >> + > >> + if (!ring || !ring->sched.thread) > >> + continue; > >> + > >> + drm_sched_stop(&ring->sched, &job->base); > >> + } > >> + } > >> + > >> + > >> + /* > >> + * Must check guilty signal here since after this point > all old > >> + * HW fences are force signaled. > >> + * > >> + * job->base holds a reference to parent fence > >> + */ > >> + if (job && job->base.s_fence->parent && > >> + dma_fence_is_signaled(job->base.s_fence->parent)) > >> + job_signaled = true; > >> + > >> + if (!amdgpu_device_ip_need_full_reset(adev)) > >> + device_list_handle = &device_list; > >> + > >> + if (job_signaled) { > >> + dev_info(adev->dev, "Guilty job already signaled, > skipping HW reset"); > >> + goto skip_hw_reset; > >> + } > >> + > >> + > >> + /* Guilty job will be freed after this*/ > >> + r = amdgpu_device_pre_asic_reset(adev, > >> + job, > >> + &need_full_reset); > >> + if (r) { > >> + /*TODO Should we stop ?*/ > >> + DRM_ERROR("GPU pre asic reset failed with err, %d > for drm dev, %s ", > >> + r, adev->ddev->unique); > >> + adev->asic_reset_res = r; > >> + } > >> + > >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ > >> list_for_each_entry(tmp_adev, device_list_handle, > gmc.xgmi.head) { > >> > >> if (tmp_adev == adev) > >> continue; > >> > >> - amdgpu_device_lock_adev(tmp_adev); > >> + amdgpu_device_lock_adev(tmp_adev, false); > >> r = amdgpu_device_pre_asic_reset(tmp_adev, > >> NULL, > >> &need_full_reset); > >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct > amdgpu_device *adev, > >> goto retry; > >> } > >> > >> +skip_hw_reset: > >> + > >> /* Post ASIC reset for all devs .*/ > >> list_for_each_entry(tmp_adev, device_list_handle, > gmc.xgmi.head) { > >> - amdgpu_device_post_asic_reset(tmp_adev); > >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; > >> + > >> + if (!ring || !ring->sched.thread) > >> + continue; > >> + > >> + /* No point to resubmit jobs if we didn't > HW reset*/ > >> + if (!tmp_adev->asic_reset_res && > !job_signaled) > >> + drm_sched_resubmit_jobs(&ring->sched); > >> + > >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); > >> + } > >> + > >> + if (!amdgpu_device_has_dc_support(tmp_adev) && > !job_signaled) { > >> + drm_helper_resume_force_mode(tmp_adev->ddev); > >> + } > >> + > >> + tmp_adev->asic_reset_res = 0; > >> > >> if (r) { > >> /* bad news, how to tell it to userspace ? */ > >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct > amdgpu_device *adev, > >> amdgpu_device_unlock_adev(tmp_adev); > >> } > >> > >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) > >> + if (hive) > >> mutex_unlock(&hive->reset_lock); > >> > >> if (r) > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > [-- Attachment #1.2: Type: text/html, Size: 29928 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <e20d013e-df21-1300-27d1-7f9b829cc067-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. [not found] ` <e20d013e-df21-1300-27d1-7f9b829cc067-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-04-26 14:08 ` Grodzovsky, Andrey 2019-04-28 2:56 ` Zhou, David(ChunMing) 0 siblings, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-26 14:08 UTC (permalink / raw) To: Koenig, Christian, Zhou, David(ChunMing), dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, eric-WhKQ6XTQaPysTnJN9+BGXg, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Cc: Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 13542 bytes --] Ping (mostly David and Monk). Andrey On 4/24/19 3:09 AM, Christian König wrote: Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } HW fence are already forced completion, then we can just disable irq fence process and ignore hw fence signal when we are trying to do GPU reset, I think. Otherwise which will make the logic much more complex. If this situation happens because of long time execution, we can increase timeout of reset detection. You are not thinking widely enough, forcing the hw fence to complete can trigger other to start other activity in the system. We first need to stop everything and make sure that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. -David From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; eric@anholt.net<mailto:eric@anholt.net>; etnaviv@lists.freedesktop.org<mailto:etnaviv@lists.freedesktop.org>; ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com><mailto:Nicholas.Kazlauskas@amd.com>; Liu, Monk <Monk.Liu@amd.com><mailto:Monk.Liu@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? -------- Original Message -------- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 31312 bytes --] [-- Attachment #2: Type: text/plain, Size: 153 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-26 14:08 ` Grodzovsky, Andrey @ 2019-04-28 2:56 ` Zhou, David(ChunMing) 2019-04-29 14:14 ` Grodzovsky, Andrey 0 siblings, 1 reply; 31+ messages in thread From: Zhou, David(ChunMing) @ 2019-04-28 2:56 UTC (permalink / raw) To: Grodzovsky, Andrey, Koenig, Christian, dri-devel, amd-gfx, eric, etnaviv Cc: Deng, Emily, Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 14191 bytes --] Sorry, I only can put my Acked-by: Chunming Zhou <david1.zhou@amd.com> on patch#3. I cannot fully judge patch #4, #5, #6. -David From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Friday, April 26, 2019 10:09 PM To: Koenig, Christian <Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; eric@anholt.net; etnaviv@lists.freedesktop.org Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Liu, Monk <Monk.Liu@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. Ping (mostly David and Monk). Andrey On 4/24/19 3:09 AM, Christian König wrote: Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } HW fence are already forced completion, then we can just disable irq fence process and ignore hw fence signal when we are trying to do GPU reset, I think. Otherwise which will make the logic much more complex. If this situation happens because of long time execution, we can increase timeout of reset detection. You are not thinking widely enough, forcing the hw fence to complete can trigger other to start other activity in the system. We first need to stop everything and make sure that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. -David From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; eric@anholt.net<mailto:eric@anholt.net>; etnaviv@lists.freedesktop.org<mailto:etnaviv@lists.freedesktop.org>; ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com><mailto:Nicholas.Kazlauskas@amd.com>; Liu, Monk <Monk.Liu@amd.com><mailto:Monk.Liu@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? -------- Original Message -------- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 32332 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-28 2:56 ` Zhou, David(ChunMing) @ 2019-04-29 14:14 ` Grodzovsky, Andrey 2019-04-29 19:03 ` Christian König 0 siblings, 1 reply; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-29 14:14 UTC (permalink / raw) To: Zhou, David(ChunMing), Koenig, Christian, dri-devel, amd-gfx, eric, etnaviv Cc: Deng, Emily, Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 14727 bytes --] Thanks David, with that only patches 5 and 6 are left for the series to be reviewed. Christian, any more comments on those patches ? Andrey On 4/27/19 10:56 PM, Zhou, David(ChunMing) wrote: Sorry, I only can put my Acked-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> on patch#3. I cannot fully judge patch #4, #5, #6. -David From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Friday, April 26, 2019 10:09 PM To: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; eric@anholt.net<mailto:eric@anholt.net>; etnaviv@lists.freedesktop.org<mailto:etnaviv@lists.freedesktop.org> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com><mailto:Nicholas.Kazlauskas@amd.com>; Liu, Monk <Monk.Liu@amd.com><mailto:Monk.Liu@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. Ping (mostly David and Monk). Andrey On 4/24/19 3:09 AM, Christian König wrote: Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } HW fence are already forced completion, then we can just disable irq fence process and ignore hw fence signal when we are trying to do GPU reset, I think. Otherwise which will make the logic much more complex. If this situation happens because of long time execution, we can increase timeout of reset detection. You are not thinking widely enough, forcing the hw fence to complete can trigger other to start other activity in the system. We first need to stop everything and make sure that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. -David From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, David(ChunMing) <David1.Zhou@amd.com><mailto:David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; eric@anholt.net<mailto:eric@anholt.net>; etnaviv@lists.freedesktop.org<mailto:etnaviv@lists.freedesktop.org>; ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com><mailto:Nicholas.Kazlauskas@amd.com>; Liu, Monk <Monk.Liu@amd.com><mailto:Monk.Liu@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? -------- Original Message -------- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 +++++++++++++++++++---------- >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> return r; >> } >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) >> { >> - int i; >> - >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> - struct amdgpu_ring *ring = adev->rings[i]; >> - >> - if (!ring || !ring->sched.thread) >> - continue; >> - >> - if (!adev->asic_reset_res) >> - drm_sched_resubmit_jobs(&ring->sched); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> - } >> - >> - if (!amdgpu_device_has_dc_support(adev)) { >> - drm_helper_resume_force_mode(adev->ddev); >> - } >> - >> - adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> - mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + struct list_head device_list, *device_list_handle = NULL; >> + bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> - struct list_head device_list, *device_list_handle = NULL; >> + int i, r = 0; >> >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs in XGMI hive or jobs on >> + * different schedulers for same device while this TO handler is running. >> + * We always reset all schedulers for device and all devices for XGMI >> + * hive so that should take care of them too. >> */ >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> - !mutex_trylock(&hive->reset_lock)) >> + >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", >> + job->base.id, hive->hive_id); >> return 0; >> + } >> >> /* Start with adev pre asic reset first for soft reset check.*/ >> - amdgpu_device_lock_adev(adev); >> - r = amdgpu_device_pre_asic_reset(adev, >> - job, >> - &need_full_reset); >> - if (r) { >> - /*TODO Should we stop ?*/ >> - DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> - r, adev->ddev->unique); >> - adev->asic_reset_res = r; >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> + DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", >> + job->base.id); >> + return 0; >> } >> >> /* Build list of devices to reset */ >> - if (need_full_reset && adev->gmc.xgmi.num_physical_nodes > 1) { >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> if (!hive) { >> amdgpu_device_unlock_adev(adev); >> return -ENODEV; >> @@ -3588,13 +3573,56 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> device_list_handle = &device_list; >> } >> >> + /* block all schedulers and reset given job's ring */ >> + list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + drm_sched_stop(&ring->sched, &job->base); >> + } >> + } >> + >> + >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job && job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + job_signaled = true; >> + >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> + device_list_handle = &device_list; >> + >> + if (job_signaled) { >> + dev_info(adev->dev, "Guilty job already signaled, skipping HW reset"); >> + goto skip_hw_reset; >> + } >> + >> + >> + /* Guilty job will be freed after this*/ >> + r = amdgpu_device_pre_asic_reset(adev, >> + job, >> + &need_full_reset); >> + if (r) { >> + /*TODO Should we stop ?*/ >> + DRM_ERROR("GPU pre asic reset failed with err, %d for drm dev, %s ", >> + r, adev->ddev->unique); >> + adev->asic_reset_res = r; >> + } >> + >> retry: /* Rest of adevs pre asic reset from XGMI hive. */ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> >> if (tmp_adev == adev) >> continue; >> >> - amdgpu_device_lock_adev(tmp_adev); >> + amdgpu_device_lock_adev(tmp_adev, false); >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> NULL, >> &need_full_reset); >> @@ -3618,9 +3646,28 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> goto retry; >> } >> >> +skip_hw_reset: >> + >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> - amdgpu_device_post_asic_reset(tmp_adev); >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> + struct amdgpu_ring *ring = tmp_adev->rings[i]; >> + >> + if (!ring || !ring->sched.thread) >> + continue; >> + >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!tmp_adev->asic_reset_res && !job_signaled) >> + drm_sched_resubmit_jobs(&ring->sched); >> + >> + drm_sched_start(&ring->sched, !tmp_adev->asic_reset_res); >> + } >> + >> + if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> + } >> + >> + tmp_adev->asic_reset_res = 0; >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> @@ -3633,7 +3680,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> amdgpu_device_unlock_adev(tmp_adev); >> } >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> + if (hive) >> mutex_unlock(&hive->reset_lock); >> >> if (r) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 33684 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. 2019-04-29 14:14 ` Grodzovsky, Andrey @ 2019-04-29 19:03 ` Christian König 0 siblings, 0 replies; 31+ messages in thread From: Christian König @ 2019-04-29 19:03 UTC (permalink / raw) To: Grodzovsky, Andrey, Zhou, David(ChunMing), Koenig, Christian, dri-devel, amd-gfx, eric, etnaviv Cc: Deng, Emily, Kazlauskas, Nicholas, Liu, Monk [-- Attachment #1.1: Type: text/plain, Size: 20683 bytes --] I would clean them up further, but that's only moving code around so feel free to add my rb to those. Christian. Am 29.04.19 um 16:14 schrieb Grodzovsky, Andrey: > > Thanks David, with that only patches 5 and 6 are left for the series > to be reviewed. > > Christian, any more comments on those patches ? > > Andrey > > On 4/27/19 10:56 PM, Zhou, David(ChunMing) wrote: >> >> Sorry, I only can put my Acked-by: Chunming Zhou >> <david1.zhou@amd.com> on patch#3. >> >> I cannot fully judge patch #4, #5, #6. >> >> -David >> >> *From:*amd-gfx <amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of >> *Grodzovsky, Andrey >> *Sent:* Friday, April 26, 2019 10:09 PM >> *To:* Koenig, Christian <Christian.Koenig@amd.com>; Zhou, >> David(ChunMing) <David1.Zhou@amd.com>; >> dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; >> eric@anholt.net; etnaviv@lists.freedesktop.org >> *Cc:* Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Liu, Monk >> <Monk.Liu@amd.com> >> *Subject:* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty >> job already signaled. >> >> Ping (mostly David and Monk). >> >> Andrey >> >> On 4/24/19 3:09 AM, Christian König wrote: >> >> Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): >> >> >> - drm_sched_stop(&ring->sched, &job->base); >> >> - >> >> /* after all hw jobs are reset, hw fence is >> meaningless, so force_completion */ >> >> amdgpu_fence_driver_force_completion(ring); >> >> } >> >> HW fence are already forced completion, then we can just >> disable irq fence process and ignore hw fence signal when we >> are trying to do GPU reset, I think. Otherwise which will >> make the logic much more complex. >> >> If this situation happens because of long time execution, we >> can increase timeout of reset detection. >> >> >> You are not thinking widely enough, forcing the hw fence to >> complete can trigger other to start other activity in the system. >> >> We first need to stop everything and make sure that we don't do >> any processing any more and then start with our reset procedure >> including forcing all hw fences to complete. >> >> Christian. >> >> >> -David >> >> *From:*amd-gfx <amd-gfx-bounces@lists.freedesktop.org> >> <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of >> *Grodzovsky, Andrey >> *Sent:* Wednesday, April 24, 2019 12:00 AM >> *To:* Zhou, David(ChunMing) <David1.Zhou@amd.com> >> <mailto:David1.Zhou@amd.com>; dri-devel@lists.freedesktop.org >> <mailto:dri-devel@lists.freedesktop.org>; >> amd-gfx@lists.freedesktop.org >> <mailto:amd-gfx@lists.freedesktop.org>; eric@anholt.net >> <mailto:eric@anholt.net>; etnaviv@lists.freedesktop.org >> <mailto:etnaviv@lists.freedesktop.org>; >> ckoenig.leichtzumerken@gmail.com >> <mailto:ckoenig.leichtzumerken@gmail.com> >> *Cc:* Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com> >> <mailto:Nicholas.Kazlauskas@amd.com>; Liu, Monk >> <Monk.Liu@amd.com> <mailto:Monk.Liu@amd.com> >> *Subject:* Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if >> guilty job already signaled. >> >> No, i mean the actual HW fence which signals when the job >> finished execution on the HW. >> >> Andrey >> >> On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: >> >> do you mean fence timer? why not stop it as well when >> stopping sched for the reason of hw reset? >> >> -------- Original Message -------- >> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if >> guilty job already signaled. >> From: "Grodzovsky, Andrey" >> To: "Zhou, David(ChunMing)" >> ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com >> <mailto:dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com> >> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" >> >> >> On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: >> > +Monk. >> > >> > GPU reset is used widely in SRIOV, so need >> virtulizatino guy take a look. >> > >> > But out of curious, why guilty job can signal more if >> the job is already >> > set to guilty? set it wrongly? >> > >> > >> > -David >> >> >> It's possible that the job does completes at a later time >> then it's >> timeout handler started processing so in this patch we >> try to protect >> against this by rechecking the HW fence after stopping >> all SW >> schedulers. We do it BEFORE marking guilty on the job's >> sched_entity so >> at the point we check the guilty flag is not set yet. >> >> Andrey >> >> >> > >> > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> >> Also reject TDRs if another one already running. >> >> >> >> v2: >> >> Stop all schedulers across device and entire XGMI hive >> before >> >> force signaling HW fences. >> >> Avoid passing job_signaled to helper fnctions to keep >> all the decision >> >> making about skipping HW reset in one place. >> >> >> >> v3: >> >> Fix SW sched. hang after non HW reset. >> sched.hw_rq_count has to be balanced >> >> against it's decrement in drm_sched_stop in non HW >> reset case. >> >> v4: rebase >> >> v5: Revert v3 as we do it now in sceduler code. >> >> >> >> Signed-off-by: Andrey Grodzovsky >> <andrey.grodzovsky@amd.com> >> <mailto:andrey.grodzovsky@amd.com> >> >> --- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 >> +++++++++++++++++++---------- >> >> 1 file changed, 95 insertions(+), 48 deletions(-) >> >> >> >> diff --git >> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> >> index a0e165c..85f8792 100644 >> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> >> @@ -3334,8 +3334,6 @@ static int >> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> >> if (!ring || !ring->sched.thread) >> >> continue; >> >> >> >> - drm_sched_stop(&ring->sched, &job->base); >> >> - >> >> /* after all hw jobs are reset, hw fence >> is meaningless, so force_completion */ >> >> amdgpu_fence_driver_force_completion(ring); >> >> } >> >> @@ -3343,6 +3341,7 @@ static int >> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> >> if(job) >> >> drm_sched_increase_karma(&job->base); >> >> >> >> + /* Don't suspend on bare metal if we are not >> going to HW reset the ASIC */ >> >> if (!amdgpu_sriov_vf(adev)) { >> >> >> >> if (!need_full_reset) >> >> @@ -3480,37 +3479,21 @@ static int >> amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, >> >> return r; >> >> } >> >> >> >> -static void amdgpu_device_post_asic_reset(struct >> amdgpu_device *adev) >> >> +static bool amdgpu_device_lock_adev(struct >> amdgpu_device *adev, bool trylock) >> >> { >> >> - int i; >> >> - >> >> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> >> - struct amdgpu_ring *ring = adev->rings[i]; >> >> - >> >> - if (!ring || !ring->sched.thread) >> >> - continue; >> >> - >> >> - if (!adev->asic_reset_res) >> >> - drm_sched_resubmit_jobs(&ring->sched); >> >> + if (trylock) { >> >> + if (!mutex_trylock(&adev->lock_reset)) >> >> + return false; >> >> + } else >> >> + mutex_lock(&adev->lock_reset); >> >> >> >> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >> >> - } >> >> - >> >> - if (!amdgpu_device_has_dc_support(adev)) { >> >> - drm_helper_resume_force_mode(adev->ddev); >> >> - } >> >> - >> >> - adev->asic_reset_res = 0; >> >> -} >> >> - >> >> -static void amdgpu_device_lock_adev(struct >> amdgpu_device *adev) >> >> -{ >> >> - mutex_lock(&adev->lock_reset); >> >> atomic_inc(&adev->gpu_reset_counter); >> >> adev->in_gpu_reset = 1; >> >> /* Block kfd: SRIOV would do it separately */ >> >> if (!amdgpu_sriov_vf(adev)) >> >> amdgpu_amdkfd_pre_reset(adev); >> >> + >> >> + return true; >> >> } >> >> >> >> static void amdgpu_device_unlock_adev(struct >> amdgpu_device *adev) >> >> @@ -3538,40 +3521,42 @@ static void >> amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> >> int amdgpu_device_gpu_recover(struct amdgpu_device >> *adev, >> >> struct amdgpu_job *job) >> >> { >> >> - int r; >> >> + struct list_head device_list, *device_list_handle >> = NULL; >> >> + bool need_full_reset, job_signaled; >> >> struct amdgpu_hive_info *hive = NULL; >> >> - bool need_full_reset = false; >> >> struct amdgpu_device *tmp_adev = NULL; >> >> - struct list_head device_list, *device_list_handle >> = NULL; >> >> + int i, r = 0; >> >> >> >> + need_full_reset = job_signaled = false; >> >> INIT_LIST_HEAD(&device_list); >> >> >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> >> >> + hive = amdgpu_get_xgmi_hive(adev, false); >> >> + >> >> /* >> >> - * In case of XGMI hive disallow concurrent >> resets to be triggered >> >> - * by different nodes. No point also since the >> one node already executing >> >> - * reset will also reset all the other nodes in >> the hive. >> >> + * Here we trylock to avoid chain of resets >> executing from >> >> + * either trigger by jobs on different adevs in >> XGMI hive or jobs on >> >> + * different schedulers for same device while >> this TO handler is running. >> >> + * We always reset all schedulers for device and >> all devices for XGMI >> >> + * hive so that should take care of them too. >> >> */ >> >> - hive = amdgpu_get_xgmi_hive(adev, 0); >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && >> >> - !mutex_trylock(&hive->reset_lock)) >> >> + >> >> + if (hive && !mutex_trylock(&hive->reset_lock)) { >> >> + DRM_INFO("Bailing on TDR for s_job:%llx, >> hive: %llx as another already in progress", >> >> + job->base.id, hive->hive_id); >> >> return 0; >> >> + } >> >> >> >> /* Start with adev pre asic reset first for soft >> reset check.*/ >> >> - amdgpu_device_lock_adev(adev); >> >> - r = amdgpu_device_pre_asic_reset(adev, >> >> - job, >> >> - &need_full_reset); >> >> - if (r) { >> >> - /*TODO Should we stop ?*/ >> >> - DRM_ERROR("GPU pre asic reset failed with >> err, %d for drm dev, %s ", >> >> - r, adev->ddev->unique); >> >> - adev->asic_reset_res = r; >> >> + if (!amdgpu_device_lock_adev(adev, !hive)) { >> >> + DRM_INFO("Bailing on TDR for s_job:%llx, >> as another already in progress", >> >> + job->base.id); >> >> + return 0; >> >> } >> >> >> >> /* Build list of devices to reset */ >> >> - if (need_full_reset && >> adev->gmc.xgmi.num_physical_nodes > 1) { >> >> + if (adev->gmc.xgmi.num_physical_nodes > 1) { >> >> if (!hive) { >> >> amdgpu_device_unlock_adev(adev); >> >> return -ENODEV; >> >> @@ -3588,13 +3573,56 @@ int >> amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> >> device_list_handle = &device_list; >> >> } >> >> >> >> + /* block all schedulers and reset given job's ring */ >> >> + list_for_each_entry(tmp_adev, device_list_handle, >> gmc.xgmi.head) { >> >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> >> + struct amdgpu_ring *ring = >> tmp_adev->rings[i]; >> >> + >> >> + if (!ring || !ring->sched.thread) >> >> + continue; >> >> + >> >> + drm_sched_stop(&ring->sched, &job->base); >> >> + } >> >> + } >> >> + >> >> + >> >> + /* >> >> + * Must check guilty signal here since after this >> point all old >> >> + * HW fences are force signaled. >> >> + * >> >> + * job->base holds a reference to parent fence >> >> + */ >> >> + if (job && job->base.s_fence->parent && >> >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> >> + job_signaled = true; >> >> + >> >> + if (!amdgpu_device_ip_need_full_reset(adev)) >> >> + device_list_handle = &device_list; >> >> + >> >> + if (job_signaled) { >> >> + dev_info(adev->dev, "Guilty job already >> signaled, skipping HW reset"); >> >> + goto skip_hw_reset; >> >> + } >> >> + >> >> + >> >> + /* Guilty job will be freed after this*/ >> >> + r = amdgpu_device_pre_asic_reset(adev, >> >> + job, >> >> + &need_full_reset); >> >> + if (r) { >> >> + /*TODO Should we stop ?*/ >> >> + DRM_ERROR("GPU pre asic reset failed with >> err, %d for drm dev, %s ", >> >> + r, adev->ddev->unique); >> >> + adev->asic_reset_res = r; >> >> + } >> >> + >> >> retry: /* Rest of adevs pre asic reset from XGMI >> hive. */ >> >> list_for_each_entry(tmp_adev, >> device_list_handle, gmc.xgmi.head) { >> >> >> >> if (tmp_adev == adev) >> >> continue; >> >> >> >> - amdgpu_device_lock_adev(tmp_adev); >> >> + amdgpu_device_lock_adev(tmp_adev, false); >> >> r = amdgpu_device_pre_asic_reset(tmp_adev, >> >> NULL, >> >> &need_full_reset); >> >> @@ -3618,9 +3646,28 @@ int >> amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> >> goto retry; >> >> } >> >> >> >> +skip_hw_reset: >> >> + >> >> /* Post ASIC reset for all devs .*/ >> >> list_for_each_entry(tmp_adev, >> device_list_handle, gmc.xgmi.head) { >> >> - amdgpu_device_post_asic_reset(tmp_adev); >> >> + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> >> + struct amdgpu_ring *ring = >> tmp_adev->rings[i]; >> >> + >> >> + if (!ring || !ring->sched.thread) >> >> + continue; >> >> + >> >> + /* No point to resubmit jobs if >> we didn't HW reset*/ >> >> + if (!tmp_adev->asic_reset_res && >> !job_signaled) >> >> + drm_sched_resubmit_jobs(&ring->sched); >> >> + >> >> + drm_sched_start(&ring->sched, >> !tmp_adev->asic_reset_res); >> >> + } >> >> + >> >> + if >> (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) { >> >> + drm_helper_resume_force_mode(tmp_adev->ddev); >> >> + } >> >> + >> >> + tmp_adev->asic_reset_res = 0; >> >> >> >> if (r) { >> >> /* bad news, how to tell it to >> userspace ? */ >> >> @@ -3633,7 +3680,7 @@ int >> amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> >> amdgpu_device_unlock_adev(tmp_adev); >> >> } >> >> >> >> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) >> >> + if (hive) >> >> mutex_unlock(&hive->reset_lock); >> >> >> >> if (r) >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> <mailto:amd-gfx@lists.freedesktop.org> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel [-- Attachment #1.2: Type: text/html, Size: 39232 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> ` (4 preceding siblings ...) 2019-04-18 15:00 ` [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky @ 2019-04-23 2:35 ` Dieter Nützel [not found] ` <2ddcff29bfaab2408b6e2cbc416322cd-0hun7QTegEsDD4udEopG9Q@public.gmane.org> 5 siblings, 1 reply; 31+ messages in thread From: Dieter Nützel @ 2019-04-23 2:35 UTC (permalink / raw) To: Andrey Grodzovsky Cc: ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Christian König, eric-WhKQ6XTQaPysTnJN9+BGXg, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Nicholas.Kazlauskas-5C7GfCeVMHo Hello Andrey, this series can't apply (brake on #3) on top of amd-staging-drm-next. v2 works (Thu, 11 Apr 2019). Dieter Am 18.04.2019 17:00, schrieb Andrey Grodzovsky: > From: Christian König <ckoenig.leichtzumerken@gmail.com> > > Don't block others while waiting for the fences to finish, concurrent > submission is perfectly valid in this case and holding the lock can > prevent killed applications from terminating. > > Signed-off-by: Christian König <christian.koenig@amd.com> > Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 380a7f9..ad4f0e5 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -4814,23 +4814,26 @@ static void amdgpu_dm_commit_planes(struct > drm_atomic_state *state, > continue; > } > > + abo = gem_to_amdgpu_bo(fb->obj[0]); > + > + /* Wait for all fences on this FB */ > + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, > + false, > + MAX_SCHEDULE_TIMEOUT); > + WARN_ON(r < 0); > + > /* > * TODO This might fail and hence better not used, wait > * explicitly on fences instead > * and in general should be called for > * blocking commit to as per framework helpers > */ > - abo = gem_to_amdgpu_bo(fb->obj[0]); > r = amdgpu_bo_reserve(abo, true); > if (unlikely(r != 0)) { > DRM_ERROR("failed to reserve buffer before flip\n"); > WARN_ON(1); > } > > - /* Wait for all fences on this FB */ > - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, > false, > - MAX_SCHEDULE_TIMEOUT) < 0); > - > amdgpu_bo_get_tiling_flags(abo, &tiling_flags); > > amdgpu_bo_unreserve(abo); _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <2ddcff29bfaab2408b6e2cbc416322cd-0hun7QTegEsDD4udEopG9Q@public.gmane.org>]
* Re: [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock [not found] ` <2ddcff29bfaab2408b6e2cbc416322cd-0hun7QTegEsDD4udEopG9Q@public.gmane.org> @ 2019-04-23 14:02 ` Grodzovsky, Andrey 0 siblings, 0 replies; 31+ messages in thread From: Grodzovsky, Andrey @ 2019-04-23 14:02 UTC (permalink / raw) To: Dieter Nützel Cc: ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Koenig, Christian, eric-WhKQ6XTQaPysTnJN9+BGXg, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Kazlauskas, Nicholas This series is on top of drm-misc because of panfrost and lima drovers which are missing form amd-staging-drm-next. Once i land it in drm-misc I will merge and p[ush it into drm-next. Andrey On 4/22/19 10:35 PM, Dieter Nützel wrote: > Hello Andrey, > > this series can't apply (brake on #3) on top of amd-staging-drm-next. > v2 works (Thu, 11 Apr 2019). > > Dieter > > Am 18.04.2019 17:00, schrieb Andrey Grodzovsky: >> From: Christian König <ckoenig.leichtzumerken@gmail.com> >> >> Don't block others while waiting for the fences to finish, concurrent >> submission is perfectly valid in this case and holding the lock can >> prevent killed applications from terminating. >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> >> --- >> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 ++++++++----- >> 1 file changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> index 380a7f9..ad4f0e5 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> @@ -4814,23 +4814,26 @@ static void amdgpu_dm_commit_planes(struct >> drm_atomic_state *state, >> continue; >> } >> >> + abo = gem_to_amdgpu_bo(fb->obj[0]); >> + >> + /* Wait for all fences on this FB */ >> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >> + false, >> + MAX_SCHEDULE_TIMEOUT); >> + WARN_ON(r < 0); >> + >> /* >> * TODO This might fail and hence better not used, wait >> * explicitly on fences instead >> * and in general should be called for >> * blocking commit to as per framework helpers >> */ >> - abo = gem_to_amdgpu_bo(fb->obj[0]); >> r = amdgpu_bo_reserve(abo, true); >> if (unlikely(r != 0)) { >> DRM_ERROR("failed to reserve buffer before flip\n"); >> WARN_ON(1); >> } >> >> - /* Wait for all fences on this FB */ >> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >> false, >> - MAX_SCHEDULE_TIMEOUT) < 0); >> - >> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); >> >> amdgpu_bo_unreserve(abo); _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2019-05-29 10:03 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-18 15:00 [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock Andrey Grodzovsky [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-18 15:00 ` [PATCH v5 2/6] drm/amd/display: Use a reasonable timeout for framebuffer fence waits Andrey Grodzovsky 2019-04-18 15:00 ` [PATCH v5 3/6] drm/scheduler: rework job destruction Andrey Grodzovsky 2019-04-22 12:48 ` Chunming Zhou [not found] ` <9f7112b1-0348-b4f6-374d-e44c0d448112-5C7GfCeVMHo@public.gmane.org> 2019-04-23 14:26 ` Grodzovsky, Andrey 2019-04-23 14:44 ` Zhou, David(ChunMing) 2019-04-23 15:01 ` [PATCH " Grodzovsky, Andrey 2019-05-29 10:02 ` Daniel Vetter 2019-04-18 15:00 ` [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky [not found] ` <1555599624-12285-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-22 12:59 ` Chunming Zhou 2019-04-23 15:14 ` Grodzovsky, Andrey 2019-04-18 15:00 ` [PATCH v5 5/6] drm/scheduler: Add flag to hint the release of guilty job Andrey Grodzovsky 2019-04-18 15:00 ` [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky [not found] ` <1555599624-12285-6-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org> 2019-04-22 11:54 ` Grodzovsky, Andrey 2019-04-23 12:32 ` Koenig, Christian [not found] ` <9774408b-cc4c-90dd-cbc7-6ef5c6fd8c46-5C7GfCeVMHo@public.gmane.org> 2019-04-23 13:14 ` Kazlauskas, Nicholas 2019-04-23 14:03 ` Grodzovsky, Andrey 2019-04-23 14:12 ` Grodzovsky, Andrey [not found] ` <a5c97356-66d8-b79e-32ab-a03e4c4d3e39-5C7GfCeVMHo@public.gmane.org> 2019-04-23 14:49 ` Christian König 2019-04-22 13:09 ` Chunming Zhou 2019-04-23 14:51 ` Grodzovsky, Andrey [not found] ` <1b41c4f1-b406-8710-2a7a-e5c54a116fe9-5C7GfCeVMHo@public.gmane.org> 2019-04-23 15:19 ` Zhou, David(ChunMing) [not found] ` <-hyv5g0n8ru25qelb0v-8u6jdi1vp2c7z1m3f5-uygwc1o5ji6s-9zli9v-srreuk-3pvse1en6kx0-6se95l-6jsafd-a6sboi-j814xf-ijgwfc-qewgmm-vnafjgrn2fq0-jgir949hx4yo-i772hz-tn7ial.1556032736536-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org> 2019-04-23 15:59 ` [PATCH " Grodzovsky, Andrey 2019-04-24 3:02 ` Zhou, David(ChunMing) 2019-04-24 7:09 ` Christian König [not found] ` <e20d013e-df21-1300-27d1-7f9b829cc067-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-26 14:08 ` Grodzovsky, Andrey 2019-04-28 2:56 ` Zhou, David(ChunMing) 2019-04-29 14:14 ` Grodzovsky, Andrey 2019-04-29 19:03 ` Christian König 2019-04-23 2:35 ` [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock Dieter Nützel [not found] ` <2ddcff29bfaab2408b6e2cbc416322cd-0hun7QTegEsDD4udEopG9Q@public.gmane.org> 2019-04-23 14:02 ` Grodzovsky, Andrey
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.