All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhou, David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
To: "Grodzovsky,
	Andrey" <Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org>,
	"Zhou,
	David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>,
	"dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org"
	<eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>,
	"etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "Kazlauskas,
	Nicholas" <Nicholas.Kazlauskas-5C7GfCeVMHo@public.gmane.org>,
	"Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
Subject: Re:[PATCH v5 3/6] drm/scheduler: rework job destruction
Date: Tue, 23 Apr 2019 14:44:08 +0000	[thread overview]
Message-ID: <eqyp0q-xw3o4t-5si8qg-3jxnvt7nb0s5trrutj-uelqagfwr69wprwk7ty7v2u3s1j0tabcslq-uksg9o3oidy22540hd-kfbkbo-hbl55n-zeoeno-ygqeaw5tea3s-oa7egj-jmt1w8-n5xit0m1pyob.1556030645243@email.android.com> (raw)
In-Reply-To: 2fa87715-ed8e-1a57-db41-3f7ef928165f@amd.com


[-- Attachment #1.1: Type: text/plain, Size: 20067 bytes --]

This patch is to fix deadlock between fence->lock and sched->job_list_lock, right?
So I suggest to just move list_del_init(&s_job->node) from drm_sched_process_job to work thread. That will avoid deadlock described in the link.


-------- Original Message --------
Subject: Re: [PATCH v5 3/6] drm/scheduler: rework job destruction
From: "Grodzovsky, Andrey"
To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,eric@anholt.net,etnaviv@lists.freedesktop.org,ckoenig.leichtzumerken@gmail.com
CC: "Kazlauskas, Nicholas" ,"Koenig, Christian"


On 4/22/19 8:48 AM, Chunming Zhou wrote:
> Hi Andrey,
>
> static void drm_sched_process_job(struct dma_fence *f, struct
> dma_fence_cb *cb)
> {
> ...
>           spin_lock_irqsave(&sched->job_list_lock, flags);
>           /* remove job from ring_mirror_list */
>           list_del_init(&s_job->node);
>           spin_unlock_irqrestore(&sched->job_list_lock, flags);
> [David] How about just remove above to worker from irq process? Any
> problem? Maybe I missed previous your discussion, but I think removing
> lock for list is a risk for future maintenance although you make sure
> thread safe currently.
>
> -David


We remove the lock exactly because of the fact that insertion and
removal to/from the list will be done form exactly one thread at ant
time now. So I am not sure I understand what you mean.

Andrey


>
> ...
>
>           schedule_work(&s_job->finish_work);
> }
>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> From: Christian König <christian.koenig@amd.com>
>>
>> We now destroy finished jobs from the worker thread to make sure that
>> we never destroy a job currently in timeout processing.
>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>> which should solve a deadlock reported by a user.
>>
>> v2: Remove unused variable.
>> v4: Move guilty job free into sched code.
>> v5:
>> Move sched->hw_rq_count to drm_sched_start to account for counter
>> decrement in drm_sched_stop even when we don't call resubmit jobs
>> if guily job did signal.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>    drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>    drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>    drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>    drivers/gpu/drm/scheduler/sched_main.c     | 159 +++++++++++++++++------------
>>    drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>    include/drm/gpu_scheduler.h                |   6 +-
>>    8 files changed, 102 insertions(+), 84 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 7cee269..a0e165c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>    if (!ring || !ring->sched.thread)
>>    continue;
>>
>> - drm_sched_stop(&ring->sched);
>> + drm_sched_stop(&ring->sched, &job->base);
>>
>>    /* after all hw jobs are reset, hw fence is meaningless, so force_completion */
>>    amdgpu_fence_driver_force_completion(ring);
>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>    if(job)
>>    drm_sched_increase_karma(&job->base);
>>
>> -
>> -
>>    if (!amdgpu_sriov_vf(adev)) {
>>
>>    if (!need_full_reset)
>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>>    return r;
>>    }
>>
>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
>> -   struct amdgpu_job *job)
>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>    {
>>    int i;
>>
>> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>
>>    /* Post ASIC reset for all devs .*/
>>    list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
>> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
>> + amdgpu_device_post_asic_reset(tmp_adev);
>>
>>    if (r) {
>>    /* bad news, how to tell it to userspace ? */
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> index 33854c9..5778d9c 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>        mmu_size + gpu->buffer.size;
>>
>>    /* Add in the active command buffers */
>> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>    list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>    submit = to_etnaviv_submit(s_job);
>>    file_size += submit->cmdbuf.size;
>>    n_obj++;
>>    }
>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>
>>    /* Add in the active buffer objects */
>>    list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>          gpu->buffer.size,
>>          etnaviv_cmdbuf_get_va(&gpu->buffer));
>>
>> - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>    list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>    submit = to_etnaviv_submit(s_job);
>>    etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>          submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>          etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>    }
>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>
>>    /* Reserve space for the bomap */
>>    if (n_bomap_pages) {
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index 6d24fea..a813c82 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>>    }
>>
>>    /* block scheduler */
>> - drm_sched_stop(&gpu->sched);
>> + drm_sched_stop(&gpu->sched, sched_job);
>>
>>    if(sched_job)
>>    drm_sched_increase_karma(sched_job);
>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>> index 97bd9c1..df98931 100644
>> --- a/drivers/gpu/drm/lima/lima_sched.c
>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job)
>>    static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe,
>>    struct lima_sched_task *task)
>>    {
>> - drm_sched_stop(&pipe->base);
>> + drm_sched_stop(&pipe->base, &task->base);
>>
>>    if (task)
>>    drm_sched_increase_karma(&task->base);
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 0a7ed04..c6336b7 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
>>    sched_job);
>>
>>    for (i = 0; i < NUM_JOB_SLOTS; i++)
>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>
>>    if (sched_job)
>>    drm_sched_increase_karma(sched_job);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 19fc601..7816de7 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>>    }
>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>
>> -/* job_finish is called after hw fence signaled
>> - */
>> -static void drm_sched_job_finish(struct work_struct *work)
>> -{
>> - struct drm_sched_job *s_job = container_of(work, struct drm_sched_job,
>> -    finish_work);
>> - struct drm_gpu_scheduler *sched = s_job->sched;
>> - unsigned long flags;
>> -
>> - /*
>> - * Canceling the timeout without removing our job from the ring mirror
>> - * list is safe, as we will only end up in this worker if our jobs
>> - * finished fence has been signaled. So even if some another worker
>> - * manages to find this job as the next job in the list, the fence
>> - * signaled check below will prevent the timeout to be restarted.
>> - */
>> - cancel_delayed_work_sync(&sched->work_tdr);
>> -
>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>> - /* queue TDR for next job */
>> - drm_sched_start_timeout(sched);
>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> - sched->ops->free_job(s_job);
>> -}
>> -
>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>    {
>>    struct drm_gpu_scheduler *sched = s_job->sched;
>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>    if (job)
>>    job->sched->ops->timedout_job(job);
>>
>> + /*
>> + * Guilty job did complete and hence needs to be manually removed
>> + * See drm_sched_stop doc.
>> + */
>> + if (list_empty(&job->node))
>> + job->sched->ops->free_job(job);
>> +
>>    spin_lock_irqsave(&sched->job_list_lock, flags);
>>    drm_sched_start_timeout(sched);
>>    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>     * @sched: scheduler instance
>>     * @bad: bad scheduler job
>>     *
>> + * Stop the scheduler and also removes and frees all completed jobs.
>> + * Note: bad job will not be freed as it might be used later and so it's
>> + * callers responsibility to release it manually if it's not part of the
>> + * mirror list any more.
>> + *
>>     */
>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    {
>> - struct drm_sched_job *s_job;
>> + struct drm_sched_job *s_job, *tmp;
>>    unsigned long flags;
>> - struct dma_fence *last_fence =  NULL;
>>
>>    kthread_park(sched->thread);
>>
>>    /*
>> - * Verify all the signaled jobs in mirror list are removed from the ring
>> - * by waiting for the latest job to enter the list. This should insure that
>> - * also all the previous jobs that were in flight also already singaled
>> - * and removed from the list.
>> + * Iterate the job list from later to  earlier one and either deactive
>> + * their HW callbacks or remove them from mirror list if they already
>> + * signaled.
>> + * This iteration is thread safe as sched thread is stopped.
>>    */
>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>> - list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
>> + list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) {
>>    if (s_job->s_fence->parent &&
>>        dma_fence_remove_callback(s_job->s_fence->parent,
>>          &s_job->cb)) {
>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>    s_job->s_fence->parent = NULL;
>>    atomic_dec(&sched->hw_rq_count);
>>    } else {
>> - last_fence = dma_fence_get(&s_job->s_fence->finished);
>> - break;
>> + /*
>> + * remove job from ring_mirror_list.
>> + * Locking here is for concurrent resume timeout
>> + */
>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>> + list_del_init(&s_job->node);
>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>> + /*
>> + * Wait for job's HW fence callback to finish using s_job
>> + * before releasing it.
>> + *
>> + * Job is still alive so fence refcount at least 1
>> + */
>> + dma_fence_wait(&s_job->s_fence->finished, false);
>> +
>> + /*
>> + * We must keep bad job alive for later use during
>> + * recovery by some of the drivers
>> + */
>> + if (bad != s_job)
>> + sched->ops->free_job(s_job);
>>    }
>>    }
>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> - if (last_fence) {
>> - dma_fence_wait(last_fence, false);
>> - dma_fence_put(last_fence);
>> - }
>>    }
>>
>>    EXPORT_SYMBOL(drm_sched_stop);
>> @@ -418,21 +416,22 @@ EXPORT_SYMBOL(drm_sched_stop);
>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    {
>>    struct drm_sched_job *s_job, *tmp;
>> + unsigned long flags;
>>    int r;
>>
>> - if (!full_recovery)
>> - goto unpark;
>> -
>>    /*
>>    * Locking the list is not required here as the sched thread is parked
>> - * so no new jobs are being pushed in to HW and in drm_sched_stop we
>> - * flushed all the jobs who were still in mirror list but who already
>> - * signaled and removed them self from the list. Also concurrent
>> + * so no new jobs are being inserted or removed. Also concurrent
>>    * GPU recovers can't run in parallel.
>>    */
>>    list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>>    struct dma_fence *fence = s_job->s_fence->parent;
>>
>> + atomic_inc(&sched->hw_rq_count);
>> +
>> + if (!full_recovery)
>> + continue;
>> +
>>    if (fence) {
>>    r = dma_fence_add_callback(fence, &s_job->cb,
>>       drm_sched_process_job);
>> @@ -445,9 +444,12 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    drm_sched_process_job(NULL, &s_job->cb);
>>    }
>>
>> - drm_sched_start_timeout(sched);
>> + if (full_recovery) {
>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>> + drm_sched_start_timeout(sched);
>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> + }
>>
>> -unpark:
>>    kthread_unpark(sched->thread);
>>    }
>>    EXPORT_SYMBOL(drm_sched_start);
>> @@ -464,7 +466,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>>    uint64_t guilty_context;
>>    bool found_guilty = false;
>>
>> - /*TODO DO we need spinlock here ? */
>>    list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>>    struct drm_sched_fence *s_fence = s_job->s_fence;
>>
>> @@ -477,7 +478,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>>    dma_fence_set_error(&s_fence->finished, -ECANCELED);
>>
>>    s_job->s_fence->parent = sched->ops->run_job(s_job);
>> - atomic_inc(&sched->hw_rq_count);
>>    }
>>    }
>>    EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>> @@ -514,7 +514,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>    return -ENOMEM;
>>    job->id = atomic64_inc_return(&sched->job_id_count);
>>
>> - INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>    INIT_LIST_HEAD(&job->node);
>>
>>    return 0;
>> @@ -597,24 +596,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>    struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb);
>>    struct drm_sched_fence *s_fence = s_job->s_fence;
>>    struct drm_gpu_scheduler *sched = s_fence->sched;
>> - unsigned long flags;
>> -
>> - cancel_delayed_work(&sched->work_tdr);
>>
>>    atomic_dec(&sched->hw_rq_count);
>>    atomic_dec(&sched->num_jobs);
>>
>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>> - /* remove job from ring_mirror_list */
>> - list_del_init(&s_job->node);
>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> + trace_drm_sched_process_job(s_fence);
>>
>>    drm_sched_fence_finished(s_fence);
>> -
>> - trace_drm_sched_process_job(s_fence);
>>    wake_up_interruptible(&sched->wake_up_worker);
>> +}
>> +
>> +/**
>> + * drm_sched_cleanup_jobs - destroy finished jobs
>> + *
>> + * @sched: scheduler instance
>> + *
>> + * Remove all finished jobs from the mirror list and destroy them.
>> + */
>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>> +{
>> + unsigned long flags;
>> +
>> + /* Don't destroy jobs while the timeout worker is running */
>> + if (!cancel_delayed_work(&sched->work_tdr))
>> + return;
>> +
>> +
>> + while (!list_empty(&sched->ring_mirror_list)) {
>> + struct drm_sched_job *job;
>> +
>> + job = list_first_entry(&sched->ring_mirror_list,
>> +        struct drm_sched_job, node);
>> + if (!dma_fence_is_signaled(&job->s_fence->finished))
>> + break;
>> +
>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>> + /* remove job from ring_mirror_list */
>> + list_del_init(&job->node);
>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>> + sched->ops->free_job(job);
>> + }
>> +
>> + /* queue timeout for next job */
>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>> + drm_sched_start_timeout(sched);
>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>
>> - schedule_work(&s_job->finish_work);
>>    }
>>
>>    /**
>> @@ -656,9 +684,10 @@ static int drm_sched_main(void *param)
>>    struct dma_fence *fence;
>>
>>    wait_event_interruptible(sched->wake_up_worker,
>> + (drm_sched_cleanup_jobs(sched),
>>    (!drm_sched_blocked(sched) &&
>>      (entity = drm_sched_select_entity(sched))) ||
>> - kthread_should_stop());
>> + kthread_should_stop()));
>>
>>    if (!entity)
>>    continue;
>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
>> index e740f3b..1a4abe7 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
>>
>>    /* block scheduler */
>>    for (q = 0; q < V3D_MAX_QUEUES; q++)
>> - drm_sched_stop(&v3d->queue[q].sched);
>> + drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>
>>    if (sched_job)
>>    drm_sched_increase_karma(sched_job);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 0daca4d..9ee0f27 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>>     * @sched: the scheduler instance on which this job is scheduled.
>>     * @s_fence: contains the fences for the scheduling of job.
>>     * @finish_cb: the callback for the finished fence.
>> - * @finish_work: schedules the function @drm_sched_job_finish once the job has
>> - *               finished to remove the job from the
>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>     * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
>>     * @id: a unique id assigned to each job scheduled on the scheduler.
>>     * @karma: increment on every hang caused by this job. If this exceeds the hang
>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>    struct drm_gpu_scheduler *sched;
>>    struct drm_sched_fence *s_fence;
>>    struct dma_fence_cb finish_cb;
>> - struct work_struct finish_work;
>>    struct list_head node;
>>    uint64_t id;
>>    atomic_t karma;
>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>           void *owner);
>>    void drm_sched_job_cleanup(struct drm_sched_job *job);
>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
>>    void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>    void drm_sched_increase_karma(struct drm_sched_job *bad);
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #1.2: Type: text/html, Size: 31021 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2019-04-23 14:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-18 15:00 [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock Andrey Grodzovsky
     [not found] ` <1555599624-12285-1-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2019-04-18 15:00   ` [PATCH v5 2/6] drm/amd/display: Use a reasonable timeout for framebuffer fence waits Andrey Grodzovsky
2019-04-18 15:00   ` [PATCH v5 3/6] drm/scheduler: rework job destruction Andrey Grodzovsky
2019-04-22 12:48     ` Chunming Zhou
     [not found]       ` <9f7112b1-0348-b4f6-374d-e44c0d448112-5C7GfCeVMHo@public.gmane.org>
2019-04-23 14:26         ` Grodzovsky, Andrey
2019-04-23 14:44           ` Zhou, David(ChunMing) [this message]
2019-04-23 15:01             ` Grodzovsky, Andrey
2019-05-29 10:02     ` Daniel Vetter
2019-04-18 15:00   ` [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer Andrey Grodzovsky
     [not found]     ` <1555599624-12285-4-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2019-04-22 12:59       ` Chunming Zhou
2019-04-23 15:14         ` Grodzovsky, Andrey
2019-04-18 15:00   ` [PATCH v5 5/6] drm/scheduler: Add flag to hint the release of guilty job Andrey Grodzovsky
2019-04-18 15:00   ` [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled Andrey Grodzovsky
     [not found]     ` <1555599624-12285-6-git-send-email-andrey.grodzovsky-5C7GfCeVMHo@public.gmane.org>
2019-04-22 11:54       ` Grodzovsky, Andrey
2019-04-23 12:32         ` Koenig, Christian
     [not found]           ` <9774408b-cc4c-90dd-cbc7-6ef5c6fd8c46-5C7GfCeVMHo@public.gmane.org>
2019-04-23 13:14             ` Kazlauskas, Nicholas
2019-04-23 14:03               ` Grodzovsky, Andrey
2019-04-23 14:12           ` Grodzovsky, Andrey
     [not found]             ` <a5c97356-66d8-b79e-32ab-a03e4c4d3e39-5C7GfCeVMHo@public.gmane.org>
2019-04-23 14:49               ` Christian König
2019-04-22 13:09       ` Chunming Zhou
2019-04-23 14:51         ` Grodzovsky, Andrey
     [not found]           ` <1b41c4f1-b406-8710-2a7a-e5c54a116fe9-5C7GfCeVMHo@public.gmane.org>
2019-04-23 15:19             ` Zhou, David(ChunMing)
     [not found]               ` <-hyv5g0n8ru25qelb0v-8u6jdi1vp2c7z1m3f5-uygwc1o5ji6s-9zli9v-srreuk-3pvse1en6kx0-6se95l-6jsafd-a6sboi-j814xf-ijgwfc-qewgmm-vnafjgrn2fq0-jgir949hx4yo-i772hz-tn7ial.1556032736536-2ueSQiBKiTY7tOexoI0I+QC/G2K4zDHf@public.gmane.org>
2019-04-23 15:59                 ` [PATCH " Grodzovsky, Andrey
2019-04-24  3:02                   ` Zhou, David(ChunMing)
2019-04-24  7:09                     ` Christian König
     [not found]                       ` <e20d013e-df21-1300-27d1-7f9b829cc067-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-26 14:08                         ` Grodzovsky, Andrey
2019-04-28  2:56                           ` Zhou, David(ChunMing)
2019-04-29 14:14                             ` Grodzovsky, Andrey
2019-04-29 19:03                               ` Christian König
2019-04-23  2:35   ` [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock Dieter Nützel
     [not found]     ` <2ddcff29bfaab2408b6e2cbc416322cd-0hun7QTegEsDD4udEopG9Q@public.gmane.org>
2019-04-23 14:02       ` Grodzovsky, Andrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eqyp0q-xw3o4t-5si8qg-3jxnvt7nb0s5trrutj-uelqagfwr69wprwk7ty7v2u3s1j0tabcslq-uksg9o3oidy22540hd-kfbkbo-hbl55n-zeoeno-ygqeaw5tea3s-oa7egj-jmt1w8-n5xit0m1pyob.1556030645243@email.android.com \
    --to=david1.zhou-5c7gfcevmho@public.gmane.org \
    --cc=Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org \
    --cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=Nicholas.Kazlauskas-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org \
    --cc=etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.