All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: Monk Liu <Monk.Liu@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/sched: fix the bug of time out calculation
Date: Tue, 24 Aug 2021 11:01:44 -0400	[thread overview]
Message-ID: <ad3529fd-0a9c-e095-6cce-8063072f43be@amd.com> (raw)
In-Reply-To: <c83ad5a4-0adc-3940-ff86-b9e194235604@amd.com>


On 2021-08-24 10:46 a.m., Andrey Grodzovsky wrote:
>
> On 2021-08-24 5:51 a.m., Monk Liu wrote:
>> the original logic is wrong that the timeout will not be retriggerd
>> after the previous job siganled, and that lead to the scenario that all
>> jobs in the same scheduler shares the same timeout timer from the very
>> begining job in this scheduler which is wrong.
>>
>> we should modify the timer everytime a previous job signaled.
>>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index a2a9536..fb27025 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -235,6 +235,13 @@ static void drm_sched_start_timeout(struct 
>> drm_gpu_scheduler *sched)
>>           schedule_delayed_work(&sched->work_tdr, sched->timeout);
>>   }
>>   +static void drm_sched_restart_timeout(struct drm_gpu_scheduler 
>> *sched)
>> +{
>> +    if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> +        !list_empty(&sched->pending_list))
>> +        mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout);


3d point - if list empty you need to cancel the timer, let the new job 
coming after that restart it.

Andrey


>> +}
>> +
>>   /**
>>    * drm_sched_fault - immediately start timeout handler
>>    *
>> @@ -693,6 +700,11 @@ drm_sched_get_cleanup_job(struct 
>> drm_gpu_scheduler *sched)
>>       if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>           /* remove job from pending_list */
>>           list_del_init(&job->list);
>> +
>> +        /* once the job deleted from pending list we should restart
>> +         * the timeout calculation for the next job.
>> +         */
>> +        drm_sched_restart_timeout(sched);
>
>
> I think this should work, but 2 points -
>
> 1st you should probably remove this now 
> https://elixir.bootlin.com/linux/v5.14-rc1/source/drivers/gpu/drm/scheduler/sched_main.c#L797
>
> 2nd - if you have two adjacent jobs started very closely you 
> effectively letting the second job to be twice longer hang without TDR 
> because
> you reset TDR timer for it when it's almost expired. If we could have 
> TTL (time to live counter) for each job and then do mod_delayed_work
> to the TTL of the following job instead of just full timer reset then 
> this would be more precise. But this is more of recommendation for 
> improvement.
>
> Andrey
>
>
>>           /* make the scheduled timestamp more accurate */
>>           next = list_first_entry_or_null(&sched->pending_list,
>>                           typeof(*next), list);

  reply	other threads:[~2021-08-24 15:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24  9:51 [PATCH] drm/sched: fix the bug of time out calculation Monk Liu
2021-08-24 14:46 ` Andrey Grodzovsky
2021-08-24 15:01   ` Andrey Grodzovsky [this message]
2021-08-25  4:13     ` Liu, Monk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad3529fd-0a9c-e095-6cce-8063072f43be@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Monk.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.