From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A1D3C433F5 for ; Wed, 15 Sep 2021 11:45:46 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2D7086128B for ; Wed, 15 Sep 2021 11:45:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2D7086128B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2B76F6E91C; Wed, 15 Sep 2021 11:45:43 +0000 (UTC) Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by gabe.freedesktop.org (Postfix) with ESMTPS id 47F106E914; Wed, 15 Sep 2021 11:45:41 +0000 (UTC) Received: by mail-wr1-x430.google.com with SMTP id u18so1671432wrg.5; Wed, 15 Sep 2021 04:45:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=Zh48JfzZnNO3eFLiUCBCArgXr8Y4evHfiZod73CAP08=; b=lb7XFxvvBz+WDuLsqyRTd+XLq/qJzIuYbQnfhIclZ4IRlLqys4Ms+yTCdfcrQft5d6 q7EJ5GQfJEFJmba3Llu1Ibs7CIzXP4b0MIGUY+JZdR1aHy0eci3XvC0LGfFSdIiR9K8u 1/8nGH/ROO/tE/6CR66XPb4QsFRxGPTvfX5cAy1edHr9/qTjQTPh0o9o/TaKntkLY8V8 b4EXT4FbJbB5fSN6y4J9uPThqdmz+ffzjz4KMG/2HVt4umZAFgo/LYu4jaewOZYAqmyw SoJM7K8PnOj79GJ/lLMtEcZYeGeob9V5OlupiV+krkUwbfVRTzI54uky4Dmm+wWF+U93 D85Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=Zh48JfzZnNO3eFLiUCBCArgXr8Y4evHfiZod73CAP08=; b=icdkCPqxjwrg5XdHfqVzE+LMjoZZYW0Eg5mIJRQ87r6bXGPzldlYttT7N3xx7tT3zj z6oaxE12HGViiRpvNql/pf13/eqZp6H7+N7X0E8B60o4SzcGE4EOm9KJs549+M4sYoX2 JiydKlDn/yL0s/2dGIw8BmP8Ju61jfmQCg0C8u7h6hcaVjZEJV8RD86kN0gFVo1nRk09 Hn0rivaqLeAT0cd5ePAtzfmsRI4mB+Al5NRRAvCZGgmQXj70GCqWnnKxRt9EIWfL4AXG QQ3U3EZEOKhVenSoKN1U8DHs51vOW03Quir+kJZREA2on7EhcfHPldLraVQ5o7ZyYmC3 g30w== X-Gm-Message-State: AOAM531tMTRdsYDTmkEtUx0Gv2aac52coAgIz4s9BDQFF5hBG4NaNfM4 Bkx3SfTJCCxoSq5tkefix6lNbVtY5GM= X-Google-Smtp-Source: ABdhPJyQ0lh8msPS3rDvTKdm9hldBPrb7m5J5mO+VvjXwqod3GTxUaDqRdHSB8yza1aCL9dQmlXUeQ== X-Received: by 2002:a05:6000:14d:: with SMTP id r13mr4639348wrx.420.1631706339795; Wed, 15 Sep 2021 04:45:39 -0700 (PDT) Received: from ?IPv6:2a02:908:1252:fb60:d03d:8939:3840:1f95? ([2a02:908:1252:fb60:d03d:8939:3840:1f95]) by smtp.gmail.com with ESMTPSA id c23sm4048630wmb.37.2021.09.15.04.45.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Sep 2021 04:45:39 -0700 (PDT) Subject: Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4) To: "Grodzovsky, Andrey" , Alex Deucher Cc: "Liu, Monk" , amd-gfx list , Maling list - DRI developers References: <1630457207-13107-1-git-send-email-Monk.Liu@amd.com> <28709f7f-8a48-40ad-87bb-c2f0dd89da38@gmail.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <9fef7287-5d2c-1d25-afa8-1c621be8fe67@gmail.com> Date: Wed, 15 Sep 2021 13:45:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------489A5B73750A51ACB45DB9F1" Content-Language: en-US X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This is a multi-part message in MIME format. --------------489A5B73750A51ACB45DB9F1 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Yes, I think so as well. Andrey can you push this? Christian. Am 15.09.21 um 00:59 schrieb Grodzovsky, Andrey: > AFAIK this one is independent. > > Christian, can you confirm ? > > Andrey > ------------------------------------------------------------------------ > *From:* amd-gfx on behalf of > Alex Deucher > *Sent:* 14 September 2021 15:33 > *To:* Christian König > *Cc:* Liu, Monk ; amd-gfx list > ; Maling list - DRI developers > > *Subject:* Re: [PATCH 1/2] drm/sched: fix the bug of time out > calculation(v4) > Was this fix independent of the other discussions?  Should this be > applied to drm-misc? > > Alex > > On Wed, Sep 1, 2021 at 4:42 PM Alex Deucher wrote: > > > > On Wed, Sep 1, 2021 at 2:50 AM Christian König > > wrote: > > > > > > Am 01.09.21 um 02:46 schrieb Monk Liu: > > > > issue: > > > > in cleanup_job the cancle_delayed_work will cancel a TO timer > > > > even the its corresponding job is still running. > > > > > > > > fix: > > > > do not cancel the timer in cleanup_job, instead do the cancelling > > > > only when the heading job is signaled, and if there is a "next" job > > > > we start_timeout again. > > > > > > > > v2: > > > > further cleanup the logic, and do the TDR timer cancelling if > the signaled job > > > > is the last one in its scheduler. > > > > > > > > v3: > > > > change the issue description > > > > remove the cancel_delayed_work in the begining of the cleanup_job > > > > recover the implement of drm_sched_job_begin. > > > > > > > > v4: > > > > remove the kthread_should_park() checking in cleanup_job routine, > > > > we should cleanup the signaled job asap > > > > > > > > TODO: > > > > 1)introduce pause/resume scheduler in job_timeout to serial the > handling > > > > of scheduler and job_timeout. > > > > 2)drop the bad job's del and insert in scheduler due to above > serialization > > > > (no race issue anymore with the serialization) > > > > > > > > tested-by: jingwen > > > > Signed-off-by: Monk Liu > > > > > > Reviewed-by: Christian König > > > > > > > Are you planning to push this to drm-misc? > > > > Alex > > > > > > > > --- > > > >   drivers/gpu/drm/scheduler/sched_main.c | 26 > +++++++++----------------- > > > >   1 file changed, 9 insertions(+), 17 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > > > > index a2a9536..3e0bbc7 100644 > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > @@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct > drm_gpu_scheduler *sched) > > > >   { > > > >       struct drm_sched_job *job, *next; > > > > > > > > -     /* > > > > -      * Don't destroy jobs while the timeout worker is running  > OR thread > > > > -      * is being parked and hence assumed to not touch pending_list > > > > -      */ > > > > -     if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > > > > - !cancel_delayed_work(&sched->work_tdr)) || > > > > -         kthread_should_park()) > > > > -             return NULL; > > > > - > > > > spin_lock(&sched->job_list_lock); > > > > > > > >       job = list_first_entry_or_null(&sched->pending_list, > > > > @@ -693,17 +684,21 @@ drm_sched_get_cleanup_job(struct > drm_gpu_scheduler *sched) > > > >       if (job && dma_fence_is_signaled(&job->s_fence->finished)) { > > > >               /* remove job from pending_list */ > > > > list_del_init(&job->list); > > > > + > > > > +             /* cancel this job's TO timer */ > > > > + cancel_delayed_work(&sched->work_tdr); > > > >               /* make the scheduled timestamp more accurate */ > > > >               next = list_first_entry_or_null(&sched->pending_list, > > > > typeof(*next), list); > > > > -             if (next) > > > > + > > > > +             if (next) { > > > > next->s_fence->scheduled.timestamp = > > > > job->s_fence->finished.timestamp; > > > > - > > > > +                     /* start TO timer for next job */ > > > > + drm_sched_start_timeout(sched); > > > > +             } > > > >       } else { > > > >               job = NULL; > > > > -             /* queue timeout for next job */ > > > > - drm_sched_start_timeout(sched); > > > >       } > > > > > > > > spin_unlock(&sched->job_list_lock); > > > > @@ -791,11 +786,8 @@ static int drm_sched_main(void *param) > > > > (entity = drm_sched_select_entity(sched))) || > > > > kthread_should_stop()); > > > > > > > > -             if (cleanup_job) { > > > > +             if (cleanup_job) > > > > sched->ops->free_job(cleanup_job); > > > > -                     /* queue timeout for next job */ > > > > - drm_sched_start_timeout(sched); > > > > -             } > > > > > > > >               if (!entity) > > > >                       continue; > > > --------------489A5B73750A51ACB45DB9F1 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit Yes, I think so as well. Andrey can you push this?

Christian.

Am 15.09.21 um 00:59 schrieb Grodzovsky, Andrey:
AFAIK this one is independent.

Christian, can you confirm ?

Andrey

From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Alex Deucher <alexdeucher@gmail.com>
Sent: 14 September 2021 15:33
To: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: Liu, Monk <Monk.Liu@amd.com>; amd-gfx list <amd-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4)
 
Was this fix independent of the other discussions?  Should this be
applied to drm-misc?

Alex

On Wed, Sep 1, 2021 at 4:42 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Wed, Sep 1, 2021 at 2:50 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 01.09.21 um 02:46 schrieb Monk Liu:
> > > issue:
> > > in cleanup_job the cancle_delayed_work will cancel a TO timer
> > > even the its corresponding job is still running.
> > >
> > > fix:
> > > do not cancel the timer in cleanup_job, instead do the cancelling
> > > only when the heading job is signaled, and if there is a "next" job
> > > we start_timeout again.
> > >
> > > v2:
> > > further cleanup the logic, and do the TDR timer cancelling if the signaled job
> > > is the last one in its scheduler.
> > >
> > > v3:
> > > change the issue description
> > > remove the cancel_delayed_work in the begining of the cleanup_job
> > > recover the implement of drm_sched_job_begin.
> > >
> > > v4:
> > > remove the kthread_should_park() checking in cleanup_job routine,
> > > we should cleanup the signaled job asap
> > >
> > > TODO:
> > > 1)introduce pause/resume scheduler in job_timeout to serial the handling
> > > of scheduler and job_timeout.
> > > 2)drop the bad job's del and insert in scheduler due to above serialization
> > > (no race issue anymore with the serialization)
> > >
> > > tested-by: jingwen <jingwen.chen@@amd.com>
> > > Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> >
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> >
>
> Are you planning to push this to drm-misc?
>
> Alex
>
>
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_main.c | 26 +++++++++-----------------
> > >   1 file changed, 9 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > index a2a9536..3e0bbc7 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
> > >   {
> > >       struct drm_sched_job *job, *next;
> > >
> > > -     /*
> > > -      * Don't destroy jobs while the timeout worker is running  OR thread
> > > -      * is being parked and hence assumed to not touch pending_list
> > > -      */
> > > -     if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> > > -         !cancel_delayed_work(&sched->work_tdr)) ||
> > > -         kthread_should_park())
> > > -             return NULL;
> > > -
> > >       spin_lock(&sched->job_list_lock);
> > >
> > >       job = list_first_entry_or_null(&sched->pending_list,
> > > @@ -693,17 +684,21 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
> > >       if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
> > >               /* remove job from pending_list */
> > >               list_del_init(&job->list);
> > > +
> > > +             /* cancel this job's TO timer */
> > > +             cancel_delayed_work(&sched->work_tdr);
> > >               /* make the scheduled timestamp more accurate */
> > >               next = list_first_entry_or_null(&sched->pending_list,
> > >                                               typeof(*next), list);
> > > -             if (next)
> > > +
> > > +             if (next) {
> > >                       next->s_fence->scheduled.timestamp =
> > >                               job->s_fence->finished.timestamp;
> > > -
> > > +                     /* start TO timer for next job */
> > > +                     drm_sched_start_timeout(sched);
> > > +             }
> > >       } else {
> > >               job = NULL;
> > > -             /* queue timeout for next job */
> > > -             drm_sched_start_timeout(sched);
> > >       }
> > >
> > >       spin_unlock(&sched->job_list_lock);
> > > @@ -791,11 +786,8 @@ static int drm_sched_main(void *param)
> > >                                         (entity = drm_sched_select_entity(sched))) ||
> > >                                        kthread_should_stop());
> > >
> > > -             if (cleanup_job) {
> > > +             if (cleanup_job)
> > >                       sched->ops->free_job(cleanup_job);
> > > -                     /* queue timeout for next job */
> > > -                     drm_sched_start_timeout(sched);
> > > -             }
> > >
> > >               if (!entity)
> > >                       continue;
> >

--------------489A5B73750A51ACB45DB9F1--