From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D9E2C433E6 for ; Wed, 17 Mar 2021 07:43:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D7C3464F7E for ; Wed, 17 Mar 2021 07:43:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7C3464F7E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 593BD6E103; Wed, 17 Mar 2021 07:43:40 +0000 (UTC) Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6C9E789A8B; Wed, 17 Mar 2021 07:43:38 +0000 (UTC) Received: by mail-ed1-x52b.google.com with SMTP id j3so1013943edp.11; Wed, 17 Mar 2021 00:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=hMZ0sTLd+GtZrcVWM9uMcCoDvk0vIa13FCgLGfmne7o=; b=F4D2065An/z8HQX8fy+3HCcn7NPj1RfZroiwQP3Stlc+velOjojA+lsYTqdhFvqr+N lXX5F6I7XtjXv8nUAEALGo1GA7dSxCV0ikkrs2/PUMeIy/ryCMah048b1bWhgbA6XcZa UYW0cr7U+Lr9OeurdG10OrWgI7Ul0rliFvj88Wt/de9U2JW4G82XEJ6ouAjwAGPiBsjG BL69iNQWuve8Uz8jiPRWbLb1gcckJCYlK0jlVZ4kcQrff5J0dJVfpOIIZ/eGAG8yd9tn WgO+2xi69/vGDgkCElkktEMMThGOKXeoefnWIGUuSGKmLXZaKkSsiZLBlrfmHu40CrH/ vgSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=hMZ0sTLd+GtZrcVWM9uMcCoDvk0vIa13FCgLGfmne7o=; b=GCm8xFTTGBwFsP6U2kCY8IoTGw8ISPrzIkkc/PLzyCWDODJEmkUFZOz+TzVVlOMaI2 gqnM1I0iCPQmFX66vXuMQrKpZuAublVrBwarbmQMukcVAw5fuYZcel37vHJi/yo1qI08 O+th3VCin51MVfIMQj9eS4JaTKfE/T0DzxegNBh2+YaYeZRdUOb8v8T2BvdUjbkqROxi h9cXxpAx5ANHBdf1+gMSYO9V3wCkVP5E8hDFDjTRnSokOG7ybJF6g+DyUY4nzvJSB1vW a8F6nVNtMk4d6rcXpC5z+UkcjOo211hHBZzgYX5GovO5NCcDrF+sSCXIJ1XN1SuwVmn9 qKIA== X-Gm-Message-State: AOAM530y8MaMdPTiwQvV0Vgs7I+U36W0Y1gK3sFXicYNZsXCkbhzOAoc a+S4V4wJbUKggVkQ06mFm4I= X-Google-Smtp-Source: ABdhPJx34e5gFsdjxsUDrUJFVgj0nf8u9N6aPsHyglVy2AB7ilhLX917KFNNW5wvku3scg9fSiikmg== X-Received: by 2002:aa7:cf14:: with SMTP id a20mr40356536edy.49.1615967017095; Wed, 17 Mar 2021 00:43:37 -0700 (PDT) Received: from ?IPv6:2a02:908:1252:fb60:ccdd:b6ca:11e:5cc5? ([2a02:908:1252:fb60:ccdd:b6ca:11e:5cc5]) by smtp.gmail.com with ESMTPSA id d9sm5991608ejj.5.2021.03.17.00.43.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Mar 2021 00:43:36 -0700 (PDT) Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak To: "Zhang, Jack (Jian)" , "dri-devel@lists.freedesktop.org" , "amd-gfx@lists.freedesktop.org" , "Koenig, Christian" , "Grodzovsky, Andrey" , "Liu, Monk" , "Deng, Emily" , Rob Herring , Tomeu Vizoso , Steven Price References: <20210315052036.1113638-1-Jack.Zhang1@amd.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <9b48b715-52dd-e435-2873-2472427dffda@gmail.com> Date: Wed, 17 Mar 2021 08:43:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" I was hoping Andrey would take a look since I'm really busy with other work right now. Regards, Christian. Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian): > Hi, Andrey/Crhistian and Team, > > I didn't receive the reviewer's message from maintainers on panfrost driver for several days. > Due to this patch is urgent for my current working project. > Would you please help to give some review ideas? > > Many Thanks, > Jack > -----Original Message----- > From: Zhang, Jack (Jian) > Sent: Tuesday, March 16, 2021 3:20 PM > To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price > Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak > > [AMD Public Use] > > Ping > > -----Original Message----- > From: Zhang, Jack (Jian) > Sent: Monday, March 15, 2021 1:24 PM > To: Jack Zhang ; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price > Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak > > [AMD Public Use] > > Hi, Rob/Tomeu/Steven, > > Would you please help to review this patch for panfrost driver? > > Thanks, > Jack Zhang > > -----Original Message----- > From: Jack Zhang > Sent: Monday, March 15, 2021 1:21 PM > To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily > Cc: Zhang, Jack (Jian) > Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak > > re-insert Bailing jobs to avoid memory leak. > > V2: move re-insert step to drm/scheduler logic > V3: add panfrost's return value for bailing jobs in case it hits the memleak issue. > > Signed-off-by: Jack Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++-- > drivers/gpu/drm/panfrost/panfrost_job.c | 4 ++-- > drivers/gpu/drm/scheduler/sched_main.c | 8 +++++++- > include/drm/gpu_scheduler.h | 1 + > 5 files changed, 19 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 79b9cc73763f..86463b0f936e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > job ? job->base.id : -1); > > /* even we skipped this reset, still need to set the job to guilty */ > - if (job) > + if (job) { > drm_sched_increase_karma(&job->base); > + r = DRM_GPU_SCHED_STAT_BAILING; > + } > goto skip_recovery; > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 759b34799221..41390bdacd9e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) > struct amdgpu_job *job = to_amdgpu_job(s_job); > struct amdgpu_task_info ti; > struct amdgpu_device *adev = ring->adev; > + int ret; > > memset(&ti, 0, sizeof(struct amdgpu_task_info)); > > @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) > ti.process_name, ti.tgid, ti.task_name, ti.pid); > > if (amdgpu_device_should_recover_gpu(ring->adev)) { > - amdgpu_device_gpu_recover(ring->adev, job); > - return DRM_GPU_SCHED_STAT_NOMINAL; > + ret = amdgpu_device_gpu_recover(ring->adev, job); > + if (ret == DRM_GPU_SCHED_STAT_BAILING) > + return DRM_GPU_SCHED_STAT_BAILING; > + else > + return DRM_GPU_SCHED_STAT_NOMINAL; > } else { > drm_sched_suspend_timeout(&ring->sched); > if (amdgpu_sriov_vf(adev)) > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c > index 6003cfeb1322..e2cb4f32dae1 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job > * spurious. Bail out. > */ > if (dma_fence_is_signaled(job->done_fence)) > - return DRM_GPU_SCHED_STAT_NOMINAL; > + return DRM_GPU_SCHED_STAT_BAILING; > > dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", > js, > @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job > > /* Scheduler is already stopped, nothing to do. */ > if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job)) > - return DRM_GPU_SCHED_STAT_NOMINAL; > + return DRM_GPU_SCHED_STAT_BAILING; > > /* Schedule a reset if there's no reset in progress. */ > if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 92d8de24d0a1..a44f621fb5c4 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work) { > struct drm_gpu_scheduler *sched; > struct drm_sched_job *job; > + int ret; > > sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); > > @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work) > list_del_init(&job->list); > spin_unlock(&sched->job_list_lock); > > - job->sched->ops->timedout_job(job); > + ret = job->sched->ops->timedout_job(job); > > + if (ret == DRM_GPU_SCHED_STAT_BAILING) { > + spin_lock(&sched->job_list_lock); > + list_add(&job->node, &sched->ring_mirror_list); > + spin_unlock(&sched->job_list_lock); > + } > /* > * Guilty job did complete and hence needs to be manually removed > * See drm_sched_stop doc. > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat { > DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */ > DRM_GPU_SCHED_STAT_NOMINAL, > DRM_GPU_SCHED_STAT_ENODEV, > + DRM_GPU_SCHED_STAT_BAILING, > }; > > /** > -- > 2.25.1 > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.1 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A59DC433E0 for ; Wed, 17 Mar 2021 07:43:40 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D1F7764F8B for ; Wed, 17 Mar 2021 07:43:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D1F7764F8B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 86E1489A8B; Wed, 17 Mar 2021 07:43:39 +0000 (UTC) Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6C9E789A8B; Wed, 17 Mar 2021 07:43:38 +0000 (UTC) Received: by mail-ed1-x52b.google.com with SMTP id j3so1013943edp.11; Wed, 17 Mar 2021 00:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=hMZ0sTLd+GtZrcVWM9uMcCoDvk0vIa13FCgLGfmne7o=; b=F4D2065An/z8HQX8fy+3HCcn7NPj1RfZroiwQP3Stlc+velOjojA+lsYTqdhFvqr+N lXX5F6I7XtjXv8nUAEALGo1GA7dSxCV0ikkrs2/PUMeIy/ryCMah048b1bWhgbA6XcZa UYW0cr7U+Lr9OeurdG10OrWgI7Ul0rliFvj88Wt/de9U2JW4G82XEJ6ouAjwAGPiBsjG BL69iNQWuve8Uz8jiPRWbLb1gcckJCYlK0jlVZ4kcQrff5J0dJVfpOIIZ/eGAG8yd9tn WgO+2xi69/vGDgkCElkktEMMThGOKXeoefnWIGUuSGKmLXZaKkSsiZLBlrfmHu40CrH/ vgSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=hMZ0sTLd+GtZrcVWM9uMcCoDvk0vIa13FCgLGfmne7o=; b=GCm8xFTTGBwFsP6U2kCY8IoTGw8ISPrzIkkc/PLzyCWDODJEmkUFZOz+TzVVlOMaI2 gqnM1I0iCPQmFX66vXuMQrKpZuAublVrBwarbmQMukcVAw5fuYZcel37vHJi/yo1qI08 O+th3VCin51MVfIMQj9eS4JaTKfE/T0DzxegNBh2+YaYeZRdUOb8v8T2BvdUjbkqROxi h9cXxpAx5ANHBdf1+gMSYO9V3wCkVP5E8hDFDjTRnSokOG7ybJF6g+DyUY4nzvJSB1vW a8F6nVNtMk4d6rcXpC5z+UkcjOo211hHBZzgYX5GovO5NCcDrF+sSCXIJ1XN1SuwVmn9 qKIA== X-Gm-Message-State: AOAM530y8MaMdPTiwQvV0Vgs7I+U36W0Y1gK3sFXicYNZsXCkbhzOAoc a+S4V4wJbUKggVkQ06mFm4I= X-Google-Smtp-Source: ABdhPJx34e5gFsdjxsUDrUJFVgj0nf8u9N6aPsHyglVy2AB7ilhLX917KFNNW5wvku3scg9fSiikmg== X-Received: by 2002:aa7:cf14:: with SMTP id a20mr40356536edy.49.1615967017095; Wed, 17 Mar 2021 00:43:37 -0700 (PDT) Received: from ?IPv6:2a02:908:1252:fb60:ccdd:b6ca:11e:5cc5? ([2a02:908:1252:fb60:ccdd:b6ca:11e:5cc5]) by smtp.gmail.com with ESMTPSA id d9sm5991608ejj.5.2021.03.17.00.43.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Mar 2021 00:43:36 -0700 (PDT) Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak To: "Zhang, Jack (Jian)" , "dri-devel@lists.freedesktop.org" , "amd-gfx@lists.freedesktop.org" , "Koenig, Christian" , "Grodzovsky, Andrey" , "Liu, Monk" , "Deng, Emily" , Rob Herring , Tomeu Vizoso , Steven Price References: <20210315052036.1113638-1-Jack.Zhang1@amd.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <9b48b715-52dd-e435-2873-2472427dffda@gmail.com> Date: Wed, 17 Mar 2021 08:43:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" I was hoping Andrey would take a look since I'm really busy with other work right now. Regards, Christian. Am 17.03.21 um 07:46 schrieb Zhang, Jack (Jian): > Hi, Andrey/Crhistian and Team, > > I didn't receive the reviewer's message from maintainers on panfrost driver for several days. > Due to this patch is urgent for my current working project. > Would you please help to give some review ideas? > > Many Thanks, > Jack > -----Original Message----- > From: Zhang, Jack (Jian) > Sent: Tuesday, March 16, 2021 3:20 PM > To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price > Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak > > [AMD Public Use] > > Ping > > -----Original Message----- > From: Zhang, Jack (Jian) > Sent: Monday, March 15, 2021 1:24 PM > To: Jack Zhang ; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso ; Steven Price > Subject: RE: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak > > [AMD Public Use] > > Hi, Rob/Tomeu/Steven, > > Would you please help to review this patch for panfrost driver? > > Thanks, > Jack Zhang > > -----Original Message----- > From: Jack Zhang > Sent: Monday, March 15, 2021 1:21 PM > To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily > Cc: Zhang, Jack (Jian) > Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak > > re-insert Bailing jobs to avoid memory leak. > > V2: move re-insert step to drm/scheduler logic > V3: add panfrost's return value for bailing jobs in case it hits the memleak issue. > > Signed-off-by: Jack Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++-- > drivers/gpu/drm/panfrost/panfrost_job.c | 4 ++-- > drivers/gpu/drm/scheduler/sched_main.c | 8 +++++++- > include/drm/gpu_scheduler.h | 1 + > 5 files changed, 19 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 79b9cc73763f..86463b0f936e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4815,8 +4815,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > job ? job->base.id : -1); > > /* even we skipped this reset, still need to set the job to guilty */ > - if (job) > + if (job) { > drm_sched_increase_karma(&job->base); > + r = DRM_GPU_SCHED_STAT_BAILING; > + } > goto skip_recovery; > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 759b34799221..41390bdacd9e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -34,6 +34,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) > struct amdgpu_job *job = to_amdgpu_job(s_job); > struct amdgpu_task_info ti; > struct amdgpu_device *adev = ring->adev; > + int ret; > > memset(&ti, 0, sizeof(struct amdgpu_task_info)); > > @@ -52,8 +53,11 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) > ti.process_name, ti.tgid, ti.task_name, ti.pid); > > if (amdgpu_device_should_recover_gpu(ring->adev)) { > - amdgpu_device_gpu_recover(ring->adev, job); > - return DRM_GPU_SCHED_STAT_NOMINAL; > + ret = amdgpu_device_gpu_recover(ring->adev, job); > + if (ret == DRM_GPU_SCHED_STAT_BAILING) > + return DRM_GPU_SCHED_STAT_BAILING; > + else > + return DRM_GPU_SCHED_STAT_NOMINAL; > } else { > drm_sched_suspend_timeout(&ring->sched); > if (amdgpu_sriov_vf(adev)) > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c > index 6003cfeb1322..e2cb4f32dae1 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job > * spurious. Bail out. > */ > if (dma_fence_is_signaled(job->done_fence)) > - return DRM_GPU_SCHED_STAT_NOMINAL; > + return DRM_GPU_SCHED_STAT_BAILING; > > dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", > js, > @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job > > /* Scheduler is already stopped, nothing to do. */ > if (!panfrost_scheduler_stop(&pfdev->js->queue[js], sched_job)) > - return DRM_GPU_SCHED_STAT_NOMINAL; > + return DRM_GPU_SCHED_STAT_BAILING; > > /* Schedule a reset if there's no reset in progress. */ > if (!atomic_xchg(&pfdev->reset.pending, 1)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 92d8de24d0a1..a44f621fb5c4 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work) { > struct drm_gpu_scheduler *sched; > struct drm_sched_job *job; > + int ret; > > sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); > > @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work) > list_del_init(&job->list); > spin_unlock(&sched->job_list_lock); > > - job->sched->ops->timedout_job(job); > + ret = job->sched->ops->timedout_job(job); > > + if (ret == DRM_GPU_SCHED_STAT_BAILING) { > + spin_lock(&sched->job_list_lock); > + list_add(&job->node, &sched->ring_mirror_list); > + spin_unlock(&sched->job_list_lock); > + } > /* > * Guilty job did complete and hence needs to be manually removed > * See drm_sched_stop doc. > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat { > DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */ > DRM_GPU_SCHED_STAT_NOMINAL, > DRM_GPU_SCHED_STAT_ENODEV, > + DRM_GPU_SCHED_STAT_BAILING, > }; > > /** > -- > 2.25.1 > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx