amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Lucas Stach <l.stach@pengutronix.de>
To: "Steven Price" <steven.price@arm.com>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Andrey Grodzovsky" <Andrey.Grodzovsky@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>
Cc: Emily Deng <Emily.Deng@amd.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status
Date: Wed, 25 Nov 2020 12:15:48 +0100	[thread overview]
Message-ID: <bec26ed329bf13da8362c9aba225f86089d805d9.camel@pengutronix.de> (raw)
In-Reply-To: <40741fd8-4496-2c12-7850-b68fe8dc8e2a@arm.com>

Am Mittwoch, den 25.11.2020, 11:04 +0000 schrieb Steven Price:
> On 25/11/2020 03:17, Luben Tuikov wrote:
> > The job timeout handler now returns status
> > indicating back to the DRM layer whether the job
> > was successfully cancelled or whether more time
> > should be given to the job to complete.
> 
> I'm not sure I understand in what circumstances you would want to give 
> the job more time to complete. Could you expand on that?

On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.

Regards,
Lucas

> One thing we're missing at the moment in Panfrost is the ability to 
> suspend ("soft stop" is the Mali jargon) a job and pick something else 
> to run. The propitiatory driver stack uses this to avoid timing out long 
> running jobs while still allowing other processes to have time on the 
> GPU. But this interface as it stands doesn't seem to provide that.
> 
> As the kernel test robot has already pointed out - you'll need to at the 
> very least update the other uses of this interface.
> 
> Steve
> 
> > Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 ++++--
> >   include/drm/gpu_scheduler.h             | 13 ++++++++++---
> >   2 files changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index ff48101bab55..81b73790ecc6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -28,7 +28,7 @@
> >   #include "amdgpu.h"
> >   #include "amdgpu_trace.h"
> >   
> > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   {
> >   	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> >   	struct amdgpu_job *job = to_amdgpu_job(s_job);
> > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   	    amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
> >   		DRM_ERROR("ring %s timeout, but soft recovered\n",
> >   			  s_job->sched->name);
> > -		return;
> > +		return 0;
> >   	}
> >   
> >   	amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
> > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   
> >   	if (amdgpu_device_should_recover_gpu(ring->adev)) {
> >   		amdgpu_device_gpu_recover(ring->adev, job);
> > +		return 0;
> >   	} else {
> >   		drm_sched_suspend_timeout(&ring->sched);
> >   		if (amdgpu_sriov_vf(adev))
> >   			adev->virt.tdr_debug = true;
> > +		return 1;
> >   	}
> >   }
> >   
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 2e0c368e19f6..61f7121e1c19 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
> >   	struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
> >   
> >   	/**
> > -         * @timedout_job: Called when a job has taken too long to execute,
> > -         * to trigger GPU recovery.
> > +	 * @timedout_job: Called when a job has taken too long to execute,
> > +	 * to trigger GPU recovery.
> > +	 *
> > +	 * Return 0, if the job has been aborted successfully and will
> > +	 * never be heard of from the device. Return non-zero if the
> > +	 * job wasn't able to be aborted, i.e. if more time should be
> > +	 * given to this job. The result is not "bool" as this
> > +	 * function is not a predicate, although its result may seem
> > +	 * as one.
> >   	 */
> > -	void (*timedout_job)(struct drm_sched_job *sched_job);
> > +	int (*timedout_job)(struct drm_sched_job *sched_job);
> >   
> >   	/**
> >            * @free_job: Called once the job's finished fence has been signaled
> > 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2020-11-25 11:16 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-25 20:51 [PATCH v4] drm/scheduler: Avoid accessing freed bad job Andrey Grodzovsky
2019-11-25 20:51 ` Andrey Grodzovsky
2019-11-25 21:44 ` Deng, Emily
2019-11-25 21:44   ` Deng, Emily
2019-11-26  0:09   ` Grodzovsky, Andrey
2019-11-26  0:09     ` Grodzovsky, Andrey
     [not found]     ` <MWHPR12MB1453C6FC45A83482232CA3EDEA450-Gy0DoCVfaSWZBIDmKHdw+wdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-11-26 15:36       ` Deucher, Alexander
2019-11-26 15:36         ` Deucher, Alexander
2019-11-26 15:37 ` Andrey Grodzovsky
2019-11-26 15:37   ` Andrey Grodzovsky
     [not found]   ` <b8b716a7-e235-38b2-ea6d-0a21881fa64e-5C7GfCeVMHo@public.gmane.org>
2019-11-27  0:41     ` Deng, Emily
2019-11-27  0:41       ` Deng, Emily
2019-12-02 19:24       ` Deng, Emily
2019-12-03 19:10         ` Andrey Grodzovsky
2019-12-03 19:44           ` Deucher, Alexander
2019-12-03 19:57             ` Andrey Grodzovsky
2019-12-03 19:59               ` Deucher, Alexander
2019-12-03 20:32                 ` Andrey Grodzovsky
2019-12-03 20:58                   ` Deng, Emily
2019-12-03 19:53           ` Deng, Emily
2020-02-05 18:24 ` Lucas Stach
2020-02-06 11:10   ` Lucas Stach
2020-02-06 11:49     ` Christian König
2020-02-06 14:49       ` Alex Deucher
2020-02-06 14:51         ` Christian König
2020-02-06 15:49           ` Andrey Grodzovsky
2020-02-10 16:55             ` Andrey Grodzovsky
2020-02-10 21:50               ` Luben Tuikov
2020-02-11 15:55                 ` Andrey Grodzovsky
2020-02-11 21:27                   ` Andrey Grodzovsky
2020-02-12  0:53                     ` Luben Tuikov
2020-02-12 16:33                       ` Andrey Grodzovsky
2020-07-21 11:03                         ` Lucas Stach
2020-07-21 13:36                           ` Andrey Grodzovsky
2020-07-21 13:39                             ` Christian König
2020-07-21 13:42                               ` Andrey Grodzovsky
2020-07-21 18:29                                 ` Luben Tuikov
2020-11-25  3:17                                   ` [PATCH 0/6] Allow to extend the timeout without jobs disappearing Luben Tuikov
2020-11-25  3:17                                     ` [PATCH 1/6] drm/scheduler: "node" --> "list" Luben Tuikov
2020-11-25  9:44                                       ` Christian König
2020-11-25  3:17                                     ` [PATCH 2/6] gpu/drm: ring_mirror_list --> pending_list Luben Tuikov
2020-11-25  9:47                                       ` Christian König
2020-11-25 16:42                                         ` Luben Tuikov
2020-11-25  3:17                                     ` [PATCH 3/6] drm/scheduler: Job timeout handler returns status Luben Tuikov
2020-11-25  4:41                                       ` kernel test robot
2020-11-25  9:50                                       ` Christian König
2020-11-25 16:48                                         ` Luben Tuikov
2020-11-25 11:04                                       ` Steven Price
2020-11-25 11:15                                         ` Lucas Stach [this message]
2020-11-25 11:22                                           ` Steven Price
2020-11-25 11:47                                             ` Lucas Stach
2020-11-25 12:41                                         ` Christian König
2020-11-26 15:06                                       ` Andrey Grodzovsky
2020-11-25  3:17                                     ` [PATCH 4/6] drm/scheduler: Essentialize the job done callback Luben Tuikov
2020-11-25  9:51                                       ` Christian König
2020-11-25  3:17                                     ` [PATCH 5/6] drm/amdgpu: Don't hardcode thread name length Luben Tuikov
2020-11-25  9:55                                       ` Christian König
2020-11-25 17:01                                         ` Luben Tuikov
2020-11-26  8:11                                           ` Christian König
2020-11-25  3:17                                     ` [PATCH 6/6] drm/sched: Make use of a "done" thread Luben Tuikov
2020-11-25 10:10                                       ` Christian König
2020-11-26  0:24                                         ` Luben Tuikov
2020-11-25 11:09                                       ` Steven Price
2020-11-26  0:30                                         ` Luben Tuikov
2020-02-07 15:26           ` [PATCH v4] drm/scheduler: Avoid accessing freed bad job Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bec26ed329bf13da8362c9aba225f86089d805d9.camel@pengutronix.de \
    --to=l.stach@pengutronix.de \
    --cc=Alexander.Deucher@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Emily.Deng@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=luben.tuikov@amd.com \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).