All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Monk Liu <Monk.Liu@amd.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	jingwen chen <jingwen.chen@amd.com>
Subject: Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler
Date: Tue, 31 Aug 2021 16:03:32 +0200	[thread overview]
Message-ID: <YS42tI6qAUb3yqOk@phenom.ffwll.local> (raw)
In-Reply-To: <29be989b-c2a5-69b3-f0b8-2df52f50047f@amd.com>

On Tue, Aug 31, 2021 at 09:53:36AM -0400, Andrey Grodzovsky wrote:
> It's says patch [2/2] but i can't find patch 1
> 
> On 2021-08-31 6:35 a.m., Monk Liu wrote:
> > tested-by: jingwen chen <jingwen.chen@amd.com>
> > Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> > Signed-off-by: jingwen chen <jingwen.chen@amd.com>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 24 ++++--------------------
> >   1 file changed, 4 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index ecf8140..894fdb24 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -319,19 +319,17 @@ static void drm_sched_job_timedout(struct work_struct *work)
> >   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> >   	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
> > +	if (!__kthread_should_park(sched->thread))
> > +		kthread_park(sched->thread);
> > +
> 
> 
> As mentioned before, without serializing against other TDR handlers from
> other
> schedulers you just race here against them, e.g. you parked it now but
> another
> one in progress will unpark it as part of calling  drm_sched_start for other
> rings[1]
> Unless I am missing something since I haven't found patch [1/2]
> 
> [1] - https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L5041

You need to have your own wq and run all your tdr work on the same wq if
your reset has any cross-engine impact.

See

https://dri.freedesktop.org/docs/drm/gpu/drm-mm.html#c.drm_sched_backend_ops

for the ->timeout_job callback docs. I thought I brought this up already?
-Daniel

> 
> Andrey
> 
> 
> >   	spin_lock(&sched->job_list_lock);
> >   	job = list_first_entry_or_null(&sched->pending_list,
> >   				       struct drm_sched_job, list);
> >   	if (job) {
> > -		/*
> > -		 * Remove the bad job so it cannot be freed by concurrent
> > -		 * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread
> > -		 * is parked at which point it's safe.
> > -		 */
> > -		list_del_init(&job->list);
> >   		spin_unlock(&sched->job_list_lock);
> > +		/* vendor's timeout_job should call drm_sched_start() */
> >   		status = job->sched->ops->timedout_job(job);
> >   		/*
> > @@ -393,20 +391,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> >   	kthread_park(sched->thread);
> >   	/*
> > -	 * Reinsert back the bad job here - now it's safe as
> > -	 * drm_sched_get_cleanup_job cannot race against us and release the
> > -	 * bad job at this point - we parked (waited for) any in progress
> > -	 * (earlier) cleanups and drm_sched_get_cleanup_job will not be called
> > -	 * now until the scheduler thread is unparked.
> > -	 */
> > -	if (bad && bad->sched == sched)
> > -		/*
> > -		 * Add at the head of the queue to reflect it was the earliest
> > -		 * job extracted.
> > -		 */
> > -		list_add(&bad->list, &sched->pending_list);
> > -
> > -	/*
> >   	 * Iterate the job list from later to  earlier one and either deactive
> >   	 * their HW callbacks or remove them from pending list if they already
> >   	 * signaled.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2021-08-31 14:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31 10:35 [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3) Monk Liu
2021-08-31 10:35 ` [PATCH 2/2] drm/sched: serialize job_timeout and scheduler Monk Liu
2021-08-31 12:59   ` Daniel Vetter
2021-08-31 13:01     ` Daniel Vetter
2021-09-01  0:56       ` Liu, Monk
2021-09-01  1:29       ` Liu, Monk
2021-11-08 23:39         ` Rob Clark
2021-11-08 23:39           ` Rob Clark
2021-11-09  9:07           ` Daniel Vetter
2021-11-09  9:07             ` Daniel Vetter
2021-11-09 16:17             ` Rob Clark
2021-11-10  9:50               ` Daniel Vetter
2021-11-10  9:50                 ` Daniel Vetter
2021-11-10 10:09                 ` Christian König
2021-11-10 12:50                   ` Andrey Grodzovsky
2021-11-10 13:24                   ` Daniel Vetter
2021-11-10 13:24                     ` Daniel Vetter
2021-11-11 15:54                     ` Andrey Grodzovsky
2021-11-10 19:15                 ` Rob Clark
2021-08-31 15:06     ` Luben Tuikov
2021-09-01  0:52     ` Liu, Monk
2021-08-31 13:53   ` Andrey Grodzovsky
2021-08-31 14:03     ` Daniel Vetter [this message]
2021-08-31 14:20       ` Andrey Grodzovsky
2021-08-31 14:38         ` Daniel Vetter
2021-08-31 15:23           ` Andrey Grodzovsky
2021-08-31 16:01             ` Luben Tuikov
2021-08-31 20:56               ` Andrey Grodzovsky
2021-08-31 21:24                 ` Luben Tuikov
2021-09-01  0:24 ` [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3) Liu, Monk
2021-09-01  0:32   ` Grodzovsky, Andrey
2021-09-01  0:46 [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4) Monk Liu
2021-09-01  0:46 ` [PATCH 2/2] drm/sched: serialize job_timeout and scheduler Monk Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YS42tI6qAUb3yqOk@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Monk.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jingwen.chen@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.