linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Koenig, Christian" <Christian.Koenig@amd.com>
To: Steven Price <steven.price@arm.com>,
	Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@linux.ie>
Cc: Sharat Masetty <smasetty@codeaurora.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Nayan Deshmukh <nayan26deshmukh@gmail.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: [PATCH v3] drm: Don't free jobs in wait_event_interruptible()
Date: Thu, 26 Sep 2019 13:50:29 +0000	[thread overview]
Message-ID: <d9b23ef8-f784-0bce-c34a-fa02002db1ea@amd.com> (raw)
In-Reply-To: <f045e1d6-5263-58c2-ce28-2cd215067bdb@arm.com>

Am 26.09.19 um 15:43 schrieb Steven Price:
> On 26/09/2019 14:37, Koenig, Christian wrote:
>> Am 26.09.19 um 14:31 schrieb Steven Price:
>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>>> it is called as the condition of wait_event_interruptible() it must not
>>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>>
>>> Instead let's rename drm_sched_cleanup_jobs() to
>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>> there is one. The caller can then call the free_job() callback outside
>>> the wait_event_interruptible() where sleeping is possible before
>>> re-checking and returning to sleep if necessary.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>>
>>> Changes from v2:
>>>    * Actually move list_first_entry_or_null() within the lock
>>>
>>>    drivers/gpu/drm/scheduler/sched_main.c | 42 ++++++++++++++------------
>>>    1 file changed, 23 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74d82dc..e4bd792f2b29 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>>    }
>>>    
>>>    /**
>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>>     *
>>>     * @sched: scheduler instance
>>>     *
>>> - * Remove all finished jobs from the mirror list and destroy them.
>>> + * Returns the next finished job from the mirror list (if there is one)
>>> + * ready for it to be destroyed.
>>>     */
>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>> +static struct drm_sched_job *
>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>    {
>>> +	struct drm_sched_job *job = NULL;
>>>    	unsigned long flags;
>>>    
>>>    	/* Don't destroy jobs while the timeout worker is running */
>>>    	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>    	    !cancel_delayed_work(&sched->work_tdr))
>>> -		return;
>>> -
>>> +		return NULL;
>>>    
>>> -	while (!list_empty(&sched->ring_mirror_list)) {
>>> -		struct drm_sched_job *job;
>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>    
>>> -		job = list_first_entry(&sched->ring_mirror_list,
>>> +	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>    				       struct drm_sched_job, node);
>>> -		if (!dma_fence_is_signaled(&job->s_fence->finished))
>>> -			break;
>>>    
>>> -		spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>    		/* remove job from ring_mirror_list */
>>>    		list_del_init(&job->node);
>>> -		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> -		sched->ops->free_job(job);
>>> +	} else {
>>> +		job = NULL;
>>> +		/* queue timeout for next job */
>>> +		drm_sched_start_timeout(sched);
>> Sorry I've only seen this right now: This won't work.
>>
>> See you always need to start the timeout, or cancel_delayed_work() will
>> abort the whole thing after returning the first job.
> Good spot!
>
> My logic was that drm_sched_main() would call again and that would
> re-arm the timeout after the job has been completed.
>
> But looking at this more closely I see that isn't actually going to work
> reliably - if the work has been cancelled the first time then
> cancel_delayed_work() will return false causing an early out and the
> timeout isn't re-armed.
>
> Is it as simple as moving drm_sched_start_timeout() out of the else case
> and unconditionally re-arming the timeout? I'm worried about races with
> the job that is being returned, although it has already been removed
> from ring_mirror_list so perhaps it's safe to do?
>
> Alternatively the caller could manually re-arm the timeout after
> handling the job free.

I don't see anything that could go wrong immediately, but that is 
probably the cleaner approach.

So please do this and since we are now doing more than just removing the 
job, please keep that in a separate function.

Regards,
Christian.

>
> Steve
>
>> Christian.
>>
>>>    	}
>>>    
>>> -	/* queue timeout for next job */
>>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -	drm_sched_start_timeout(sched);
>>>    	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>    
>>> +	return job;
>>>    }
>>>    
>>>    /**
>>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param)
>>>    		struct drm_sched_fence *s_fence;
>>>    		struct drm_sched_job *sched_job;
>>>    		struct dma_fence *fence;
>>> +		struct drm_sched_job *cleanup_job = NULL;
>>>    
>>>    		wait_event_interruptible(sched->wake_up_worker,
>>> -					 (drm_sched_cleanup_jobs(sched),
>>> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>>>    					 (!drm_sched_blocked(sched) &&
>>>    					  (entity = drm_sched_select_entity(sched))) ||
>>> -					 kthread_should_stop()));
>>> +					 kthread_should_stop());
>>> +
>>> +		while (cleanup_job) {
>>> +			sched->ops->free_job(cleanup_job);
>>> +			cleanup_job = drm_sched_get_cleanup_job(sched);
>>> +		}
>>>    
>>>    		if (!entity)
>>>    			continue;
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>


      reply	other threads:[~2019-09-26 13:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-26 12:31 [PATCH v3] drm: Don't free jobs in wait_event_interruptible() Steven Price
2019-09-26 13:37 ` Koenig, Christian
2019-09-26 13:43   ` Steven Price
2019-09-26 13:50     ` Koenig, Christian [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9b23ef8-f784-0bce-c34a-fa02002db1ea@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=airlied@linux.ie \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nayan26deshmukh@gmail.com \
    --cc=smasetty@codeaurora.org \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).