dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/scheduler: put killed job cleanup to worker
@ 2019-07-03 10:28 Lucas Stach
  2019-07-03 14:23 ` Grodzovsky, Andrey
  0 siblings, 1 reply; 6+ messages in thread
From: Lucas Stach @ 2019-07-03 10:28 UTC (permalink / raw)
  To: Christian König, Andrey Grodzovsky; +Cc: kernel, dri-devel, patchwork-lst

drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
job finished fence signaling. As this might happen from IRQ context we
now end up calling the GPU driver free_job callback in IRQ context, while
all other paths call it from normal process context.

Etnaviv in particular calls core kernel functions that are only valid to
be called from process context when freeing the job. Other drivers might
have similar issues, but I did not validate this. Fix this by punting
the cleanup work into a work item, so the driver expectations are met.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 28 ++++++++++++++----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 35ddbec1375a..ba4eb66784b9 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 }
 EXPORT_SYMBOL(drm_sched_entity_flush);
 
-/**
- * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs
- *
- * @f: signaled fence
- * @cb: our callback structure
- *
- * Signal the scheduler finished fence when the entity in question is killed.
- */
+static void drm_sched_entity_kill_work(struct work_struct *work)
+{
+	struct drm_sched_job *job = container_of(work, struct drm_sched_job,
+						 finish_work);
+
+	drm_sched_fence_finished(job->s_fence);
+	WARN_ON(job->s_fence->parent);
+	job->sched->ops->free_job(job);
+}
+
 static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 					  struct dma_fence_cb *cb)
 {
 	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
 						 finish_cb);
 
-	drm_sched_fence_finished(job->s_fence);
-	WARN_ON(job->s_fence->parent);
-	job->sched->ops->free_job(job);
+	schedule_work(&job->finish_work);
 }
 
 /**
@@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
 		drm_sched_fence_scheduled(s_fence);
 		dma_fence_set_error(&s_fence->finished, -ESRCH);
 
+		/*
+		 * Replace regular finish work function with one that just
+		 * kills the job.
+		 */
+		job->finish_work.func = drm_sched_entity_kill_work;
+
 		/*
 		 * When pipe is hanged by older entity, new entity might
 		 * not even have chance to submit it's first job to HW
-- 
2.20.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: put killed job cleanup to worker
  2019-07-03 10:28 [PATCH] drm/scheduler: put killed job cleanup to worker Lucas Stach
@ 2019-07-03 14:23 ` Grodzovsky, Andrey
  2019-07-03 14:32   ` Lucas Stach
  0 siblings, 1 reply; 6+ messages in thread
From: Grodzovsky, Andrey @ 2019-07-03 14:23 UTC (permalink / raw)
  To: Lucas Stach, Koenig, Christian; +Cc: kernel, dri-devel, patchwork-lst


On 7/3/19 6:28 AM, Lucas Stach wrote:
> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
> job finished fence signaling. As this might happen from IRQ context we
> now end up calling the GPU driver free_job callback in IRQ context, while
> all other paths call it from normal process context.
>
> Etnaviv in particular calls core kernel functions that are only valid to
> be called from process context when freeing the job. Other drivers might
> have similar issues, but I did not validate this. Fix this by punting
> the cleanup work into a work item, so the driver expectations are met.
>
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 28 ++++++++++++++----------
>   1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 35ddbec1375a..ba4eb66784b9 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>   }
>   EXPORT_SYMBOL(drm_sched_entity_flush);
>   
> -/**
> - * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs
> - *
> - * @f: signaled fence
> - * @cb: our callback structure
> - *
> - * Signal the scheduler finished fence when the entity in question is killed.
> - */
> +static void drm_sched_entity_kill_work(struct work_struct *work)
> +{
> +	struct drm_sched_job *job = container_of(work, struct drm_sched_job,
> +						 finish_work);
> +
> +	drm_sched_fence_finished(job->s_fence);
> +	WARN_ON(job->s_fence->parent);
> +	job->sched->ops->free_job(job);
> +}
> +
>   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>   					  struct dma_fence_cb *cb)
>   {
>   	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
>   						 finish_cb);
>   
> -	drm_sched_fence_finished(job->s_fence);
> -	WARN_ON(job->s_fence->parent);
> -	job->sched->ops->free_job(job);
> +	schedule_work(&job->finish_work);
>   }
>   
>   /**
> @@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
>   		drm_sched_fence_scheduled(s_fence);
>   		dma_fence_set_error(&s_fence->finished, -ESRCH);
>   
> +		/*
> +		 * Replace regular finish work function with one that just
> +		 * kills the job.
> +		 */
> +		job->finish_work.func = drm_sched_entity_kill_work;


I rechecked the latest code and finish_work was removed in ffae3e5 
'drm/scheduler: rework job destruction'

Andrey


> +
>   		/*
>   		 * When pipe is hanged by older entity, new entity might
>   		 * not even have chance to submit it's first job to HW
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: put killed job cleanup to worker
  2019-07-03 14:23 ` Grodzovsky, Andrey
@ 2019-07-03 14:32   ` Lucas Stach
  2019-07-03 14:41     ` Grodzovsky, Andrey
  0 siblings, 1 reply; 6+ messages in thread
From: Lucas Stach @ 2019-07-03 14:32 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Koenig, Christian; +Cc: kernel, dri-devel, patchwork-lst

Am Mittwoch, den 03.07.2019, 14:23 +0000 schrieb Grodzovsky, Andrey:
> On 7/3/19 6:28 AM, Lucas Stach wrote:
> > drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
> > job finished fence signaling. As this might happen from IRQ context we
> > now end up calling the GPU driver free_job callback in IRQ context, while
> > all other paths call it from normal process context.
> > 
> > Etnaviv in particular calls core kernel functions that are only valid to
> > be called from process context when freeing the job. Other drivers might
> > have similar issues, but I did not validate this. Fix this by punting
> > the cleanup work into a work item, so the driver expectations are met.
> > 
> > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> >   drivers/gpu/drm/scheduler/sched_entity.c | 28 ++++++++++++++----------
> >   1 file changed, 17 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 35ddbec1375a..ba4eb66784b9 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
> >   }
> >   EXPORT_SYMBOL(drm_sched_entity_flush);
> >   
> > -/**
> > - * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs
> > - *
> > - * @f: signaled fence
> > - * @cb: our callback structure
> > - *
> > - * Signal the scheduler finished fence when the entity in question is killed.
> > - */
> > +static void drm_sched_entity_kill_work(struct work_struct *work)
> > +{
> > > > +	struct drm_sched_job *job = container_of(work, struct drm_sched_job,
> > > > +						 finish_work);
> > +
> > > > +	drm_sched_fence_finished(job->s_fence);
> > > > +	WARN_ON(job->s_fence->parent);
> > > > +	job->sched->ops->free_job(job);
> > +}
> > +
> >   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
> > > >   					  struct dma_fence_cb *cb)
> >   {
> > > >   	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
> > > >   						 finish_cb);
> >   
> > > > -	drm_sched_fence_finished(job->s_fence);
> > > > -	WARN_ON(job->s_fence->parent);
> > > > -	job->sched->ops->free_job(job);
> > > > +	schedule_work(&job->finish_work);
> >   }
> >   
> >   /**
> > @@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
> > > >   		drm_sched_fence_scheduled(s_fence);
> > > >   		dma_fence_set_error(&s_fence->finished, -ESRCH);
> >   
> > > > +		/*
> > > > +		 * Replace regular finish work function with one that just
> > > > +		 * kills the job.
> > > > +		 */
> > +		job->finish_work.func = drm_sched_entity_kill_work;
> 
> 
> I rechecked the latest code and finish_work was removed in ffae3e5 
> 'drm/scheduler: rework job destruction'

Aw, thanks. Seems this patch was stuck for a bit too long in my
outgoing queue. I've just checked the commit you pointed out, it should
also fix the issue that this patch was trying to fix.

Regards,
Lucas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: put killed job cleanup to worker
  2019-07-03 14:32   ` Lucas Stach
@ 2019-07-03 14:41     ` Grodzovsky, Andrey
  2019-07-03 14:53       ` Lucas Stach
  0 siblings, 1 reply; 6+ messages in thread
From: Grodzovsky, Andrey @ 2019-07-03 14:41 UTC (permalink / raw)
  To: Lucas Stach, Koenig, Christian; +Cc: kernel, dri-devel, patchwork-lst


On 7/3/19 10:32 AM, Lucas Stach wrote:
> Am Mittwoch, den 03.07.2019, 14:23 +0000 schrieb Grodzovsky, Andrey:
>> On 7/3/19 6:28 AM, Lucas Stach wrote:
>>> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
>>> job finished fence signaling. As this might happen from IRQ context we
>>> now end up calling the GPU driver free_job callback in IRQ context, while
>>> all other paths call it from normal process context.
>>>
>>> Etnaviv in particular calls core kernel functions that are only valid to
>>> be called from process context when freeing the job. Other drivers might
>>> have similar issues, but I did not validate this. Fix this by punting
>>> the cleanup work into a work item, so the driver expectations are met.
>>>
>>>>> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_entity.c | 28 ++++++++++++++----------
>>>    1 file changed, 17 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 35ddbec1375a..ba4eb66784b9 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_entity_flush);
>>>    
>>> -/**
>>> - * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs
>>> - *
>>> - * @f: signaled fence
>>> - * @cb: our callback structure
>>> - *
>>> - * Signal the scheduler finished fence when the entity in question is killed.
>>> - */
>>> +static void drm_sched_entity_kill_work(struct work_struct *work)
>>> +{
>>>>> +	struct drm_sched_job *job = container_of(work, struct drm_sched_job,
>>>>> +						 finish_work);
>>> +
>>>>> +	drm_sched_fence_finished(job->s_fence);
>>>>> +	WARN_ON(job->s_fence->parent);
>>>>> +	job->sched->ops->free_job(job);
>>> +}
>>> +
>>>    static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>>>>>    					  struct dma_fence_cb *cb)
>>>    {
>>>>>    	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
>>>>>    						 finish_cb);
>>>    
>>>>> -	drm_sched_fence_finished(job->s_fence);
>>>>> -	WARN_ON(job->s_fence->parent);
>>>>> -	job->sched->ops->free_job(job);
>>>>> +	schedule_work(&job->finish_work);
>>>    }
>>>    
>>>    /**
>>> @@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
>>>>>    		drm_sched_fence_scheduled(s_fence);
>>>>>    		dma_fence_set_error(&s_fence->finished, -ESRCH);
>>>    
>>>>> +		/*
>>>>> +		 * Replace regular finish work function with one that just
>>>>> +		 * kills the job.
>>>>> +		 */
>>> +		job->finish_work.func = drm_sched_entity_kill_work;
>>
>> I rechecked the latest code and finish_work was removed in ffae3e5
>> 'drm/scheduler: rework job destruction'
> Aw, thanks. Seems this patch was stuck for a bit too long in my
> outgoing queue. I've just checked the commit you pointed out, it should
> also fix the issue that this patch was trying to fix.


Not sure about this as you patch only concerns use case when cleaning 
unfinished job's for entity being destroyed.

Andrey


>
> Regards,
> Lucas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: put killed job cleanup to worker
  2019-07-03 14:41     ` Grodzovsky, Andrey
@ 2019-07-03 14:53       ` Lucas Stach
  2019-07-03 15:00         ` Grodzovsky, Andrey
  0 siblings, 1 reply; 6+ messages in thread
From: Lucas Stach @ 2019-07-03 14:53 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Koenig, Christian; +Cc: dri-devel, kernel, patchwork-lst

Am Mittwoch, den 03.07.2019, 14:41 +0000 schrieb Grodzovsky, Andrey:
> On 7/3/19 10:32 AM, Lucas Stach wrote:
> > Am Mittwoch, den 03.07.2019, 14:23 +0000 schrieb Grodzovsky, Andrey:
> > > On 7/3/19 6:28 AM, Lucas Stach wrote:
> > > > drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
> > > > job finished fence signaling. As this might happen from IRQ context we
> > > > now end up calling the GPU driver free_job callback in IRQ context, while
> > > > all other paths call it from normal process context.
> > > > 
> > > > Etnaviv in particular calls core kernel functions that are only valid to
> > > > be called from process context when freeing the job. Other drivers might
> > > > have similar issues, but I did not validate this. Fix this by punting
> > > > the cleanup work into a work item, so the driver expectations are met.
> > > > 
> > > > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > > > 
> > > > ---

[...]

> > > I rechecked the latest code and finish_work was removed in ffae3e5
> > > 'drm/scheduler: rework job destruction'
> > 
> > Aw, thanks. Seems this patch was stuck for a bit too long in my
> > outgoing queue. I've just checked the commit you pointed out, it should
> > also fix the issue that this patch was trying to fix.
> 
> 
> Not sure about this as you patch only concerns use case when cleaning 
> unfinished job's for entity being destroyed.

AFAICS after ffae3e5 all the free_job invocations are done from process
context, so things should work for etnaviv.

Regards,
Lucas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: put killed job cleanup to worker
  2019-07-03 14:53       ` Lucas Stach
@ 2019-07-03 15:00         ` Grodzovsky, Andrey
  0 siblings, 0 replies; 6+ messages in thread
From: Grodzovsky, Andrey @ 2019-07-03 15:00 UTC (permalink / raw)
  To: Lucas Stach, Koenig, Christian; +Cc: dri-devel, kernel, patchwork-lst


On 7/3/19 10:53 AM, Lucas Stach wrote:
> Am Mittwoch, den 03.07.2019, 14:41 +0000 schrieb Grodzovsky, Andrey:
>> On 7/3/19 10:32 AM, Lucas Stach wrote:
>>> Am Mittwoch, den 03.07.2019, 14:23 +0000 schrieb Grodzovsky, Andrey:
>>>> On 7/3/19 6:28 AM, Lucas Stach wrote:
>>>>> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled
>>>>> job finished fence signaling. As this might happen from IRQ context we
>>>>> now end up calling the GPU driver free_job callback in IRQ context, while
>>>>> all other paths call it from normal process context.
>>>>>
>>>>> Etnaviv in particular calls core kernel functions that are only valid to
>>>>> be called from process context when freeing the job. Other drivers might
>>>>> have similar issues, but I did not validate this. Fix this by punting
>>>>> the cleanup work into a work item, so the driver expectations are met.
>>>>>
>>>>>>> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
>>>>> ---
> [...]
>
>>>> I rechecked the latest code and finish_work was removed in ffae3e5
>>>> 'drm/scheduler: rework job destruction'
>>> Aw, thanks. Seems this patch was stuck for a bit too long in my
>>> outgoing queue. I've just checked the commit you pointed out, it should
>>> also fix the issue that this patch was trying to fix.
>>
>> Not sure about this as you patch only concerns use case when cleaning
>> unfinished job's for entity being destroyed.
> AFAICS after ffae3e5 all the free_job invocations are done from process
> context, so things should work for etnaviv.
>
> Regards,
> Lucas


Actually for jobs that were never submitted to HW your change actually 
makes sense as those will still get cleaned from IRQ context when 
entity->last_scheduled will signal.

Andrey

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-03 15:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-03 10:28 [PATCH] drm/scheduler: put killed job cleanup to worker Lucas Stach
2019-07-03 14:23 ` Grodzovsky, Andrey
2019-07-03 14:32   ` Lucas Stach
2019-07-03 14:41     ` Grodzovsky, Andrey
2019-07-03 14:53       ` Lucas Stach
2019-07-03 15:00         ` Grodzovsky, Andrey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).