* [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible()
@ 2019-10-24 16:24 Steven Price
2019-10-25 9:43 ` Christian Gmeiner
2019-10-25 9:49 ` Koenig, Christian
0 siblings, 2 replies; 6+ messages in thread
From: Steven Price @ 2019-10-24 16:24 UTC (permalink / raw)
To: Daniel Vetter, David Airlie
Cc: dri-devel, linux-kernel, Christian König, Alex Deucher,
Andrey Grodzovsky, Erico Nunes, Sam Ravnborg, Sharat Masetty,
Steven Price
drm_sched_cleanup_jobs() attempts to free finished jobs, however because
it is called as the condition of wait_event_interruptible() it must not
sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
Instead let's rename drm_sched_cleanup_jobs() to
drm_sched_get_cleanup_job() and simply return a job for processing if
there is one. The caller can then call the free_job() callback outside
the wait_event_interruptible() where sleeping is possible before
re-checking and returning to sleep if necessary.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
1 file changed, 26 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9a0ee74d82dc..148468447ba9 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
}
/**
- * drm_sched_cleanup_jobs - destroy finished jobs
+ * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
*
* @sched: scheduler instance
*
- * Remove all finished jobs from the mirror list and destroy them.
+ * Returns the next finished job from the mirror list (if there is one)
+ * ready for it to be destroyed.
*/
-static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
+static struct drm_sched_job *
+drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
{
+ struct drm_sched_job *job = NULL;
unsigned long flags;
/* Don't destroy jobs while the timeout worker is running */
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
!cancel_delayed_work(&sched->work_tdr))
- return;
-
+ return NULL;
- while (!list_empty(&sched->ring_mirror_list)) {
- struct drm_sched_job *job;
+ spin_lock_irqsave(&sched->job_list_lock, flags);
- job = list_first_entry(&sched->ring_mirror_list,
+ job = list_first_entry_or_null(&sched->ring_mirror_list,
struct drm_sched_job, node);
- if (!dma_fence_is_signaled(&job->s_fence->finished))
- break;
- spin_lock_irqsave(&sched->job_list_lock, flags);
+ if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
/* remove job from ring_mirror_list */
list_del_init(&job->node);
- spin_unlock_irqrestore(&sched->job_list_lock, flags);
-
- sched->ops->free_job(job);
+ } else {
+ job = NULL;
+ /* queue timeout for next job */
+ drm_sched_start_timeout(sched);
}
- /* queue timeout for next job */
- spin_lock_irqsave(&sched->job_list_lock, flags);
- drm_sched_start_timeout(sched);
spin_unlock_irqrestore(&sched->job_list_lock, flags);
+ return job;
}
/**
@@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
struct drm_sched_fence *s_fence;
struct drm_sched_job *sched_job;
struct dma_fence *fence;
+ struct drm_sched_job *cleanup_job = NULL;
wait_event_interruptible(sched->wake_up_worker,
- (drm_sched_cleanup_jobs(sched),
+ (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
(!drm_sched_blocked(sched) &&
(entity = drm_sched_select_entity(sched))) ||
- kthread_should_stop()));
+ kthread_should_stop());
+
+ while (cleanup_job) {
+ sched->ops->free_job(cleanup_job);
+ /* queue timeout for next job */
+ drm_sched_start_timeout(sched);
+
+ cleanup_job = drm_sched_get_cleanup_job(sched);
+ }
if (!entity)
continue;
--
2.20.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible()
2019-10-24 16:24 [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible() Steven Price
@ 2019-10-25 9:43 ` Christian Gmeiner
2019-10-25 9:49 ` Steven Price
2019-10-25 9:49 ` Koenig, Christian
1 sibling, 1 reply; 6+ messages in thread
From: Christian Gmeiner @ 2019-10-25 9:43 UTC (permalink / raw)
To: Steven Price
Cc: Daniel Vetter, David Airlie, Sharat Masetty, LKML,
DRI mailing list, Alex Deucher, Sam Ravnborg,
Christian König, Erico Nunes
Am Do., 24. Okt. 2019 um 18:25 Uhr schrieb Steven Price <steven.price@arm.com>:
>
> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
> it is called as the condition of wait_event_interruptible() it must not
> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>
> Instead let's rename drm_sched_cleanup_jobs() to
> drm_sched_get_cleanup_job() and simply return a job for processing if
> there is one. The caller can then call the free_job() callback outside
> the wait_event_interruptible() where sleeping is possible before
> re-checking and returning to sleep if necessary.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Without this patch I get the following warning:
[ 242.935254] ------------[ cut here ]------------
[ 242.940044] WARNING: CPU: 2 PID: 109 at kernel/sched/core.c:6731
__might_sleep+0x94/0xa8
[ 242.948242] do not call blocking ops when !TASK_RUNNING; state=1
set at [<38751e36>] prepare_to_wait_event+0xa8/0x180
[ 242.958923] Modules linked in:
[ 242.962010] CPU: 2 PID: 109 Comm: 130000.gpu Not tainted 5.4.0-rc4 #10
[ 242.968551] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 242.975112] [<c0113160>] (unwind_backtrace) from [<c010cf34>]
(show_stack+0x10/0x14)
[ 242.982879] [<c010cf34>] (show_stack) from [<c0c065ec>]
(dump_stack+0xd8/0x110)
[ 242.990213] [<c0c065ec>] (dump_stack) from [<c0128adc>] (__warn+0xc0/0x10c)
[ 242.997194] [<c0128adc>] (__warn) from [<c0128f10>]
(warn_slowpath_fmt+0x8c/0xb8)
[ 243.004697] [<c0128f10>] (warn_slowpath_fmt) from [<c01598bc>]
(__might_sleep+0x94/0xa8)
[ 243.012810] [<c01598bc>] (__might_sleep) from [<c0c246e4>]
(__mutex_lock+0x38/0xa1c)
[ 243.020571] [<c0c246e4>] (__mutex_lock) from [<c0c250e4>]
(mutex_lock_nested+0x1c/0x24)
[ 243.028600] [<c0c250e4>] (mutex_lock_nested) from [<c064f020>]
(etnaviv_cmdbuf_free+0x40/0x8c)
[ 243.037233] [<c064f020>] (etnaviv_cmdbuf_free) from [<c06503a0>]
(etnaviv_submit_put+0x38/0x1c8)
[ 243.046042] [<c06503a0>] (etnaviv_submit_put) from [<c064177c>]
(drm_sched_cleanup_jobs+0xc8/0xec)
[ 243.055021] [<c064177c>] (drm_sched_cleanup_jobs) from [<c06419b4>]
(drm_sched_main+0x214/0x298)
[ 243.063826] [<c06419b4>] (drm_sched_main) from [<c0152890>]
(kthread+0x140/0x158)
[ 243.071329] [<c0152890>] (kthread) from [<c01010b4>]
(ret_from_fork+0x14/0x20)
[ 243.078563] Exception stack(0xec691fb0 to 0xec691ff8)
[ 243.083630] 1fa0: 00000000
00000000 00000000 00000000
[ 243.091822] 1fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 243.100013] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 243.106795] irq event stamp: 321
[ 243.110098] hardirqs last enabled at (339): [<c0193854>]
console_unlock+0x430/0x620
[ 243.117864] hardirqs last disabled at (346): [<c01934cc>]
console_unlock+0xa8/0x620
[ 243.125592] softirqs last enabled at (362): [<c01024e0>]
__do_softirq+0x2c0/0x590
[ 243.133232] softirqs last disabled at (373): [<c0130ed0>]
irq_exit+0x100/0x18c
[ 243.140517] ---[ end trace 8afcd79e9e2725b2 ]---
--
greets
--
Christian Gmeiner, MSc
https://christian-gmeiner.info/privacypolicy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible()
2019-10-24 16:24 [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible() Steven Price
2019-10-25 9:43 ` Christian Gmeiner
@ 2019-10-25 9:49 ` Koenig, Christian
2019-10-25 10:26 ` Steven Price
1 sibling, 1 reply; 6+ messages in thread
From: Koenig, Christian @ 2019-10-25 9:49 UTC (permalink / raw)
To: Steven Price, Daniel Vetter, David Airlie
Cc: dri-devel, linux-kernel, Deucher, Alexander, Grodzovsky, Andrey,
Erico Nunes, Sam Ravnborg, Sharat Masetty
Am 24.10.19 um 18:24 schrieb Steven Price:
> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
> it is called as the condition of wait_event_interruptible() it must not
> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>
> Instead let's rename drm_sched_cleanup_jobs() to
> drm_sched_get_cleanup_job() and simply return a job for processing if
> there is one. The caller can then call the free_job() callback outside
> the wait_event_interruptible() where sleeping is possible before
> re-checking and returning to sleep if necessary.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
>
> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
> 1 file changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 9a0ee74d82dc..148468447ba9 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
> }
>
> /**
> - * drm_sched_cleanup_jobs - destroy finished jobs
> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
> *
> * @sched: scheduler instance
> *
> - * Remove all finished jobs from the mirror list and destroy them.
> + * Returns the next finished job from the mirror list (if there is one)
> + * ready for it to be destroyed.
> */
> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> +static struct drm_sched_job *
> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
> {
> + struct drm_sched_job *job = NULL;
Please don't initialize job here.
> unsigned long flags;
>
> /* Don't destroy jobs while the timeout worker is running */
> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> !cancel_delayed_work(&sched->work_tdr))
> - return;
> -
> + return NULL;
>
> - while (!list_empty(&sched->ring_mirror_list)) {
> - struct drm_sched_job *job;
> + spin_lock_irqsave(&sched->job_list_lock, flags);
>
> - job = list_first_entry(&sched->ring_mirror_list,
> + job = list_first_entry_or_null(&sched->ring_mirror_list,
> struct drm_sched_job, node);
> - if (!dma_fence_is_signaled(&job->s_fence->finished))
> - break;
>
> - spin_lock_irqsave(&sched->job_list_lock, flags);
> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
> /* remove job from ring_mirror_list */
> list_del_init(&job->node);
> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
> -
> - sched->ops->free_job(job);
> + } else {
> + job = NULL;
> + /* queue timeout for next job */
> + drm_sched_start_timeout(sched);
> }
>
> - /* queue timeout for next job */
> - spin_lock_irqsave(&sched->job_list_lock, flags);
> - drm_sched_start_timeout(sched);
> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>
> + return job;
> }
>
> /**
> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
> struct drm_sched_fence *s_fence;
> struct drm_sched_job *sched_job;
> struct dma_fence *fence;
> + struct drm_sched_job *cleanup_job = NULL;
>
> wait_event_interruptible(sched->wake_up_worker,
> - (drm_sched_cleanup_jobs(sched),
> + (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
> (!drm_sched_blocked(sched) &&
> (entity = drm_sched_select_entity(sched))) ||
> - kthread_should_stop()));
> + kthread_should_stop());
> +
> + while (cleanup_job) {
Better make this just "if (cleanup_job)"... to make sure that we stop
immediately when the thread should stop.
Apart from that looks good to me now,
Christian.
> + sched->ops->free_job(cleanup_job);
> + /* queue timeout for next job */
> + drm_sched_start_timeout(sched);
> +
> + cleanup_job = drm_sched_get_cleanup_job(sched);
> + }
>
> if (!entity)
> continue;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible()
2019-10-25 9:43 ` Christian Gmeiner
@ 2019-10-25 9:49 ` Steven Price
0 siblings, 0 replies; 6+ messages in thread
From: Steven Price @ 2019-10-25 9:49 UTC (permalink / raw)
To: Christian Gmeiner
Cc: David Airlie, Sharat Masetty, LKML, DRI mailing list,
Alex Deucher, Sam Ravnborg, Christian König, Erico Nunes
On 25/10/2019 10:43, Christian Gmeiner wrote:
> Am Do., 24. Okt. 2019 um 18:25 Uhr schrieb Steven Price <steven.price@arm.com>:
>>
>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>> it is called as the condition of wait_event_interruptible() it must not
>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>
>> Instead let's rename drm_sched_cleanup_jobs() to
>> drm_sched_get_cleanup_job() and simply return a job for processing if
>> there is one. The caller can then call the free_job() callback outside
>> the wait_event_interruptible() where sleeping is possible before
>> re-checking and returning to sleep if necessary.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>
> Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
>
> Without this patch I get the following warning:
Thanks, if you've got an (easily) reproducible case, can you check which
commit this fixes. I *think*:
Fixes: 5918045c4ed4 ("drm/scheduler: rework job destruction")
But I haven't got a reliable way of reproducing this (with Panfrost).
Thanks,
Steve
>
> [ 242.935254] ------------[ cut here ]------------
> [ 242.940044] WARNING: CPU: 2 PID: 109 at kernel/sched/core.c:6731
> __might_sleep+0x94/0xa8
> [ 242.948242] do not call blocking ops when !TASK_RUNNING; state=1
> set at [<38751e36>] prepare_to_wait_event+0xa8/0x180
> [ 242.958923] Modules linked in:
> [ 242.962010] CPU: 2 PID: 109 Comm: 130000.gpu Not tainted 5.4.0-rc4 #10
> [ 242.968551] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 242.975112] [<c0113160>] (unwind_backtrace) from [<c010cf34>]
> (show_stack+0x10/0x14)
> [ 242.982879] [<c010cf34>] (show_stack) from [<c0c065ec>]
> (dump_stack+0xd8/0x110)
> [ 242.990213] [<c0c065ec>] (dump_stack) from [<c0128adc>] (__warn+0xc0/0x10c)
> [ 242.997194] [<c0128adc>] (__warn) from [<c0128f10>]
> (warn_slowpath_fmt+0x8c/0xb8)
> [ 243.004697] [<c0128f10>] (warn_slowpath_fmt) from [<c01598bc>]
> (__might_sleep+0x94/0xa8)
> [ 243.012810] [<c01598bc>] (__might_sleep) from [<c0c246e4>]
> (__mutex_lock+0x38/0xa1c)
> [ 243.020571] [<c0c246e4>] (__mutex_lock) from [<c0c250e4>]
> (mutex_lock_nested+0x1c/0x24)
> [ 243.028600] [<c0c250e4>] (mutex_lock_nested) from [<c064f020>]
> (etnaviv_cmdbuf_free+0x40/0x8c)
> [ 243.037233] [<c064f020>] (etnaviv_cmdbuf_free) from [<c06503a0>]
> (etnaviv_submit_put+0x38/0x1c8)
> [ 243.046042] [<c06503a0>] (etnaviv_submit_put) from [<c064177c>]
> (drm_sched_cleanup_jobs+0xc8/0xec)
> [ 243.055021] [<c064177c>] (drm_sched_cleanup_jobs) from [<c06419b4>]
> (drm_sched_main+0x214/0x298)
> [ 243.063826] [<c06419b4>] (drm_sched_main) from [<c0152890>]
> (kthread+0x140/0x158)
> [ 243.071329] [<c0152890>] (kthread) from [<c01010b4>]
> (ret_from_fork+0x14/0x20)
> [ 243.078563] Exception stack(0xec691fb0 to 0xec691ff8)
> [ 243.083630] 1fa0: 00000000
> 00000000 00000000 00000000
> [ 243.091822] 1fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [ 243.100013] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 243.106795] irq event stamp: 321
> [ 243.110098] hardirqs last enabled at (339): [<c0193854>]
> console_unlock+0x430/0x620
> [ 243.117864] hardirqs last disabled at (346): [<c01934cc>]
> console_unlock+0xa8/0x620
> [ 243.125592] softirqs last enabled at (362): [<c01024e0>]
> __do_softirq+0x2c0/0x590
> [ 243.133232] softirqs last disabled at (373): [<c0130ed0>]
> irq_exit+0x100/0x18c
> [ 243.140517] ---[ end trace 8afcd79e9e2725b2 ]---
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible()
2019-10-25 9:49 ` Koenig, Christian
@ 2019-10-25 10:26 ` Steven Price
2019-10-25 10:34 ` Koenig, Christian
0 siblings, 1 reply; 6+ messages in thread
From: Steven Price @ 2019-10-25 10:26 UTC (permalink / raw)
To: Koenig, Christian, Daniel Vetter, David Airlie
Cc: Sharat Masetty, linux-kernel, dri-devel, Deucher, Alexander,
Sam Ravnborg, Erico Nunes
On 25/10/2019 10:49, Koenig, Christian wrote:
> Am 24.10.19 um 18:24 schrieb Steven Price:
>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>> it is called as the condition of wait_event_interruptible() it must not
>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>
>> Instead let's rename drm_sched_cleanup_jobs() to
>> drm_sched_get_cleanup_job() and simply return a job for processing if
>> there is one. The caller can then call the free_job() callback outside
>> the wait_event_interruptible() where sleeping is possible before
>> re-checking and returning to sleep if necessary.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
>>
>> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
>> 1 file changed, 26 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 9a0ee74d82dc..148468447ba9 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>> }
>>
>> /**
>> - * drm_sched_cleanup_jobs - destroy finished jobs
>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>> *
>> * @sched: scheduler instance
>> *
>> - * Remove all finished jobs from the mirror list and destroy them.
>> + * Returns the next finished job from the mirror list (if there is one)
>> + * ready for it to be destroyed.
>> */
>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>> +static struct drm_sched_job *
>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>> {
>> + struct drm_sched_job *job = NULL;
>
> Please don't initialize job here.
Good spot, will fix.
>> unsigned long flags;
>>
>> /* Don't destroy jobs while the timeout worker is running */
>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> !cancel_delayed_work(&sched->work_tdr))
>> - return;
>> -
>> + return NULL;
>>
>> - while (!list_empty(&sched->ring_mirror_list)) {
>> - struct drm_sched_job *job;
>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>
>> - job = list_first_entry(&sched->ring_mirror_list,
>> + job = list_first_entry_or_null(&sched->ring_mirror_list,
>> struct drm_sched_job, node);
>> - if (!dma_fence_is_signaled(&job->s_fence->finished))
>> - break;
>>
>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>> /* remove job from ring_mirror_list */
>> list_del_init(&job->node);
>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> - sched->ops->free_job(job);
>> + } else {
>> + job = NULL;
>> + /* queue timeout for next job */
>> + drm_sched_start_timeout(sched);
>> }
>>
>> - /* queue timeout for next job */
>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>> - drm_sched_start_timeout(sched);
>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>
>> + return job;
>> }
>>
>> /**
>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
>> struct drm_sched_fence *s_fence;
>> struct drm_sched_job *sched_job;
>> struct dma_fence *fence;
>> + struct drm_sched_job *cleanup_job = NULL;
>>
>> wait_event_interruptible(sched->wake_up_worker,
>> - (drm_sched_cleanup_jobs(sched),
>> + (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>> (!drm_sched_blocked(sched) &&
>> (entity = drm_sched_select_entity(sched))) ||
>> - kthread_should_stop()));
>> + kthread_should_stop());
>> +
>> + while (cleanup_job) {
>
> Better make this just "if (cleanup_job)"... to make sure that we stop
> immediately when the thread should stop.
Ok, no problem. Note that this is a change in behaviour (previously
drm_sched_cleanup_jobs() was called *before* checking
kthread_should_stop()). But I can't see the harm.
Steve
> Apart from that looks good to me now,
> Christian.
>
>> + sched->ops->free_job(cleanup_job);
>> + /* queue timeout for next job */
>> + drm_sched_start_timeout(sched);
>> +
>> + cleanup_job = drm_sched_get_cleanup_job(sched);
>> + }
>>
>> if (!entity)
>> continue;
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible()
2019-10-25 10:26 ` Steven Price
@ 2019-10-25 10:34 ` Koenig, Christian
0 siblings, 0 replies; 6+ messages in thread
From: Koenig, Christian @ 2019-10-25 10:34 UTC (permalink / raw)
To: Steven Price, Daniel Vetter, David Airlie
Cc: Sharat Masetty, linux-kernel, dri-devel, Deucher, Alexander,
Sam Ravnborg, Erico Nunes
Am 25.10.19 um 12:26 schrieb Steven Price:
> On 25/10/2019 10:49, Koenig, Christian wrote:
>> Am 24.10.19 um 18:24 schrieb Steven Price:
>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>>> it is called as the condition of wait_event_interruptible() it must not
>>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>>
>>> Instead let's rename drm_sched_cleanup_jobs() to
>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>> there is one. The caller can then call the free_job() callback outside
>>> the wait_event_interruptible() where sleeping is possible before
>>> re-checking and returning to sleep if necessary.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
>>>
>>> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
>>> 1 file changed, 26 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74d82dc..148468447ba9 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>> }
>>>
>>> /**
>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>> *
>>> * @sched: scheduler instance
>>> *
>>> - * Remove all finished jobs from the mirror list and destroy them.
>>> + * Returns the next finished job from the mirror list (if there is one)
>>> + * ready for it to be destroyed.
>>> */
>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>> +static struct drm_sched_job *
>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>> {
>>> + struct drm_sched_job *job = NULL;
>> Please don't initialize job here.
> Good spot, will fix.
>
>>> unsigned long flags;
>>>
>>> /* Don't destroy jobs while the timeout worker is running */
>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> !cancel_delayed_work(&sched->work_tdr))
>>> - return;
>>> -
>>> + return NULL;
>>>
>>> - while (!list_empty(&sched->ring_mirror_list)) {
>>> - struct drm_sched_job *job;
>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>
>>> - job = list_first_entry(&sched->ring_mirror_list,
>>> + job = list_first_entry_or_null(&sched->ring_mirror_list,
>>> struct drm_sched_job, node);
>>> - if (!dma_fence_is_signaled(&job->s_fence->finished))
>>> - break;
>>>
>>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>> /* remove job from ring_mirror_list */
>>> list_del_init(&job->node);
>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> - sched->ops->free_job(job);
>>> + } else {
>>> + job = NULL;
>>> + /* queue timeout for next job */
>>> + drm_sched_start_timeout(sched);
>>> }
>>>
>>> - /* queue timeout for next job */
>>> - spin_lock_irqsave(&sched->job_list_lock, flags);
>>> - drm_sched_start_timeout(sched);
>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>
>>> + return job;
>>> }
>>>
>>> /**
>>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
>>> struct drm_sched_fence *s_fence;
>>> struct drm_sched_job *sched_job;
>>> struct dma_fence *fence;
>>> + struct drm_sched_job *cleanup_job = NULL;
>>>
>>> wait_event_interruptible(sched->wake_up_worker,
>>> - (drm_sched_cleanup_jobs(sched),
>>> + (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>>> (!drm_sched_blocked(sched) &&
>>> (entity = drm_sched_select_entity(sched))) ||
>>> - kthread_should_stop()));
>>> + kthread_should_stop());
>>> +
>>> + while (cleanup_job) {
>> Better make this just "if (cleanup_job)"... to make sure that we stop
>> immediately when the thread should stop.
> Ok, no problem. Note that this is a change in behaviour (previously
> drm_sched_cleanup_jobs() was called *before* checking
> kthread_should_stop()). But I can't see the harm.
Yeah, but this is actually a rather nice improvement.
When we say that the thread should stop then that should happen
immediately and not cleanup the jobs first.
Christian.
>
> Steve
>
>> Apart from that looks good to me now,
>> Christian.
>>
>>> + sched->ops->free_job(cleanup_job);
>>> + /* queue timeout for next job */
>>> + drm_sched_start_timeout(sched);
>>> +
>>> + cleanup_job = drm_sched_get_cleanup_job(sched);
>>> + }
>>>
>>> if (!entity)
>>> continue;
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-10-25 10:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 16:24 [RESEND PATCH v4] drm: Don't free jobs in wait_event_interruptible() Steven Price
2019-10-25 9:43 ` Christian Gmeiner
2019-10-25 9:49 ` Steven Price
2019-10-25 9:49 ` Koenig, Christian
2019-10-25 10:26 ` Steven Price
2019-10-25 10:34 ` Koenig, Christian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).