* [PATCH] drm/scheduler: don't update last scheduled fence in TDR
@ 2018-04-25 8:39 Pixel Ding
[not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Pixel Ding @ 2018-04-25 8:39 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Pixel Ding
The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1
Since we update last sched prior to joining mirror list, the jobs
in mirror list already pass the last sched fence. TDR just run
the jobs in mirror list, so we should not update the last sched
fences in TDR.
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
---
drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
fence = sched->ops->run_job(s_job);
atomic_inc(&sched->hw_rq_count);
- dma_fence_put(s_job->entity->last_scheduled);
- s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
-
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, &s_fence->cb,
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
[not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-26 3:47 ` Ding, Pixel
[not found] ` <6A08CC55-19DC-46E1-9741-24CE89C49763-5C7GfCeVMHo@public.gmane.org>
2018-04-26 3:48 ` Liu, Monk
1 sibling, 1 reply; 6+ messages in thread
From: Ding, Pixel @ 2018-04-26 3:47 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk
Hi Monk,
Please review it. Thanks.
—
Sincerely Yours,
Pixel
On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1
Since we update last sched prior to joining mirror list, the jobs
in mirror list already pass the last sched fence. TDR just run
the jobs in mirror list, so we should not update the last sched
fences in TDR.
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
---
drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
fence = sched->ops->run_job(s_job);
atomic_inc(&sched->hw_rq_count);
- dma_fence_put(s_job->entity->last_scheduled);
- s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
-
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, &s_fence->cb,
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
[not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2018-04-26 3:47 ` Ding, Pixel
@ 2018-04-26 3:48 ` Liu, Monk
1 sibling, 0 replies; 6+ messages in thread
From: Liu, Monk @ 2018-04-26 3:48 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Ding, Pixel
Reviewed-by : Monk Liu <monk.liu@amd.com>
-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
Sent: 2018年4月25日 16:40
To: amd-gfx@lists.freedesktop.org
Cc: Ding, Pixel <Pixel.Ding@amd.com>
Subject: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1
Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR.
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
---
drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
fence = sched->ops->run_job(s_job);
atomic_inc(&sched->hw_rq_count);
- dma_fence_put(s_job->entity->last_scheduled);
- s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
-
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, &s_fence->cb,
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
[not found] ` <6A08CC55-19DC-46E1-9741-24CE89C49763-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-26 4:23 ` zhoucm1
[not found] ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: zhoucm1 @ 2018-04-26 4:23 UTC (permalink / raw)
To: Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk
NAK on it.
First of all, without this patch, Does it cause any issue?
second,
entity->last_scheduled present the last submiting job.
Your this change will break this meaning and don't work, e.g.
1. mirror list has jobA and jobB, assuming they are belonged to same
entity, then the entity->last_scheduled is jobB->finished.
2. when you do recovery, re-submit jobA first, if don't update
last_scheduled, then entity->last_scheduled still is jobB->finished.
3. killed this process, will call to drm_sched_entity_cleanup, will
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't
submitted at all.
So the change isn't necessary at all.
Regards,
David Zhou
On 2018年04月26日 11:47, Ding, Pixel wrote:
> Hi Monk,
>
> Please review it. Thanks.
>
> —
> Sincerely Yours,
> Pixel
>
>
> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
>
> The current sequence in scheduler thread is:
> 1. update last sched fence
> 2. job begin (adding to mirror list)
> 3. job finish (remove from mirror list)
> 4. back to 1
>
> Since we update last sched prior to joining mirror list, the jobs
> in mirror list already pass the last sched fence. TDR just run
> the jobs in mirror list, so we should not update the last sched
> fences in TDR.
>
> Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
> ---
> drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 088ff2b..1f1dd70 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
> fence = sched->ops->run_job(s_job);
> atomic_inc(&sched->hw_rq_count);
>
> - dma_fence_put(s_job->entity->last_scheduled);
> - s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
> -
> if (fence) {
> s_fence->parent = dma_fence_get(fence);
> r = dma_fence_add_callback(fence, &s_fence->cb,
> --
> 2.7.4
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
[not found] ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-03 12:09 ` Christian König
2018-05-07 1:00 ` Ding, Pixel
1 sibling, 0 replies; 6+ messages in thread
From: Christian König @ 2018-05-03 12:09 UTC (permalink / raw)
To: zhoucm1, Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
Liu, Monk
That change is indeed fixing a problem. When drm_sched_job_recovery() is
called s_job->entity should already be NULL.
And even when the process is killed we should still either re-submit the
jobs or set the error.
BTW: There is another bug in that function: guilty_context needs to be
kept alive over the course of a loop.
Going to submit a patch for that.
Regards,
Christian.
Am 26.04.2018 um 06:23 schrieb zhoucm1:
> NAK on it.
>
> First of all, without this patch, Does it cause any issue?
>
> second,
>
> entity->last_scheduled present the last submiting job.
>
> Your this change will break this meaning and don't work, e.g.
>
> 1. mirror list has jobA and jobB, assuming they are belonged to same
> entity, then the entity->last_scheduled is jobB->finished.
>
> 2. when you do recovery, re-submit jobA first, if don't update
> last_scheduled, then entity->last_scheduled still is jobB->finished.
>
> 3. killed this process, will call to drm_sched_entity_cleanup, will
> register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB
> isn't submitted at all.
>
>
> So the change isn't necessary at all.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018年04月26日 11:47, Ding, Pixel wrote:
>> Hi Monk,
>>
>> Please review it. Thanks.
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
>>
>> The current sequence in scheduler thread is:
>> 1. update last sched fence
>> 2. job begin (adding to mirror list)
>> 3. job finish (remove from mirror list)
>> 4. back to 1
>> Since we update last sched prior to joining mirror list,
>> the jobs
>> in mirror list already pass the last sched fence. TDR just run
>> the jobs in mirror list, so we should not update the last sched
>> fences in TDR.
>> Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
>> ---
>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>> 1 file changed, 3 deletions(-)
>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> index 088ff2b..1f1dd70 100644
>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct
>> drm_gpu_scheduler *sched)
>> fence = sched->ops->run_job(s_job);
>> atomic_inc(&sched->hw_rq_count);
>> - dma_fence_put(s_job->entity->last_scheduled);
>> - s_job->entity->last_scheduled =
>> dma_fence_get(&s_fence->finished);
>> -
>> if (fence) {
>> s_fence->parent = dma_fence_get(fence);
>> r = dma_fence_add_callback(fence, &s_fence->cb,
>> --
>> 2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
[not found] ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
2018-05-03 12:09 ` Christian König
@ 2018-05-07 1:00 ` Ding, Pixel
1 sibling, 0 replies; 6+ messages in thread
From: Ding, Pixel @ 2018-05-07 1:00 UTC (permalink / raw)
To: Zhou, David(ChunMing),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk
3. killed this process, will call to drm_sched_entity_cleanup, will
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't
submitted at all.
I think that point is your misunderstanding. Killing process won’t stop jobB to be submitted.
—
Sincerely Yours,
Pixel
On 2018/4/26, 12:23 PM, "amd-gfx on behalf of zhoucm1" <amd-gfx-bounces@lists.freedesktop.org on behalf of zhoucm1@amd.com> wrote:
NAK on it.
First of all, without this patch, Does it cause any issue?
second,
entity->last_scheduled present the last submiting job.
Your this change will break this meaning and don't work, e.g.
1. mirror list has jobA and jobB, assuming they are belonged to same
entity, then the entity->last_scheduled is jobB->finished.
2. when you do recovery, re-submit jobA first, if don't update
last_scheduled, then entity->last_scheduled still is jobB->finished.
3. killed this process, will call to drm_sched_entity_cleanup, will
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't
submitted at all.
So the change isn't necessary at all.
Regards,
David Zhou
On 2018年04月26日 11:47, Ding, Pixel wrote:
> Hi Monk,
>
> Please review it. Thanks.
>
> —
> Sincerely Yours,
> Pixel
>
>
> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
>
> The current sequence in scheduler thread is:
> 1. update last sched fence
> 2. job begin (adding to mirror list)
> 3. job finish (remove from mirror list)
> 4. back to 1
>
> Since we update last sched prior to joining mirror list, the jobs
> in mirror list already pass the last sched fence. TDR just run
> the jobs in mirror list, so we should not update the last sched
> fences in TDR.
>
> Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
> ---
> drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 088ff2b..1f1dd70 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
> fence = sched->ops->run_job(s_job);
> atomic_inc(&sched->hw_rq_count);
>
> - dma_fence_put(s_job->entity->last_scheduled);
> - s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
> -
> if (fence) {
> s_fence->parent = dma_fence_get(fence);
> r = dma_fence_add_callback(fence, &s_fence->cb,
> --
> 2.7.4
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-07 1:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 8:39 [PATCH] drm/scheduler: don't update last scheduled fence in TDR Pixel Ding
[not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2018-04-26 3:47 ` Ding, Pixel
[not found] ` <6A08CC55-19DC-46E1-9741-24CE89C49763-5C7GfCeVMHo@public.gmane.org>
2018-04-26 4:23 ` zhoucm1
[not found] ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
2018-05-03 12:09 ` Christian König
2018-05-07 1:00 ` Ding, Pixel
2018-04-26 3:48 ` Liu, Monk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.