* [PATCH 1/2] drm/amdgpu: set finished fence error if job timedout
@ 2023-05-09 10:22 ZhenGuo Yin
2023-05-09 10:22 ` [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence ZhenGuo Yin
0 siblings, 1 reply; 5+ messages in thread
From: ZhenGuo Yin @ 2023-05-09 10:22 UTC (permalink / raw)
To: amd-gfx; +Cc: ZhenGuo Yin, jingwen.chen, monk.liu, christian.koenig
Set finished fence to ETIME error if job timedout.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 57f8f8b3cd8a..f2c02e4167fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -65,6 +65,8 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
DRM_ERROR("Process information: process %s pid %d thread %s pid %d\n",
ti.process_name, ti.tgid, ti.task_name, ti.pid);
+ dma_fence_set_error(&s_job->s_fence->finished, -ETIME);
+
if (amdgpu_device_should_recover_gpu(ring->adev)) {
struct amdgpu_reset_context reset_context;
memset(&reset_context, 0, sizeof(reset_context));
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
2023-05-09 10:22 [PATCH 1/2] drm/amdgpu: set finished fence error if job timedout ZhenGuo Yin
@ 2023-05-09 10:22 ` ZhenGuo Yin
2023-05-17 15:02 ` Alex Deucher
0 siblings, 1 reply; 5+ messages in thread
From: ZhenGuo Yin @ 2023-05-09 10:22 UTC (permalink / raw)
To: amd-gfx; +Cc: ZhenGuo Yin, jingwen.chen, monk.liu, christian.koenig
[Why]
drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
If entity's dependency is a schedulerd error fence and drm_sched_stop is called
due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
[How]
Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
callback for the dependency with scheduled error fence.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index d3f4ada6a68e..96e173b0a6c6 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
}
s_fence = to_drm_sched_fence(fence);
- if (s_fence && s_fence->sched == sched &&
+ if (!fence->error && s_fence && s_fence->sched == sched &&
!test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
/*
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
2023-05-09 10:22 ` [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence ZhenGuo Yin
@ 2023-05-17 15:02 ` Alex Deucher
2023-05-17 21:01 ` Alex Deucher
0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2023-05-17 15:02 UTC (permalink / raw)
To: ZhenGuo Yin, Maling list - DRI developers
Cc: monk.liu, jingwen.chen, amd-gfx, christian.koenig
+ dri-devel for scheduler
On Tue, May 9, 2023 at 6:23 AM ZhenGuo Yin <zhenguo.yin@amd.com> wrote:
>
> [Why]
> drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
> If entity's dependency is a schedulerd error fence and drm_sched_stop is called
> due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
>
> [How]
> Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
> callback for the dependency with scheduled error fence.
>
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index d3f4ada6a68e..96e173b0a6c6 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> }
>
> s_fence = to_drm_sched_fence(fence);
> - if (s_fence && s_fence->sched == sched &&
> + if (!fence->error && s_fence && s_fence->sched == sched &&
> !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
>
> /*
> --
> 2.35.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
2023-05-17 15:02 ` Alex Deucher
@ 2023-05-17 21:01 ` Alex Deucher
2023-05-30 17:58 ` Christian König
0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2023-05-17 21:01 UTC (permalink / raw)
To: ZhenGuo Yin, Maling list - DRI developers
Cc: monk.liu, jingwen.chen, amd-gfx, christian.koenig
On Wed, May 17, 2023 at 11:02 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> + dri-devel for scheduler
>
> On Tue, May 9, 2023 at 6:23 AM ZhenGuo Yin <zhenguo.yin@amd.com> wrote:
> >
> > [Why]
> > drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
> > If entity's dependency is a schedulerd error fence and drm_sched_stop is called
typo: schedulerd -> scheduler
> > due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
> >
> > [How]
> > Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
> > callback for the dependency with scheduled error fence.
> >
> > Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
The series looks good to me, but it would be good to have Christian
take a look as well. Series is:
Acked-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> > drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index d3f4ada6a68e..96e173b0a6c6 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> > }
> >
> > s_fence = to_drm_sched_fence(fence);
> > - if (s_fence && s_fence->sched == sched &&
> > + if (!fence->error && s_fence && s_fence->sched == sched &&
> > !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
> >
> > /*
> > --
> > 2.35.1
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
2023-05-17 21:01 ` Alex Deucher
@ 2023-05-30 17:58 ` Christian König
0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-05-30 17:58 UTC (permalink / raw)
To: Alex Deucher, ZhenGuo Yin, Maling list - DRI developers
Cc: monk.liu, jingwen.chen, amd-gfx
Am 17.05.23 um 23:01 schrieb Alex Deucher:
> On Wed, May 17, 2023 at 11:02 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>> + dri-devel for scheduler
>>
>> On Tue, May 9, 2023 at 6:23 AM ZhenGuo Yin <zhenguo.yin@amd.com> wrote:
>>> [Why]
>>> drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
>>> If entity's dependency is a schedulerd error fence and drm_sched_stop is called
> typo: schedulerd -> scheduler
>
>>> due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
>>>
>>> [How]
>>> Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
>>> callback for the dependency with scheduled error fence.
>>>
>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
> The series looks good to me, but it would be good to have Christian
> take a look as well. Series is:
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
With Alex comments fixes Reviewed-by: Christian König
<christian.koenig@amd.com>.
But Luben should probably push the patches upstream through drm-misc-next.
Christian.
>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index d3f4ada6a68e..96e173b0a6c6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>> }
>>>
>>> s_fence = to_drm_sched_fence(fence);
>>> - if (s_fence && s_fence->sched == sched &&
>>> + if (!fence->error && s_fence && s_fence->sched == sched &&
>>> !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
>>>
>>> /*
>>> --
>>> 2.35.1
>>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-05-30 17:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 10:22 [PATCH 1/2] drm/amdgpu: set finished fence error if job timedout ZhenGuo Yin
2023-05-09 10:22 ` [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence ZhenGuo Yin
2023-05-17 15:02 ` Alex Deucher
2023-05-17 21:01 ` Alex Deucher
2023-05-30 17:58 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.