All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: set finished fence error if job timedout
@ 2023-05-09 10:22 ZhenGuo Yin
  2023-05-09 10:22 ` [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence ZhenGuo Yin
  0 siblings, 1 reply; 5+ messages in thread
From: ZhenGuo Yin @ 2023-05-09 10:22 UTC (permalink / raw)
  To: amd-gfx; +Cc: ZhenGuo Yin, jingwen.chen, monk.liu, christian.koenig

Set finished fence to ETIME error if job timedout.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 57f8f8b3cd8a..f2c02e4167fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -65,6 +65,8 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 	DRM_ERROR("Process information: process %s pid %d thread %s pid %d\n",
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
+	dma_fence_set_error(&s_job->s_fence->finished, -ETIME);
+
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
 		struct amdgpu_reset_context reset_context;
 		memset(&reset_context, 0, sizeof(reset_context));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
  2023-05-09 10:22 [PATCH 1/2] drm/amdgpu: set finished fence error if job timedout ZhenGuo Yin
@ 2023-05-09 10:22 ` ZhenGuo Yin
  2023-05-17 15:02   ` Alex Deucher
  0 siblings, 1 reply; 5+ messages in thread
From: ZhenGuo Yin @ 2023-05-09 10:22 UTC (permalink / raw)
  To: amd-gfx; +Cc: ZhenGuo Yin, jingwen.chen, monk.liu, christian.koenig

[Why]
drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
If entity's dependency is a schedulerd error fence and drm_sched_stop is called
due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.

[How]
Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
callback for the dependency with scheduled error fence.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index d3f4ada6a68e..96e173b0a6c6 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
 	}
 
 	s_fence = to_drm_sched_fence(fence);
-	if (s_fence && s_fence->sched == sched &&
+	if (!fence->error && s_fence && s_fence->sched == sched &&
 	    !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
 
 		/*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
  2023-05-09 10:22 ` [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence ZhenGuo Yin
@ 2023-05-17 15:02   ` Alex Deucher
  2023-05-17 21:01     ` Alex Deucher
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2023-05-17 15:02 UTC (permalink / raw)
  To: ZhenGuo Yin, Maling list - DRI developers
  Cc: monk.liu, jingwen.chen, amd-gfx, christian.koenig

+ dri-devel for scheduler

On Tue, May 9, 2023 at 6:23 AM ZhenGuo Yin <zhenguo.yin@amd.com> wrote:
>
> [Why]
> drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
> If entity's dependency is a schedulerd error fence and drm_sched_stop is called
> due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
>
> [How]
> Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
> callback for the dependency with scheduled error fence.
>
> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index d3f4ada6a68e..96e173b0a6c6 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>         }
>
>         s_fence = to_drm_sched_fence(fence);
> -       if (s_fence && s_fence->sched == sched &&
> +       if (!fence->error && s_fence && s_fence->sched == sched &&
>             !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
>
>                 /*
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
  2023-05-17 15:02   ` Alex Deucher
@ 2023-05-17 21:01     ` Alex Deucher
  2023-05-30 17:58       ` Christian König
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2023-05-17 21:01 UTC (permalink / raw)
  To: ZhenGuo Yin, Maling list - DRI developers
  Cc: monk.liu, jingwen.chen, amd-gfx, christian.koenig

On Wed, May 17, 2023 at 11:02 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> + dri-devel for scheduler
>
> On Tue, May 9, 2023 at 6:23 AM ZhenGuo Yin <zhenguo.yin@amd.com> wrote:
> >
> > [Why]
> > drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
> > If entity's dependency is a schedulerd error fence and drm_sched_stop is called

typo: schedulerd -> scheduler

> > due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
> >
> > [How]
> > Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
> > callback for the dependency with scheduled error fence.
> >
> > Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>

The series looks good to me, but it would be good to have Christian
take a look as well.  Series is:
Acked-by: Alex Deucher <alexander.deucher@amd.com>

> > ---
> >  drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index d3f4ada6a68e..96e173b0a6c6 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
> >         }
> >
> >         s_fence = to_drm_sched_fence(fence);
> > -       if (s_fence && s_fence->sched == sched &&
> > +       if (!fence->error && s_fence && s_fence->sched == sched &&
> >             !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
> >
> >                 /*
> > --
> > 2.35.1
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence
  2023-05-17 21:01     ` Alex Deucher
@ 2023-05-30 17:58       ` Christian König
  0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-05-30 17:58 UTC (permalink / raw)
  To: Alex Deucher, ZhenGuo Yin, Maling list - DRI developers
  Cc: monk.liu, jingwen.chen, amd-gfx

Am 17.05.23 um 23:01 schrieb Alex Deucher:
> On Wed, May 17, 2023 at 11:02 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>> + dri-devel for scheduler
>>
>> On Tue, May 9, 2023 at 6:23 AM ZhenGuo Yin <zhenguo.yin@amd.com> wrote:
>>> [Why]
>>> drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
>>> If entity's dependency is a schedulerd error fence and drm_sched_stop is called
> typo: schedulerd -> scheduler
>
>>> due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.
>>>
>>> [How]
>>> Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
>>> callback for the dependency with scheduled error fence.
>>>
>>> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
> The series looks good to me, but it would be good to have Christian
> take a look as well.  Series is:
> Acked-by: Alex Deucher <alexander.deucher@amd.com>

With Alex comments fixes Reviewed-by: Christian König 
<christian.koenig@amd.com>.

But Luben should probably push the patches upstream through drm-misc-next.

Christian.

>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index d3f4ada6a68e..96e173b0a6c6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -384,7 +384,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity)
>>>          }
>>>
>>>          s_fence = to_drm_sched_fence(fence);
>>> -       if (s_fence && s_fence->sched == sched &&
>>> +       if (!fence->error && s_fence && s_fence->sched == sched &&
>>>              !test_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &fence->flags)) {
>>>
>>>                  /*
>>> --
>>> 2.35.1
>>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-30 17:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 10:22 [PATCH 1/2] drm/amdgpu: set finished fence error if job timedout ZhenGuo Yin
2023-05-09 10:22 ` [PATCH 2/2] drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence ZhenGuo Yin
2023-05-17 15:02   ` Alex Deucher
2023-05-17 21:01     ` Alex Deucher
2023-05-30 17:58       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.