All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Cc: horace.chen@amd.com, christian.koenig@amd.com, Monk.Liu@amd.com
Subject: Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
Date: Wed, 22 Dec 2021 08:50:44 +0100	[thread overview]
Message-ID: <9f32fed2-3a72-44d3-0eb9-474725fc86ab@gmail.com> (raw)
In-Reply-To: <dce7b2d7-ac9c-047c-365b-38added395b8@amd.com>

Am 21.12.21 um 17:03 schrieb Andrey Grodzovsky:
>
> On 2021-12-21 2:02 a.m., Christian König wrote:
>>
>>
>> Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-12-20 2:17 a.m., Christian König wrote:
>>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>>> Restrict jobs resubmission to suspend case
>>>>> only since schedulers not initialised yet on
>>>>> probe.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index 5527c68c51de..8ebd954e06c6 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>>>> amdgpu_device *adev)
>>>>>           if (!ring || !ring->fence_drv.initialized)
>>>>>               continue;
>>>>>   -        if (!ring->no_scheduler) {
>>>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>>>
>>>> Uff, why is that suddenly necessary? Because of the changed order?
>>>>
>>>> Christian.
>>>
>>>
>>> Yes.
>>
>> Mhm, that's quite bad design then.
>
>
> If you look at the original patch for this 
> https://www.spinics.net/lists/amd-gfx/msg67560.html you will
> see that that restarting scheduler here is only relevant for 
> suspend/resume case because there was
> a race to fix. There is no point in this code on driver init because 
> nothing was submitted to scheduler yet
> and so it seems to me ok to add condition that this code run only 
> in_suspend case.

Yeah, but having extra logic like this means that we have some design 
issue in the IP block handling.

We need to clean that and some other odd approaches up at some point, 
but probably not now.

Christian.

>
>
>>
>> How about we keep the order as is and allow specifying the reset work 
>> queue with drm_sched_start() ?
>
>
> As i mentioned above, the fact we even have drm_sched_start there is 
> just part of a solution to resolve a race
> during suspend/resume. It is not for device initialization and indeed, 
> other client drivers of gpu shcheduler never call
> drm_sched_start on device init. We must guarantee that reset work 
> queue already initialized before any job submission to scheduler
> and because of that IMHO the right place for this is drm_sched_init.
>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> drm_sched_resubmit_jobs(&ring->sched);
>>>>>               drm_sched_start(&ring->sched, true);
>>>>>           }
>>>>
>>


WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Cc: daniel@ffwll.ch, horace.chen@amd.com, christian.koenig@amd.com,
	Monk.Liu@amd.com
Subject: Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
Date: Wed, 22 Dec 2021 08:50:44 +0100	[thread overview]
Message-ID: <9f32fed2-3a72-44d3-0eb9-474725fc86ab@gmail.com> (raw)
In-Reply-To: <dce7b2d7-ac9c-047c-365b-38added395b8@amd.com>

Am 21.12.21 um 17:03 schrieb Andrey Grodzovsky:
>
> On 2021-12-21 2:02 a.m., Christian König wrote:
>>
>>
>> Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-12-20 2:17 a.m., Christian König wrote:
>>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>>> Restrict jobs resubmission to suspend case
>>>>> only since schedulers not initialised yet on
>>>>> probe.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index 5527c68c51de..8ebd954e06c6 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>>>> amdgpu_device *adev)
>>>>>           if (!ring || !ring->fence_drv.initialized)
>>>>>               continue;
>>>>>   -        if (!ring->no_scheduler) {
>>>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>>>
>>>> Uff, why is that suddenly necessary? Because of the changed order?
>>>>
>>>> Christian.
>>>
>>>
>>> Yes.
>>
>> Mhm, that's quite bad design then.
>
>
> If you look at the original patch for this 
> https://www.spinics.net/lists/amd-gfx/msg67560.html you will
> see that that restarting scheduler here is only relevant for 
> suspend/resume case because there was
> a race to fix. There is no point in this code on driver init because 
> nothing was submitted to scheduler yet
> and so it seems to me ok to add condition that this code run only 
> in_suspend case.

Yeah, but having extra logic like this means that we have some design 
issue in the IP block handling.

We need to clean that and some other odd approaches up at some point, 
but probably not now.

Christian.

>
>
>>
>> How about we keep the order as is and allow specifying the reset work 
>> queue with drm_sched_start() ?
>
>
> As i mentioned above, the fact we even have drm_sched_start there is 
> just part of a solution to resolve a race
> during suspend/resume. It is not for device initialization and indeed, 
> other client drivers of gpu shcheduler never call
> drm_sched_start on device init. We must guarantee that reset work 
> queue already initialized before any job submission to scheduler
> and because of that IMHO the right place for this is drm_sched_init.
>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> drm_sched_resubmit_jobs(&ring->sched);
>>>>>               drm_sched_start(&ring->sched, true);
>>>>>           }
>>>>
>>


  reply	other threads:[~2021-12-22  7:50 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-17 22:27 [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Andrey Grodzovsky
2021-12-17 22:27 ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:16   ` Christian König
2021-12-20  7:16     ` Christian König
2021-12-20 21:51     ` Andrey Grodzovsky
2021-12-20 21:51       ` Andrey Grodzovsky
2021-12-21  7:05       ` Christian König
2021-12-21  7:05         ` Christian König
2021-12-17 22:27 ` [RFC 3/6] drm/amdgpu: Fix crash on modprobe Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:17   ` Christian König
2021-12-20  7:17     ` Christian König
2021-12-20 19:22     ` Andrey Grodzovsky
2021-12-20 19:22       ` Andrey Grodzovsky
2021-12-21  7:02       ` Christian König
2021-12-21  7:02         ` Christian König
2021-12-21 16:03         ` Andrey Grodzovsky
2021-12-21 16:03           ` Andrey Grodzovsky
2021-12-22  7:50           ` Christian König [this message]
2021-12-22  7:50             ` Christian König
2021-12-17 22:27 ` [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:20   ` Christian König
2021-12-20  7:20     ` Christian König
2021-12-20 22:17     ` Andrey Grodzovsky
2021-12-20 22:17       ` Andrey Grodzovsky
2021-12-21  7:59       ` Christian König
2021-12-21  7:59         ` Christian König
2021-12-21 16:10         ` Andrey Grodzovsky
2021-12-21 16:10           ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 5/6] drm/amdgpu: Drop hive->in_reset Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:25 ` [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Christian König
2021-12-20  7:25   ` Christian König
2021-12-20  9:43   ` Daniel Vetter
2021-12-20  9:43     ` Daniel Vetter
2021-12-20 17:06   ` Liu, Shaoyun
2021-12-20 17:06     ` Liu, Shaoyun
2021-12-20 19:11     ` Andrey Grodzovsky
2021-12-20 19:11       ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f32fed2-3a72-44d3-0eb9-474725fc86ab@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Monk.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=horace.chen@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.