Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Li, Dennis" <Dennis.Li@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Zhang, Hawking" <Hawking.Zhang@amd.com>
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
Date: Mon, 12 Apr 2021 14:18:56 -0400	[thread overview]
Message-ID: <ecf465a2-d4fc-1cbf-a9d5-39c3844f23bb@amd.com> (raw)
In-Reply-To: <a970101f-89f1-8bdf-51d9-4a4e5e0f9e9a@amd.com>

[-- Attachment #1.1: Type: text/plain, Size: 7668 bytes --]

On 2021-04-12 2:05 p.m., Christian König wrote:
> Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-12 1:44 p.m., Christian König wrote:
>>
>>>
>>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>>>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>>>> Hi Andrey,
>>>>>
>>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> If we use a list and a flag called 'emit_allowed' under a lock 
>>>>>>>> such that in amdgpu_fence_emit we lock the list, check the flag 
>>>>>>>> and if true add the new HW fence to list and proceed to HW 
>>>>>>>> emition as normal, otherwise return with -ENODEV. In 
>>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, and 
>>>>>>>> then iterate the list and force signal it. Will this not 
>>>>>>>> prevent any new HW fence creation from now on from any place 
>>>>>>>> trying to do so ?
>>>>>>>
>>>>>>> Way to much overhead. The fence processing is intentionally lock 
>>>>>>> free to avoid cache line bouncing because the IRQ can move from 
>>>>>>> CPU to CPU.
>>>>>>>
>>>>>>> We need something which at least the processing of fences in the 
>>>>>>> interrupt handler doesn't affect at all.
>>>>>>
>>>>>>
>>>>>> As far as I see in the code, amdgpu_fence_emit is only called 
>>>>>> from task context. Also, we can skip this list I proposed and 
>>>>>> just use amdgpu_fence_driver_force_completion for each ring to 
>>>>>> signal all created HW fences.
>>>>>
>>>>> Ah, wait a second this gave me another idea.
>>>>>
>>>>> See amdgpu_fence_driver_force_completion():
>>>>>
>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>>>
>>>>> If we change that to something like:
>>>>>
>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>>>
>>>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>>>> fences will be considered signaled.
>>>>>
>>>>> This basically solves out problem of making sure that new fences 
>>>>> are also signaled without any additional overhead whatsoever.
>>>>
>>>>
>>>> Problem with this is that the act of setting the sync_seq to some 
>>>> MAX value alone is not enough, you actually have to call 
>>>> amdgpu_fence_process to iterate and signal the fences currently 
>>>> stored in ring->fence_drv.fences array and to guarantee that once 
>>>> you done your signalling no more HW fences will be added to that 
>>>> array anymore. I was thinking to do something like bellow:
>>>>
>>>
>>> Well we could implement the is_signaled callback once more, but I'm 
>>> not sure if that is a good idea.
>>
>>
>> This indeed could save the explicit signaling I am doing bellow but I 
>> also set an error code there which might be helpful to propagate to users
>>
>>
>>>
>>>> amdgpu_fence_emit()
>>>>
>>>> {
>>>>
>>>>     dma_fence_init(fence);
>>>>
>>>>     srcu_read_lock(amdgpu_unplug_srcu)
>>>>
>>>>     if (!adev->unplug)) {
>>>>
>>>>         seq = ++ring->fence_drv.sync_seq;
>>>>         emit_fence(fence);
>>>>
>>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>>
>>>
>>> You can pretty much ignore this wait here. It is only as a last 
>>> resort so that we never overwrite the ring buffers.
>>
>>
>> If device is present how can I ignore this ?
>>

I think you missed my question here

>>
>>>
>>> But it should not have a timeout as far as I can see.
>>
>>
>> Without timeout wait the who approach falls apart as I can't call 
>> srcu_synchronize on this scope because once device is physically gone 
>> the wait here will be forever
>>
>
> Yeah, but this is intentional. The only alternative to avoid 
> corruption is to wait with a timeout and call BUG() if that triggers. 
> That isn't much better.
>
>>
>>>
>>>>         ring->fence_drv.fences[seq & 
>>>> ring->fence_drv.num_fences_mask] = fence;
>>>>
>>>>     } else {
>>>>
>>>>         dma_fence_set_error(fence, -ENODEV);
>>>>         DMA_fence_signal(fence)
>>>>
>>>>     }
>>>>
>>>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>>>     return fence;
>>>>
>>>> }
>>>>
>>>> amdgpu_pci_remove
>>>>
>>>> {
>>>>
>>>>     adev->unplug = true;
>>>>     synchronize_srcu(amdgpu_unplug_srcu)
>>>>
>>>
>>> Well that is just duplicating what drm_dev_unplug() should be doing 
>>> on a different level.
>>
>>
>> drm_dev_unplug is on a much wider scope, for everything in the device 
>> including 'flushing' in flight IOCTLs, this deals specifically with 
>> the issue of force signalling HW fences
>>
>
> Yeah, but it adds the same overhead as the device srcu.
>
> Christian.

So what's the right approach ? How we guarantee that when running 
amdgpu_fence_driver_force_completion we will signal all the HW fences 
and not racing against some more fences insertion into that array ?

Andrey

>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>     /* Past this point no more fence are submitted to HW ring and 
>>>> hence we can safely call force signal on all that are currently there.
>>>>      * Any subsequently created  HW fences will be returned 
>>>> signaled with an error code right away
>>>>      */
>>>>
>>>>     for_each_ring(adev)
>>>>         amdgpu_fence_process(ring)
>>>>
>>>>     drm_dev_unplug(dev);
>>>>     Stop schedulers
>>>>     cancel_sync(all timers and queued works);
>>>>     hw_fini
>>>>     unmap_mmio
>>>>
>>>> }
>>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>>>
>>>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>>>> then restarting the scheduler could work as well.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>> I didn't get the above and I don't see why I need to reuse the 
>>>>>>>> GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. 
>>>>>>>> Also, not clear to me why are we focusing on the scheduler 
>>>>>>>> threads, any code patch to generate HW fences should be 
>>>>>>>> covered, so any code leading to amdgpu_fence_emit needs to be 
>>>>>>>> taken into account such as, direct IB submissions, VM flushes 
>>>>>>>> e.t.c
>>>>>>>
>>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>>> hotplug could run at the same time as a reset.
>>>>>>
>>>>>>
>>>>>> For going my way indeed now I see now that I have to take reset 
>>>>>> write side lock during HW fences signalling in order to protect 
>>>>>> against scheduler/HW fences detachment and reattachment during 
>>>>>> schedulers stop/restart. But if we go with your approach  then 
>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>>>> drm_dev_enter/exit should be enough to prevent any concurrent GPU 
>>>>>> resets during unplug. In fact I already do it anyway - 
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>
>>>>> Yes, good point as well.
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

[-- Attachment #1.2: Type: text/html, Size: 14876 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx