On 2021-04-12 2:23 p.m., Christian König wrote: > Am 12.04.21 um 20:18 schrieb Andrey Grodzovsky: >> >> On 2021-04-12 2:05 p.m., Christian König wrote: >> >>> Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky: >>>> >>>> On 2021-04-12 1:44 p.m., Christian König wrote: >>>> >>>>> >>>>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky: >>>>>> On 2021-04-10 1:34 p.m., Christian König wrote: >>>>>>> Hi Andrey, >>>>>>> >>>>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky: >>>>>>>> [SNIP] >>>>>>>>>> >>>>>>>>>> If we use a list and a flag called 'emit_allowed' under a >>>>>>>>>> lock such that in amdgpu_fence_emit we lock the list, check >>>>>>>>>> the flag and if true add the new HW fence to list and proceed >>>>>>>>>> to HW emition as normal, otherwise return with -ENODEV. In >>>>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, >>>>>>>>>> and then iterate the list and force signal it. Will this not >>>>>>>>>> prevent any new HW fence creation from now on from any place >>>>>>>>>> trying to do so ? >>>>>>>>> >>>>>>>>> Way to much overhead. The fence processing is intentionally >>>>>>>>> lock free to avoid cache line bouncing because the IRQ can >>>>>>>>> move from CPU to CPU. >>>>>>>>> >>>>>>>>> We need something which at least the processing of fences in >>>>>>>>> the interrupt handler doesn't affect at all. >>>>>>>> >>>>>>>> >>>>>>>> As far as I see in the code, amdgpu_fence_emit is only called >>>>>>>> from task context. Also, we can skip this list I proposed and >>>>>>>> just use amdgpu_fence_driver_force_completion for each ring to >>>>>>>> signal all created HW fences. >>>>>>> >>>>>>> Ah, wait a second this gave me another idea. >>>>>>> >>>>>>> See amdgpu_fence_driver_force_completion(): >>>>>>> >>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq); >>>>>>> >>>>>>> If we change that to something like: >>>>>>> >>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF); >>>>>>> >>>>>>> Not only the currently submitted, but also the next 0x3FFFFFFF >>>>>>> fences will be considered signaled. >>>>>>> >>>>>>> This basically solves out problem of making sure that new fences >>>>>>> are also signaled without any additional overhead whatsoever. >>>>>> >>>>>> >>>>>> Problem with this is that the act of setting the sync_seq to some >>>>>> MAX value alone is not enough, you actually have to call >>>>>> amdgpu_fence_process to iterate and signal the fences currently >>>>>> stored in ring->fence_drv.fences array and to guarantee that once >>>>>> you done your signalling no more HW fences will be added to that >>>>>> array anymore. I was thinking to do something like bellow: >>>>>> >>>>> >>>>> Well we could implement the is_signaled callback once more, but >>>>> I'm not sure if that is a good idea. >>>> >>>> >>>> This indeed could save the explicit signaling I am doing bellow but >>>> I also set an error code there which might be helpful to propagate >>>> to users >>>> >>>> >>>>> >>>>>> amdgpu_fence_emit() >>>>>> >>>>>> { >>>>>> >>>>>>     dma_fence_init(fence); >>>>>> >>>>>>     srcu_read_lock(amdgpu_unplug_srcu) >>>>>> >>>>>>     if (!adev->unplug)) { >>>>>> >>>>>>         seq = ++ring->fence_drv.sync_seq; >>>>>>         emit_fence(fence); >>>>>> >>>>>> */* We can't wait forever as the HW might be gone at any point*/** >>>>>>        dma_fence_wait_timeout(old_fence, 5S);* >>>>>> >>>>> >>>>> You can pretty much ignore this wait here. It is only as a last >>>>> resort so that we never overwrite the ring buffers. >>>> >>>> >>>> If device is present how can I ignore this ? >>>> >> >> I think you missed my question here >> > > Sorry I thought I answered that below. > > See this is just the last resort so that we don't need to worry about > ring buffer overflows during testing. > > We should not get here in practice and if we get here generating a > deadlock might actually be the best handling. > > The alternative would be to call BUG(). > >>>> >>>>> >>>>> But it should not have a timeout as far as I can see. >>>> >>>> >>>> Without timeout wait the who approach falls apart as I can't call >>>> srcu_synchronize on this scope because once device is physically >>>> gone the wait here will be forever >>>> >>> >>> Yeah, but this is intentional. The only alternative to avoid >>> corruption is to wait with a timeout and call BUG() if that >>> triggers. That isn't much better. >>> >>>> >>>>> >>>>>>         ring->fence_drv.fences[seq & >>>>>> ring->fence_drv.num_fences_mask] = fence; >>>>>> >>>>>>     } else { >>>>>> >>>>>>         dma_fence_set_error(fence, -ENODEV); >>>>>>         DMA_fence_signal(fence) >>>>>> >>>>>>     } >>>>>> >>>>>>     srcu_read_unlock(amdgpu_unplug_srcu) >>>>>>     return fence; >>>>>> >>>>>> } >>>>>> >>>>>> amdgpu_pci_remove >>>>>> >>>>>> { >>>>>> >>>>>>     adev->unplug = true; >>>>>>     synchronize_srcu(amdgpu_unplug_srcu) >>>>>> >>>>> >>>>> Well that is just duplicating what drm_dev_unplug() should be >>>>> doing on a different level. >>>> >>>> >>>> drm_dev_unplug is on a much wider scope, for everything in the >>>> device including 'flushing' in flight IOCTLs, this deals >>>> specifically with the issue of force signalling HW fences >>>> >>> >>> Yeah, but it adds the same overhead as the device srcu. >>> >>> Christian. >> >> >> So what's the right approach ? How we guarantee that when running >> amdgpu_fence_driver_force_completion we will signal all the HW fences >> and not racing against some more fences insertion into that array ? >> > > Well I would still say the best approach would be to insert this > between the front end and the backend and not rely on signaling fences > while holding the device srcu. My question is, even now, when we run amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what there prevents a race with another fence being at the same time emitted and inserted into the fence array ? Looks like nothing. > > BTW: Could it be that the device SRCU protects more than one device > and we deadlock because of this? I haven't actually experienced any deadlock until now but, yes, drm_unplug_srcu is defined as static in drm_drv.c and so in the presence  of multiple devices from same or different drivers we in fact are dependent on all their critical sections i guess. Andrey > > Christian. > >> Andrey >> >> >>> >>>> Andrey >>>> >>>> >>>>> >>>>> Christian. >>>>> >>>>>>     /* Past this point no more fence are submitted to HW ring and >>>>>> hence we can safely call force signal on all that are currently >>>>>> there. >>>>>>      * Any subsequently created  HW fences will be returned >>>>>> signaled with an error code right away >>>>>>      */ >>>>>> >>>>>>     for_each_ring(adev) >>>>>>         amdgpu_fence_process(ring) >>>>>> >>>>>>     drm_dev_unplug(dev); >>>>>>     Stop schedulers >>>>>>     cancel_sync(all timers and queued works); >>>>>>     hw_fini >>>>>>     unmap_mmio >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Alternatively grabbing the reset write side and stopping and >>>>>>>>>>> then restarting the scheduler could work as well. >>>>>>>>>>> >>>>>>>>>>> Christian. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I didn't get the above and I don't see why I need to reuse >>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for >>>>>>>>>> unplug. Also, not clear to me why are we focusing on the >>>>>>>>>> scheduler threads, any code patch to generate HW fences >>>>>>>>>> should be covered, so any code leading to amdgpu_fence_emit >>>>>>>>>> needs to be taken into account such as, direct IB >>>>>>>>>> submissions, VM flushes e.t.c >>>>>>>>> >>>>>>>>> You need to work together with the reset lock anyway, cause a >>>>>>>>> hotplug could run at the same time as a reset. >>>>>>>> >>>>>>>> >>>>>>>> For going my way indeed now I see now that I have to take reset >>>>>>>> write side lock during HW fences signalling in order to protect >>>>>>>> against scheduler/HW fences detachment and reattachment during >>>>>>>> schedulers stop/restart. But if we go with your approach  then >>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with >>>>>>>> drm_dev_enter/exit should be enough to prevent any concurrent >>>>>>>> GPU resets during unplug. In fact I already do it anyway - >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&reserved=0 >>>>>>> >>>>>>> Yes, good point as well. >>>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>> Andrey >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Andrey >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Christian. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Andrey >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Andrey >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >