Hm, I think the proper fix is not to loop on submission_in_progress in amdgpu_cs_flush while bo_fence_lock is held. The looping is there because we need to prepare the request.dependencies list. We can remove that looping rather trivially by passing an amdgpu_fence list (with potential submissions in progress) to the CS thread as dependencies, and the CS thread can initialize request.dependencies accordingly.