Hm, I think the proper fix is not to loop on submission_in_progress in
amdgpu_cs_flush while bo_fence_lock is held. The looping is there because we
need to prepare the request.dependencies list. We can remove that looping
rather trivially by passing an amdgpu_fence list (with potential submissions in
progress) to the CS thread as dependencies, and the CS thread can initialize
request.dependencies accordingly.