Adding fence is done already, and I did wait it before unmap. But then
I see when
the buffer is shared between processes, the "perfect wait" is just
wait the fence
from this process's task, so it's better to also distinguish fences.
If so, I just think
why we don't just wait tasks from this process in the preclose before
unmap/free
buffer in the drm_release()?
Well it depends on your VM management. When userspace expects that the VM
space the BO used is reusable immediately than the TTM callback won't work.
On the other hand you can just grab the list of fences on a BO and filter
out the ones from your current process and wait for those. See
amdgpu_sync_resv() as an example how to do that.