On Thu, May 28, 2020 at 10:40 AM Christian König wrote: > Am 28.05.20 um 12:06 schrieb Michel Dänzer: > > On 2020-05-28 11:11 a.m., Christian König wrote: > >> Well we still need implicit sync [...] > > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu > > doesn't ensure later jobs fully see the effects of previous implicitly > > synced jobs", requiring userspace to do pessimistic flushing. > > Yes, exactly that. > > For the background: We also do this flushing for explicit syncs. And > when this was implemented 2-3 years ago we first did the flushing for > implicit sync as well. > > That was immediately reverted and then implemented differently because > it caused severe performance problems in some use cases. > > I'm not sure of the root cause of this performance problems. My > assumption was always that we then insert to many pipeline syncs, but > Marek doesn't seem to think it could be that. > > On the one hand I'm rather keen to remove the extra handling and just > always use the explicit handling for everything because it simplifies > the kernel code quite a bit. On the other hand I don't want to run into > this performance problem again. > > Additional to that what the kernel does is a "full" pipeline sync, e.g. > we busy wait for the full hardware pipeline to drain. That might be > overkill if you just want to do some flushing so that the next shader > sees the stuff written, but I'm not an expert on that. > Do we busy-wait on the CPU or in WAIT_REG_MEM? WAIT_REG_MEM is what UMDs do and should be faster. Marek