All of lore.kernel.org
 help / color / mirror / Atom feed
* amdgpu doesn't do implicit sync, requires drivers to do it in IBs
@ 2020-05-25 22:05 Marek Olšák
  2020-05-25 22:07 ` Marek Olšák
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2020-05-25 22:05 UTC (permalink / raw)
  To: amd-gfx mailing list, Bas Nieuwenhuizen, Michel Dänzer,
	Christian König


[-- Attachment #1.1: Type: text/plain, Size: 343 bytes --]

Hi Christian,

Bas and Michel wanted to discuss this. The main disadvantage of no implicit
(pipeline) sync within the same queue is that we get lower performance and
lower GPU utilization in some cases.

We actually never really needed the kernel to have implicit sync, because
all user mode drivers contained hacks to work without it.

Marek

[-- Attachment #1.2: Type: text/html, Size: 456 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-25 22:05 amdgpu doesn't do implicit sync, requires drivers to do it in IBs Marek Olšák
@ 2020-05-25 22:07 ` Marek Olšák
  2020-05-28  9:11   ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2020-05-25 22:07 UTC (permalink / raw)
  To: amd-gfx mailing list, Bas Nieuwenhuizen, Michel Dänzer,
	Christian König


[-- Attachment #1.1: Type: text/plain, Size: 651 bytes --]

If a user mode driver is changed to rely on the existence of implicit sync,
it results in corruption and flickering as reported here:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/2950

Marek

On Mon, May 25, 2020 at 6:05 PM Marek Olšák <maraeo@gmail.com> wrote:

> Hi Christian,
>
> Bas and Michel wanted to discuss this. The main disadvantage of no
> implicit (pipeline) sync within the same queue is that we get lower
> performance and lower GPU utilization in some cases.
>
> We actually never really needed the kernel to have implicit sync, because
> all user mode drivers contained hacks to work without it.
>
> Marek
>

[-- Attachment #1.2: Type: text/html, Size: 1110 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-25 22:07 ` Marek Olšák
@ 2020-05-28  9:11   ` Christian König
  2020-05-28 10:06     ` Michel Dänzer
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2020-05-28  9:11 UTC (permalink / raw)
  To: Marek Olšák, amd-gfx mailing list, Bas Nieuwenhuizen,
	Michel Dänzer


[-- Attachment #1.1: Type: text/plain, Size: 1285 bytes --]

Well we still need implicit sync or otherwise the GPU scheduler would 
pick up the jobs in the wrong order.

See without this when we have the following sequence of submission:

Client IB1 using buffer A
Client IB2
X IB1 using buffer A

We could end up with the execution order

X IB1 using buffer A
Client IB1 using buffer A
Client IB2

And that is not correct. The scheduler is only allowed to do the following:

Client IB1 using buffer A
X IB1 using buffer A
Client IB2

And that's what implicit sync is taking care of.

Christian.

Am 26.05.20 um 00:07 schrieb Marek Olšák:
> If a user mode driver is changed to rely on the existence of implicit 
> sync, it results in corruption and flickering as reported here: 
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/2950
>
> Marek
>
> On Mon, May 25, 2020 at 6:05 PM Marek Olšák <maraeo@gmail.com 
> <mailto:maraeo@gmail.com>> wrote:
>
>     Hi Christian,
>
>     Bas and Michel wanted to discuss this. The main disadvantage of no
>     implicit (pipeline) sync within the same queue is that we get
>     lower performance and lower GPU utilization in some cases.
>
>     We actually never really needed the kernel to have implicit sync,
>     because all user mode drivers contained hacks to work without it.
>
>     Marek
>


[-- Attachment #1.2: Type: text/html, Size: 2849 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-28  9:11   ` Christian König
@ 2020-05-28 10:06     ` Michel Dänzer
  2020-05-28 14:39       ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Michel Dänzer @ 2020-05-28 10:06 UTC (permalink / raw)
  To: christian.koenig, Marek Olšák, Bas Nieuwenhuizen
  Cc: amd-gfx mailing list

On 2020-05-28 11:11 a.m., Christian König wrote:
> Well we still need implicit sync [...]

Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
doesn't ensure later jobs fully see the effects of previous implicitly
synced jobs", requiring userspace to do pessimistic flushing.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-28 10:06     ` Michel Dänzer
@ 2020-05-28 14:39       ` Christian König
  2020-05-28 16:06         ` Marek Olšák
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2020-05-28 14:39 UTC (permalink / raw)
  To: Michel Dänzer, Marek Olšák, Bas Nieuwenhuizen
  Cc: amd-gfx mailing list

Am 28.05.20 um 12:06 schrieb Michel Dänzer:
> On 2020-05-28 11:11 a.m., Christian König wrote:
>> Well we still need implicit sync [...]
> Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
> doesn't ensure later jobs fully see the effects of previous implicitly
> synced jobs", requiring userspace to do pessimistic flushing.

Yes, exactly that.

For the background: We also do this flushing for explicit syncs. And 
when this was implemented 2-3 years ago we first did the flushing for 
implicit sync as well.

That was immediately reverted and then implemented differently because 
it caused severe performance problems in some use cases.

I'm not sure of the root cause of this performance problems. My 
assumption was always that we then insert to many pipeline syncs, but 
Marek doesn't seem to think it could be that.

On the one hand I'm rather keen to remove the extra handling and just 
always use the explicit handling for everything because it simplifies 
the kernel code quite a bit. On the other hand I don't want to run into 
this performance problem again.

Additional to that what the kernel does is a "full" pipeline sync, e.g. 
we busy wait for the full hardware pipeline to drain. That might be 
overkill if you just want to do some flushing so that the next shader 
sees the stuff written, but I'm not an expert on that.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-28 14:39       ` Christian König
@ 2020-05-28 16:06         ` Marek Olšák
  2020-05-28 18:12           ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2020-05-28 16:06 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, amd-gfx mailing list, Bas Nieuwenhuizen


[-- Attachment #1.1: Type: text/plain, Size: 1640 bytes --]

On Thu, May 28, 2020 at 10:40 AM Christian König <christian.koenig@amd.com>
wrote:

> Am 28.05.20 um 12:06 schrieb Michel Dänzer:
> > On 2020-05-28 11:11 a.m., Christian König wrote:
> >> Well we still need implicit sync [...]
> > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
> > doesn't ensure later jobs fully see the effects of previous implicitly
> > synced jobs", requiring userspace to do pessimistic flushing.
>
> Yes, exactly that.
>
> For the background: We also do this flushing for explicit syncs. And
> when this was implemented 2-3 years ago we first did the flushing for
> implicit sync as well.
>
> That was immediately reverted and then implemented differently because
> it caused severe performance problems in some use cases.
>
> I'm not sure of the root cause of this performance problems. My
> assumption was always that we then insert to many pipeline syncs, but
> Marek doesn't seem to think it could be that.
>
> On the one hand I'm rather keen to remove the extra handling and just
> always use the explicit handling for everything because it simplifies
> the kernel code quite a bit. On the other hand I don't want to run into
> this performance problem again.
>
> Additional to that what the kernel does is a "full" pipeline sync, e.g.
> we busy wait for the full hardware pipeline to drain. That might be
> overkill if you just want to do some flushing so that the next shader
> sees the stuff written, but I'm not an expert on that.
>

Do we busy-wait on the CPU or in WAIT_REG_MEM?

WAIT_REG_MEM is what UMDs do and should be faster.

Marek

[-- Attachment #1.2: Type: text/html, Size: 2139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-28 16:06         ` Marek Olšák
@ 2020-05-28 18:12           ` Christian König
  2020-05-28 19:35             ` Marek Olšák
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2020-05-28 18:12 UTC (permalink / raw)
  To: Marek Olšák
  Cc: Michel Dänzer, amd-gfx mailing list, Bas Nieuwenhuizen


[-- Attachment #1.1: Type: text/plain, Size: 2421 bytes --]

Am 28.05.20 um 18:06 schrieb Marek Olšák:
> On Thu, May 28, 2020 at 10:40 AM Christian König 
> <christian.koenig@amd.com <mailto:christian.koenig@amd.com>> wrote:
>
>     Am 28.05.20 um 12:06 schrieb Michel Dänzer:
>     > On 2020-05-28 11:11 a.m., Christian König wrote:
>     >> Well we still need implicit sync [...]
>     > Yeah, this isn't about "we don't want implicit sync", it's about
>     "amdgpu
>     > doesn't ensure later jobs fully see the effects of previous
>     implicitly
>     > synced jobs", requiring userspace to do pessimistic flushing.
>
>     Yes, exactly that.
>
>     For the background: We also do this flushing for explicit syncs. And
>     when this was implemented 2-3 years ago we first did the flushing for
>     implicit sync as well.
>
>     That was immediately reverted and then implemented differently
>     because
>     it caused severe performance problems in some use cases.
>
>     I'm not sure of the root cause of this performance problems. My
>     assumption was always that we then insert to many pipeline syncs, but
>     Marek doesn't seem to think it could be that.
>
>     On the one hand I'm rather keen to remove the extra handling and just
>     always use the explicit handling for everything because it simplifies
>     the kernel code quite a bit. On the other hand I don't want to run
>     into
>     this performance problem again.
>
>     Additional to that what the kernel does is a "full" pipeline sync,
>     e.g.
>     we busy wait for the full hardware pipeline to drain. That might be
>     overkill if you just want to do some flushing so that the next shader
>     sees the stuff written, but I'm not an expert on that.
>
>
> Do we busy-wait on the CPU or in WAIT_REG_MEM?
>
> WAIT_REG_MEM is what UMDs do and should be faster.

We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.

We use this for a couple of things, especially to make sure that the 
hardware is idle before changing VMID to page table associations.

What about your idea of having an extra dw in the shared BOs indicating 
that they are flushed?

As far as I understand it an EOS or other event might be sufficient for 
the caches as well. And you could insert the WAIT_REG_MEM directly 
before the first draw using the texture and not before the whole IB.

Could be that we can optimize this even more than what we do in the kernel.

Christian.

>
> Marek


[-- Attachment #1.2: Type: text/html, Size: 4178 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-28 18:12           ` Christian König
@ 2020-05-28 19:35             ` Marek Olšák
  2020-05-29  9:05               ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2020-05-28 19:35 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, amd-gfx mailing list, Bas Nieuwenhuizen


[-- Attachment #1.1: Type: text/plain, Size: 2933 bytes --]

On Thu, May 28, 2020 at 2:12 PM Christian König <christian.koenig@amd.com>
wrote:

> Am 28.05.20 um 18:06 schrieb Marek Olšák:
>
> On Thu, May 28, 2020 at 10:40 AM Christian König <christian.koenig@amd.com>
> wrote:
>
>> Am 28.05.20 um 12:06 schrieb Michel Dänzer:
>> > On 2020-05-28 11:11 a.m., Christian König wrote:
>> >> Well we still need implicit sync [...]
>> > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu
>> > doesn't ensure later jobs fully see the effects of previous implicitly
>> > synced jobs", requiring userspace to do pessimistic flushing.
>>
>> Yes, exactly that.
>>
>> For the background: We also do this flushing for explicit syncs. And
>> when this was implemented 2-3 years ago we first did the flushing for
>> implicit sync as well.
>>
>> That was immediately reverted and then implemented differently because
>> it caused severe performance problems in some use cases.
>>
>> I'm not sure of the root cause of this performance problems. My
>> assumption was always that we then insert to many pipeline syncs, but
>> Marek doesn't seem to think it could be that.
>>
>> On the one hand I'm rather keen to remove the extra handling and just
>> always use the explicit handling for everything because it simplifies
>> the kernel code quite a bit. On the other hand I don't want to run into
>> this performance problem again.
>>
>> Additional to that what the kernel does is a "full" pipeline sync, e.g.
>> we busy wait for the full hardware pipeline to drain. That might be
>> overkill if you just want to do some flushing so that the next shader
>> sees the stuff written, but I'm not an expert on that.
>>
>
> Do we busy-wait on the CPU or in WAIT_REG_MEM?
>
> WAIT_REG_MEM is what UMDs do and should be faster.
>
>
> We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.
>
> We use this for a couple of things, especially to make sure that the
> hardware is idle before changing VMID to page table associations.
>
> What about your idea of having an extra dw in the shared BOs indicating
> that they are flushed?
>
> As far as I understand it an EOS or other event might be sufficient for
> the caches as well. And you could insert the WAIT_REG_MEM directly before
> the first draw using the texture and not before the whole IB.
>
> Could be that we can optimize this even more than what we do in the kernel.
>
> Christian.
>

Adding fences into BOs would be bad, because all UMDs would have to handle
them.

Is it possible to do this in the ring buffer:
if (fence_signalled) {
   indirect_buffer(dependent_IB);
   indirect_buffer(other_IB);
} else {
   indirect_buffer(other_IB);
   wait_reg_mem(fence);
   indirect_buffer(dependent_IB);
}

Or we might have to wait for a hw scheduler.

Does the kernel sync when the driver fd is different, or when the context
is different?

Marek

[-- Attachment #1.2: Type: text/html, Size: 5111 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: amdgpu doesn't do implicit sync, requires drivers to do it in IBs
  2020-05-28 19:35             ` Marek Olšák
@ 2020-05-29  9:05               ` Christian König
  0 siblings, 0 replies; 9+ messages in thread
From: Christian König @ 2020-05-29  9:05 UTC (permalink / raw)
  To: Marek Olšák
  Cc: Michel Dänzer, amd-gfx mailing list, Bas Nieuwenhuizen


[-- Attachment #1.1: Type: text/plain, Size: 3725 bytes --]

Am 28.05.20 um 21:35 schrieb Marek Olšák:
> On Thu, May 28, 2020 at 2:12 PM Christian König 
> <christian.koenig@amd.com <mailto:christian.koenig@amd.com>> wrote:
>
>     Am 28.05.20 um 18:06 schrieb Marek Olšák:
>>     On Thu, May 28, 2020 at 10:40 AM Christian König
>>     <christian.koenig@amd.com <mailto:christian.koenig@amd.com>> wrote:
>>
>>         Am 28.05.20 um 12:06 schrieb Michel Dänzer:
>>         > On 2020-05-28 11:11 a.m., Christian König wrote:
>>         >> Well we still need implicit sync [...]
>>         > Yeah, this isn't about "we don't want implicit sync", it's
>>         about "amdgpu
>>         > doesn't ensure later jobs fully see the effects of previous
>>         implicitly
>>         > synced jobs", requiring userspace to do pessimistic flushing.
>>
>>         Yes, exactly that.
>>
>>         For the background: We also do this flushing for explicit
>>         syncs. And
>>         when this was implemented 2-3 years ago we first did the
>>         flushing for
>>         implicit sync as well.
>>
>>         That was immediately reverted and then implemented
>>         differently because
>>         it caused severe performance problems in some use cases.
>>
>>         I'm not sure of the root cause of this performance problems. My
>>         assumption was always that we then insert to many pipeline
>>         syncs, but
>>         Marek doesn't seem to think it could be that.
>>
>>         On the one hand I'm rather keen to remove the extra handling
>>         and just
>>         always use the explicit handling for everything because it
>>         simplifies
>>         the kernel code quite a bit. On the other hand I don't want
>>         to run into
>>         this performance problem again.
>>
>>         Additional to that what the kernel does is a "full" pipeline
>>         sync, e.g.
>>         we busy wait for the full hardware pipeline to drain. That
>>         might be
>>         overkill if you just want to do some flushing so that the
>>         next shader
>>         sees the stuff written, but I'm not an expert on that.
>>
>>
>>     Do we busy-wait on the CPU or in WAIT_REG_MEM?
>>
>>     WAIT_REG_MEM is what UMDs do and should be faster.
>
>     We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.
>
>     We use this for a couple of things, especially to make sure that
>     the hardware is idle before changing VMID to page table associations.
>
>     What about your idea of having an extra dw in the shared BOs
>     indicating that they are flushed?
>
>     As far as I understand it an EOS or other event might be
>     sufficient for the caches as well. And you could insert the
>     WAIT_REG_MEM directly before the first draw using the texture and
>     not before the whole IB.
>
>     Could be that we can optimize this even more than what we do in
>     the kernel.
>
>     Christian.
>
>
> Adding fences into BOs would be bad, because all UMDs would have to 
> handle them.

Yeah, already assumed that this is the biggest blocker.

> Is it possible to do this in the ring buffer:
> if (fence_signalled) {
>    indirect_buffer(dependent_IB);
>    indirect_buffer(other_IB);
> } else {
>    indirect_buffer(other_IB);
>    wait_reg_mem(fence);
>    indirect_buffer(dependent_IB);
> }

That's maybe possible, but at least not easily implementable.

> Or we might have to wait for a hw scheduler.

I'm still fine doing the pipeline sync for implicit sync as well, I just 
need somebody to confirm me that this doesn't backfire in some case.

>
> Does the kernel sync when the driver fd is different, or when the 
> context is different?

Only when the driver fd is different.

Christian.

>
> Marek


[-- Attachment #1.2: Type: text/html, Size: 7936 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-29  9:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 22:05 amdgpu doesn't do implicit sync, requires drivers to do it in IBs Marek Olšák
2020-05-25 22:07 ` Marek Olšák
2020-05-28  9:11   ` Christian König
2020-05-28 10:06     ` Michel Dänzer
2020-05-28 14:39       ` Christian König
2020-05-28 16:06         ` Marek Olšák
2020-05-28 18:12           ` Christian König
2020-05-28 19:35             ` Marek Olšák
2020-05-29  9:05               ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.