All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Data race when use PACKET3_DMA_DATA?
@ 2021-06-02 12:39 Chen Lei
  2021-06-02 13:37 ` Alex Deucher
  0 siblings, 1 reply; 7+ messages in thread
From: Chen Lei @ 2021-06-02 12:39 UTC (permalink / raw)
  To: amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 872 bytes --]

Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.

I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works. 

But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet. 

Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.

I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.

Was it a hardware bug or did I miss something?

 

[-- Attachment #1.2: Type: text/html, Size: 1225 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Data race when use PACKET3_DMA_DATA?
  2021-06-02 12:39 [BUG] Data race when use PACKET3_DMA_DATA? Chen Lei
@ 2021-06-02 13:37 ` Alex Deucher
  2021-06-03  0:29   ` Chen Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2021-06-02 13:37 UTC (permalink / raw)
  To: Chen Lei; +Cc: amd-gfx list

On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s@ict.ac.cn> wrote:
>
> Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
>
> I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
>
> But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
>
> Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
>
> I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
>
> Was it a hardware bug or did I miss something?
>

The CP DMA engine is separate from the various CP micro engines.  When
there is a DMA DATA packet, the DMA operation is offloaded to the CP
DMA engine and the CP engine that processed the packet continues on to
the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
bit selects which CP engine processes the packet (PFP or ME) and the
CP_SYNC bit stops further packet processing on the selected engine
until the DMA is complete.

Alex
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
  2021-06-02 13:37 ` Alex Deucher
@ 2021-06-03  0:29   ` Chen Lei
  2021-06-03  2:11     ` Alex Deucher
  0 siblings, 1 reply; 7+ messages in thread
From: Chen Lei @ 2021-06-03  0:29 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx list

Hi Alex. Thanks for your quick reply. 
I first submit the OpenCL kernel packet and then submit the DMA DATA packet. And the OpenCL kernel reads the value written by the DMA DATA packet. 
If I understand you correctly, that is because the CP engine continues on to process the DMA DATA packet after launching the OpenCL kernel. If so, is there any way to sync the CP engine until the OpenCL kernel is complete?


&gt; -----Original Messages-----
&gt; From: "Alex Deucher" <alexdeucher@gmail.com>
&gt; Sent Time: 2021-06-02 21:37:51 (Wednesday)
&gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
&gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
&gt; Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
&gt; 
&gt; On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s@ict.ac.cn> wrote:
&gt; &gt;
&gt; &gt; Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
&gt; &gt;
&gt; &gt; I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
&gt; &gt;
&gt; &gt; But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
&gt; &gt;
&gt; &gt; Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
&gt; &gt;
&gt; &gt; I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
&gt; &gt;
&gt; &gt; Was it a hardware bug or did I miss something?
&gt; &gt;
&gt; 
&gt; The CP DMA engine is separate from the various CP micro engines.  When
&gt; there is a DMA DATA packet, the DMA operation is offloaded to the CP
&gt; DMA engine and the CP engine that processed the packet continues on to
&gt; the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
&gt; the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
&gt; bit selects which CP engine processes the packet (PFP or ME) and the
&gt; CP_SYNC bit stops further packet processing on the selected engine
&gt; until the DMA is complete.
&gt; 
&gt; Alex
</chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
  2021-06-03  0:29   ` Chen Lei
@ 2021-06-03  2:11     ` Alex Deucher
  2021-06-03  3:37       ` Chen Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Deucher @ 2021-06-03  2:11 UTC (permalink / raw)
  To: Chen Lei; +Cc: amd-gfx list

On Wed, Jun 2, 2021 at 8:29 PM Chen Lei <chenlei18s@ict.ac.cn> wrote:
>
> Hi Alex. Thanks for your quick reply.
> I first submit the OpenCL kernel packet and then submit the DMA DATA packet. And the OpenCL kernel reads the value written by the DMA DATA packet.
> If I understand you correctly, that is because the CP engine continues on to process the DMA DATA packet after launching the OpenCL kernel. If so, is there any way to sync the CP engine until the OpenCL kernel is complete?
>

Once the kernel has been dispatched to the shader cores, the CP will
continue to execute packets in the queue.  If you want it to wait for
the pipeline to drain you'll need to insert a fence packet (e.g.,
RELEASE_MEM).

Alex

>
> &gt; -----Original Messages-----
> &gt; From: "Alex Deucher" <alexdeucher@gmail.com>
> &gt; Sent Time: 2021-06-02 21:37:51 (Wednesday)
> &gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
> &gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
> &gt; Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
> &gt;
> &gt; On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s@ict.ac.cn> wrote:
> &gt; &gt;
> &gt; &gt; Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
> &gt; &gt;
> &gt; &gt; I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
> &gt; &gt;
> &gt; &gt; But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
> &gt; &gt;
> &gt; &gt; Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
> &gt; &gt;
> &gt; &gt; I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
> &gt; &gt;
> &gt; &gt; Was it a hardware bug or did I miss something?
> &gt; &gt;
> &gt;
> &gt; The CP DMA engine is separate from the various CP micro engines.  When
> &gt; there is a DMA DATA packet, the DMA operation is offloaded to the CP
> &gt; DMA engine and the CP engine that processed the packet continues on to
> &gt; the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
> &gt; the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
> &gt; bit selects which CP engine processes the packet (PFP or ME) and the
> &gt; CP_SYNC bit stops further packet processing on the selected engine
> &gt; until the DMA is complete.
> &gt;
> &gt; Alex
> </chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
  2021-06-03  2:11     ` Alex Deucher
@ 2021-06-03  3:37       ` Chen Lei
  2021-06-04  7:40         ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Chen Lei @ 2021-06-03  3:37 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx list

I had seperated the dispatch packet and DMA DATA packet into 2 IBs, and called the amdgpu_cs_ioctl twice. 
If I was not mistaken, the `amdgpu_ib_schedule` would emit fence packets for each amdgpu_cs_ioctl call. 
Did I need to insert the fence packet explicitly after the dispatch packet?


&gt; -----Original Messages-----
&gt; From: "Alex Deucher" <alexdeucher@gmail.com>
&gt; Sent Time: 2021-06-03 10:11:46 (Thursday)
&gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
&gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
&gt; Subject: Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
&gt; 
&gt; On Wed, Jun 2, 2021 at 8:29 PM Chen Lei <chenlei18s@ict.ac.cn> wrote:
&gt; &gt;
&gt; &gt; Hi Alex. Thanks for your quick reply.
&gt; &gt; I first submit the OpenCL kernel packet and then submit the DMA DATA packet. And the OpenCL kernel reads the value written by the DMA DATA packet.
&gt; &gt; If I understand you correctly, that is because the CP engine continues on to process the DMA DATA packet after launching the OpenCL kernel. If so, is there any way to sync the CP engine until the OpenCL kernel is complete?
&gt; &gt;
&gt; 
&gt; Once the kernel has been dispatched to the shader cores, the CP will
&gt; continue to execute packets in the queue.  If you want it to wait for
&gt; the pipeline to drain you'll need to insert a fence packet (e.g.,
&gt; RELEASE_MEM).
&gt; 
&gt; Alex
&gt; 
&gt; &gt;
&gt; &gt; &gt; -----Original Messages-----
&gt; &gt; &gt; From: "Alex Deucher" <alexdeucher@gmail.com>
&gt; &gt; &gt; Sent Time: 2021-06-02 21:37:51 (Wednesday)
&gt; &gt; &gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
&gt; &gt; &gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
&gt; &gt; &gt; Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
&gt; &gt; &gt;
&gt; &gt; &gt; On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s@ict.ac.cn> wrote:
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; Was it a hardware bug or did I miss something?
&gt; &gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; The CP DMA engine is separate from the various CP micro engines.  When
&gt; &gt; &gt; there is a DMA DATA packet, the DMA operation is offloaded to the CP
&gt; &gt; &gt; DMA engine and the CP engine that processed the packet continues on to
&gt; &gt; &gt; the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
&gt; &gt; &gt; the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
&gt; &gt; &gt; bit selects which CP engine processes the packet (PFP or ME) and the
&gt; &gt; &gt; CP_SYNC bit stops further packet processing on the selected engine
&gt; &gt; &gt; until the DMA is complete.
&gt; &gt; &gt;
&gt; &gt; &gt; Alex
&gt; &gt; </chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
&gt; &gt; _______________________________________________
&gt; &gt; amd-gfx mailing list
&gt; &gt; amd-gfx@lists.freedesktop.org
&gt; &gt; https://lists.freedesktop.org/mailman/listinfo/amd-gfx
</chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Data race when use PACKET3_DMA_DATA?
  2021-06-03  3:37       ` Chen Lei
@ 2021-06-04  7:40         ` Christian König
  2021-06-04 13:00           ` Chen Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-06-04  7:40 UTC (permalink / raw)
  To: Chen Lei, Alex Deucher; +Cc: amd-gfx list

Hi,

I think your problem comes from the missing understanding that the 
hardware is heavily pipelined.

In other words commands you send to the hardware just kick of 
asynchronously processing, e.g. a CP DMA command just kicks a copy 
operation but the CP then continue executing commands.

Same thing for a RELEASE_MEM packet, it just kicks of an operation to 
write a value to an address when all Compute or 3D rendering is completed.

But if you want to synchronize execution of the CP commands you still 
need to block for that valuie to be written or otherwise the CP will 
just keep going with the next command.

Regards,
Christian.

Am 03.06.21 um 05:37 schrieb Chen Lei:
> I had seperated the dispatch packet and DMA DATA packet into 2 IBs, and called the amdgpu_cs_ioctl twice.
> If I was not mistaken, the `amdgpu_ib_schedule` would emit fence packets for each amdgpu_cs_ioctl call.
> Did I need to insert the fence packet explicitly after the dispatch packet?
>
>
> &gt; -----Original Messages-----
> &gt; From: "Alex Deucher" <alexdeucher@gmail.com>
> &gt; Sent Time: 2021-06-03 10:11:46 (Thursday)
> &gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
> &gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
> &gt; Subject: Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
> &gt;
> &gt; On Wed, Jun 2, 2021 at 8:29 PM Chen Lei <chenlei18s@ict.ac.cn> wrote:
> &gt; &gt;
> &gt; &gt; Hi Alex. Thanks for your quick reply.
> &gt; &gt; I first submit the OpenCL kernel packet and then submit the DMA DATA packet. And the OpenCL kernel reads the value written by the DMA DATA packet.
> &gt; &gt; If I understand you correctly, that is because the CP engine continues on to process the DMA DATA packet after launching the OpenCL kernel. If so, is there any way to sync the CP engine until the OpenCL kernel is complete?
> &gt; &gt;
> &gt;
> &gt; Once the kernel has been dispatched to the shader cores, the CP will
> &gt; continue to execute packets in the queue.  If you want it to wait for
> &gt; the pipeline to drain you'll need to insert a fence packet (e.g.,
> &gt; RELEASE_MEM).
> &gt;
> &gt; Alex
> &gt;
> &gt; &gt;
> &gt; &gt; &gt; -----Original Messages-----
> &gt; &gt; &gt; From: "Alex Deucher" <alexdeucher@gmail.com>
> &gt; &gt; &gt; Sent Time: 2021-06-02 21:37:51 (Wednesday)
> &gt; &gt; &gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
> &gt; &gt; &gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
> &gt; &gt; &gt; Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
> &gt; &gt; &gt;
> &gt; &gt; &gt; On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s@ict.ac.cn> wrote:
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; Was it a hardware bug or did I miss something?
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt;
> &gt; &gt; &gt; The CP DMA engine is separate from the various CP micro engines.  When
> &gt; &gt; &gt; there is a DMA DATA packet, the DMA operation is offloaded to the CP
> &gt; &gt; &gt; DMA engine and the CP engine that processed the packet continues on to
> &gt; &gt; &gt; the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
> &gt; &gt; &gt; the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
> &gt; &gt; &gt; bit selects which CP engine processes the packet (PFP or ME) and the
> &gt; &gt; &gt; CP_SYNC bit stops further packet processing on the selected engine
> &gt; &gt; &gt; until the DMA is complete.
> &gt; &gt; &gt;
> &gt; &gt; &gt; Alex
> &gt; &gt; </chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
> &gt; &gt; _______________________________________________
> &gt; &gt; amd-gfx mailing list
> &gt; &gt; amd-gfx@lists.freedesktop.org
> &gt; &gt; https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> </chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
  2021-06-04  7:40         ` Christian König
@ 2021-06-04 13:00           ` Chen Lei
  0 siblings, 0 replies; 7+ messages in thread
From: Chen Lei @ 2021-06-04 13:00 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx list

Thanks very much. I get it now.


&gt; -----Original Messages-----
&gt; From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
&gt; Sent Time: 2021-06-04 15:40:08 (Friday)
&gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>, "Alex Deucher" <alexdeucher@gmail.com>
&gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
&gt; Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
&gt; 
&gt; Hi,
&gt; 
&gt; I think your problem comes from the missing understanding that the 
&gt; hardware is heavily pipelined.
&gt; 
&gt; In other words commands you send to the hardware just kick of 
&gt; asynchronously processing, e.g. a CP DMA command just kicks a copy 
&gt; operation but the CP then continue executing commands.
&gt; 
&gt; Same thing for a RELEASE_MEM packet, it just kicks of an operation to 
&gt; write a value to an address when all Compute or 3D rendering is completed.
&gt; 
&gt; But if you want to synchronize execution of the CP commands you still 
&gt; need to block for that valuie to be written or otherwise the CP will 
&gt; just keep going with the next command.
&gt; 
&gt; Regards,
&gt; Christian.
&gt; 
&gt; Am 03.06.21 um 05:37 schrieb Chen Lei:
&gt; &gt; I had seperated the dispatch packet and DMA DATA packet into 2 IBs, and called the amdgpu_cs_ioctl twice.
&gt; &gt; If I was not mistaken, the `amdgpu_ib_schedule` would emit fence packets for each amdgpu_cs_ioctl call.
&gt; &gt; Did I need to insert the fence packet explicitly after the dispatch packet?
&gt; &gt;
&gt; &gt;
&gt; &gt; &gt; -----Original Messages-----
&gt; &gt; &gt; From: "Alex Deucher" <alexdeucher@gmail.com>
&gt; &gt; &gt; Sent Time: 2021-06-03 10:11:46 (Thursday)
&gt; &gt; &gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
&gt; &gt; &gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
&gt; &gt; &gt; Subject: Re: Re: [BUG] Data race when use PACKET3_DMA_DATA?
&gt; &gt; &gt;
&gt; &gt; &gt; On Wed, Jun 2, 2021 at 8:29 PM Chen Lei <chenlei18s@ict.ac.cn> wrote:
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; Hi Alex. Thanks for your quick reply.
&gt; &gt; &gt; &gt; I first submit the OpenCL kernel packet and then submit the DMA DATA packet. And the OpenCL kernel reads the value written by the DMA DATA packet.
&gt; &gt; &gt; &gt; If I understand you correctly, that is because the CP engine continues on to process the DMA DATA packet after launching the OpenCL kernel. If so, is there any way to sync the CP engine until the OpenCL kernel is complete?
&gt; &gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; Once the kernel has been dispatched to the shader cores, the CP will
&gt; &gt; &gt; continue to execute packets in the queue.  If you want it to wait for
&gt; &gt; &gt; the pipeline to drain you'll need to insert a fence packet (e.g.,
&gt; &gt; &gt; RELEASE_MEM).
&gt; &gt; &gt;
&gt; &gt; &gt; Alex
&gt; &gt; &gt;
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; -----Original Messages-----
&gt; &gt; &gt; &gt; &gt; From: "Alex Deucher" <alexdeucher@gmail.com>
&gt; &gt; &gt; &gt; &gt; Sent Time: 2021-06-02 21:37:51 (Wednesday)
&gt; &gt; &gt; &gt; &gt; To: "Chen Lei" <chenlei18s@ict.ac.cn>
&gt; &gt; &gt; &gt; &gt; Cc: "amd-gfx list" <amd-gfx@lists.freedesktop.org>
&gt; &gt; &gt; &gt; &gt; Subject: Re: [BUG] Data race when use PACKET3_DMA_DATA?
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; On Wed, Jun 2, 2021 at 8:44 AM Chen Lei <chenlei18s@ict.ac.cn> wrote:
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; Hi, I noticed that there are two ways to do DMA for amd gpu: the SDMA copy packet and the PM4 dma packet.
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; I had tested the PM4 dma packet:  PACKET3_DMA_DATA. In most of time, it works.
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; But when I launch an OpenCL kernel followed by a host-to-gpu DMA packet, it seems that the OpenCL kernel read the new value written by the following DMA packet.
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; Both the OpenCL kernel and the PM4 dma packet are submitted using the amdgpu_cs_ioctl, and they are submitted to the same ring.
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; I was not family with the hardware details. According to my understanding, because the ring is FIFO, there is no need for any explicit synchronization between the OpenCL kernel launch packet and the dma packet. So the result looked weird. And when I add the synchronization(i.e. amdgpu_cs_wait_ioctl) before the dma packet, everything is OK.
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; Was it a hardware bug or did I miss something?
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; The CP DMA engine is separate from the various CP micro engines.  When
&gt; &gt; &gt; &gt; &gt; there is a DMA DATA packet, the DMA operation is offloaded to the CP
&gt; &gt; &gt; &gt; &gt; DMA engine and the CP engine that processed the packet continues on to
&gt; &gt; &gt; &gt; &gt; the next packet.  You need to use the ENGINE_SEL and CP_SYNC bits in
&gt; &gt; &gt; &gt; &gt; the DMA DATA packet to specify the behavior you want.  The ENGINE_SEL
&gt; &gt; &gt; &gt; &gt; bit selects which CP engine processes the packet (PFP or ME) and the
&gt; &gt; &gt; &gt; &gt; CP_SYNC bit stops further packet processing on the selected engine
&gt; &gt; &gt; &gt; &gt; until the DMA is complete.
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; Alex
&gt; &gt; &gt; &gt; </chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
&gt; &gt; &gt; &gt; _______________________________________________
&gt; &gt; &gt; &gt; amd-gfx mailing list
&gt; &gt; &gt; &gt; amd-gfx@lists.freedesktop.org
&gt; &gt; &gt; &gt; https://lists.freedesktop.org/mailman/listinfo/amd-gfx
&gt; &gt; </chenlei18s@ict.ac.cn></amd-gfx@lists.freedesktop.org></chenlei18s@ict.ac.cn></alexdeucher@gmail.com>
&gt; &gt; _______________________________________________
&gt; &gt; amd-gfx mailing list
&gt; &gt; amd-gfx@lists.freedesktop.org
&gt; &gt; https://lists.freedesktop.org/mailman/listinfo/amd-gfx
</amd-gfx@lists.freedesktop.org></alexdeucher@gmail.com></chenlei18s@ict.ac.cn></ckoenig.leichtzumerken@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-04 13:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 12:39 [BUG] Data race when use PACKET3_DMA_DATA? Chen Lei
2021-06-02 13:37 ` Alex Deucher
2021-06-03  0:29   ` Chen Lei
2021-06-03  2:11     ` Alex Deucher
2021-06-03  3:37       ` Chen Lei
2021-06-04  7:40         ` Christian König
2021-06-04 13:00           ` Chen Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.