All of lore.kernel.org
 help / color / mirror / Atom feed
* drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
@ 2020-04-25  8:52 Marek Olšák
  2020-04-25 15:04 ` Christian König
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Marek Olšák @ 2020-04-25  8:52 UTC (permalink / raw)
  To: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 44 bytes --]

This should fix SDMA hangs on gfx10.

Marek

[-- Attachment #1.2: Type: text/html, Size: 130 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-invalidate-L2-before-SDMA-IBs.patch --]
[-- Type: text/x-patch, Size: 3856 bytes --]

From 478671f577551f06b68e2564a20bdfb61806a802 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak@amd.com>
Date: Sat, 25 Apr 2020 04:23:01 -0400
Subject: [PATCH] drm/amdgpu: invalidate L2 before SDMA IBs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c          |  3 ++-
 .../gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h    | 16 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c           | 14 +++++++++++++-
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d377c5d17bb8..0177892e609a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -86,9 +86,10 @@
  * - 3.35.0 - Add drm_amdgpu_info_device::tcc_disabled_mask
  * - 3.36.0 - Allow reading more status registers on si/cik
  * - 3.37.0 - Add AMDGPU_IB_FLAG_EMIT_MEM_SYNC
+ * - 3.38.0 - L2 is invalidated before SDMA IBs, needed for correctness
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	37
+#define KMS_DRIVER_MINOR	38
 #define KMS_DRIVER_PATCHLEVEL	0
 
 int amdgpu_vram_limit = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h b/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h
index 074a9a09c0a7..a5b60c9a2418 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h
@@ -73,6 +73,22 @@
 #define SDMA_OP_AQL_COPY  0
 #define SDMA_OP_AQL_BARRIER_OR  0
 
+#define SDMA_GCR_RANGE_IS_PA		(1 << 18)
+#define SDMA_GCR_SEQ(x)			(((x) & 0x3) << 16)
+#define SDMA_GCR_GL2_WB			(1 << 15)
+#define SDMA_GCR_GL2_INV		(1 << 14)
+#define SDMA_GCR_GL2_DISCARD		(1 << 13)
+#define SDMA_GCR_GL2_RANGE(x)		(((x) & 0x3) << 11)
+#define SDMA_GCR_GL2_US			(1 << 10)
+#define SDMA_GCR_GL1_INV		(1 << 9)
+#define SDMA_GCR_GLV_INV		(1 << 8)
+#define SDMA_GCR_GLK_INV		(1 << 7)
+#define SDMA_GCR_GLK_WB			(1 << 6)
+#define SDMA_GCR_GLM_INV		(1 << 5)
+#define SDMA_GCR_GLM_WB			(1 << 4)
+#define SDMA_GCR_GL1_RANGE(x)		(((x) & 0x3) << 2)
+#define SDMA_GCR_GLI_INV(x)		(((x) & 0x3) << 0)
+
 /*define for op field*/
 #define SDMA_PKT_HEADER_op_offset 0
 #define SDMA_PKT_HEADER_op_mask   0x000000FF
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 764f455ada75..b544baf306f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -410,6 +410,18 @@ static void sdma_v5_0_ring_emit_ib(struct amdgpu_ring *ring,
 	unsigned vmid = AMDGPU_JOB_GET_VMID(job);
 	uint64_t csa_mc_addr = amdgpu_sdma_get_csa_mc_addr(ring, vmid);
 
+	/* Invalidate L2, because if we don't do it, we might get stale cache
+	 * lines from previous IBs.
+	 */
+	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_GCR_REQ));
+	amdgpu_ring_write(ring, 0);
+	amdgpu_ring_write(ring, (SDMA_GCR_GL2_INV |
+				 SDMA_GCR_GL2_WB |
+				 SDMA_GCR_GLM_INV |
+				 SDMA_GCR_GLM_WB) << 16);
+	amdgpu_ring_write(ring, 0xffffff80);
+	amdgpu_ring_write(ring, 0xffff);
+
 	/* An IB packet must end on a 8 DW boundary--the next dword
 	 * must be on a 8-dword boundary. Our IB packet below is 6
 	 * dwords long, thus add x number of NOPs, such that, in
@@ -1634,7 +1646,7 @@ static const struct amdgpu_ring_funcs sdma_v5_0_ring_funcs = {
 		SOC15_FLUSH_GPU_TLB_NUM_WREG * 3 +
 		SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 6 * 2 +
 		10 + 10 + 10, /* sdma_v5_0_ring_emit_fence x3 for user fence, vm fence */
-	.emit_ib_size = 7 + 6, /* sdma_v5_0_ring_emit_ib */
+	.emit_ib_size = 5 + 7 + 6, /* sdma_v5_0_ring_emit_ib */
 	.emit_ib = sdma_v5_0_ring_emit_ib,
 	.emit_fence = sdma_v5_0_ring_emit_fence,
 	.emit_pipeline_sync = sdma_v5_0_ring_emit_pipeline_sync,
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-25  8:52 drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10) Marek Olšák
@ 2020-04-25 15:04 ` Christian König
  2020-04-26  0:28   ` Marek Olšák
  2020-04-25 16:28 ` Pierre-Eric Pelloux-Prayer
  2020-04-29  9:06 ` Liu, Monk
  2 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2020-04-25 15:04 UTC (permalink / raw)
  To: Marek Olšák, amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 298 bytes --]

Could we do this in userspace as well?

Am 25.04.20 um 10:52 schrieb Marek Olšák:
> This should fix SDMA hangs on gfx10.
>
> Marek
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 1146 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-25  8:52 drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10) Marek Olšák
  2020-04-25 15:04 ` Christian König
@ 2020-04-25 16:28 ` Pierre-Eric Pelloux-Prayer
  2020-04-29  9:06 ` Liu, Monk
  2 siblings, 0 replies; 13+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2020-04-25 16:28 UTC (permalink / raw)
  To: Marek Olšák, amd-gfx mailing list

Hi Marek,

With this patch applied I could replay a trace (that usually caused a sdma
timeout on the first run) several times in a row without any issues.

So you can tag the commit as:
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Thanks,
Pierre-Eric



On 25/04/2020 10:52, Marek Olšák wrote:
> This should fix SDMA hangs on gfx10.
> 
> Marek
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-25 15:04 ` Christian König
@ 2020-04-26  0:28   ` Marek Olšák
  2020-04-26  8:52     ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Marek Olšák @ 2020-04-26  0:28 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 550 bytes --]

Not without a mandatory firmware update. The gcr packet support for IBs was
added into the sdma firmware just two weeks ago.

Marek

On Sat., Apr. 25, 2020, 11:04 Christian König, <
ckoenig.leichtzumerken@gmail.com> wrote:

> Could we do this in userspace as well?
>
> Am 25.04.20 um 10:52 schrieb Marek Olšák:
>
> This should fix SDMA hangs on gfx10.
>
> Marek
>
> _______________________________________________
> amd-gfx mailing listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 1422 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-26  0:28   ` Marek Olšák
@ 2020-04-26  8:52     ` Christian König
  2020-04-27 13:01       ` Deucher, Alexander
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2020-04-26  8:52 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 1233 bytes --]

Thought so, we should try to get this get it committed ASAP. And maybe 
add a CC: stable tag as well.

Patch is Reviewed-by: Christian König <christian.koenig@amd.com>.

Thanks,
Christian.

Am 26.04.20 um 02:28 schrieb Marek Olšák:
> Not without a mandatory firmware update. The gcr packet support for 
> IBs was added into the sdma firmware just two weeks ago.
>
> Marek
>
> On Sat., Apr. 25, 2020, 11:04 Christian König, 
> <ckoenig.leichtzumerken@gmail.com 
> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
>
>     Could we do this in userspace as well?
>
>     Am 25.04.20 um 10:52 schrieb Marek Olšák:
>>     This should fix SDMA hangs on gfx10.
>>
>>     Marek
>>
>>     _______________________________________________
>>     amd-gfx mailing list
>>     amd-gfx@lists.freedesktop.org  <mailto:amd-gfx@lists.freedesktop.org>
>>     https://lists.freedesktop.org/mailman/listinfo/amd-gfx  <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cchristian.koenig%40amd.com%7C359b939e8c09417cc08008d7e978cf48%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637234577398011504&sdata=XOGnAIRryyooKGH%2FDH1ZIT7TsFWZ8Y5lSDtVOcE6RxQ%3D&reserved=0>
>


[-- Attachment #1.2: Type: text/html, Size: 3042 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-26  8:52     ` Christian König
@ 2020-04-27 13:01       ` Deucher, Alexander
  2020-04-27 13:53         ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Deucher, Alexander @ 2020-04-27 13:01 UTC (permalink / raw)
  To: Koenig, Christian, Marek Olšák; +Cc: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 1873 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Please split the patch into two, one to update the emit IB sequence, and one to bump the minor version.  That way we can make sure older kernels get the new sequence.  Also do we need an sdma fw version check for the new packet in the emi IB function?

Alex
________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Christian König <christian.koenig@amd.com>
Sent: Sunday, April 26, 2020 4:52 AM
To: Marek Olšák <maraeo@gmail.com>
Cc: amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

Thought so, we should try to get this get it committed ASAP. And maybe add a CC: stable tag as well.

Patch is Reviewed-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com>.

Thanks,
Christian.

Am 26.04.20 um 02:28 schrieb Marek Olšák:
Not without a mandatory firmware update. The gcr packet support for IBs was added into the sdma firmware just two weeks ago.

Marek

On Sat., Apr. 25, 2020, 11:04 Christian König, <ckoenig.leichtzumerken@gmail.com<mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
Could we do this in userspace as well?

Am 25.04.20 um 10:52 schrieb Marek Olšák:
This should fix SDMA hangs on gfx10.

Marek



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7Cc88f73e068bb483bb21f08d7e9bf2cd4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637234879980330283&sdata=LZ2GMq0a0n6cm74ulhIkMq%2Fyj6XlG1lj9AH0InU1%2BdQ%3D&reserved=0>




[-- Attachment #1.2: Type: text/html, Size: 4223 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-27 13:01       ` Deucher, Alexander
@ 2020-04-27 13:53         ` Marek Olšák
  0 siblings, 0 replies; 13+ messages in thread
From: Marek Olšák @ 2020-04-27 13:53 UTC (permalink / raw)
  To: Deucher, Alexander; +Cc: Koenig, Christian, amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 2211 bytes --]

I've already pushed the patch and marked it for stable.

I added a note into the commit message to discard the version bump for
stable.

The GCR packet is always supported in the ring regardless of firmware.

Marek

On Mon, Apr 27, 2020 at 9:01 AM Deucher, Alexander <
Alexander.Deucher@amd.com> wrote:

> [AMD Official Use Only - Internal Distribution Only]
>
> Please split the patch into two, one to update the emit IB sequence, and
> one to bump the minor version.  That way we can make sure older kernels get
> the new sequence.  Also do we need an sdma fw version check for the new
> packet in the emi IB function?
>
> Alex
> ------------------------------
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
> Christian König <christian.koenig@amd.com>
> *Sent:* Sunday, April 26, 2020 4:52 AM
> *To:* Marek Olšák <maraeo@gmail.com>
> *Cc:* amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
> *Subject:* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
> Thought so, we should try to get this get it committed ASAP. And maybe add
> a CC: stable tag as well.
>
> Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
> <christian.koenig@amd.com>.
>
> Thanks,
> Christian.
>
> Am 26.04.20 um 02:28 schrieb Marek Olšák:
>
> Not without a mandatory firmware update. The gcr packet support for IBs
> was added into the sdma firmware just two weeks ago.
>
> Marek
>
> On Sat., Apr. 25, 2020, 11:04 Christian König, <
> ckoenig.leichtzumerken@gmail.com> wrote:
>
> Could we do this in userspace as well?
>
> Am 25.04.20 um 10:52 schrieb Marek Olšák:
>
> This should fix SDMA hangs on gfx10.
>
> Marek
>
> _______________________________________________
> amd-gfx mailing listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7Cc88f73e068bb483bb21f08d7e9bf2cd4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637234879980330283&sdata=LZ2GMq0a0n6cm74ulhIkMq%2Fyj6XlG1lj9AH0InU1%2BdQ%3D&reserved=0>
>
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 4402 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-25  8:52 drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10) Marek Olšák
  2020-04-25 15:04 ` Christian König
  2020-04-25 16:28 ` Pierre-Eric Pelloux-Prayer
@ 2020-04-29  9:06 ` Liu, Monk
  2020-04-29  9:35   ` Liu, Monk
  2 siblings, 1 reply; 13+ messages in thread
From: Liu, Monk @ 2020-04-29  9:06 UTC (permalink / raw)
  To: Marek Olšák, amd-gfx mailing list, Koenig, Christian


[-- Attachment #1.1.1: Type: text/plain, Size: 1237 bytes --]

Hi @Koenig, Christian<mailto:Christian.Koenig@amd.com> & Marek

I still have some concerns regarding Marek’s patch, correct me if I’m wrong

See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA won’t get stale cache data during the IB execution.

But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache right ?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?

Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their operation result from their cache to memory ??

Besides, with my understanding the “EOP” of gfx ring is doing the thing of “invalidate/flush” L2 cache upon a fence signaled, so what we should do on SDMA5 is to insert this “SDMA_OP_GCR_REQ”
Right before thee “emit_fence” of SDMA  (this is what windows KMD do)

thanks
_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Marek Ol?ák
Sent: Saturday, April 25, 2020 4:52 PM
To: amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

This should fix SDMA hangs on gfx10.

Marek

[-- Attachment #1.1.2: Type: text/html, Size: 4931 bytes --]

[-- Attachment #1.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-29  9:06 ` Liu, Monk
@ 2020-04-29  9:35   ` Liu, Monk
  2020-04-29  9:37     ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Liu, Monk @ 2020-04-29  9:35 UTC (permalink / raw)
  To: Marek Olšák, amd-gfx mailing list, Koenig, Christian


[-- Attachment #1.1.1: Type: text/plain, Size: 2338 bytes --]

Here is the reason we should always insert a “sync mem” packet at the FENCE place of SDMA, not before IB emit.

By always inserting “sync mem” in the FENCE place we can make sure:1

  1.  data is really flushed to system memory before CPU try to read it
  2.  all the G2LC is invalidated by “sync mem”, thus in the next round SDMA IB, it won’t get staled data from G2LC cache

by inserting “sync mem” in prior to IB could only achieve :  Avoid get staled data in g2lc during IB execution

for GFX/COMPUTE ring since they have release_mem packet so it is inherently doing the G2LC flush and invalidate upon a fence signaled

_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: Liu, Monk
Sent: Wednesday, April 29, 2020 5:06 PM
To: 'Marek Olšák' <maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

Hi @Koenig, Christian<mailto:Christian.Koenig@amd.com> & Marek

I still have some concerns regarding Marek’s patch, correct me if I’m wrong

See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA won’t get stale cache data during the IB execution.

But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache right ?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?

Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their operation result from their cache to memory ??

Besides, with my understanding the “EOP” of gfx ring is doing the thing of “invalidate/flush” L2 cache upon a fence signaled, so what we should do on SDMA5 is to insert this “SDMA_OP_GCR_REQ”
Right before thee “emit_fence” of SDMA  (this is what windows KMD do)

thanks
_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Marek Ol?ák
Sent: Saturday, April 25, 2020 4:52 PM
To: amd-gfx mailing list <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Subject: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

This should fix SDMA hangs on gfx10.

Marek

[-- Attachment #1.1.2: Type: text/html, Size: 9095 bytes --]

[-- Attachment #1.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-29  9:35   ` Liu, Monk
@ 2020-04-29  9:37     ` Christian König
  2020-04-29 11:24       ` Liu, Monk
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2020-04-29  9:37 UTC (permalink / raw)
  To: Liu, Monk, Marek Olšák, amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 2732 bytes --]

Well from my understanding I think that a G2LC invalidation is still 
necessary before an IB executes.

The problem is that the memory of the IB could also be cached because of 
some activity of the GFX or Compute rings.

Regards,
Christian.

Am 29.04.20 um 11:35 schrieb Liu, Monk:
>
> Here is the reason we should always insert a “sync mem” packet at the 
> FENCE place of SDMA, not before IB emit.
>
> By always inserting “sync mem” in the FENCE place we can make sure:1
>
>  1. data is really flushed to system memory before CPU try to read it
>  2. all the G2LC is invalidated by “sync mem”, thus in the next round
>     SDMA IB, it won’t get staled data from G2LC cache
>
> by inserting “sync mem” in prior to IB could only achieve :  Avoid get 
> staled data in g2lc during IB execution
>
> for GFX/COMPUTE ring since they have release_mem packet so it is 
> inherently doing the G2LC flush and invalidate upon a fence signaled
>
> _____________________________________
>
> Monk Liu|GPU Virtualization Team |AMD
>
> sig-cloud-gpu
>
> *From:* Liu, Monk
> *Sent:* Wednesday, April 29, 2020 5:06 PM
> *To:* 'Marek Olšák' <maraeo@gmail.com>; amd-gfx mailing list 
> <amd-gfx@lists.freedesktop.org>; Koenig, Christian 
> <Christian.Koenig@amd.com>
> *Subject:* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
> Hi @Koenig, Christian <mailto:Christian.Koenig@amd.com> & Marek
>
> I still have some concerns regarding Marek’s patch, correct me if I’m 
> wrong
>
> See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure 
> SDMA won’t get stale cache data during the IB execution.
>
> But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC 
> cache right ?  what if the memory is changed by MM or CPU (out side of 
> GFXHUB) ?
>
> Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their 
> operation result from their cache to memory ??
>
> Besides, with my understanding the “EOP” of gfx ring is doing the 
> thing of “invalidate/flush” L2 cache upon a fence signaled, so what we 
> should do on SDMA5 is to insert this “SDMA_OP_GCR_REQ”
>
> Right before thee “emit_fence” of SDMA  (this is what windows KMD do)
>
> thanks
>
> _____________________________________
>
> Monk Liu|GPU Virtualization Team |AMD
>
> sig-cloud-gpu
>
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org 
> <mailto:amd-gfx-bounces@lists.freedesktop.org>> *On Behalf Of *Marek Ol?ák
> *Sent:* Saturday, April 25, 2020 4:52 PM
> *To:* amd-gfx mailing list <amd-gfx@lists.freedesktop.org 
> <mailto:amd-gfx@lists.freedesktop.org>>
> *Subject:* drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
> This should fix SDMA hangs on gfx10.
>
> Marek
>


[-- Attachment #1.2.1: Type: text/html, Size: 10490 bytes --]

[-- Attachment #1.2.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-29  9:37     ` Christian König
@ 2020-04-29 11:24       ` Liu, Monk
  2020-05-01 16:02         ` Marek Olšák
  0 siblings, 1 reply; 13+ messages in thread
From: Liu, Monk @ 2020-04-29 11:24 UTC (permalink / raw)
  To: Koenig, Christian, Marek Olšák, amd-gfx mailing list


[-- Attachment #1.1.1: Type: text/plain, Size: 3574 bytes --]

>> Well from my understanding I think that a G2LC invalidation is still necessary before an IB executes.
Agree, I think before an IB executes the only thing we need on GL2C is the invalidation, not the flush .

>> The problem is that the memory of the IB could also be cached because of some activity of the GFX or Compute rings.
If we always insert a GL2C invalidate at every EOP of every IB from every engine, why we need a GL2C invalidate before IB  execute ?
_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Wednesday, April 29, 2020 5:38 PM
To: Liu, Monk <Monk.Liu@amd.com>; Marek Olšák <maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

Well from my understanding I think that a G2LC invalidation is still necessary before an IB executes.

The problem is that the memory of the IB could also be cached because of some activity of the GFX or Compute rings.

Regards,
Christian.

Am 29.04.20 um 11:35 schrieb Liu, Monk:
Here is the reason we should always insert a “sync mem” packet at the FENCE place of SDMA, not before IB emit.

By always inserting “sync mem” in the FENCE place we can make sure:1

  1.  data is really flushed to system memory before CPU try to read it
  2.  all the G2LC is invalidated by “sync mem”, thus in the next round SDMA IB, it won’t get staled data from G2LC cache

by inserting “sync mem” in prior to IB could only achieve :  Avoid get staled data in g2lc during IB execution

for GFX/COMPUTE ring since they have release_mem packet so it is inherently doing the G2LC flush and invalidate upon a fence signaled

_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: Liu, Monk
Sent: Wednesday, April 29, 2020 5:06 PM
To: 'Marek Olšák' <maraeo@gmail.com><mailto:maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org><mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Subject: RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

Hi @Koenig, Christian<mailto:Christian.Koenig@amd.com> & Marek

I still have some concerns regarding Marek’s patch, correct me if I’m wrong

See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA won’t get stale cache data during the IB execution.

But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache right ?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?

Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their operation result from their cache to memory ??

Besides, with my understanding the “EOP” of gfx ring is doing the thing of “invalidate/flush” L2 cache upon a fence signaled, so what we should do on SDMA5 is to insert this “SDMA_OP_GCR_REQ”
Right before thee “emit_fence” of SDMA  (this is what windows KMD do)

thanks
_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Marek Ol?ák
Sent: Saturday, April 25, 2020 4:52 PM
To: amd-gfx mailing list <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Subject: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

This should fix SDMA hangs on gfx10.

Marek


[-- Attachment #1.1.2: Type: text/html, Size: 11923 bytes --]

[-- Attachment #1.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
  2020-04-29 11:24       ` Liu, Monk
@ 2020-05-01 16:02         ` Marek Olšák
  0 siblings, 0 replies; 13+ messages in thread
From: Marek Olšák @ 2020-05-01 16:02 UTC (permalink / raw)
  To: Liu, Monk; +Cc: Koenig, Christian, amd-gfx mailing list


[-- Attachment #1.1.1: Type: text/plain, Size: 4594 bytes --]

I'll answer two questions asked:

1) SDMA doesn't need GCR at the end of IBs. It's because SDMA writes bypass
GL2 and at the same time they invalidate all cache lines they touch.

2)
> If we always insert a GL2C invalidate at every EOP of every IB from every
engine, why we need a GL2C invalidate before IB  execute ?

I just sent you a counterexample on a private thread that proves that
invalidation in RELEASE_MEM doesn't accomplish anything. The invalidation
flag is there because:
1) it was inherited from gfx9, which was inherited from gfx8, which was
inherited from gfx7, which doesn't have the WB flag, so INV has to be used
instead.
2) to hide bugs

Marek

On Wed, Apr 29, 2020 at 7:24 AM Liu, Monk <Monk.Liu@amd.com> wrote:

> >> Well from my understanding I think that a G2LC invalidation is still
> necessary before an IB executes.
>
> Agree, I think before an IB executes the only thing we need on GL2C is the
> invalidation, not the flush .
>
>
> >> The problem is that the memory of the IB could also be cached because
> of some activity of the GFX or Compute rings.
>
> If we always insert a GL2C invalidate at every EOP of every IB from every
> engine, why we need a GL2C invalidate before IB  execute ?
>
> _____________________________________
>
> Monk Liu|GPU Virtualization Team |AMD
>
> [image: sig-cloud-gpu]
>
>
>
> *From:* Koenig, Christian <Christian.Koenig@amd.com>
> *Sent:* Wednesday, April 29, 2020 5:38 PM
> *To:* Liu, Monk <Monk.Liu@amd.com>; Marek Olšák <maraeo@gmail.com>;
> amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
> *Subject:* Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
>
>
> Well from my understanding I think that a G2LC invalidation is still
> necessary before an IB executes.
>
> The problem is that the memory of the IB could also be cached because of
> some activity of the GFX or Compute rings.
>
> Regards,
> Christian.
>
> Am 29.04.20 um 11:35 schrieb Liu, Monk:
>
> Here is the reason we should always insert a “sync mem” packet at the
> FENCE place of SDMA, not before IB emit.
>
>
>
> By always inserting “sync mem” in the FENCE place we can make sure:1
>
>    1. data is really flushed to system memory before CPU try to read it
>    2. all the G2LC is invalidated by “sync mem”, thus in the next round
>    SDMA IB, it won’t get staled data from G2LC cache
>
>
>
> by inserting “sync mem” in prior to IB could only achieve :  Avoid get
> staled data in g2lc during IB execution
>
>
>
> for GFX/COMPUTE ring since they have release_mem packet so it is
> inherently doing the G2LC flush and invalidate upon a fence signaled
>
>
>
> _____________________________________
>
> Monk Liu|GPU Virtualization Team |AMD
>
> [image: sig-cloud-gpu]
>
>
>
> *From:* Liu, Monk
> *Sent:* Wednesday, April 29, 2020 5:06 PM
> *To:* 'Marek Olšák' <maraeo@gmail.com> <maraeo@gmail.com>; amd-gfx
> mailing list <amd-gfx@lists.freedesktop.org>
> <amd-gfx@lists.freedesktop.org>; Koenig, Christian
> <Christian.Koenig@amd.com> <Christian.Koenig@amd.com>
> *Subject:* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
>
>
> Hi @Koenig, Christian <Christian.Koenig@amd.com> & Marek
>
>
>
> I still have some concerns regarding Marek’s patch, correct me if I’m wrong
>
>
>
> See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA
> won’t get stale cache data during the IB execution.
>
>
>
> But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache
> right ?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?
>
>
>
> Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their
> operation result from their cache to memory ??
>
>
>
> Besides, with my understanding the “EOP” of gfx ring is doing the thing of
> “invalidate/flush” L2 cache upon a fence signaled, so what we should do on
> SDMA5 is to insert this “SDMA_OP_GCR_REQ”
>
> Right before thee “emit_fence” of SDMA  (this is what windows KMD do)
>
>
>
> thanks
>
> _____________________________________
>
> Monk Liu|GPU Virtualization Team |AMD
>
> [image: sig-cloud-gpu]
>
>
>
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of *Marek
> Ol?ák
> *Sent:* Saturday, April 25, 2020 4:52 PM
> *To:* amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
> *Subject:* drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
>
>
>
> This should fix SDMA hangs on gfx10.
>
>
>
> Marek
>
>
>

[-- Attachment #1.1.2: Type: text/html, Size: 10059 bytes --]

[-- Attachment #1.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
@ 2020-04-29 12:17 Koenig, Christian
  0 siblings, 0 replies; 13+ messages in thread
From: Koenig, Christian @ 2020-04-29 12:17 UTC (permalink / raw)
  To: Liu, Monk; +Cc: amd-gfx mailing list, Marek Olšák


[-- Attachment #1.1: Type: text/plain, Size: 4147 bytes --]

Hi Monk,

because some parallel execution could load the GL2C.

See you need to insert cache invalidations before you start reading something which another engine has written.

And you need cache flushes to make sure that something your engine has written has reached memory before you signal finished execution.

That's perfectly normal cache handling what Marek is doing here.

Regards,
Christian.

Am 29.04.2020 13:24 schrieb "Liu, Monk" <Monk.Liu@amd.com>:

>> Well from my understanding I think that a G2LC invalidation is still necessary before an IB executes.

Agree, I think before an IB executes the only thing we need on GL2C is the invalidation, not the flush .

>> The problem is that the memory of the IB could also be cached because of some activity of the GFX or Compute rings.

If we always insert a GL2C invalidate at every EOP of every IB from every engine, why we need a GL2C invalidate before IB  execute ?

_____________________________________

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Wednesday, April 29, 2020 5:38 PM
To: Liu, Monk <Monk.Liu@amd.com>; Marek Olšák <maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



Well from my understanding I think that a G2LC invalidation is still necessary before an IB executes.

The problem is that the memory of the IB could also be cached because of some activity of the GFX or Compute rings.

Regards,
Christian.

Am 29.04.20 um 11:35 schrieb Liu, Monk:

Here is the reason we should always insert a “sync mem” packet at the FENCE place of SDMA, not before IB emit.



By always inserting “sync mem” in the FENCE place we can make sure:1

  1.  data is really flushed to system memory before CPU try to read it
  2.  all the G2LC is invalidated by “sync mem”, thus in the next round SDMA IB, it won’t get staled data from G2LC cache



by inserting “sync mem” in prior to IB could only achieve :  Avoid get staled data in g2lc during IB execution



for GFX/COMPUTE ring since they have release_mem packet so it is inherently doing the G2LC flush and invalidate upon a fence signaled



_____________________________________

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: Liu, Monk
Sent: Wednesday, April 29, 2020 5:06 PM
To: 'Marek Olšák' <maraeo@gmail.com><mailto:maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org><mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Subject: RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



Hi @Koenig, Christian<mailto:Christian.Koenig@amd.com> & Marek



I still have some concerns regarding Marek’s patch, correct me if I’m wrong



See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA won’t get stale cache data during the IB execution.



But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache right ?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?



Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their operation result from their cache to memory ??



Besides, with my understanding the “EOP” of gfx ring is doing the thing of “invalidate/flush” L2 cache upon a fence signaled, so what we should do on SDMA5 is to insert this “SDMA_OP_GCR_REQ”

Right before thee “emit_fence” of SDMA  (this is what windows KMD do)



thanks

_____________________________________

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Marek Ol?ák
Sent: Saturday, April 25, 2020 4:52 PM
To: amd-gfx mailing list <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Subject: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



This should fix SDMA hangs on gfx10.



Marek




[-- Attachment #1.2: Type: text/html, Size: 6885 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-01 16:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-25  8:52 drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10) Marek Olšák
2020-04-25 15:04 ` Christian König
2020-04-26  0:28   ` Marek Olšák
2020-04-26  8:52     ` Christian König
2020-04-27 13:01       ` Deucher, Alexander
2020-04-27 13:53         ` Marek Olšák
2020-04-25 16:28 ` Pierre-Eric Pelloux-Prayer
2020-04-29  9:06 ` Liu, Monk
2020-04-29  9:35   ` Liu, Monk
2020-04-29  9:37     ` Christian König
2020-04-29 11:24       ` Liu, Monk
2020-05-01 16:02         ` Marek Olšák
2020-04-29 12:17 Koenig, Christian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.