All of lore.kernel.org
 help / color / mirror / Atom feed
* drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
@ 2020-04-25  8:52 Marek Olšák
  2020-04-25 15:04 ` Christian König
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Marek Olšák @ 2020-04-25  8:52 UTC (permalink / raw)
  To: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 44 bytes --]

This should fix SDMA hangs on gfx10.

Marek

[-- Attachment #1.2: Type: text/html, Size: 130 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-invalidate-L2-before-SDMA-IBs.patch --]
[-- Type: text/x-patch, Size: 3856 bytes --]

From 478671f577551f06b68e2564a20bdfb61806a802 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak@amd.com>
Date: Sat, 25 Apr 2020 04:23:01 -0400
Subject: [PATCH] drm/amdgpu: invalidate L2 before SDMA IBs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c          |  3 ++-
 .../gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h    | 16 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c           | 14 +++++++++++++-
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d377c5d17bb8..0177892e609a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -86,9 +86,10 @@
  * - 3.35.0 - Add drm_amdgpu_info_device::tcc_disabled_mask
  * - 3.36.0 - Allow reading more status registers on si/cik
  * - 3.37.0 - Add AMDGPU_IB_FLAG_EMIT_MEM_SYNC
+ * - 3.38.0 - L2 is invalidated before SDMA IBs, needed for correctness
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	37
+#define KMS_DRIVER_MINOR	38
 #define KMS_DRIVER_PATCHLEVEL	0
 
 int amdgpu_vram_limit = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h b/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h
index 074a9a09c0a7..a5b60c9a2418 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h
@@ -73,6 +73,22 @@
 #define SDMA_OP_AQL_COPY  0
 #define SDMA_OP_AQL_BARRIER_OR  0
 
+#define SDMA_GCR_RANGE_IS_PA		(1 << 18)
+#define SDMA_GCR_SEQ(x)			(((x) & 0x3) << 16)
+#define SDMA_GCR_GL2_WB			(1 << 15)
+#define SDMA_GCR_GL2_INV		(1 << 14)
+#define SDMA_GCR_GL2_DISCARD		(1 << 13)
+#define SDMA_GCR_GL2_RANGE(x)		(((x) & 0x3) << 11)
+#define SDMA_GCR_GL2_US			(1 << 10)
+#define SDMA_GCR_GL1_INV		(1 << 9)
+#define SDMA_GCR_GLV_INV		(1 << 8)
+#define SDMA_GCR_GLK_INV		(1 << 7)
+#define SDMA_GCR_GLK_WB			(1 << 6)
+#define SDMA_GCR_GLM_INV		(1 << 5)
+#define SDMA_GCR_GLM_WB			(1 << 4)
+#define SDMA_GCR_GL1_RANGE(x)		(((x) & 0x3) << 2)
+#define SDMA_GCR_GLI_INV(x)		(((x) & 0x3) << 0)
+
 /*define for op field*/
 #define SDMA_PKT_HEADER_op_offset 0
 #define SDMA_PKT_HEADER_op_mask   0x000000FF
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 764f455ada75..b544baf306f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -410,6 +410,18 @@ static void sdma_v5_0_ring_emit_ib(struct amdgpu_ring *ring,
 	unsigned vmid = AMDGPU_JOB_GET_VMID(job);
 	uint64_t csa_mc_addr = amdgpu_sdma_get_csa_mc_addr(ring, vmid);
 
+	/* Invalidate L2, because if we don't do it, we might get stale cache
+	 * lines from previous IBs.
+	 */
+	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_GCR_REQ));
+	amdgpu_ring_write(ring, 0);
+	amdgpu_ring_write(ring, (SDMA_GCR_GL2_INV |
+				 SDMA_GCR_GL2_WB |
+				 SDMA_GCR_GLM_INV |
+				 SDMA_GCR_GLM_WB) << 16);
+	amdgpu_ring_write(ring, 0xffffff80);
+	amdgpu_ring_write(ring, 0xffff);
+
 	/* An IB packet must end on a 8 DW boundary--the next dword
 	 * must be on a 8-dword boundary. Our IB packet below is 6
 	 * dwords long, thus add x number of NOPs, such that, in
@@ -1634,7 +1646,7 @@ static const struct amdgpu_ring_funcs sdma_v5_0_ring_funcs = {
 		SOC15_FLUSH_GPU_TLB_NUM_WREG * 3 +
 		SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 6 * 2 +
 		10 + 10 + 10, /* sdma_v5_0_ring_emit_fence x3 for user fence, vm fence */
-	.emit_ib_size = 7 + 6, /* sdma_v5_0_ring_emit_ib */
+	.emit_ib_size = 5 + 7 + 6, /* sdma_v5_0_ring_emit_ib */
 	.emit_ib = sdma_v5_0_ring_emit_ib,
 	.emit_fence = sdma_v5_0_ring_emit_fence,
 	.emit_pipeline_sync = sdma_v5_0_ring_emit_pipeline_sync,
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread
* RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)
@ 2020-04-29 12:17 Koenig, Christian
  0 siblings, 0 replies; 13+ messages in thread
From: Koenig, Christian @ 2020-04-29 12:17 UTC (permalink / raw)
  To: Liu, Monk; +Cc: amd-gfx mailing list, Marek Olšák


[-- Attachment #1.1: Type: text/plain, Size: 4147 bytes --]

Hi Monk,

because some parallel execution could load the GL2C.

See you need to insert cache invalidations before you start reading something which another engine has written.

And you need cache flushes to make sure that something your engine has written has reached memory before you signal finished execution.

That's perfectly normal cache handling what Marek is doing here.

Regards,
Christian.

Am 29.04.2020 13:24 schrieb "Liu, Monk" <Monk.Liu@amd.com>:

>> Well from my understanding I think that a G2LC invalidation is still necessary before an IB executes.

Agree, I think before an IB executes the only thing we need on GL2C is the invalidation, not the flush .

>> The problem is that the memory of the IB could also be cached because of some activity of the GFX or Compute rings.

If we always insert a GL2C invalidate at every EOP of every IB from every engine, why we need a GL2C invalidate before IB  execute ?

_____________________________________

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Wednesday, April 29, 2020 5:38 PM
To: Liu, Monk <Monk.Liu@amd.com>; Marek Olšák <maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org>
Subject: Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



Well from my understanding I think that a G2LC invalidation is still necessary before an IB executes.

The problem is that the memory of the IB could also be cached because of some activity of the GFX or Compute rings.

Regards,
Christian.

Am 29.04.20 um 11:35 schrieb Liu, Monk:

Here is the reason we should always insert a “sync mem” packet at the FENCE place of SDMA, not before IB emit.



By always inserting “sync mem” in the FENCE place we can make sure:1

  1.  data is really flushed to system memory before CPU try to read it
  2.  all the G2LC is invalidated by “sync mem”, thus in the next round SDMA IB, it won’t get staled data from G2LC cache



by inserting “sync mem” in prior to IB could only achieve :  Avoid get staled data in g2lc during IB execution



for GFX/COMPUTE ring since they have release_mem packet so it is inherently doing the G2LC flush and invalidate upon a fence signaled



_____________________________________

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: Liu, Monk
Sent: Wednesday, April 29, 2020 5:06 PM
To: 'Marek Olšák' <maraeo@gmail.com><mailto:maraeo@gmail.com>; amd-gfx mailing list <amd-gfx@lists.freedesktop.org><mailto:amd-gfx@lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Subject: RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



Hi @Koenig, Christian<mailto:Christian.Koenig@amd.com> & Marek



I still have some concerns regarding Marek’s patch, correct me if I’m wrong



See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA won’t get stale cache data during the IB execution.



But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache right ?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?



Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their operation result from their cache to memory ??



Besides, with my understanding the “EOP” of gfx ring is doing the thing of “invalidate/flush” L2 cache upon a fence signaled, so what we should do on SDMA5 is to insert this “SDMA_OP_GCR_REQ”

Right before thee “emit_fence” of SDMA  (this is what windows KMD do)



thanks

_____________________________________

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Marek Ol?ák
Sent: Saturday, April 25, 2020 4:52 PM
To: amd-gfx mailing list <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Subject: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



This should fix SDMA hangs on gfx10.



Marek




[-- Attachment #1.2: Type: text/html, Size: 6885 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-01 16:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-25  8:52 drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10) Marek Olšák
2020-04-25 15:04 ` Christian König
2020-04-26  0:28   ` Marek Olšák
2020-04-26  8:52     ` Christian König
2020-04-27 13:01       ` Deucher, Alexander
2020-04-27 13:53         ` Marek Olšák
2020-04-25 16:28 ` Pierre-Eric Pelloux-Prayer
2020-04-29  9:06 ` Liu, Monk
2020-04-29  9:35   ` Liu, Monk
2020-04-29  9:37     ` Christian König
2020-04-29 11:24       ` Liu, Monk
2020-05-01 16:02         ` Marek Olšák
2020-04-29 12:17 Koenig, Christian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.