amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] Enable FW assisted shadowing for GFX11
@ 2023-03-17 17:17 Alex Deucher
  2023-03-17 17:17 ` [PATCH 01/10] drm/amdgpu: add UAPI to query GFX shadow sizes Alex Deucher
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

This patch set allows for FW assisted shadowing on supported
platforms.  A new enough CP FW is required.  This feature is
required for mid command buffer preemption and proper SR-IOV
support.  This also simplifies the UMDs by allowing persistent
hardware state when the command submission executes.  UMDs
that use this will have their state retained across command
submissions.

The mesa MR to implement the user mode side of this is:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986

Alex Deucher (8):
  drm/amdgpu: add UAPI to query GFX shadow sizes
  drm/amdgpu/UAPI: add new CS chunk for GFX shadow buffers
  drm/amdgpu/gfx11: make job optional in emit_gfx_shadow
  drm/amdgpu: don't require a job for cond_exec and shadow
  drm/amdgpu: add gfx shadow callback
  drm/amdgpu: add get_gfx_shadow_info callback for gfx11
  drm/amdgpu: add support for new GFX shadow size query
  drm/amdgpu: bump driver version number for CP GFX shadow

Christian König (2):
  drm/amdgpu: add gfx shadow CS IOCTL support
  drm/amdgpu: add gfx11 emit shadow callback

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  | 14 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 26 ++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   | 85 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/nvd.h         |  5 +-
 include/uapi/drm/amdgpu_drm.h            | 17 +++++
 11 files changed, 186 insertions(+), 3 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/10] drm/amdgpu: add UAPI to query GFX shadow sizes
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-17 17:17 ` [PATCH 02/10] drm/amdgpu/UAPI: add new CS chunk for GFX shadow buffers Alex Deucher
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Add UAPI to query the GFX shadow buffer requirements
for preemption on GFX11.  UMDs need to specify the shadow
areas for preemption.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index b6eb90df5d05..cb22bb78c373 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -876,6 +876,7 @@ struct drm_amdgpu_cs_chunk_data {
 	#define AMDGPU_INFO_VIDEO_CAPS_DECODE		0
 	/* Subquery id: Encode */
 	#define AMDGPU_INFO_VIDEO_CAPS_ENCODE		1
+#define AMDGPU_INFO_CP_GFX_SHADOW_SIZE		0x22
 
 #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT	0
 #define AMDGPU_INFO_MMR_SE_INDEX_MASK	0xff
@@ -1193,6 +1194,15 @@ struct drm_amdgpu_info_video_caps {
 	struct drm_amdgpu_info_video_codec_info codec_info[AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_COUNT];
 };
 
+struct drm_amdgpu_info_cp_gfx_shadow_size {
+	__u32 shadow_size;
+	__u32 shadow_alignment;
+	__u32 csa_size;
+	__u32 csa_alignment;
+	__u32 gds_size;
+	__u32 gds_alignment;
+};
+
 /*
  * Supported GPU families
  */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/10] drm/amdgpu/UAPI: add new CS chunk for GFX shadow buffers
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
  2023-03-17 17:17 ` [PATCH 01/10] drm/amdgpu: add UAPI to query GFX shadow sizes Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-17 17:17 ` [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support Alex Deucher
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

For GFX11, the UMD needs to allocate some shadow buffers
to be used for preemption.  The UMD allocates the buffers
and passes the GPU virtual address to the kernel since the
kernel will program the packet that specified these
addresses as part of its IB submission frame.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index cb22bb78c373..be43d5f96534 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -592,6 +592,7 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES	0x07
 #define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT    0x08
 #define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x09
+#define AMDGPU_CHUNK_ID_CP_GFX_SHADOW   0x0a
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
@@ -708,6 +709,12 @@ struct drm_amdgpu_cs_chunk_data {
 	};
 };
 
+struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
+       __u64 shadow_va;
+       __u64 csa_va;
+       __u64 gds_va;
+};
+
 /*
  *  Query h/w info: Flag that this is integrated (a.h.a. fusion) GPU
  *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
  2023-03-17 17:17 ` [PATCH 01/10] drm/amdgpu: add UAPI to query GFX shadow sizes Alex Deucher
  2023-03-17 17:17 ` [PATCH 02/10] drm/amdgpu/UAPI: add new CS chunk for GFX shadow buffers Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-20 15:46   ` Christian König
  2023-03-17 17:17 ` [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback Alex Deucher
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian König

From: Christian König <christian.koenig@amd.com>

Add support for submitting the shadow update packet
when submitting an IB.  Needed for MCBP on GFX11.

v2: update API for CSA (Alex)
v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
    Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
    amdgpu_cs_pass1()
    Only initialize shadow on first use
    (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
 5 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index f6144b378617..9bdda246b09c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
 		case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
 		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
 		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
+		case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
 			break;
 
 		default:
@@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
 	return 0;
 }
 
+static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
+				struct amdgpu_cs_chunk *chunk)
+{
+	struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
+	bool shadow_initialized = false;
+	int i;
+
+	for (i = 0; i < p->gang_size; ++i) {
+		p->jobs[i]->shadow_va = shadow->shadow_va;
+		p->jobs[i]->csa_va = shadow->csa_va;
+		p->jobs[i]->gds_va = shadow->gds_va;
+		if (!p->ctx->shadow_initialized) {
+			p->jobs[i]->init_shadow = true;
+			shadow_initialized = true;
+		}
+	}
+	if (shadow_initialized)
+		p->ctx->shadow_initialized = true;
+}
+
 static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 {
 	unsigned int ce_preempt = 0, de_preempt = 0;
@@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 			if (r)
 				return r;
 			break;
+		case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
+			amdgpu_cs_p2_shadow(p, chunk);
+			break;
 		}
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index 0fa0e56daf67..909d188c41f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -57,6 +57,7 @@ struct amdgpu_ctx {
 	unsigned long			ras_counter_ce;
 	unsigned long			ras_counter_ue;
 	uint32_t			stable_pstate;
+	bool				shadow_initialized;
 };
 
 struct amdgpu_ctx_mgr {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index bcccc348dbe2..d88964b9407f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 	}
 
 	amdgpu_ring_ib_begin(ring);
+
+	if (job && ring->funcs->emit_gfx_shadow)
+		amdgpu_ring_emit_gfx_shadow(ring, job);
+
 	if (job && ring->funcs->init_cond_exec)
 		patch_offset = amdgpu_ring_init_cond_exec(ring);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 9790def34815..b470808fa40e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -68,6 +68,12 @@ struct amdgpu_job {
 	uint64_t		uf_addr;
 	uint64_t		uf_sequence;
 
+	/* virtual addresses for shadow/GDS/CSA */
+	uint64_t		shadow_va;
+	uint64_t		csa_va;
+	uint64_t		gds_va;
+	bool			init_shadow;
+
 	/* job_run_counter >= 1 means a resubmit job */
 	uint32_t		job_run_counter;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 3989e755a5b4..8643d4a92c27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
 	void (*end_use)(struct amdgpu_ring *ring);
 	void (*emit_switch_buffer) (struct amdgpu_ring *ring);
 	void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
+	void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
 	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
 			  uint32_t reg_val_offs);
 	void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
@@ -307,6 +308,7 @@ struct amdgpu_ring {
 #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
 #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
 #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
+#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
 #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
 #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
 #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (2 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-20 15:49   ` Christian König
  2023-03-17 17:17 ` [PATCH 05/10] drm/amdgpu/gfx11: make job optional in emit_gfx_shadow Alex Deucher
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian König

From: Christian König <christian.koenig@amd.com>

Add ring callback for gfx to update the CP firmware
with the new shadow information before we process the
IB.

v2: add implementation for new packet (Alex)
v3: add current FW version checks (Alex)
v4: only initialize shadow on first use
    Only set IB_VMID when a valid shadow buffer is present
    (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 46 +++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/nvd.h        |  5 ++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index de9e7a00bb15..4ad9e225d6e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -364,6 +364,8 @@ struct amdgpu_gfx {
 
 	struct amdgpu_ring		sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];
 	struct amdgpu_ring_mux          muxer;
+
+	bool				cp_gfx_shadow; /* for gfx11 */
 };
 
 #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 3bf697a80cf2..166a3f640042 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -463,6 +463,27 @@ static int gfx_v11_0_init_toc_microcode(struct amdgpu_device *adev, const char *
 	return err;
 }
 
+static void gfx_v11_0_check_fw_cp_gfx_shadow(struct amdgpu_device *adev)
+{
+	switch (adev->ip_versions[GC_HWIP][0]) {
+	case IP_VERSION(11, 0, 0):
+	case IP_VERSION(11, 0, 2):
+	case IP_VERSION(11, 0, 3):
+		/* XXX fix me! */
+		if ((adev->gfx.me_fw_version >= 1498) &&
+		    (adev->gfx.me_feature_version >= 29) &&
+		    (adev->gfx.pfp_fw_version >= 1541) &&
+		    (adev->gfx.pfp_feature_version >= 29) &&
+		    (adev->gfx.mec_fw_version >= 507) &&
+		    (adev->gfx.mec_feature_version >= 29))
+			adev->gfx.cp_gfx_shadow = true;
+		break;
+	default:
+		adev->gfx.cp_gfx_shadow = false;
+		break;
+	}
+}
+
 static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
 {
 	char fw_name[40];
@@ -539,6 +560,7 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
 	/* only one MEC for gfx 11.0.0. */
 	adev->gfx.mec2_fw = NULL;
 
+	gfx_v11_0_check_fw_cp_gfx_shadow(adev);
 out:
 	if (err) {
 		amdgpu_ucode_release(&adev->gfx.pfp_fw);
@@ -5563,6 +5585,28 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct amdgpu_ring *ring,
 	amdgpu_ring_write(ring, 0);
 }
 
+static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring,
+					   struct amdgpu_job *job)
+{
+	unsigned vmid = AMDGPU_JOB_GET_VMID(job);
+	struct amdgpu_device *adev = ring->adev;
+
+	if (!adev->gfx.cp_gfx_shadow)
+		return;
+
+	amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
+	amdgpu_ring_write(ring, lower_32_bits(job->shadow_va));
+	amdgpu_ring_write(ring, upper_32_bits(job->shadow_va));
+	amdgpu_ring_write(ring, lower_32_bits(job->gds_va));
+	amdgpu_ring_write(ring, upper_32_bits(job->gds_va));
+	amdgpu_ring_write(ring, lower_32_bits(job->csa_va));
+	amdgpu_ring_write(ring, upper_32_bits(job->csa_va));
+	amdgpu_ring_write(ring, job->shadow_va ?
+			  PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(vmid) : 0);
+	amdgpu_ring_write(ring, job->init_shadow ?
+			  PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM : 0);
+}
+
 static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring)
 {
 	unsigned ret;
@@ -6183,6 +6227,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = {
 	.set_wptr = gfx_v11_0_ring_set_wptr_gfx,
 	.emit_frame_size = /* totally 242 maximum if 16 IBs */
 		5 + /* COND_EXEC */
+		9 + /* SET_Q_PREEMPTION_MODE */
 		7 + /* PIPELINE_SYNC */
 		SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
 		SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
@@ -6209,6 +6254,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = {
 	.insert_nop = amdgpu_ring_insert_nop,
 	.pad_ib = amdgpu_ring_generic_pad_ib,
 	.emit_cntxcntl = gfx_v11_0_ring_emit_cntxcntl,
+	.emit_gfx_shadow = gfx_v11_0_ring_emit_gfx_shadow,
 	.init_cond_exec = gfx_v11_0_ring_emit_init_cond_exec,
 	.patch_cond_exec = gfx_v11_0_ring_emit_patch_cond_exec,
 	.preempt_ib = gfx_v11_0_ring_preempt_ib,
diff --git a/drivers/gpu/drm/amd/amdgpu/nvd.h b/drivers/gpu/drm/amd/amdgpu/nvd.h
index fd6b58243b03..631dafb92299 100644
--- a/drivers/gpu/drm/amd/amdgpu/nvd.h
+++ b/drivers/gpu/drm/amd/amdgpu/nvd.h
@@ -462,6 +462,9 @@
 #              define PACKET3_QUERY_STATUS_ENG_SEL(x)          ((x) << 25)
 #define	PACKET3_RUN_LIST				0xA5
 #define	PACKET3_MAP_PROCESS_VM				0xA6
-
+/* GFX11 */
+#define	PACKET3_SET_Q_PREEMPTION_MODE			0xF0
+#              define PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(x)  ((x) << 0)
+#              define PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM    (1 << 0)
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/10] drm/amdgpu/gfx11: make job optional in emit_gfx_shadow
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (3 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-17 17:17 ` [PATCH 06/10] drm/amdgpu: don't require a job for cond_exec and shadow Alex Deucher
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

We need to emit this packet any time we emit an IB, not
just when we have a job.  When no job is present just
send all 0's to reset to the legacy state.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 34 +++++++++++++++++---------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 166a3f640042..0a507b7939f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -5594,17 +5594,29 @@ static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring,
 	if (!adev->gfx.cp_gfx_shadow)
 		return;
 
-	amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
-	amdgpu_ring_write(ring, lower_32_bits(job->shadow_va));
-	amdgpu_ring_write(ring, upper_32_bits(job->shadow_va));
-	amdgpu_ring_write(ring, lower_32_bits(job->gds_va));
-	amdgpu_ring_write(ring, upper_32_bits(job->gds_va));
-	amdgpu_ring_write(ring, lower_32_bits(job->csa_va));
-	amdgpu_ring_write(ring, upper_32_bits(job->csa_va));
-	amdgpu_ring_write(ring, job->shadow_va ?
-			  PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(vmid) : 0);
-	amdgpu_ring_write(ring, job->init_shadow ?
-			  PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM : 0);
+	if (job) {
+		amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
+		amdgpu_ring_write(ring, lower_32_bits(job->shadow_va));
+		amdgpu_ring_write(ring, upper_32_bits(job->shadow_va));
+		amdgpu_ring_write(ring, lower_32_bits(job->gds_va));
+		amdgpu_ring_write(ring, upper_32_bits(job->gds_va));
+		amdgpu_ring_write(ring, lower_32_bits(job->csa_va));
+		amdgpu_ring_write(ring, upper_32_bits(job->csa_va));
+		amdgpu_ring_write(ring, job->shadow_va ?
+				  PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(vmid) : 0);
+		amdgpu_ring_write(ring, job->init_shadow ?
+				  PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM : 0);
+	} else {
+		amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 0);
+	}
 }
 
 static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/10] drm/amdgpu: don't require a job for cond_exec and shadow
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (4 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 05/10] drm/amdgpu/gfx11: make job optional in emit_gfx_shadow Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-20 15:50   ` Christian König
  2023-03-17 17:17 ` [PATCH 07/10] drm/amdgpu: add gfx shadow callback Alex Deucher
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

We need to reset the shadow state every time we submit an
IB and there needs to be a COND_EXEC packet after the
SET_Q_PREEMPTION_MODE packet for it to work properly, so
we should emit both of these packets regardless of whether
there is a job present or not.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index d88964b9407f..cd5b0df44ffb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -213,10 +213,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 
 	amdgpu_ring_ib_begin(ring);
 
-	if (job && ring->funcs->emit_gfx_shadow)
+	if (ring->funcs->emit_gfx_shadow)
 		amdgpu_ring_emit_gfx_shadow(ring, job);
 
-	if (job && ring->funcs->init_cond_exec)
+	if (ring->funcs->init_cond_exec)
 		patch_offset = amdgpu_ring_init_cond_exec(ring);
 
 	amdgpu_device_flush_hdp(adev, ring);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/10] drm/amdgpu: add gfx shadow callback
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (5 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 06/10] drm/amdgpu: don't require a job for cond_exec and shadow Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-20 15:57   ` Christian König
  2023-03-17 17:17 ` [PATCH 08/10] drm/amdgpu: add get_gfx_shadow_info callback for gfx11 Alex Deucher
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

To provide IP specific shadow sizes.  UMDs will use
this to query the kernel driver for the size of the
shadow buffers.

v2: make callback return an int (Alex)

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 4ad9e225d6e6..8826f1efc75f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -219,6 +219,15 @@ struct amdgpu_gfx_ras {
 						struct amdgpu_iv_entry *entry);
 };
 
+struct amdgpu_gfx_shadow_info {
+	u32 shadow_size;
+	u32 shadow_alignment;
+	u32 csa_size;
+	u32 csa_alignment;
+	u32 gds_size;
+	u32 gds_alignment;
+};
+
 struct amdgpu_gfx_funcs {
 	/* get the gpu clock counter */
 	uint64_t (*get_gpu_clock_counter)(struct amdgpu_device *adev);
@@ -236,6 +245,8 @@ struct amdgpu_gfx_funcs {
 				 u32 queue, u32 vmid);
 	void (*init_spm_golden)(struct amdgpu_device *adev);
 	void (*update_perfmon_mgcg)(struct amdgpu_device *adev, bool enable);
+	int (*get_gfx_shadow_info)(struct amdgpu_device *adev,
+				   struct amdgpu_gfx_shadow_info *shadow_info);
 };
 
 struct sq_work {
@@ -372,6 +383,7 @@ struct amdgpu_gfx {
 #define amdgpu_gfx_select_se_sh(adev, se, sh, instance) (adev)->gfx.funcs->select_se_sh((adev), (se), (sh), (instance))
 #define amdgpu_gfx_select_me_pipe_q(adev, me, pipe, q, vmid) (adev)->gfx.funcs->select_me_pipe_q((adev), (me), (pipe), (q), (vmid))
 #define amdgpu_gfx_init_spm_golden(adev) (adev)->gfx.funcs->init_spm_golden((adev))
+#define amdgpu_gfx_get_gfx_shadow_info(adev, si) (adev)->gfx.funcs->get_gfx_shadow_info((adev), (si))
 
 /**
  * amdgpu_gfx_create_bitmask - create a bitmask
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/10] drm/amdgpu: add get_gfx_shadow_info callback for gfx11
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (6 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 07/10] drm/amdgpu: add gfx shadow callback Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-17 17:17 ` [PATCH 09/10] drm/amdgpu: add support for new GFX shadow size query Alex Deucher
  2023-03-17 17:17 ` [PATCH 10/10] drm/amdgpu: bump driver version number for CP GFX shadow Alex Deucher
  9 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

XXX: don't apply this until we get the final FW versions

Used to get the size and alignment requirements for
the gfx shadow buffer for preemption.

v2: use FW version check to determine whether to
    return a valid size here
    return an error if not supported (Alex)

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 0a507b7939f4..87b3d777e9ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -822,6 +822,32 @@ static void gfx_v11_0_select_me_pipe_q(struct amdgpu_device *adev,
 	soc21_grbm_select(adev, me, pipe, q, vm);
 }
 
+/* all sizes are in bytes */
+#define MQD_SHADOW_BASE_SIZE      73728
+#define MQD_SHADOW_BASE_ALIGNMENT 256
+#define MQD_FWWORKAREA_SIZE       484
+#define MQD_FWWORKAREA_ALIGNMENT  256
+
+static int gfx_v11_0_get_gfx_shadow_info(struct amdgpu_device *adev,
+					 struct amdgpu_gfx_shadow_info *shadow_info)
+{
+	if (shadow_info) {
+		if (adev->gfx.cp_gfx_shadow) {
+			shadow_info->shadow_size = MQD_SHADOW_BASE_SIZE;
+			shadow_info->shadow_alignment = MQD_SHADOW_BASE_ALIGNMENT;
+			shadow_info->csa_size = MQD_FWWORKAREA_SIZE;
+			shadow_info->csa_alignment = MQD_FWWORKAREA_ALIGNMENT;
+			shadow_info->gds_size = 0x1000;
+			shadow_info->gds_alignment = 256;
+			return 0;
+		} else {
+			memset(shadow_info, 0, sizeof(struct amdgpu_gfx_shadow_info));
+			return -ENOTSUPP;
+		}
+	}
+	return -EINVAL;
+}
+
 static const struct amdgpu_gfx_funcs gfx_v11_0_gfx_funcs = {
 	.get_gpu_clock_counter = &gfx_v11_0_get_gpu_clock_counter,
 	.select_se_sh = &gfx_v11_0_select_se_sh,
@@ -830,6 +856,7 @@ static const struct amdgpu_gfx_funcs gfx_v11_0_gfx_funcs = {
 	.read_wave_vgprs = &gfx_v11_0_read_wave_vgprs,
 	.select_me_pipe_q = &gfx_v11_0_select_me_pipe_q,
 	.update_perfmon_mgcg = &gfx_v11_0_update_perf_clk,
+	.get_gfx_shadow_info = &gfx_v11_0_get_gfx_shadow_info,
 };
 
 static int gfx_v11_0_gpu_early_init(struct amdgpu_device *adev)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/10] drm/amdgpu: add support for new GFX shadow size query
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (7 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 08/10] drm/amdgpu: add get_gfx_shadow_info callback for gfx11 Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  2023-03-17 17:17 ` [PATCH 10/10] drm/amdgpu: bump driver version number for CP GFX shadow Alex Deucher
  9 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

Use the new callback to fetch the data.  Return an error if
not supported.  UMDs should use this query to check whether
shadow buffers are supported and if so what size they
should be.

v2: return an error rather than a zerod structure.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 26 +++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 9e85eedb57d8..8a6764756dcf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1139,6 +1139,32 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		kfree(caps);
 		return r;
 	}
+	case AMDGPU_INFO_CP_GFX_SHADOW_SIZE: {
+		struct amdgpu_gfx_shadow_info shadow_info;
+		struct drm_amdgpu_info_cp_gfx_shadow_size drm_shadow_size;
+		int r;
+
+		memset(&shadow_info, 0, sizeof(struct amdgpu_gfx_shadow_info));
+		if (adev->gfx.funcs->get_gfx_shadow_info) {
+			r = amdgpu_gfx_get_gfx_shadow_info(adev, &shadow_info);
+			if (r)
+				return r;
+			drm_shadow_size.shadow_size = shadow_info.shadow_size;
+			drm_shadow_size.shadow_alignment = shadow_info.shadow_alignment;
+			drm_shadow_size.csa_size = shadow_info.csa_size;
+			drm_shadow_size.csa_alignment = shadow_info.csa_alignment;
+			drm_shadow_size.gds_size = shadow_info.gds_size;
+			drm_shadow_size.gds_alignment = shadow_info.gds_alignment;
+		} else {
+			return -ENOTSUPP;
+		}
+		r = copy_to_user(out, &drm_shadow_size,
+				 min((size_t)size,
+				     sizeof(struct drm_amdgpu_info_cp_gfx_shadow_size))) ?
+			-EFAULT : 0;
+		return r;
+
+	}
 	default:
 		DRM_DEBUG_KMS("Invalid request %d\n", info->query);
 		return -EINVAL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/10] drm/amdgpu: bump driver version number for CP GFX shadow
  2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
                   ` (8 preceding siblings ...)
  2023-03-17 17:17 ` [PATCH 09/10] drm/amdgpu: add support for new GFX shadow size query Alex Deucher
@ 2023-03-17 17:17 ` Alex Deucher
  9 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-17 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher

So UMDs can determine whether the kernel supports this.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 5f02c530e2cc..9caa7b7f52dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -110,9 +110,10 @@
  *   3.52.0 - Add AMDGPU_IDS_FLAGS_CONFORMANT_TRUNC_COORD, add device_info fields:
  *            tcp_cache_size, num_sqc_per_wgp, sqc_data_cache_size, sqc_inst_cache_size,
  *            gl1c_cache_size, gl2c_cache_size, mall_size, enabled_rb_pipes_mask_hi
+ *   3.53.0 - Support for GFX11 CP GFX shadowing
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	52
+#define KMS_DRIVER_MINOR	53
 #define KMS_DRIVER_PATCHLEVEL	0
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-17 17:17 ` [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support Alex Deucher
@ 2023-03-20 15:46   ` Christian König
  2023-03-20 15:49     ` Alex Deucher
  2023-03-20 15:52     ` Alex Deucher
  0 siblings, 2 replies; 22+ messages in thread
From: Christian König @ 2023-03-20 15:46 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx

Am 17.03.23 um 18:17 schrieb Alex Deucher:
> From: Christian König <christian.koenig@amd.com>
>
> Add support for submitting the shadow update packet
> when submitting an IB.  Needed for MCBP on GFX11.
>
> v2: update API for CSA (Alex)
> v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
>      Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
>      amdgpu_cs_pass1()
>      Only initialize shadow on first use
>      (Alex)
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
>   5 files changed, 37 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index f6144b378617..9bdda246b09c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
>   		case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
>   		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
>   		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
> +		case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
>   			break;
>   
>   		default:
> @@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
>   	return 0;
>   }
>   
> +static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
> +				struct amdgpu_cs_chunk *chunk)
> +{
> +	struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
> +	bool shadow_initialized = false;
> +	int i;
> +
> +	for (i = 0; i < p->gang_size; ++i) {
> +		p->jobs[i]->shadow_va = shadow->shadow_va;
> +		p->jobs[i]->csa_va = shadow->csa_va;
> +		p->jobs[i]->gds_va = shadow->gds_va;

Do we really need all three VAs separately?

> +		if (!p->ctx->shadow_initialized) {
> +			p->jobs[i]->init_shadow = true;
> +			shadow_initialized = true;

> +		}
> +	}
> +	if (shadow_initialized)
> +		p->ctx->shadow_initialized = true;

This is a really bad idea since the IOCTL can be interrupted later on.

Why do we need that?

Regards,
Christian.


> +}
> +
>   static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>   {
>   	unsigned int ce_preempt = 0, de_preempt = 0;
> @@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>   			if (r)
>   				return r;
>   			break;
> +		case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> +			amdgpu_cs_p2_shadow(p, chunk);
> +			break;
>   		}
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> index 0fa0e56daf67..909d188c41f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> @@ -57,6 +57,7 @@ struct amdgpu_ctx {
>   	unsigned long			ras_counter_ce;
>   	unsigned long			ras_counter_ue;
>   	uint32_t			stable_pstate;
> +	bool				shadow_initialized;
>   };
>   
>   struct amdgpu_ctx_mgr {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index bcccc348dbe2..d88964b9407f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>   	}
>   
>   	amdgpu_ring_ib_begin(ring);
> +
> +	if (job && ring->funcs->emit_gfx_shadow)
> +		amdgpu_ring_emit_gfx_shadow(ring, job);
> +
>   	if (job && ring->funcs->init_cond_exec)
>   		patch_offset = amdgpu_ring_init_cond_exec(ring);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index 9790def34815..b470808fa40e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -68,6 +68,12 @@ struct amdgpu_job {
>   	uint64_t		uf_addr;
>   	uint64_t		uf_sequence;
>   
> +	/* virtual addresses for shadow/GDS/CSA */
> +	uint64_t		shadow_va;
> +	uint64_t		csa_va;
> +	uint64_t		gds_va;
> +	bool			init_shadow;
> +
>   	/* job_run_counter >= 1 means a resubmit job */
>   	uint32_t		job_run_counter;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 3989e755a5b4..8643d4a92c27 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
>   	void (*end_use)(struct amdgpu_ring *ring);
>   	void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>   	void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> +	void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
>   	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
>   			  uint32_t reg_val_offs);
>   	void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
> @@ -307,6 +308,7 @@ struct amdgpu_ring {
>   #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
>   #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
>   #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
> +#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
>   #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
>   #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
>   #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback
  2023-03-17 17:17 ` [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback Alex Deucher
@ 2023-03-20 15:49   ` Christian König
  2023-03-20 16:08     ` Alex Deucher
  0 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2023-03-20 15:49 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx; +Cc: Christian König

Am 17.03.23 um 18:17 schrieb Alex Deucher:
> From: Christian König <christian.koenig@amd.com>
>
> Add ring callback for gfx to update the CP firmware
> with the new shadow information before we process the
> IB.
>
> v2: add implementation for new packet (Alex)
> v3: add current FW version checks (Alex)
> v4: only initialize shadow on first use
>      Only set IB_VMID when a valid shadow buffer is present
>      (Alex)
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 46 +++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/nvd.h        |  5 ++-
>   3 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index de9e7a00bb15..4ad9e225d6e6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -364,6 +364,8 @@ struct amdgpu_gfx {
>   
>   	struct amdgpu_ring		sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];
>   	struct amdgpu_ring_mux          muxer;
> +
> +	bool				cp_gfx_shadow; /* for gfx11 */
>   };
>   
>   #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev))
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 3bf697a80cf2..166a3f640042 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -463,6 +463,27 @@ static int gfx_v11_0_init_toc_microcode(struct amdgpu_device *adev, const char *
>   	return err;
>   }
>   
> +static void gfx_v11_0_check_fw_cp_gfx_shadow(struct amdgpu_device *adev)
> +{
> +	switch (adev->ip_versions[GC_HWIP][0]) {
> +	case IP_VERSION(11, 0, 0):
> +	case IP_VERSION(11, 0, 2):
> +	case IP_VERSION(11, 0, 3):
> +		/* XXX fix me! */
> +		if ((adev->gfx.me_fw_version >= 1498) &&
> +		    (adev->gfx.me_feature_version >= 29) &&
> +		    (adev->gfx.pfp_fw_version >= 1541) &&
> +		    (adev->gfx.pfp_feature_version >= 29) &&
> +		    (adev->gfx.mec_fw_version >= 507) &&
> +		    (adev->gfx.mec_feature_version >= 29))
> +			adev->gfx.cp_gfx_shadow = true;
> +		break;
> +	default:
> +		adev->gfx.cp_gfx_shadow = false;
> +		break;
> +	}
> +}
> +
>   static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
>   {
>   	char fw_name[40];
> @@ -539,6 +560,7 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
>   	/* only one MEC for gfx 11.0.0. */
>   	adev->gfx.mec2_fw = NULL;
>   
> +	gfx_v11_0_check_fw_cp_gfx_shadow(adev);
>   out:
>   	if (err) {
>   		amdgpu_ucode_release(&adev->gfx.pfp_fw);
> @@ -5563,6 +5585,28 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct amdgpu_ring *ring,
>   	amdgpu_ring_write(ring, 0);
>   }
>   
> +static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring,
> +					   struct amdgpu_job *job)

Better give the values to use here instead of the job structure.

Regards,
Christian.

> +{
> +	unsigned vmid = AMDGPU_JOB_GET_VMID(job);
> +	struct amdgpu_device *adev = ring->adev;
> +
> +	if (!adev->gfx.cp_gfx_shadow)
> +		return;
> +
> +	amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
> +	amdgpu_ring_write(ring, lower_32_bits(job->shadow_va));
> +	amdgpu_ring_write(ring, upper_32_bits(job->shadow_va));
> +	amdgpu_ring_write(ring, lower_32_bits(job->gds_va));
> +	amdgpu_ring_write(ring, upper_32_bits(job->gds_va));
> +	amdgpu_ring_write(ring, lower_32_bits(job->csa_va));
> +	amdgpu_ring_write(ring, upper_32_bits(job->csa_va));
> +	amdgpu_ring_write(ring, job->shadow_va ?
> +			  PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(vmid) : 0);
> +	amdgpu_ring_write(ring, job->init_shadow ?
> +			  PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM : 0);
> +}
> +
>   static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring)
>   {
>   	unsigned ret;
> @@ -6183,6 +6227,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = {
>   	.set_wptr = gfx_v11_0_ring_set_wptr_gfx,
>   	.emit_frame_size = /* totally 242 maximum if 16 IBs */
>   		5 + /* COND_EXEC */
> +		9 + /* SET_Q_PREEMPTION_MODE */
>   		7 + /* PIPELINE_SYNC */
>   		SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
>   		SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> @@ -6209,6 +6254,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = {
>   	.insert_nop = amdgpu_ring_insert_nop,
>   	.pad_ib = amdgpu_ring_generic_pad_ib,
>   	.emit_cntxcntl = gfx_v11_0_ring_emit_cntxcntl,
> +	.emit_gfx_shadow = gfx_v11_0_ring_emit_gfx_shadow,
>   	.init_cond_exec = gfx_v11_0_ring_emit_init_cond_exec,
>   	.patch_cond_exec = gfx_v11_0_ring_emit_patch_cond_exec,
>   	.preempt_ib = gfx_v11_0_ring_preempt_ib,
> diff --git a/drivers/gpu/drm/amd/amdgpu/nvd.h b/drivers/gpu/drm/amd/amdgpu/nvd.h
> index fd6b58243b03..631dafb92299 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nvd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/nvd.h
> @@ -462,6 +462,9 @@
>   #              define PACKET3_QUERY_STATUS_ENG_SEL(x)          ((x) << 25)
>   #define	PACKET3_RUN_LIST				0xA5
>   #define	PACKET3_MAP_PROCESS_VM				0xA6
> -
> +/* GFX11 */
> +#define	PACKET3_SET_Q_PREEMPTION_MODE			0xF0
> +#              define PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(x)  ((x) << 0)
> +#              define PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM    (1 << 0)
>   
>   #endif


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-20 15:46   ` Christian König
@ 2023-03-20 15:49     ` Alex Deucher
  2023-03-20 15:55       ` Christian König
  2023-03-20 15:52     ` Alex Deucher
  1 sibling, 1 reply; 22+ messages in thread
From: Alex Deucher @ 2023-03-20 15:49 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Mon, Mar 20, 2023 at 11:46 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 17.03.23 um 18:17 schrieb Alex Deucher:
> > From: Christian König <christian.koenig@amd.com>
> >
> > Add support for submitting the shadow update packet
> > when submitting an IB.  Needed for MCBP on GFX11.
> >
> > v2: update API for CSA (Alex)
> > v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
> >      Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
> >      amdgpu_cs_pass1()
> >      Only initialize shadow on first use
> >      (Alex)
> >
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
> >   5 files changed, 37 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index f6144b378617..9bdda246b09c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
> >               case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
> >               case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
> >               case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
> > +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> >                       break;
> >
> >               default:
> > @@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
> >       return 0;
> >   }
> >
> > +static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
> > +                             struct amdgpu_cs_chunk *chunk)
> > +{
> > +     struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
> > +     bool shadow_initialized = false;
> > +     int i;
> > +
> > +     for (i = 0; i < p->gang_size; ++i) {
> > +             p->jobs[i]->shadow_va = shadow->shadow_va;
> > +             p->jobs[i]->csa_va = shadow->csa_va;
> > +             p->jobs[i]->gds_va = shadow->gds_va;
>
> Do we really need all three VAs separately?
>
> > +             if (!p->ctx->shadow_initialized) {
> > +                     p->jobs[i]->init_shadow = true;
> > +                     shadow_initialized = true;
>
> > +             }
> > +     }
> > +     if (shadow_initialized)
> > +             p->ctx->shadow_initialized = true;
>
> This is a really bad idea since the IOCTL can be interrupted later on.
>
> Why do we need that?

Yes.  We have to only initial the buffer the first time it's used.
Doing it again will overwrite whatever userspace has done with it
since then.

Alex

>
> Regards,
> Christian.
>
>
> > +}
> > +
> >   static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
> >   {
> >       unsigned int ce_preempt = 0, de_preempt = 0;
> > @@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
> >                       if (r)
> >                               return r;
> >                       break;
> > +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> > +                     amdgpu_cs_p2_shadow(p, chunk);
> > +                     break;
> >               }
> >       }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > index 0fa0e56daf67..909d188c41f2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > @@ -57,6 +57,7 @@ struct amdgpu_ctx {
> >       unsigned long                   ras_counter_ce;
> >       unsigned long                   ras_counter_ue;
> >       uint32_t                        stable_pstate;
> > +     bool                            shadow_initialized;
> >   };
> >
> >   struct amdgpu_ctx_mgr {
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > index bcccc348dbe2..d88964b9407f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > @@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
> >       }
> >
> >       amdgpu_ring_ib_begin(ring);
> > +
> > +     if (job && ring->funcs->emit_gfx_shadow)
> > +             amdgpu_ring_emit_gfx_shadow(ring, job);
> > +
> >       if (job && ring->funcs->init_cond_exec)
> >               patch_offset = amdgpu_ring_init_cond_exec(ring);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > index 9790def34815..b470808fa40e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > @@ -68,6 +68,12 @@ struct amdgpu_job {
> >       uint64_t                uf_addr;
> >       uint64_t                uf_sequence;
> >
> > +     /* virtual addresses for shadow/GDS/CSA */
> > +     uint64_t                shadow_va;
> > +     uint64_t                csa_va;
> > +     uint64_t                gds_va;
> > +     bool                    init_shadow;
> > +
> >       /* job_run_counter >= 1 means a resubmit job */
> >       uint32_t                job_run_counter;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index 3989e755a5b4..8643d4a92c27 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > @@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
> >       void (*end_use)(struct amdgpu_ring *ring);
> >       void (*emit_switch_buffer) (struct amdgpu_ring *ring);
> >       void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> > +     void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
> >       void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
> >                         uint32_t reg_val_offs);
> >       void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
> > @@ -307,6 +308,7 @@ struct amdgpu_ring {
> >   #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
> >   #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
> >   #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
> > +#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
> >   #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
> >   #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
> >   #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 06/10] drm/amdgpu: don't require a job for cond_exec and shadow
  2023-03-17 17:17 ` [PATCH 06/10] drm/amdgpu: don't require a job for cond_exec and shadow Alex Deucher
@ 2023-03-20 15:50   ` Christian König
  0 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2023-03-20 15:50 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx

Am 17.03.23 um 18:17 schrieb Alex Deucher:
> We need to reset the shadow state every time we submit an
> IB and there needs to be a COND_EXEC packet after the
> SET_Q_PREEMPTION_MODE packet for it to work properly, so
> we should emit both of these packets regardless of whether
> there is a job present or not.
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index d88964b9407f..cd5b0df44ffb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -213,10 +213,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>   
>   	amdgpu_ring_ib_begin(ring);
>   
> -	if (job && ring->funcs->emit_gfx_shadow)
> +	if (ring->funcs->emit_gfx_shadow)
>   		amdgpu_ring_emit_gfx_shadow(ring, job);
>   
> -	if (job && ring->funcs->init_cond_exec)
> +	if (ring->funcs->init_cond_exec)
>   		patch_offset = amdgpu_ring_init_cond_exec(ring);
>   
>   	amdgpu_device_flush_hdp(adev, ring);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-20 15:46   ` Christian König
  2023-03-20 15:49     ` Alex Deucher
@ 2023-03-20 15:52     ` Alex Deucher
  1 sibling, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-20 15:52 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Mon, Mar 20, 2023 at 11:46 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 17.03.23 um 18:17 schrieb Alex Deucher:
> > From: Christian König <christian.koenig@amd.com>
> >
> > Add support for submitting the shadow update packet
> > when submitting an IB.  Needed for MCBP on GFX11.
> >
> > v2: update API for CSA (Alex)
> > v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
> >      Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
> >      amdgpu_cs_pass1()
> >      Only initialize shadow on first use
> >      (Alex)
> >
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
> >   5 files changed, 37 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index f6144b378617..9bdda246b09c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
> >               case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
> >               case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
> >               case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
> > +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> >                       break;
> >
> >               default:
> > @@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
> >       return 0;
> >   }
> >
> > +static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
> > +                             struct amdgpu_cs_chunk *chunk)
> > +{
> > +     struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
> > +     bool shadow_initialized = false;
> > +     int i;
> > +
> > +     for (i = 0; i < p->gang_size; ++i) {
> > +             p->jobs[i]->shadow_va = shadow->shadow_va;
> > +             p->jobs[i]->csa_va = shadow->csa_va;
> > +             p->jobs[i]->gds_va = shadow->gds_va;
>
> Do we really need all three VAs separately?

That is what the firmware uses.  Normally they are stored individually
as part of the MQD so this just matches that only on the fly for IBs.

Alex

>
> > +             if (!p->ctx->shadow_initialized) {
> > +                     p->jobs[i]->init_shadow = true;
> > +                     shadow_initialized = true;
>
> > +             }
> > +     }
> > +     if (shadow_initialized)
> > +             p->ctx->shadow_initialized = true;
>
> This is a really bad idea since the IOCTL can be interrupted later on.
>
> Why do we need that?
>
> Regards,
> Christian.
>
>
> > +}
> > +
> >   static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
> >   {
> >       unsigned int ce_preempt = 0, de_preempt = 0;
> > @@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
> >                       if (r)
> >                               return r;
> >                       break;
> > +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> > +                     amdgpu_cs_p2_shadow(p, chunk);
> > +                     break;
> >               }
> >       }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > index 0fa0e56daf67..909d188c41f2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > @@ -57,6 +57,7 @@ struct amdgpu_ctx {
> >       unsigned long                   ras_counter_ce;
> >       unsigned long                   ras_counter_ue;
> >       uint32_t                        stable_pstate;
> > +     bool                            shadow_initialized;
> >   };
> >
> >   struct amdgpu_ctx_mgr {
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > index bcccc348dbe2..d88964b9407f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > @@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
> >       }
> >
> >       amdgpu_ring_ib_begin(ring);
> > +
> > +     if (job && ring->funcs->emit_gfx_shadow)
> > +             amdgpu_ring_emit_gfx_shadow(ring, job);
> > +
> >       if (job && ring->funcs->init_cond_exec)
> >               patch_offset = amdgpu_ring_init_cond_exec(ring);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > index 9790def34815..b470808fa40e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > @@ -68,6 +68,12 @@ struct amdgpu_job {
> >       uint64_t                uf_addr;
> >       uint64_t                uf_sequence;
> >
> > +     /* virtual addresses for shadow/GDS/CSA */
> > +     uint64_t                shadow_va;
> > +     uint64_t                csa_va;
> > +     uint64_t                gds_va;
> > +     bool                    init_shadow;
> > +
> >       /* job_run_counter >= 1 means a resubmit job */
> >       uint32_t                job_run_counter;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index 3989e755a5b4..8643d4a92c27 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > @@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
> >       void (*end_use)(struct amdgpu_ring *ring);
> >       void (*emit_switch_buffer) (struct amdgpu_ring *ring);
> >       void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> > +     void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
> >       void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
> >                         uint32_t reg_val_offs);
> >       void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
> > @@ -307,6 +308,7 @@ struct amdgpu_ring {
> >   #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
> >   #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
> >   #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
> > +#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
> >   #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
> >   #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
> >   #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-20 15:49     ` Alex Deucher
@ 2023-03-20 15:55       ` Christian König
  2023-03-20 16:01         ` Alex Deucher
  0 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2023-03-20 15:55 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, amd-gfx

Am 20.03.23 um 16:49 schrieb Alex Deucher:
> On Mon, Mar 20, 2023 at 11:46 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 17.03.23 um 18:17 schrieb Alex Deucher:
>>> From: Christian König <christian.koenig@amd.com>
>>>
>>> Add support for submitting the shadow update packet
>>> when submitting an IB.  Needed for MCBP on GFX11.
>>>
>>> v2: update API for CSA (Alex)
>>> v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
>>>       Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
>>>       amdgpu_cs_pass1()
>>>       Only initialize shadow on first use
>>>       (Alex)
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
>>>    5 files changed, 37 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index f6144b378617..9bdda246b09c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
>>>                case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
>>>                case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
>>>                case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
>>> +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
>>>                        break;
>>>
>>>                default:
>>> @@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
>>>        return 0;
>>>    }
>>>
>>> +static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
>>> +                             struct amdgpu_cs_chunk *chunk)
>>> +{
>>> +     struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
>>> +     bool shadow_initialized = false;
>>> +     int i;
>>> +
>>> +     for (i = 0; i < p->gang_size; ++i) {
>>> +             p->jobs[i]->shadow_va = shadow->shadow_va;
>>> +             p->jobs[i]->csa_va = shadow->csa_va;
>>> +             p->jobs[i]->gds_va = shadow->gds_va;
>> Do we really need all three VAs separately?
>>
>>> +             if (!p->ctx->shadow_initialized) {
>>> +                     p->jobs[i]->init_shadow = true;
>>> +                     shadow_initialized = true;
>>> +             }
>>> +     }
>>> +     if (shadow_initialized)
>>> +             p->ctx->shadow_initialized = true;
>> This is a really bad idea since the IOCTL can be interrupted later on.
>>
>> Why do we need that?
> Yes.  We have to only initial the buffer the first time it's used.
> Doing it again will overwrite whatever userspace has done with it
> since then.

I don't get what you mean with that? This here doesn't make any sense at 
all.

The shadow buffer addresses must be given with every CS and updated over 
and over again. Otherwise this solution won't work correctly.

Christian.

>
> Alex
>
>> Regards,
>> Christian.
>>
>>
>>> +}
>>> +
>>>    static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>>>    {
>>>        unsigned int ce_preempt = 0, de_preempt = 0;
>>> @@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>>>                        if (r)
>>>                                return r;
>>>                        break;
>>> +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
>>> +                     amdgpu_cs_p2_shadow(p, chunk);
>>> +                     break;
>>>                }
>>>        }
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> index 0fa0e56daf67..909d188c41f2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>> @@ -57,6 +57,7 @@ struct amdgpu_ctx {
>>>        unsigned long                   ras_counter_ce;
>>>        unsigned long                   ras_counter_ue;
>>>        uint32_t                        stable_pstate;
>>> +     bool                            shadow_initialized;
>>>    };
>>>
>>>    struct amdgpu_ctx_mgr {
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> index bcccc348dbe2..d88964b9407f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> @@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>>>        }
>>>
>>>        amdgpu_ring_ib_begin(ring);
>>> +
>>> +     if (job && ring->funcs->emit_gfx_shadow)
>>> +             amdgpu_ring_emit_gfx_shadow(ring, job);
>>> +
>>>        if (job && ring->funcs->init_cond_exec)
>>>                patch_offset = amdgpu_ring_init_cond_exec(ring);
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> index 9790def34815..b470808fa40e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> @@ -68,6 +68,12 @@ struct amdgpu_job {
>>>        uint64_t                uf_addr;
>>>        uint64_t                uf_sequence;
>>>
>>> +     /* virtual addresses for shadow/GDS/CSA */
>>> +     uint64_t                shadow_va;
>>> +     uint64_t                csa_va;
>>> +     uint64_t                gds_va;
>>> +     bool                    init_shadow;
>>> +
>>>        /* job_run_counter >= 1 means a resubmit job */
>>>        uint32_t                job_run_counter;
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index 3989e755a5b4..8643d4a92c27 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
>>>        void (*end_use)(struct amdgpu_ring *ring);
>>>        void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>>>        void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>>> +     void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
>>>        void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
>>>                          uint32_t reg_val_offs);
>>>        void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
>>> @@ -307,6 +308,7 @@ struct amdgpu_ring {
>>>    #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
>>>    #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
>>>    #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
>>> +#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
>>>    #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
>>>    #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
>>>    #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 07/10] drm/amdgpu: add gfx shadow callback
  2023-03-17 17:17 ` [PATCH 07/10] drm/amdgpu: add gfx shadow callback Alex Deucher
@ 2023-03-20 15:57   ` Christian König
  2023-03-20 16:03     ` Alex Deucher
  0 siblings, 1 reply; 22+ messages in thread
From: Christian König @ 2023-03-20 15:57 UTC (permalink / raw)
  To: Alex Deucher, amd-gfx

Am 17.03.23 um 18:17 schrieb Alex Deucher:
> To provide IP specific shadow sizes.  UMDs will use
> this to query the kernel driver for the size of the
> shadow buffers.
>
> v2: make callback return an int (Alex)
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 4ad9e225d6e6..8826f1efc75f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -219,6 +219,15 @@ struct amdgpu_gfx_ras {
>   						struct amdgpu_iv_entry *entry);
>   };
>   
> +struct amdgpu_gfx_shadow_info {
> +	u32 shadow_size;
> +	u32 shadow_alignment;
> +	u32 csa_size;
> +	u32 csa_alignment;
> +	u32 gds_size;
> +	u32 gds_alignment;

I don't think we need an alignment for those. Otherwise we would run 
into problems with the VA mappings as well.

Regards,
Christian.

> +};
> +
>   struct amdgpu_gfx_funcs {
>   	/* get the gpu clock counter */
>   	uint64_t (*get_gpu_clock_counter)(struct amdgpu_device *adev);
> @@ -236,6 +245,8 @@ struct amdgpu_gfx_funcs {
>   				 u32 queue, u32 vmid);
>   	void (*init_spm_golden)(struct amdgpu_device *adev);
>   	void (*update_perfmon_mgcg)(struct amdgpu_device *adev, bool enable);
> +	int (*get_gfx_shadow_info)(struct amdgpu_device *adev,
> +				   struct amdgpu_gfx_shadow_info *shadow_info);
>   };
>   
>   struct sq_work {
> @@ -372,6 +383,7 @@ struct amdgpu_gfx {
>   #define amdgpu_gfx_select_se_sh(adev, se, sh, instance) (adev)->gfx.funcs->select_se_sh((adev), (se), (sh), (instance))
>   #define amdgpu_gfx_select_me_pipe_q(adev, me, pipe, q, vmid) (adev)->gfx.funcs->select_me_pipe_q((adev), (me), (pipe), (q), (vmid))
>   #define amdgpu_gfx_init_spm_golden(adev) (adev)->gfx.funcs->init_spm_golden((adev))
> +#define amdgpu_gfx_get_gfx_shadow_info(adev, si) (adev)->gfx.funcs->get_gfx_shadow_info((adev), (si))
>   
>   /**
>    * amdgpu_gfx_create_bitmask - create a bitmask


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-20 15:55       ` Christian König
@ 2023-03-20 16:01         ` Alex Deucher
  2023-03-20 16:04           ` Christian König
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Deucher @ 2023-03-20 16:01 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Mon, Mar 20, 2023 at 11:55 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 20.03.23 um 16:49 schrieb Alex Deucher:
> > On Mon, Mar 20, 2023 at 11:46 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 17.03.23 um 18:17 schrieb Alex Deucher:
> >>> From: Christian König <christian.koenig@amd.com>
> >>>
> >>> Add support for submitting the shadow update packet
> >>> when submitting an IB.  Needed for MCBP on GFX11.
> >>>
> >>> v2: update API for CSA (Alex)
> >>> v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
> >>>       Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
> >>>       amdgpu_cs_pass1()
> >>>       Only initialize shadow on first use
> >>>       (Alex)
> >>>
> >>> Signed-off-by: Christian König <christian.koenig@amd.com>
> >>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
> >>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
> >>>    5 files changed, 37 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> index f6144b378617..9bdda246b09c 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >>> @@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
> >>>                case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
> >>>                case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
> >>>                case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
> >>> +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> >>>                        break;
> >>>
> >>>                default:
> >>> @@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
> >>>        return 0;
> >>>    }
> >>>
> >>> +static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
> >>> +                             struct amdgpu_cs_chunk *chunk)
> >>> +{
> >>> +     struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
> >>> +     bool shadow_initialized = false;
> >>> +     int i;
> >>> +
> >>> +     for (i = 0; i < p->gang_size; ++i) {
> >>> +             p->jobs[i]->shadow_va = shadow->shadow_va;
> >>> +             p->jobs[i]->csa_va = shadow->csa_va;
> >>> +             p->jobs[i]->gds_va = shadow->gds_va;
> >> Do we really need all three VAs separately?
> >>
> >>> +             if (!p->ctx->shadow_initialized) {
> >>> +                     p->jobs[i]->init_shadow = true;
> >>> +                     shadow_initialized = true;
> >>> +             }
> >>> +     }
> >>> +     if (shadow_initialized)
> >>> +             p->ctx->shadow_initialized = true;
> >> This is a really bad idea since the IOCTL can be interrupted later on.
> >>
> >> Why do we need that?
> > Yes.  We have to only initial the buffer the first time it's used.
> > Doing it again will overwrite whatever userspace has done with it
> > since then.
>
> I don't get what you mean with that? This here doesn't make any sense at
> all.
>
> The shadow buffer addresses must be given with every CS and updated over
> and over again. Otherwise this solution won't work correctly.

The shadow replaces the old SET/LOAD model.  When the UMD uses the
shadow buffer, they no longer have to send all of their state anymore
in the IB.  The CP FW will automatically load whatever was saved when
it processes this packet.  However, the shadow needs to be initialized
by the CP FW the first time it is used.  All subsequent times, it will
just be a save/restore by the FW.  I guess the alternative would be
for the UMD to specify if it wants initialization or not, but tracking
it in the kernel aligns better with the user mode queue model where
this is handled by the MQD and is initialized the first time the queue
is loaded.

Alex

>
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >>
> >>> +}
> >>> +
> >>>    static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
> >>>    {
> >>>        unsigned int ce_preempt = 0, de_preempt = 0;
> >>> @@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
> >>>                        if (r)
> >>>                                return r;
> >>>                        break;
> >>> +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
> >>> +                     amdgpu_cs_p2_shadow(p, chunk);
> >>> +                     break;
> >>>                }
> >>>        }
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> >>> index 0fa0e56daf67..909d188c41f2 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> >>> @@ -57,6 +57,7 @@ struct amdgpu_ctx {
> >>>        unsigned long                   ras_counter_ce;
> >>>        unsigned long                   ras_counter_ue;
> >>>        uint32_t                        stable_pstate;
> >>> +     bool                            shadow_initialized;
> >>>    };
> >>>
> >>>    struct amdgpu_ctx_mgr {
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >>> index bcccc348dbe2..d88964b9407f 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >>> @@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
> >>>        }
> >>>
> >>>        amdgpu_ring_ib_begin(ring);
> >>> +
> >>> +     if (job && ring->funcs->emit_gfx_shadow)
> >>> +             amdgpu_ring_emit_gfx_shadow(ring, job);
> >>> +
> >>>        if (job && ring->funcs->init_cond_exec)
> >>>                patch_offset = amdgpu_ring_init_cond_exec(ring);
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> >>> index 9790def34815..b470808fa40e 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> >>> @@ -68,6 +68,12 @@ struct amdgpu_job {
> >>>        uint64_t                uf_addr;
> >>>        uint64_t                uf_sequence;
> >>>
> >>> +     /* virtual addresses for shadow/GDS/CSA */
> >>> +     uint64_t                shadow_va;
> >>> +     uint64_t                csa_va;
> >>> +     uint64_t                gds_va;
> >>> +     bool                    init_shadow;
> >>> +
> >>>        /* job_run_counter >= 1 means a resubmit job */
> >>>        uint32_t                job_run_counter;
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> >>> index 3989e755a5b4..8643d4a92c27 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> >>> @@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
> >>>        void (*end_use)(struct amdgpu_ring *ring);
> >>>        void (*emit_switch_buffer) (struct amdgpu_ring *ring);
> >>>        void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
> >>> +     void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
> >>>        void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
> >>>                          uint32_t reg_val_offs);
> >>>        void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
> >>> @@ -307,6 +308,7 @@ struct amdgpu_ring {
> >>>    #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
> >>>    #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
> >>>    #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
> >>> +#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
> >>>    #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
> >>>    #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
> >>>    #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 07/10] drm/amdgpu: add gfx shadow callback
  2023-03-20 15:57   ` Christian König
@ 2023-03-20 16:03     ` Alex Deucher
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-20 16:03 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Mon, Mar 20, 2023 at 11:58 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 17.03.23 um 18:17 schrieb Alex Deucher:
> > To provide IP specific shadow sizes.  UMDs will use
> > this to query the kernel driver for the size of the
> > shadow buffers.
> >
> > v2: make callback return an int (Alex)
> >
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index 4ad9e225d6e6..8826f1efc75f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -219,6 +219,15 @@ struct amdgpu_gfx_ras {
> >                                               struct amdgpu_iv_entry *entry);
> >   };
> >
> > +struct amdgpu_gfx_shadow_info {
> > +     u32 shadow_size;
> > +     u32 shadow_alignment;
> > +     u32 csa_size;
> > +     u32 csa_alignment;
> > +     u32 gds_size;
> > +     u32 gds_alignment;
>
> I don't think we need an alignment for those. Otherwise we would run
> into problems with the VA mappings as well.

This is for the GDS save area so it's just memory.  The size is just
the amount of GDS on the ASIC and the alignment is standard 256B like
most other CP things.

Alex

>
> Regards,
> Christian.
>
> > +};
> > +
> >   struct amdgpu_gfx_funcs {
> >       /* get the gpu clock counter */
> >       uint64_t (*get_gpu_clock_counter)(struct amdgpu_device *adev);
> > @@ -236,6 +245,8 @@ struct amdgpu_gfx_funcs {
> >                                u32 queue, u32 vmid);
> >       void (*init_spm_golden)(struct amdgpu_device *adev);
> >       void (*update_perfmon_mgcg)(struct amdgpu_device *adev, bool enable);
> > +     int (*get_gfx_shadow_info)(struct amdgpu_device *adev,
> > +                                struct amdgpu_gfx_shadow_info *shadow_info);
> >   };
> >
> >   struct sq_work {
> > @@ -372,6 +383,7 @@ struct amdgpu_gfx {
> >   #define amdgpu_gfx_select_se_sh(adev, se, sh, instance) (adev)->gfx.funcs->select_se_sh((adev), (se), (sh), (instance))
> >   #define amdgpu_gfx_select_me_pipe_q(adev, me, pipe, q, vmid) (adev)->gfx.funcs->select_me_pipe_q((adev), (me), (pipe), (q), (vmid))
> >   #define amdgpu_gfx_init_spm_golden(adev) (adev)->gfx.funcs->init_spm_golden((adev))
> > +#define amdgpu_gfx_get_gfx_shadow_info(adev, si) (adev)->gfx.funcs->get_gfx_shadow_info((adev), (si))
> >
> >   /**
> >    * amdgpu_gfx_create_bitmask - create a bitmask
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support
  2023-03-20 16:01         ` Alex Deucher
@ 2023-03-20 16:04           ` Christian König
  0 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2023-03-20 16:04 UTC (permalink / raw)
  To: Alex Deucher, Christian König; +Cc: Alex Deucher, amd-gfx

Am 20.03.23 um 17:01 schrieb Alex Deucher:
> On Mon, Mar 20, 2023 at 11:55 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 20.03.23 um 16:49 schrieb Alex Deucher:
>>> On Mon, Mar 20, 2023 at 11:46 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 17.03.23 um 18:17 schrieb Alex Deucher:
>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> Add support for submitting the shadow update packet
>>>>> when submitting an IB.  Needed for MCBP on GFX11.
>>>>>
>>>>> v2: update API for CSA (Alex)
>>>>> v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC
>>>>>        Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in
>>>>>        amdgpu_cs_pass1()
>>>>>        Only initialize shadow on first use
>>>>>        (Alex)
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   | 24 ++++++++++++++++++++++++
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h  |  1 +
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  4 ++++
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  6 ++++++
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
>>>>>     5 files changed, 37 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> index f6144b378617..9bdda246b09c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -280,6 +280,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
>>>>>                 case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
>>>>>                 case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
>>>>>                 case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
>>>>> +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
>>>>>                         break;
>>>>>
>>>>>                 default:
>>>>> @@ -587,6 +588,26 @@ static int amdgpu_cs_p2_syncobj_timeline_signal(struct amdgpu_cs_parser *p,
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>> +static void amdgpu_cs_p2_shadow(struct amdgpu_cs_parser *p,
>>>>> +                             struct amdgpu_cs_chunk *chunk)
>>>>> +{
>>>>> +     struct drm_amdgpu_cs_chunk_cp_gfx_shadow *shadow = chunk->kdata;
>>>>> +     bool shadow_initialized = false;
>>>>> +     int i;
>>>>> +
>>>>> +     for (i = 0; i < p->gang_size; ++i) {
>>>>> +             p->jobs[i]->shadow_va = shadow->shadow_va;
>>>>> +             p->jobs[i]->csa_va = shadow->csa_va;
>>>>> +             p->jobs[i]->gds_va = shadow->gds_va;
>>>> Do we really need all three VAs separately?
>>>>
>>>>> +             if (!p->ctx->shadow_initialized) {
>>>>> +                     p->jobs[i]->init_shadow = true;
>>>>> +                     shadow_initialized = true;
>>>>> +             }
>>>>> +     }
>>>>> +     if (shadow_initialized)
>>>>> +             p->ctx->shadow_initialized = true;
>>>> This is a really bad idea since the IOCTL can be interrupted later on.
>>>>
>>>> Why do we need that?
>>> Yes.  We have to only initial the buffer the first time it's used.
>>> Doing it again will overwrite whatever userspace has done with it
>>> since then.
>> I don't get what you mean with that? This here doesn't make any sense at
>> all.
>>
>> The shadow buffer addresses must be given with every CS and updated over
>> and over again. Otherwise this solution won't work correctly.
> The shadow replaces the old SET/LOAD model.  When the UMD uses the
> shadow buffer, they no longer have to send all of their state anymore
> in the IB.  The CP FW will automatically load whatever was saved when
> it processes this packet.  However, the shadow needs to be initialized
> by the CP FW the first time it is used.  All subsequent times, it will
> just be a save/restore by the FW.  I guess the alternative would be
> for the UMD to specify if it wants initialization or not, but tracking
> it in the kernel aligns better with the user mode queue model where
> this is handled by the MQD and is initialized the first time the queue
> is loaded.

This approach is just utterly nonsense.

The problem is that neither the kernel nor the fw can know if that's the 
first submission. Only the UMD can know that.

That's the same issue we previously had with PAL and the old model which 
didn't worked at all.

Christian.

>
> Alex
>
>> Christian.
>>
>>> Alex
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>> +}
>>>>> +
>>>>>     static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>>>>>     {
>>>>>         unsigned int ce_preempt = 0, de_preempt = 0;
>>>>> @@ -629,6 +650,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>>>>>                         if (r)
>>>>>                                 return r;
>>>>>                         break;
>>>>> +             case AMDGPU_CHUNK_ID_CP_GFX_SHADOW:
>>>>> +                     amdgpu_cs_p2_shadow(p, chunk);
>>>>> +                     break;
>>>>>                 }
>>>>>         }
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>>>> index 0fa0e56daf67..909d188c41f2 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>>>>> @@ -57,6 +57,7 @@ struct amdgpu_ctx {
>>>>>         unsigned long                   ras_counter_ce;
>>>>>         unsigned long                   ras_counter_ue;
>>>>>         uint32_t                        stable_pstate;
>>>>> +     bool                            shadow_initialized;
>>>>>     };
>>>>>
>>>>>     struct amdgpu_ctx_mgr {
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>> index bcccc348dbe2..d88964b9407f 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>> @@ -212,6 +212,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>>>>>         }
>>>>>
>>>>>         amdgpu_ring_ib_begin(ring);
>>>>> +
>>>>> +     if (job && ring->funcs->emit_gfx_shadow)
>>>>> +             amdgpu_ring_emit_gfx_shadow(ring, job);
>>>>> +
>>>>>         if (job && ring->funcs->init_cond_exec)
>>>>>                 patch_offset = amdgpu_ring_init_cond_exec(ring);
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>>>> index 9790def34815..b470808fa40e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>>>> @@ -68,6 +68,12 @@ struct amdgpu_job {
>>>>>         uint64_t                uf_addr;
>>>>>         uint64_t                uf_sequence;
>>>>>
>>>>> +     /* virtual addresses for shadow/GDS/CSA */
>>>>> +     uint64_t                shadow_va;
>>>>> +     uint64_t                csa_va;
>>>>> +     uint64_t                gds_va;
>>>>> +     bool                    init_shadow;
>>>>> +
>>>>>         /* job_run_counter >= 1 means a resubmit job */
>>>>>         uint32_t                job_run_counter;
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> index 3989e755a5b4..8643d4a92c27 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> @@ -212,6 +212,7 @@ struct amdgpu_ring_funcs {
>>>>>         void (*end_use)(struct amdgpu_ring *ring);
>>>>>         void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>>>>>         void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>>>>> +     void (*emit_gfx_shadow)(struct amdgpu_ring *ring, struct amdgpu_job *job);
>>>>>         void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
>>>>>                           uint32_t reg_val_offs);
>>>>>         void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
>>>>> @@ -307,6 +308,7 @@ struct amdgpu_ring {
>>>>>     #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r))
>>>>>     #define amdgpu_ring_emit_switch_buffer(r) (r)->funcs->emit_switch_buffer((r))
>>>>>     #define amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))
>>>>> +#define amdgpu_ring_emit_gfx_shadow(r, j) (r)->funcs->emit_gfx_shadow((r), (j))
>>>>>     #define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), (d), (o))
>>>>>     #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
>>>>>     #define amdgpu_ring_emit_reg_wait(r, d, v, m) (r)->funcs->emit_reg_wait((r), (d), (v), (m))


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback
  2023-03-20 15:49   ` Christian König
@ 2023-03-20 16:08     ` Alex Deucher
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Deucher @ 2023-03-20 16:08 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, Christian König, amd-gfx

On Mon, Mar 20, 2023 at 11:49 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 17.03.23 um 18:17 schrieb Alex Deucher:
> > From: Christian König <christian.koenig@amd.com>
> >
> > Add ring callback for gfx to update the CP firmware
> > with the new shadow information before we process the
> > IB.
> >
> > v2: add implementation for new packet (Alex)
> > v3: add current FW version checks (Alex)
> > v4: only initialize shadow on first use
> >      Only set IB_VMID when a valid shadow buffer is present
> >      (Alex)
> >
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
> >   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 46 +++++++++++++++++++++++++
> >   drivers/gpu/drm/amd/amdgpu/nvd.h        |  5 ++-
> >   3 files changed, 52 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index de9e7a00bb15..4ad9e225d6e6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -364,6 +364,8 @@ struct amdgpu_gfx {
> >
> >       struct amdgpu_ring              sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];
> >       struct amdgpu_ring_mux          muxer;
> > +
> > +     bool                            cp_gfx_shadow; /* for gfx11 */
> >   };
> >
> >   #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev))
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > index 3bf697a80cf2..166a3f640042 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > @@ -463,6 +463,27 @@ static int gfx_v11_0_init_toc_microcode(struct amdgpu_device *adev, const char *
> >       return err;
> >   }
> >
> > +static void gfx_v11_0_check_fw_cp_gfx_shadow(struct amdgpu_device *adev)
> > +{
> > +     switch (adev->ip_versions[GC_HWIP][0]) {
> > +     case IP_VERSION(11, 0, 0):
> > +     case IP_VERSION(11, 0, 2):
> > +     case IP_VERSION(11, 0, 3):
> > +             /* XXX fix me! */
> > +             if ((adev->gfx.me_fw_version >= 1498) &&
> > +                 (adev->gfx.me_feature_version >= 29) &&
> > +                 (adev->gfx.pfp_fw_version >= 1541) &&
> > +                 (adev->gfx.pfp_feature_version >= 29) &&
> > +                 (adev->gfx.mec_fw_version >= 507) &&
> > +                 (adev->gfx.mec_feature_version >= 29))
> > +                     adev->gfx.cp_gfx_shadow = true;
> > +             break;
> > +     default:
> > +             adev->gfx.cp_gfx_shadow = false;
> > +             break;
> > +     }
> > +}
> > +
> >   static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
> >   {
> >       char fw_name[40];
> > @@ -539,6 +560,7 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
> >       /* only one MEC for gfx 11.0.0. */
> >       adev->gfx.mec2_fw = NULL;
> >
> > +     gfx_v11_0_check_fw_cp_gfx_shadow(adev);
> >   out:
> >       if (err) {
> >               amdgpu_ucode_release(&adev->gfx.pfp_fw);
> > @@ -5563,6 +5585,28 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct amdgpu_ring *ring,
> >       amdgpu_ring_write(ring, 0);
> >   }
> >
> > +static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring,
> > +                                        struct amdgpu_job *job)
>
> Better give the values to use here instead of the job structure.

Will fix it up.  Thanks!

Alex

>
> Regards,
> Christian.
>
> > +{
> > +     unsigned vmid = AMDGPU_JOB_GET_VMID(job);
> > +     struct amdgpu_device *adev = ring->adev;
> > +
> > +     if (!adev->gfx.cp_gfx_shadow)
> > +             return;
> > +
> > +     amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
> > +     amdgpu_ring_write(ring, lower_32_bits(job->shadow_va));
> > +     amdgpu_ring_write(ring, upper_32_bits(job->shadow_va));
> > +     amdgpu_ring_write(ring, lower_32_bits(job->gds_va));
> > +     amdgpu_ring_write(ring, upper_32_bits(job->gds_va));
> > +     amdgpu_ring_write(ring, lower_32_bits(job->csa_va));
> > +     amdgpu_ring_write(ring, upper_32_bits(job->csa_va));
> > +     amdgpu_ring_write(ring, job->shadow_va ?
> > +                       PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(vmid) : 0);
> > +     amdgpu_ring_write(ring, job->init_shadow ?
> > +                       PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM : 0);
> > +}
> > +
> >   static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring)
> >   {
> >       unsigned ret;
> > @@ -6183,6 +6227,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = {
> >       .set_wptr = gfx_v11_0_ring_set_wptr_gfx,
> >       .emit_frame_size = /* totally 242 maximum if 16 IBs */
> >               5 + /* COND_EXEC */
> > +             9 + /* SET_Q_PREEMPTION_MODE */
> >               7 + /* PIPELINE_SYNC */
> >               SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> >               SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> > @@ -6209,6 +6254,7 @@ static const struct amdgpu_ring_funcs gfx_v11_0_ring_funcs_gfx = {
> >       .insert_nop = amdgpu_ring_insert_nop,
> >       .pad_ib = amdgpu_ring_generic_pad_ib,
> >       .emit_cntxcntl = gfx_v11_0_ring_emit_cntxcntl,
> > +     .emit_gfx_shadow = gfx_v11_0_ring_emit_gfx_shadow,
> >       .init_cond_exec = gfx_v11_0_ring_emit_init_cond_exec,
> >       .patch_cond_exec = gfx_v11_0_ring_emit_patch_cond_exec,
> >       .preempt_ib = gfx_v11_0_ring_preempt_ib,
> > diff --git a/drivers/gpu/drm/amd/amdgpu/nvd.h b/drivers/gpu/drm/amd/amdgpu/nvd.h
> > index fd6b58243b03..631dafb92299 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/nvd.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/nvd.h
> > @@ -462,6 +462,9 @@
> >   #              define PACKET3_QUERY_STATUS_ENG_SEL(x)          ((x) << 25)
> >   #define     PACKET3_RUN_LIST                                0xA5
> >   #define     PACKET3_MAP_PROCESS_VM                          0xA6
> > -
> > +/* GFX11 */
> > +#define      PACKET3_SET_Q_PREEMPTION_MODE                   0xF0
> > +#              define PACKET3_SET_Q_PREEMPTION_MODE_IB_VMID(x)  ((x) << 0)
> > +#              define PACKET3_SET_Q_PREEMPTION_MODE_INIT_SHADOW_MEM    (1 << 0)
> >
> >   #endif
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-03-20 16:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17 17:17 [PATCH 00/10] Enable FW assisted shadowing for GFX11 Alex Deucher
2023-03-17 17:17 ` [PATCH 01/10] drm/amdgpu: add UAPI to query GFX shadow sizes Alex Deucher
2023-03-17 17:17 ` [PATCH 02/10] drm/amdgpu/UAPI: add new CS chunk for GFX shadow buffers Alex Deucher
2023-03-17 17:17 ` [PATCH 03/10] drm/amdgpu: add gfx shadow CS IOCTL support Alex Deucher
2023-03-20 15:46   ` Christian König
2023-03-20 15:49     ` Alex Deucher
2023-03-20 15:55       ` Christian König
2023-03-20 16:01         ` Alex Deucher
2023-03-20 16:04           ` Christian König
2023-03-20 15:52     ` Alex Deucher
2023-03-17 17:17 ` [PATCH 04/10] drm/amdgpu: add gfx11 emit shadow callback Alex Deucher
2023-03-20 15:49   ` Christian König
2023-03-20 16:08     ` Alex Deucher
2023-03-17 17:17 ` [PATCH 05/10] drm/amdgpu/gfx11: make job optional in emit_gfx_shadow Alex Deucher
2023-03-17 17:17 ` [PATCH 06/10] drm/amdgpu: don't require a job for cond_exec and shadow Alex Deucher
2023-03-20 15:50   ` Christian König
2023-03-17 17:17 ` [PATCH 07/10] drm/amdgpu: add gfx shadow callback Alex Deucher
2023-03-20 15:57   ` Christian König
2023-03-20 16:03     ` Alex Deucher
2023-03-17 17:17 ` [PATCH 08/10] drm/amdgpu: add get_gfx_shadow_info callback for gfx11 Alex Deucher
2023-03-17 17:17 ` [PATCH 09/10] drm/amdgpu: add support for new GFX shadow size query Alex Deucher
2023-03-17 17:17 ` [PATCH 10/10] drm/amdgpu: bump driver version number for CP GFX shadow Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).