amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/14] AMDGPU usermode queues
@ 2024-04-26 13:47 Shashank Sharma
  2024-04-26 13:47 ` [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management Shashank Sharma
                   ` (14 more replies)
  0 siblings, 15 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:47 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma

This patch series introduces AMDGPU usermode queues for gfx workloads.
Usermode queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue and
submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Shadow bufffer pages.
  - GDS buffer pages (as required).
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application maps the GPU buffers in process address space.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset and the
  WPTR buffer, and the GPU will start fetching the data.
- This series adds usermode queue support for all three MES based IPs
  (GFX, SDMA and Compute).

libDRM changes for this series and a sample DRM test program can be found
in the MESA merge request here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Arvind Yadav (1):
  drm/amdgpu: enable compute/gfx usermode queue

Shashank Sharma (12):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: add helpers to create userqueue object
  drm/amdgpu: create MES-V11 usermode queue for GFX
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues
  drm/amdgpu: fix MES GFX mask
  drm/amdgpu: enable SDMA usermode queues
  drm/amdgpu: add kernel config for gfx-userqueue

 drivers/gpu/drm/amd/amdgpu/Kconfig            |   8 +
 drivers/gpu/drm/amd/amdgpu/Makefile           |   7 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |   3 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 296 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  10 +
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c        |   9 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |   9 +-
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 317 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |   6 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  79 +++++
 include/uapi/drm/amdgpu_drm.h                 | 111 ++++++
 15 files changed, 859 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.43.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
@ 2024-04-26 13:47 ` Shashank Sharma
  2024-05-01 20:39   ` Alex Deucher
  2024-04-26 13:47 ` [PATCH v9 02/14] drm/amdgpu: add usermode queue base code Shashank Sharma
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:47 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Alex Deucher, Christian Koenig, Shashank Sharma

From: Alex Deucher <alexander.deucher@amd.com>

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
    - Make the doorbell offset's comment clearer
    - Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
    - Updated the UAPI doc (Pierre-Eric)
    - Created a Union for engine specific MQDs (Alex)
    - Added Christian's R-B
V5:
    - Add variables for GDS and CSA in MQD structure (Alex)
    - Make MQD data a ptr-size pair instead of union (Alex)

V9:
   - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
     drm_amdgpu_userq_mqd as its being used for SDMA and
     compute queues as well

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 110 ++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 96e32dafd4f0..22f56a30f7cb 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM			0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
 #define DRM_AMDGPU_SCHED		0x15
+#define DRM_AMDGPU_USERQ		0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE	1
+#define AMDGPU_USERQ_OP_FREE	2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_mqd {
+	/**
+	 * @queue_va: Virtual address of the GPU memory which holds the queue
+	 * object. The queue holds the workload packets.
+	 */
+	__u64   queue_va;
+	/**
+	 * @queue_size: Size of the queue in bytes, this needs to be 256-byte
+	 * aligned.
+	 */
+	__u64   queue_size;
+	/**
+	 * @rptr_va : Virtual address of the GPU memory which holds the ring RPTR.
+	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
+	 */
+	__u64   rptr_va;
+	/**
+	 * @wptr_va : Virtual address of the GPU memory which holds the ring WPTR.
+	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
+	 *
+	 * Queue, RPTR and WPTR can come from the same object, as long as the size
+	 * and alignment related requirements are met.
+	 */
+	__u64   wptr_va;
+	/**
+	 * @shadow_va: Virtual address of the GPU memory to hold the shadow buffer.
+	 * This must be a from a separate GPU object, and must be at least 4-page
+	 * sized.
+	 */
+	__u64   shadow_va;
+	/**
+	 * @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
+	 * This must be a from a separate GPU object, and must be at least 1-page
+	 * sized.
+	 */
+	__u64   gds_va;
+	/**
+	 * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+	 * This must be a from a separate GPU object, and must be at least 1-page
+	 * sized.
+	 */
+	__u64   csa_va;
+};
+
+struct drm_amdgpu_userq_in {
+	/** AMDGPU_USERQ_OP_* */
+	__u32	op;
+	/** Queue handle for USERQ_OP_FREE */
+	__u32	queue_id;
+	/** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+	__u32   ip_type;
+	/**
+	 * @flags: flags to indicate special function for queue like secure
+	 * buffer (TMZ). Unused for now.
+	 */
+	__u32   flags;
+	/**
+	 * @doorbell_handle: the handle of doorbell GEM object
+	 * associated to this client.
+	 */
+	__u32   doorbell_handle;
+	/**
+	 * @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
+	 * Kernel will generate absolute doorbell offset using doorbell_handle
+	 * and doorbell_offset in the doorbell bo.
+	 */
+	__u32   doorbell_offset;
+	/**
+	 * @mqd: Queue descriptor for USERQ_OP_CREATE
+	 * MQD data can be of different size for different GPU IP/engine and
+	 * their respective versions/revisions, so this points to a __u64 *
+	 * which holds MQD of this usermode queue.
+	 */
+	__u64 mqd;
+	/**
+	 * @size: size of MQD data in bytes, it must match the MQD structure
+	 * size of the respective engine/revision defined in UAPI for ex, for
+	 * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
+	 */
+	__u64 mqd_size;
+};
+
+struct drm_amdgpu_userq_out {
+	/** Queue handle */
+	__u32	queue_id;
+	/** Flags */
+	__u32	flags;
+};
+
+union drm_amdgpu_userq {
+	struct drm_amdgpu_userq_in in;
+	struct drm_amdgpu_userq_out out;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 02/14] drm/amdgpu: add usermode queue base code
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
  2024-04-26 13:47 ` [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2024-04-26 13:47 ` Shashank Sharma
  2024-05-01 21:29   ` Alex Deucher
  2024-04-26 13:47 ` [PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:47 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

This patch adds skeleton code for amdgpu usermode queue.
It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

V6: Rebase
V9: Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 ++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 61 +++++++++++++++++++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4536c8ad0e11..05a2d1714070 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -260,6 +260,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b3b84647207e..4ca14b02668b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -112,6 +112,7 @@
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
+#include "amdgpu_userqueue.h"
 
 #define MAX_GPU_INSTANCE		64
 
@@ -477,6 +478,7 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	struct amdgpu_userq_mgr	userq_mgr;
 	/** GPU partition selection */
 	uint32_t		xcp_id;
 };
@@ -1039,6 +1041,7 @@ struct amdgpu_device {
 	bool                            enable_mes_kiq;
 	struct amdgpu_mes               mes;
 	struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
+	const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 
 	/* df */
 	struct amdgpu_df                df;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e4277298cf1a..374970984a61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_xgmi.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a2df3025a754..d78b06af834e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -44,6 +44,7 @@
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1388,6 +1389,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+	if (r)
+		DRM_WARN("Can't setup usermode queues, use legacy workload submission only\n");
+
 	file_priv->driver_priv = fpriv;
 	goto out_suspend;
 
@@ -1457,6 +1462,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
+	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
 	if (pasid)
 		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index 000000000000..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
+{
+	mutex_init(&userq_mgr->userq_mutex);
+	idr_init_base(&userq_mgr->userq_idr, 1);
+	userq_mgr->adev = adev;
+
+	return 0;
+}
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
+{
+	idr_destroy(&userq_mgr->userq_idr);
+	mutex_destroy(&userq_mgr->userq_mutex);
+}
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
new file mode 100644
index 000000000000..93ebe4b61682
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDGPU_USERQUEUE_H_
+#define AMDGPU_USERQUEUE_H_
+
+#define AMDGPU_MAX_USERQ_COUNT 512
+
+struct amdgpu_mqd_prop;
+
+struct amdgpu_usermode_queue {
+	int			queue_type;
+	uint64_t		doorbell_handle;
+	uint64_t		doorbell_index;
+	uint64_t		flags;
+	struct amdgpu_mqd_prop	*userq_prop;
+	struct amdgpu_userq_mgr *userq_mgr;
+	struct amdgpu_vm	*vm;
+};
+
+struct amdgpu_userq_funcs {
+	int (*mqd_create)(struct amdgpu_userq_mgr *uq_mgr,
+			  struct drm_amdgpu_userq_in *args,
+			  struct amdgpu_usermode_queue *queue);
+	void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
+			    struct amdgpu_usermode_queue *uq);
+};
+
+/* Usermode queues for gfx */
+struct amdgpu_userq_mgr {
+	struct idr			userq_idr;
+	struct mutex			userq_mutex;
+	struct amdgpu_device		*adev;
+};
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
+
+#endif
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
  2024-04-26 13:47 ` [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management Shashank Sharma
  2024-04-26 13:47 ` [PATCH v9 02/14] drm/amdgpu: add usermode queue base code Shashank Sharma
@ 2024-04-26 13:47 ` Shashank Sharma
  2024-04-30 10:55   ` Christian König
  2024-04-26 13:48 ` [PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:47 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
 - Remove wrapper functions for queue indexing (Christian)
 - Do not save the queue id/idr in queue itself (Christian)
 - Move the idr allocation in the IP independent generic space
  (Christian)

V6:
 - Check the validity of input IP type (Christian)

V7:
 - Move uq_func from uq_mgr to adev (Alex)
 - Add missing free(queue) for error cases (Yifan)

V9:
 - Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 374970984a61..acee1c279abb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2916,6 +2916,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..df97b856f891 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,127 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+	return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *uq_funcs;
+	struct amdgpu_usermode_queue *queue;
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+	if (!queue) {
+		DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+		mutex_unlock(&uq_mgr->userq_mutex);
+		return -EINVAL;
+	}
+
+	uq_funcs = adev->userq_funcs[queue->queue_type];
+	uq_funcs->mqd_destroy(uq_mgr, queue);
+	idr_remove(&uq_mgr->userq_idr, queue_id);
+	kfree(queue);
+
+	mutex_unlock(&uq_mgr->userq_mutex);
+	return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *uq_funcs;
+	struct amdgpu_usermode_queue *queue;
+	int qid, r = 0;
+
+	/* Usermode queues are only supported for GFX/SDMA engines as of now */
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
+		return -EINVAL;
+	}
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	uq_funcs = adev->userq_funcs[args->in.ip_type];
+	if (!uq_funcs) {
+		DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", args->in.ip_type);
+		r = -EINVAL;
+		goto unlock;
+	}
+
+	queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+	if (!queue) {
+		DRM_ERROR("Failed to allocate memory for queue\n");
+		r = -ENOMEM;
+		goto unlock;
+	}
+	queue->doorbell_handle = args->in.doorbell_handle;
+	queue->doorbell_index = args->in.doorbell_offset;
+	queue->queue_type = args->in.ip_type;
+	queue->flags = args->in.flags;
+	queue->vm = &fpriv->vm;
+
+	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
+	if (r) {
+		DRM_ERROR("Failed to create Queue\n");
+		kfree(queue);
+		goto unlock;
+	}
+
+	qid = idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ_COUNT, GFP_KERNEL);
+	if (qid < 0) {
+		DRM_ERROR("Failed to allocate a queue id\n");
+		uq_funcs->mqd_destroy(uq_mgr, queue);
+		kfree(queue);
+		r = -ENOMEM;
+		goto unlock;
+	}
+	args->out.queue_id = qid;
+
+unlock:
+	mutex_unlock(&uq_mgr->userq_mutex);
+	return r;
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *filp)
+{
+	union drm_amdgpu_userq *args = data;
+	int r = 0;
+
+	switch (args->in.op) {
+	case AMDGPU_USERQ_OP_CREATE:
+		r = amdgpu_userqueue_create(filp, args);
+		if (r)
+			DRM_ERROR("Failed to create usermode queue\n");
+		break;
+
+	case AMDGPU_USERQ_OP_FREE:
+		r = amdgpu_userqueue_destroy(filp, args->in.queue_id);
+		if (r)
+			DRM_ERROR("Failed to destroy usermode queue\n");
+		break;
+
+	default:
+		DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
+		return -EINVAL;
+	}
+
+	return r;
+}
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 93ebe4b61682..b739274c72e1 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -54,6 +54,8 @@ struct amdgpu_userq_mgr {
 	struct amdgpu_device		*adev;
 };
 
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (2 preceding siblings ...)
  2024-04-26 13:47 ` [PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 21:30   ` Alex Deucher
  2024-04-26 13:48 ` [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

This patch introduces amdgpu_userqueue_object and its helper
functions to creates and destroy this object. The helper
functions creates/destroys a base amdgpu_bo, kmap/unmap it and
save the respective GPU and CPU addresses in the encapsulating
userqueue object.

These helpers will be used to create/destroy userqueue MQD, WPTR
and FW areas.

V7:
- Forked out this new patch from V11-gfx-userqueue patch to prevent
  that patch from growing very big.
- Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
  for eviction fences (Christian)

V9:
 - Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 13 ++++
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index df97b856f891..65cab0ad97a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 	return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_userq_obj *userq_obj,
+				   int size)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_bo_param bp;
+	int r;
+
+	memset(&bp, 0, sizeof(bp));
+	bp.byte_align = PAGE_SIZE;
+	bp.domain = AMDGPU_GEM_DOMAIN_GTT;
+	bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+		   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+	bp.type = ttm_bo_type_kernel;
+	bp.size = size;
+	bp.resv = NULL;
+	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
+
+	r = amdgpu_bo_create(adev, &bp, &userq_obj->obj);
+	if (r) {
+		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+		return r;
+	}
+
+	r = amdgpu_bo_reserve(userq_obj->obj, true);
+	if (r) {
+		DRM_ERROR("Failed to reserve BO to map (%d)", r);
+		goto free_obj;
+	}
+
+	r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
+	if (r) {
+		DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
+		goto unresv;
+	}
+
+	r = amdgpu_bo_kmap(userq_obj->obj, &userq_obj->cpu_ptr);
+	if (r) {
+		DRM_ERROR("Failed to map BO for userqueue (%d)", r);
+		goto unresv;
+	}
+
+	userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
+	amdgpu_bo_unreserve(userq_obj->obj);
+	memset(userq_obj->cpu_ptr, 0, size);
+	return 0;
+
+unresv:
+	amdgpu_bo_unreserve(userq_obj->obj);
+
+free_obj:
+	amdgpu_bo_unref(&userq_obj->obj);
+	return r;
+}
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_userq_obj *userq_obj)
+{
+	amdgpu_bo_kunmap(userq_obj->obj);
+	amdgpu_bo_unref(&userq_obj->obj);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index b739274c72e1..bbd29f68b8d4 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+	void		 *cpu_ptr;
+	uint64_t	 gpu_addr;
+	struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
 	int			queue_type;
 	uint64_t		doorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_mqd_prop	*userq_prop;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
+	struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
@@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
 
+int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
+				   struct amdgpu_userq_obj *userq_obj,
+				   int size);
+
+void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
+				     struct amdgpu_userq_obj *userq_obj);
 #endif
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (3 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 20:50   ` Alex Deucher
  2024-05-02 15:14   ` Christian König
  2024-04-26 13:48 ` [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue Shashank Sharma
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Adds a new file which will be used for MES based userqueue
  functions targeting GFX and SDMA IP.
- Introduces MQD handler functions for the usermode queues.
- Adds new functions to create and destroy userqueue MQD for
  MES-V11 for GFX IP.

V1: Worked on review comments from Alex:
    - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
    - Reuse the existing adev->mqd[ip] for MQD creation
    - Formatting and arrangement of code

V3:
    - Integration with doorbell manager

V4: Review comments addressed:
    - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
    - Align name of structure members (Luben)
    - Don't break up the Cc tag list and the Sob tag list in commit
      message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
     calls while creating MQD object to amdgpu_bo_create() once eviction
     fences are ready (Christian).

V7:
   - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
   - Use memdup_user instead of copy_from_user (Christian)

V9:
   - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
     that it can be reused for SDMA userqueues as well (Shashank, Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   4 +
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++++++++++++++++++
 3 files changed, 116 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 05a2d1714070..a640bfa468ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,7 +184,8 @@ amdgpu-y += \
 amdgpu-y += \
 	amdgpu_mes.o \
 	mes_v10_1.o \
-	mes_v11_0.o
+	mes_v11_0.o \
+	mes_v11_0_userqueue.o
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index f7325b02a191..525bd0f4d3f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1331,6 +1331,8 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct amdgpu_device *adev)
 	return 0;
 }
 
+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
+
 static int gfx_v11_0_sw_init(void *handle)
 {
 	int i, j, k, r, ring_id = 0;
@@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 2;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		break;
 	case IP_VERSION(11, 0, 1):
 	case IP_VERSION(11, 0, 4):
@@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 1;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		break;
 	default:
 		adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
new file mode 100644
index 000000000000..9e7dee77d344
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2024 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_gfx.h"
+#include "v11_structs.h"
+#include "mes_v11_0.h"
+#include "amdgpu_userqueue.h"
+
+static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
+				      struct drm_amdgpu_userq_in *args_in,
+				      struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
+	struct drm_amdgpu_userq_mqd *mqd_user;
+	struct amdgpu_mqd_prop *userq_props;
+	int r;
+
+	/* Incoming MQD parameters from userspace to be saved here */
+	memset(&mqd_user, 0, sizeof(mqd_user));
+
+	/* Structure to initialize MQD for userqueue using generic MQD init function */
+	userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
+	if (!userq_props) {
+		DRM_ERROR("Failed to allocate memory for userq_props\n");
+		return -ENOMEM;
+	}
+
+	if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd)) {
+		DRM_ERROR("MQD size mismatch\n");
+		r = -EINVAL;
+		goto free_props;
+	}
+
+	mqd_user = memdup_user(u64_to_user_ptr(args_in->mqd), args_in->mqd_size);
+	if (IS_ERR(mqd_user)) {
+		DRM_ERROR("Failed to read user MQD\n");
+		r = -EFAULT;
+		goto free_props;
+	}
+
+	r = amdgpu_userqueue_create_object(uq_mgr, &queue->mqd, mqd_hw_default->mqd_size);
+	if (r) {
+		DRM_ERROR("Failed to create MQD object for userqueue\n");
+		goto free_mqd_user;
+	}
+
+	/* Initialize the MQD BO with user given values */
+	userq_props->wptr_gpu_addr = mqd_user->wptr_va;
+	userq_props->rptr_gpu_addr = mqd_user->rptr_va;
+	userq_props->queue_size = mqd_user->queue_size;
+	userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
+	userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
+	userq_props->use_doorbell = true;
+
+	queue->userq_prop = userq_props;
+
+	r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
+	if (r) {
+		DRM_ERROR("Failed to initialize MQD for userqueue\n");
+		goto free_mqd;
+	}
+
+	return 0;
+
+free_mqd:
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
+
+free_mqd_user:
+	kfree(mqd_user);
+
+free_props:
+	kfree(userq_props);
+
+	return r;
+}
+
+static void
+mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
+			    struct amdgpu_usermode_queue *queue)
+{
+	kfree(queue->userq_prop);
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
+}
+
+const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
+	.mqd_create = mes_v11_0_userq_mqd_create,
+	.mqd_destroy = mes_v11_0_userq_mqd_destroy,
+};
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (4 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 21:11   ` Alex Deucher
  2024-05-02 15:15   ` Christian König
  2024-04-26 13:48 ` [PATCH v9 07/14] drm/amdgpu: map usermode queue into MES Shashank Sharma
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

The FW expects us to allocate at least one page as context
space to process gang, process, GDS and FW  related work.
This patch creates a joint object for the same, and calculates
GPU space offsets of these spaces.

V1: Addressed review comments on RFC patch:
    Alex: Make this function IP specific

V2: Addressed review comments from Christian
    - Allocate only one object for total FW space, and calculate
      offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
    - Remove shadow from FW space list from cover letter (Alex)
    - Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
    Addressed review comments:
    - Use lower_32_bits instead of mask (Christian)
    - gfx_v11_0 instead of gfx_v11 in function names (Alex)
    - Shadow and GDS objects are now coming from userspace (Christian,
      Alex)

V6:
    - Add a comment to replace amdgpu_bo_create_kernel() with
      amdgpu_bo_create() during fw_ctx object creation (Christian).
    - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
      of generic queue structure and make it gen11 specific (Alex).

V7:
   - Using helper function to create/destroy userqueue objects.
   - Removed FW object space allocation.

V8:
   - Updating FW object address from user values.

V9:
   - uppdated function name from gfx_v11_* to mes_v11_*

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 43 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 9e7dee77d344..9f9fdcb9c294 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -27,6 +27,41 @@
 #include "mes_v11_0.h"
 #include "amdgpu_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+
+static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+					    struct amdgpu_usermode_queue *queue,
+					    struct drm_amdgpu_userq_mqd *mqd_user)
+{
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+	int r, size;
+
+	/*
+	 * The FW expects at least one page space allocated for
+	 * process ctx and gang ctx each. Create an object
+	 * for the same.
+	 */
+	size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
+	r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
+	if (r) {
+		DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
+		return r;
+	}
+
+	/* Shadow and GDS objects come directly from userspace */
+	mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
+	mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
+
+	mqd->gds_bkup_base_lo = mqd_user->gds_va & 0xFFFFFFFC;
+	mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
+
+	mqd->fw_work_area_base_lo = mqd_user->csa_va & 0xFFFFFFFC;
+	mqd->fw_work_area_base_hi = upper_32_bits(mqd_user->csa_va);
+	return 0;
+}
+
 static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 				      struct drm_amdgpu_userq_in *args_in,
 				      struct amdgpu_usermode_queue *queue)
@@ -82,6 +117,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* Create BO for FW operations */
+	r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
+	if (r) {
+		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+		goto free_mqd;
+	}
+
 	return 0;
 
 free_mqd:
@@ -100,6 +142,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *queue)
 {
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bbd29f68b8d4..643f31474bd8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
+	struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 07/14] drm/amdgpu: map usermode queue into MES
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (5 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-04-26 13:48 ` [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART Shashank Sharma
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
    - Map/Unmap should be IP specific.
V2:
    Addressed review comments from Christian:
    - Fix the wptr_mc_addr calculation (moved into another patch)
    Addressed review comments from Alex:
    - Do not add fptrs for map/unmap

V3: Integration with doorbell manager
V4: Rebase
V5: Use gfx_v11_0 for function names (Alex)
V6: Removed queue->proc/gang/fw_ctx_address variables and doing the
    address calculations locally to keep the queue structure GEN
    independent (Alex)
V7: Added R-B from Alex
V8: Rebase
V9: Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 9f9fdcb9c294..8d2cd61af26b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,69 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+			       struct amdgpu_usermode_queue *queue,
+			       struct amdgpu_mqd_prop *userq_props)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	struct mes_add_queue_input queue_input;
+	int r;
+
+	memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+	queue_input.process_va_start = 0;
+	queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
+
+	/* set process quantum to 10 ms and gang quantum to 1 ms as default */
+	queue_input.process_quantum = 100000;
+	queue_input.gang_quantum = 10000;
+	queue_input.paging = false;
+
+	queue_input.process_context_addr = ctx->gpu_addr;
+	queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+	queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+	queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+	queue_input.process_id = queue->vm->pasid;
+	queue_input.queue_type = queue->queue_type;
+	queue_input.mqd_addr = queue->mqd.gpu_addr;
+	queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+	queue_input.queue_size = userq_props->queue_size >> 2;
+	queue_input.doorbell_offset = userq_props->doorbell_index;
+	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+	amdgpu_mes_lock(&adev->mes);
+	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+	amdgpu_mes_unlock(&adev->mes);
+	if (r) {
+		DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+		return r;
+	}
+
+	DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", userq_props->doorbell_index);
+	return 0;
+}
+
+static void mes_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+				  struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct mes_remove_queue_input queue_input;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	int r;
+
+	memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+	queue_input.doorbell_offset = queue->doorbell_index;
+	queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+
+	amdgpu_mes_lock(&adev->mes);
+	r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+	amdgpu_mes_unlock(&adev->mes);
+	if (r)
+		DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 					    struct amdgpu_usermode_queue *queue,
 					    struct drm_amdgpu_userq_mqd *mqd_user)
@@ -124,8 +187,18 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* Map userqueue into FW using MES */
+	r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
+	if (r) {
+		DRM_ERROR("Failed to init MQD\n");
+		goto free_ctx;
+	}
+
 	return 0;
 
+free_ctx:
+	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
+
 free_mqd:
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
 
@@ -142,6 +215,7 @@ static void
 mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *queue)
 {
+	mes_v11_0_userq_unmap(uq_mgr, queue);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (6 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 07/14] drm/amdgpu: map usermode queue into MES Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 21:36   ` Alex Deucher
  2024-05-02 15:18   ` Christian König
  2024-04-26 13:48 ` [PATCH v9 09/14] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).

V5: Addressed review comments from Christian:
    - Either pin object or allocate from GART, but not both.
    - All the handling must be done with the VM locks held.

V7: Addressed review comments from Christian:
    - Do not take vm->eviction_lock
    - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset

V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 2 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 8d2cd61af26b..37b80626e792 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -30,6 +30,74 @@
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 
+static int
+mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+{
+	int ret;
+
+	ret = amdgpu_bo_reserve(bo, true);
+	if (ret) {
+		DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+		goto err_reserve_bo_failed;
+	}
+
+	ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+	if (ret) {
+		DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+		goto err_map_bo_gart_failed;
+	}
+
+	amdgpu_bo_unreserve(bo);
+	bo = amdgpu_bo_ref(bo);
+
+	return 0;
+
+err_map_bo_gart_failed:
+	amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+	return ret;
+}
+
+static int
+mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
+			      struct amdgpu_usermode_queue *queue,
+			      uint64_t wptr)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_bo_va_mapping *wptr_mapping;
+	struct amdgpu_vm *wptr_vm;
+	struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
+	int ret;
+
+	wptr_vm = queue->vm;
+	ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+	if (ret)
+		return ret;
+
+	wptr &= AMDGPU_GMC_HOLE_MASK;
+	wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+	amdgpu_bo_unreserve(wptr_vm->root.bo);
+	if (!wptr_mapping) {
+		DRM_ERROR("Failed to lookup wptr bo\n");
+		return -EINVAL;
+	}
+
+	wptr_obj->obj = wptr_mapping->bo_va->base.bo;
+	if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
+		DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
+		return -EINVAL;
+	}
+
+	ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
+	if (ret) {
+		DRM_ERROR("Failed to map wptr bo to GART\n");
+		return ret;
+	}
+
+	queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
+	return 0;
+}
+
 static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 			       struct amdgpu_usermode_queue *queue,
 			       struct amdgpu_mqd_prop *userq_props)
@@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 	queue_input.queue_size = userq_props->queue_size >> 2;
 	queue_input.doorbell_offset = userq_props->doorbell_index;
 	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+	queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
 
 	amdgpu_mes_lock(&adev->mes);
 	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
@@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* FW expects WPTR BOs to be mapped into GART */
+	r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
+	if (r) {
+		DRM_ERROR("Failed to create WPTR mapping\n");
+		goto free_ctx;
+	}
+
 	/* Map userqueue into FW using MES */
 	r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
 	if (r) {
@@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
 			    struct amdgpu_usermode_queue *queue)
 {
 	mes_v11_0_userq_unmap(uq_mgr, queue);
+	amdgpu_bo_unref(&queue->wptr_obj.obj);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
 	kfree(queue->userq_prop);
 	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 643f31474bd8..ffe8a3d73756 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
 	struct amdgpu_userq_obj fw_obj;
+	struct amdgpu_userq_obj wptr_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 09/14] drm/amdgpu: generate doorbell index for userqueue
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (7 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-04-26 13:48 ` [PATCH v9 10/14] drm/amdgpu: cleanup leftover queues Shashank Sharma
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

V5: Fix the db object reference leak (Christian)
V6: Pin the doorbell bo in userqueue_create() function, and unpin it
    in userqueue destoy (Christian)
V7: Added missing kfree for queue in error cases
    Added Alex's R-B
V8: Rebase
V9: Changed the function names from gfx_v11* to mes_v11*

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 59 +++++++++++++++++++
 .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 3 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 65cab0ad97a1..fbc7313710f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -94,6 +94,53 @@ void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
 	amdgpu_bo_unref(&userq_obj->obj);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+				     struct amdgpu_usermode_queue *queue,
+				     struct drm_file *filp,
+				     uint32_t doorbell_offset)
+{
+	uint64_t index;
+	struct drm_gem_object *gobj;
+	struct amdgpu_userq_obj *db_obj = &queue->db_obj;
+	int r;
+
+	gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+	if (gobj == NULL) {
+		DRM_ERROR("Can't find GEM object for doorbell\n");
+		return -EINVAL;
+	}
+
+	db_obj->obj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+	drm_gem_object_put(gobj);
+
+	/* Pin the BO before generating the index, unpin in queue destroy */
+	r = amdgpu_bo_pin(db_obj->obj, AMDGPU_GEM_DOMAIN_DOORBELL);
+	if (r) {
+		DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+		goto unref_bo;
+	}
+
+	r = amdgpu_bo_reserve(db_obj->obj, true);
+	if (r) {
+		DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+		goto unpin_bo;
+	}
+
+	index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_obj->obj,
+					     doorbell_offset, sizeof(u64));
+	DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+	amdgpu_bo_unreserve(db_obj->obj);
+	return index;
+
+unpin_bo:
+	amdgpu_bo_unpin(db_obj->obj);
+
+unref_bo:
+	amdgpu_bo_unref(&db_obj->obj);
+	return r;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -114,6 +161,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 
 	uq_funcs = adev->userq_funcs[queue->queue_type];
 	uq_funcs->mqd_destroy(uq_mgr, queue);
+	amdgpu_bo_unpin(queue->db_obj.obj);
+	amdgpu_bo_unref(&queue->db_obj.obj);
 	idr_remove(&uq_mgr->userq_idr, queue_id);
 	kfree(queue);
 
@@ -129,6 +178,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	struct amdgpu_device *adev = uq_mgr->adev;
 	const struct amdgpu_userq_funcs *uq_funcs;
 	struct amdgpu_usermode_queue *queue;
+	uint64_t index;
 	int qid, r = 0;
 
 	/* Usermode queues are only supported for GFX/SDMA engines as of now */
@@ -158,6 +208,15 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	queue->flags = args->in.flags;
 	queue->vm = &fpriv->vm;
 
+	/* Convert relative doorbell offset into absolute doorbell index */
+	index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, args->in.doorbell_offset);
+	if (index == (uint64_t)-EINVAL) {
+		DRM_ERROR("Failed to get doorbell for queue\n");
+		kfree(queue);
+		goto unlock;
+	}
+	queue->doorbell_index = index;
+
 	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
 	if (r) {
 		DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index 37b80626e792..a6c3037d2d1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -240,6 +240,7 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 	userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
 	userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
 	userq_props->use_doorbell = true;
+	userq_props->doorbell_index = queue->doorbell_index;
 
 	queue->userq_prop = userq_props;
 
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index ffe8a3d73756..a653e31350c5 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
+	struct amdgpu_userq_obj	db_obj;
 	struct amdgpu_userq_obj fw_obj;
 	struct amdgpu_userq_obj wptr_obj;
 };
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 10/14] drm/amdgpu: cleanup leftover queues
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (8 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 09/14] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-04-26 13:48 ` [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask Shashank Sharma
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx
  Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig,
	Bas Nieuwenhuizen

This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

V7: Added Alex's R-B
V8: Rebase
V9: Rebase

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 27 ++++++++++++++-----
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index fbc7313710f6..781283753804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -26,6 +26,19 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_userqueue.h"
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+			 struct amdgpu_usermode_queue *queue,
+			 int queue_id)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	const struct amdgpu_userq_funcs *uq_funcs = adev->userq_funcs[queue->queue_type];
+
+	uq_funcs->mqd_destroy(uq_mgr, queue);
+	idr_remove(&uq_mgr->userq_idr, queue_id);
+	kfree(queue);
+}
+
 static struct amdgpu_usermode_queue *
 amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 {
@@ -146,8 +159,6 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
-	struct amdgpu_device *adev = uq_mgr->adev;
-	const struct amdgpu_userq_funcs *uq_funcs;
 	struct amdgpu_usermode_queue *queue;
 
 	mutex_lock(&uq_mgr->userq_mutex);
@@ -159,13 +170,9 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 		return -EINVAL;
 	}
 
-	uq_funcs = adev->userq_funcs[queue->queue_type];
-	uq_funcs->mqd_destroy(uq_mgr, queue);
 	amdgpu_bo_unpin(queue->db_obj.obj);
 	amdgpu_bo_unref(&queue->db_obj.obj);
-	idr_remove(&uq_mgr->userq_idr, queue_id);
-	kfree(queue);
-
+	amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
 	mutex_unlock(&uq_mgr->userq_mutex);
 	return 0;
 }
@@ -277,6 +284,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+	uint32_t queue_id;
+	struct amdgpu_usermode_queue *queue;
+
+	idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
+		amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
 	idr_destroy(&userq_mgr->userq_idr);
 	mutex_destroy(&userq_mgr->userq_mutex);
 }
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (9 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 10/14] drm/amdgpu: cleanup leftover queues Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 21:27   ` Alex Deucher
  2024-05-02 15:19   ` Christian König
  2024-04-26 13:48 ` [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Christian König, Alex Deucher

Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same.
- Removes the central mask setup and makes it IP specific, as it would
  be different when the number of pipes and queues are different.

V9: introduce this patch in the series

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
 drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++++++--
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++++++--
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index a00cf4756ad0..b405fafc0b71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
 		adev->mes.compute_hqd_mask[i] = 0xc;
 	}
 
-	for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
-		adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffffffe;
-
 	for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
 		if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
 		    IP_VERSION(6, 0, 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 4c8fc3117ef8..598556619337 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -110,7 +110,6 @@ struct amdgpu_mes {
 	uint32_t                        vmid_mask_gfxhub;
 	uint32_t                        vmid_mask_mmhub;
 	uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
-	uint32_t                        gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
 	uint32_t                        sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
 	uint32_t                        aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
 	uint32_t                        sch_ctx_offs;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
index 1e5ad1e08d2a..4d1121d1a1e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes *mes)
 		mes_set_hw_res_pkt.compute_hqd_mask[i] =
 			mes->compute_hqd_mask[i];
 
-	for (i = 0; i < MAX_GFX_PIPES; i++)
-		mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+	/*
+	 * GFX pipe 0 queue 0 is being used by kernel
+	 * Set GFX pipe 0 queue 1 for MES scheduling
+	 * GFX pipe 1 can't be used for MES due to HW limitation.
+	 */
+	mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+	mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
 	for (i = 0; i < MAX_SDMA_PIPES; i++)
 		mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 63f281a9984d..feb7fa2c304c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
 		mes_set_hw_res_pkt.compute_hqd_mask[i] =
 			mes->compute_hqd_mask[i];
 
-	for (i = 0; i < MAX_GFX_PIPES; i++)
-		mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
+	/*
+	 * GFX pipe 0 queue 0 is being used by kernel
+	 * Set GFX pipe 0 queue 1 for MES scheduling
+	 * GFX pipe 1 can't be used for MES due to HW limitation.
+	 */
+	mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
+	mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
 
 	for (i = 0; i < MAX_SDMA_PIPES; i++)
 		mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (10 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 20:41   ` Alex Deucher
  2024-04-26 13:48 ` [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx
  Cc: Arvind Yadav, Shashank Sharma, Christian König,
	Alex Deucher, Srinivasan Shanmugam

This patch does necessary modifications to enable the SDMA
usermode queues using the existing userqueue infrastructure.

V9: introduced this patch in the series

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    | 2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c           | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 781283753804..e516487e8db9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	int qid, r = 0;
 
 	/* Usermode queues are only supported for GFX/SDMA engines as of now */
-	if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
 		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a6c3037d2d1f..a5e270eda37b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 		return r;
 	}
 
+	/* We don't need to set other FW objects for SDMA queues */
+	if (queue->queue_type == AMDGPU_HW_IP_DMA)
+		return 0;
+
 	/* Shadow and GDS objects come directly from userspace */
 	mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
 	mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 361835a61f2e..90354a70c807 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
 	return 0;
 }
 
+extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
+
 static int sdma_v6_0_sw_init(void *handle)
 {
 	struct amdgpu_ring *ring;
@@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
 		return -EINVAL;
 	}
 
+	adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
 	return r;
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (11 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-01 20:44   ` Alex Deucher
  2024-04-26 13:48 ` [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
  2024-05-02 15:51 ` [PATCH v9 00/14] AMDGPU usermode queues Alex Deucher
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Alex Deucher, Christian Koenig, Shashank Sharma

From: Arvind Yadav <arvind.yadav@amd.com>

This patch does the necessary changes required to
enable compute workload support using the existing
usermode queues infrastructure.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c           |  2 ++
 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +++++++++-
 include/uapi/drm/amdgpu_drm.h                    |  1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index e516487e8db9..78d34fa7a0b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	int qid, r = 0;
 
 	/* Usermode queues are only supported for GFX/SDMA engines as of now */
-	if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA
+			&& args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
 		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 525bd0f4d3f7..27b86f7fe949 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
 		break;
 	case IP_VERSION(11, 0, 1):
 	case IP_VERSION(11, 0, 4):
@@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
+		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
 		break;
 	default:
 		adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
index a5e270eda37b..d61d80f86003 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
@@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 	}
 
 	/* We don't need to set other FW objects for SDMA queues */
-	if (queue->queue_type == AMDGPU_HW_IP_DMA)
+	if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
+	    (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
 		return 0;
 
 	/* Shadow and GDS objects come directly from userspace */
@@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 	userq_props->use_doorbell = true;
 	userq_props->doorbell_index = queue->doorbell_index;
 
+	if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
+		userq_props->eop_gpu_addr = mqd_user->eop_va;
+		userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+		userq_props->hqd_queue_priority = AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
+		userq_props->hqd_active = false;
+	}
+
 	queue->userq_prop = userq_props;
 
 	r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 22f56a30f7cb..676792ad3618 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
 	 * sized.
 	 */
 	__u64   csa_va;
+	__u64   eop_va;
 };
 
 struct drm_amdgpu_userq_in {
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (12 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
@ 2024-04-26 13:48 ` Shashank Sharma
  2024-05-02 15:22   ` Christian König
  2024-05-02 15:51 ` [PATCH v9 00/14] AMDGPU usermode queues Alex Deucher
  14 siblings, 1 reply; 51+ messages in thread
From: Shashank Sharma @ 2024-04-26 13:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Arvind Yadav, Shashank Sharma, Alex Deucher, Christian Koenig

This patch:
- adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
- moves the usequeue initialization code for all IPs under
  this flag

so that the userqueue works only when the config is enabled.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Kconfig     | 8 ++++++++
 drivers/gpu/drm/amd/amdgpu/Makefile    | 8 ++++++--
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 22d88f8ef527..bba963527d22 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
 	  Add -Werror to the build flags for amdgpu.ko.
 	  Only enable this if you are warning code for amdgpu.ko.
 
+config DRM_AMDGPU_USERQ_GFX
+	bool "Enable Navi 3x gfx usermode queues"
+	depends on DRM_AMDGPU
+	default n
+	help
+	  Choose this option to enable usermode queue support for GFX
+          workload submission. This feature is supported on Navi 3X only.
+
 source "drivers/gpu/drm/amd/acp/Kconfig"
 source "drivers/gpu/drm/amd/display/Kconfig"
 source "drivers/gpu/drm/amd/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index a640bfa468ad..0b17fc1740a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -184,8 +184,12 @@ amdgpu-y += \
 amdgpu-y += \
 	amdgpu_mes.o \
 	mes_v10_1.o \
-	mes_v11_0.o \
-	mes_v11_0_userqueue.o
+	mes_v11_0.o
+
+# add GFX userqueue support
+ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
+amdgpu-y += mes_v11_0_userqueue.o
+endif
 
 # add UVD block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 27b86f7fe949..8591aed9f9ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 2;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
+#endif
 		break;
 	case IP_VERSION(11, 0, 1):
 	case IP_VERSION(11, 0, 4):
@@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
 		adev->gfx.mec.num_mec = 1;
 		adev->gfx.mec.num_pipe_per_mec = 4;
 		adev->gfx.mec.num_queue_per_pipe = 4;
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
 		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
 		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
+#endif
 		break;
 	default:
 		adev->gfx.me.num_me = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 90354a70c807..084059c95db6 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
 		return -EINVAL;
 	}
 
+#ifdef CONFIG_DRM_AMD_USERQ_GFX
 	adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
+#endif
+
 	return r;
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue
  2024-04-26 13:47 ` [PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2024-04-30 10:55   ` Christian König
  0 siblings, 0 replies; 51+ messages in thread
From: Christian König @ 2024-04-30 10:55 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Alex Deucher

Am 26.04.24 um 15:47 schrieb Shashank Sharma:
> This patch adds:
> - A new IOCTL function to create and destroy
> - A new structure to keep all the user queue data in one place.
> - A function to generate unique index for the queue.
>
> V1: Worked on review comments from RFC patch series:
>    - Alex: Keep a list of queues, instead of single queue per process.
>    - Christian: Use the queue manager instead of global ptrs,
>             Don't keep the queue structure in amdgpu_ctx
>
> V2: Worked on review comments:
>   - Christian:
>     - Formatting of text
>     - There is no need for queuing of userqueues, with idr in place
>   - Alex:
>     - Remove use_doorbell, its unnecessary
>     - Reuse amdgpu_mqd_props for saving mqd fields
>
>   - Code formatting and re-arrangement
>
> V3:
>   - Integration with doorbell manager
>
> V4:
>   - Accommodate MQD union related changes in UAPI (Alex)
>   - Do not set the queue size twice (Bas)
>
> V5:
>   - Remove wrapper functions for queue indexing (Christian)
>   - Do not save the queue id/idr in queue itself (Christian)
>   - Move the idr allocation in the IP independent generic space
>    (Christian)
>
> V6:
>   - Check the validity of input IP type (Christian)
>
> V7:
>   - Move uq_func from uq_mgr to adev (Alex)
>   - Add missing free(queue) for error cases (Yifan)
>
> V9:
>   - Rebase
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 121 ++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
>   3 files changed, 124 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 374970984a61..acee1c279abb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2916,6 +2916,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   };
>   
>   static const struct drm_driver amdgpu_kms_driver = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index effc0c7c02cf..df97b856f891 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -23,6 +23,127 @@
>    */
>   
>   #include "amdgpu.h"
> +#include "amdgpu_vm.h"
> +#include "amdgpu_userqueue.h"
> +
> +static struct amdgpu_usermode_queue *
> +amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
> +{
> +	return idr_find(&uq_mgr->userq_idr, qid);
> +}
> +
> +static int
> +amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
> +{
> +	struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	const struct amdgpu_userq_funcs *uq_funcs;
> +	struct amdgpu_usermode_queue *queue;
> +
> +	mutex_lock(&uq_mgr->userq_mutex);
> +
> +	queue = amdgpu_userqueue_find(uq_mgr, queue_id);
> +	if (!queue) {
> +		DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
> +		mutex_unlock(&uq_mgr->userq_mutex);
> +		return -EINVAL;
> +	}
> +
> +	uq_funcs = adev->userq_funcs[queue->queue_type];
> +	uq_funcs->mqd_destroy(uq_mgr, queue);
> +	idr_remove(&uq_mgr->userq_idr, queue_id);
> +	kfree(queue);
> +
> +	mutex_unlock(&uq_mgr->userq_mutex);
> +	return 0;
> +}
> +
> +static int
> +amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> +{
> +	struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	const struct amdgpu_userq_funcs *uq_funcs;
> +	struct amdgpu_usermode_queue *queue;
> +	int qid, r = 0;
> +
> +	/* Usermode queues are only supported for GFX/SDMA engines as of now */
> +	if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
> +		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&uq_mgr->userq_mutex);
> +
> +	uq_funcs = adev->userq_funcs[args->in.ip_type];
> +	if (!uq_funcs) {
> +		DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", args->in.ip_type);
> +		r = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
> +	if (!queue) {
> +		DRM_ERROR("Failed to allocate memory for queue\n");
> +		r = -ENOMEM;
> +		goto unlock;
> +	}
> +	queue->doorbell_handle = args->in.doorbell_handle;
> +	queue->doorbell_index = args->in.doorbell_offset;
> +	queue->queue_type = args->in.ip_type;
> +	queue->flags = args->in.flags;
> +	queue->vm = &fpriv->vm;
> +
> +	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
> +	if (r) {
> +		DRM_ERROR("Failed to create Queue\n");
> +		kfree(queue);
> +		goto unlock;
> +	}
> +
> +	qid = idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ_COUNT, GFP_KERNEL);
> +	if (qid < 0) {
> +		DRM_ERROR("Failed to allocate a queue id\n");
> +		uq_funcs->mqd_destroy(uq_mgr, queue);
> +		kfree(queue);
> +		r = -ENOMEM;
> +		goto unlock;
> +	}
> +	args->out.queue_id = qid;
> +
> +unlock:
> +	mutex_unlock(&uq_mgr->userq_mutex);
> +	return r;
> +}
> +
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> +		       struct drm_file *filp)
> +{
> +	union drm_amdgpu_userq *args = data;
> +	int r = 0;

Don't initialize local variables if it isn't necessary.

We now have automated checkers complaining about that. If some older 
compiler complains then rather change the code below to have a goto 
error or something like that.

> +
> +	switch (args->in.op) {
> +	case AMDGPU_USERQ_OP_CREATE:
> +		r = amdgpu_userqueue_create(filp, args);
> +		if (r)
> +			DRM_ERROR("Failed to create usermode queue\n");
> +		break;
> +
> +	case AMDGPU_USERQ_OP_FREE:
> +		r = amdgpu_userqueue_destroy(filp, args->in.queue_id);
> +		if (r)
> +			DRM_ERROR("Failed to destroy usermode queue\n");
> +		break;
> +
> +	default:
> +		DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);

That could spam the logs if we ever decide to extend the IOCTL, rather 
make the message debug severity.

With those two handled the patch is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Regards,
Christian.

> +		return -EINVAL;
> +	}
> +
> +	return r;
> +}
>   
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>   {
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 93ebe4b61682..b739274c72e1 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -54,6 +54,8 @@ struct amdgpu_userq_mgr {
>   	struct amdgpu_device		*adev;
>   };
>   
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
> +
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>   
>   void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management
  2024-04-26 13:47 ` [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2024-05-01 20:39   ` Alex Deucher
  2024-05-02  5:23     ` Sharma, Shashank
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 20:39 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Alex Deucher <alexander.deucher@amd.com>
>
> This patch intorduces new UAPI/IOCTL for usermode graphics
> queue. The userspace app will fill this structure and request
> the graphics driver to add a graphics work queue for it. The
> output of this UAPI is a queue id.
>
> This UAPI maps the queue into GPU, so the graphics app can start
> submitting work to the queue as soon as the call returns.
>
> V2: Addressed review comments from Alex and Christian
>     - Make the doorbell offset's comment clearer
>     - Change the output parameter name to queue_id
>
> V3: Integration with doorbell manager
>
> V4:
>     - Updated the UAPI doc (Pierre-Eric)
>     - Created a Union for engine specific MQDs (Alex)
>     - Added Christian's R-B
> V5:
>     - Add variables for GDS and CSA in MQD structure (Alex)
>     - Make MQD data a ptr-size pair instead of union (Alex)
>
> V9:
>    - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
>      drm_amdgpu_userq_mqd as its being used for SDMA and
>      compute queues as well
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  include/uapi/drm/amdgpu_drm.h | 110 ++++++++++++++++++++++++++++++++++
>  1 file changed, 110 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 96e32dafd4f0..22f56a30f7cb 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -54,6 +54,7 @@ extern "C" {
>  #define DRM_AMDGPU_VM                  0x13
>  #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>  #define DRM_AMDGPU_SCHED               0x15
> +#define DRM_AMDGPU_USERQ               0x16
>
>  #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>  #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -71,6 +72,7 @@ extern "C" {
>  #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>  #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>  #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>
>  /**
>   * DOC: memory domains
> @@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
>         union drm_amdgpu_ctx_out out;
>  };
>
> +/* user queue IOCTL */
> +#define AMDGPU_USERQ_OP_CREATE 1
> +#define AMDGPU_USERQ_OP_FREE   2
> +
> +/* Flag to indicate secure buffer related workload, unused for now */
> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> +/* Flag to indicate AQL workload, unused for now */
> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> +
> +/*
> + * MQD (memory queue descriptor) is a set of parameters which allow
> + * the GPU to uniquely define and identify a usermode queue. This
> + * structure defines the MQD for GFX-V11 IP ver 0.
> + */
> +struct drm_amdgpu_userq_mqd {

Maybe rename this to drm_amdgpu_gfx_userq_mqd since it's gfx specific.
Then we can add different MQDs for SDMA, compute, etc. as they have
different metadata.  E.g., the shadow and CSA are gfx only.

Alex


> +       /**
> +        * @queue_va: Virtual address of the GPU memory which holds the queue
> +        * object. The queue holds the workload packets.
> +        */
> +       __u64   queue_va;
> +       /**
> +        * @queue_size: Size of the queue in bytes, this needs to be 256-byte
> +        * aligned.
> +        */
> +       __u64   queue_size;
> +       /**
> +        * @rptr_va : Virtual address of the GPU memory which holds the ring RPTR.
> +        * This object must be at least 8 byte in size and aligned to 8-byte offset.
> +        */
> +       __u64   rptr_va;
> +       /**
> +        * @wptr_va : Virtual address of the GPU memory which holds the ring WPTR.
> +        * This object must be at least 8 byte in size and aligned to 8-byte offset.
> +        *
> +        * Queue, RPTR and WPTR can come from the same object, as long as the size
> +        * and alignment related requirements are met.
> +        */
> +       __u64   wptr_va;
> +       /**
> +        * @shadow_va: Virtual address of the GPU memory to hold the shadow buffer.
> +        * This must be a from a separate GPU object, and must be at least 4-page
> +        * sized.
> +        */
> +       __u64   shadow_va;
> +       /**
> +        * @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
> +        * This must be a from a separate GPU object, and must be at least 1-page
> +        * sized.
> +        */
> +       __u64   gds_va;
> +       /**
> +        * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
> +        * This must be a from a separate GPU object, and must be at least 1-page
> +        * sized.
> +        */
> +       __u64   csa_va;
> +};
> +
> +struct drm_amdgpu_userq_in {
> +       /** AMDGPU_USERQ_OP_* */
> +       __u32   op;
> +       /** Queue handle for USERQ_OP_FREE */
> +       __u32   queue_id;
> +       /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
> +       __u32   ip_type;
> +       /**
> +        * @flags: flags to indicate special function for queue like secure
> +        * buffer (TMZ). Unused for now.
> +        */
> +       __u32   flags;
> +       /**
> +        * @doorbell_handle: the handle of doorbell GEM object
> +        * associated to this client.
> +        */
> +       __u32   doorbell_handle;
> +       /**
> +        * @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
> +        * Kernel will generate absolute doorbell offset using doorbell_handle
> +        * and doorbell_offset in the doorbell bo.
> +        */
> +       __u32   doorbell_offset;
> +       /**
> +        * @mqd: Queue descriptor for USERQ_OP_CREATE
> +        * MQD data can be of different size for different GPU IP/engine and
> +        * their respective versions/revisions, so this points to a __u64 *
> +        * which holds MQD of this usermode queue.
> +        */
> +       __u64 mqd;
> +       /**
> +        * @size: size of MQD data in bytes, it must match the MQD structure
> +        * size of the respective engine/revision defined in UAPI for ex, for
> +        * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
> +        */
> +       __u64 mqd_size;
> +};
> +
> +struct drm_amdgpu_userq_out {
> +       /** Queue handle */
> +       __u32   queue_id;
> +       /** Flags */
> +       __u32   flags;
> +};
> +
> +union drm_amdgpu_userq {
> +       struct drm_amdgpu_userq_in in;
> +       struct drm_amdgpu_userq_out out;
> +};
> +
>  /* vm ioctl */
>  #define AMDGPU_VM_OP_RESERVE_VMID      1
>  #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues
  2024-04-26 13:48 ` [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
@ 2024-05-01 20:41   ` Alex Deucher
  2024-05-02  5:47     ` Sharma, Shashank
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 20:41 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: amd-gfx, Arvind Yadav, Christian König, Alex Deucher,
	Srinivasan Shanmugam

On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> This patch does necessary modifications to enable the SDMA
> usermode queues using the existing userqueue infrastructure.
>
> V9: introduced this patch in the series
>
> Cc: Christian König <Christian.Koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    | 2 +-
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 ++++
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c           | 3 +++
>  3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 781283753804..e516487e8db9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>         int qid, r = 0;
>
>         /* Usermode queues are only supported for GFX/SDMA engines as of now */
> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
>                 DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
>                 return -EINVAL;
>         }
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index a6c3037d2d1f..a5e270eda37b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>                 return r;
>         }
>
> +       /* We don't need to set other FW objects for SDMA queues */
> +       if (queue->queue_type == AMDGPU_HW_IP_DMA)
> +               return 0;
> +
>         /* Shadow and GDS objects come directly from userspace */
>         mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
>         mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> index 361835a61f2e..90354a70c807 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> @@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
>         return 0;
>  }
>
> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;

Can you include the header rather than adding an extern?

> +
>  static int sdma_v6_0_sw_init(void *handle)
>  {
>         struct amdgpu_ring *ring;
> @@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
>                 return -EINVAL;
>         }
>
> +       adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
>         return r;
>  }

I think we need a new mqd descriptor in amdgpu_drm.h as well since the
sdma metadata is different from gfx and compute.

Alex

>
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue
  2024-04-26 13:48 ` [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
@ 2024-05-01 20:44   ` Alex Deucher
  2024-05-02  5:50     ` Sharma, Shashank
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 20:44 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Arvind Yadav <arvind.yadav@amd.com>
>
> This patch does the necessary changes required to
> enable compute workload support using the existing
> usermode queues infrastructure.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c           |  2 ++
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +++++++++-
>  include/uapi/drm/amdgpu_drm.h                    |  1 +
>  4 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index e516487e8db9..78d34fa7a0b9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>         int qid, r = 0;
>
>         /* Usermode queues are only supported for GFX/SDMA engines as of now */
> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA
> +                       && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
>                 DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
>                 return -EINVAL;
>         }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 525bd0f4d3f7..27b86f7fe949 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
>                 adev->gfx.mec.num_pipe_per_mec = 4;
>                 adev->gfx.mec.num_queue_per_pipe = 4;
>                 adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
>                 break;
>         case IP_VERSION(11, 0, 1):
>         case IP_VERSION(11, 0, 4):
> @@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
>                 adev->gfx.mec.num_pipe_per_mec = 4;
>                 adev->gfx.mec.num_queue_per_pipe = 4;
>                 adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
>                 break;
>         default:
>                 adev->gfx.me.num_me = 1;
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index a5e270eda37b..d61d80f86003 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>         }
>
>         /* We don't need to set other FW objects for SDMA queues */
> -       if (queue->queue_type == AMDGPU_HW_IP_DMA)
> +       if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
> +           (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
>                 return 0;
>
>         /* Shadow and GDS objects come directly from userspace */
> @@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>         userq_props->use_doorbell = true;
>         userq_props->doorbell_index = queue->doorbell_index;
>
> +       if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
> +               userq_props->eop_gpu_addr = mqd_user->eop_va;
> +               userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
> +               userq_props->hqd_queue_priority = AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
> +               userq_props->hqd_active = false;
> +       }
> +
>         queue->userq_prop = userq_props;
>
>         r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 22f56a30f7cb..676792ad3618 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
>          * sized.
>          */
>         __u64   csa_va;
> +       __u64   eop_va;
>  };

Let's add a new mqd descriptor for compute since it's different from
gfx and sdma.  Also, can we handle the eop buffer as part of the
kernel metadata for compute user queues rather than having the user
specify it?

Alex

>
>  struct drm_amdgpu_userq_in {
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX
  2024-04-26 13:48 ` [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
@ 2024-05-01 20:50   ` Alex Deucher
  2024-05-02  5:26     ` Sharma, Shashank
  2024-05-02 15:14   ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 20:50 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 9:48 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> A Memory queue descriptor (MQD) of a userqueue defines it in
> the hw's context. As MQD format can vary between different
> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>
> This patch:
> - Adds a new file which will be used for MES based userqueue
>   functions targeting GFX and SDMA IP.
> - Introduces MQD handler functions for the usermode queues.
> - Adds new functions to create and destroy userqueue MQD for
>   MES-V11 for GFX IP.
>
> V1: Worked on review comments from Alex:
>     - Make MQD functions GEN and IP specific
>
> V2: Worked on review comments from Alex:
>     - Reuse the existing adev->mqd[ip] for MQD creation
>     - Formatting and arrangement of code
>
> V3:
>     - Integration with doorbell manager
>
> V4: Review comments addressed:
>     - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>     - Align name of structure members (Luben)
>     - Don't break up the Cc tag list and the Sob tag list in commit
>       message (Luben)
> V5:
>    - No need to reserve the bo for MQD (Christian).
>    - Some more changes to support IP specific MQD creation.
>
> V6:
>    - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
>      calls while creating MQD object to amdgpu_bo_create() once eviction
>      fences are ready (Christian).
>
> V7:
>    - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
>    - Use memdup_user instead of copy_from_user (Christian)
>
> V9:
>    - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
>      that it can be reused for SDMA userqueues as well (Shashank, Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   4 +
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++++++++++++++++++
>  3 files changed, 116 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 05a2d1714070..a640bfa468ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -184,7 +184,8 @@ amdgpu-y += \
>  amdgpu-y += \
>         amdgpu_mes.o \
>         mes_v10_1.o \
> -       mes_v11_0.o
> +       mes_v11_0.o \
> +       mes_v11_0_userqueue.o
>
>  # add UVD block
>  amdgpu-y += \
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index f7325b02a191..525bd0f4d3f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -1331,6 +1331,8 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct amdgpu_device *adev)
>         return 0;
>  }
>
> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
> +
>  static int gfx_v11_0_sw_init(void *handle)
>  {
>         int i, j, k, r, ring_id = 0;
> @@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
>                 adev->gfx.mec.num_mec = 2;
>                 adev->gfx.mec.num_pipe_per_mec = 4;
>                 adev->gfx.mec.num_queue_per_pipe = 4;
> +               adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>                 break;
>         case IP_VERSION(11, 0, 1):
>         case IP_VERSION(11, 0, 4):
> @@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
>                 adev->gfx.mec.num_mec = 1;
>                 adev->gfx.mec.num_pipe_per_mec = 4;
>                 adev->gfx.mec.num_queue_per_pipe = 4;
> +               adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;

Does this work on APUs yet?  If not, we should limit it to just dGPUs
for now.  Also, we should add minimum firmware version checks for user
queue support.

>                 break;
>         default:
>                 adev->gfx.me.num_me = 1;
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> new file mode 100644
> index 000000000000..9e7dee77d344
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2024 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include "amdgpu.h"
> +#include "amdgpu_gfx.h"
> +#include "v11_structs.h"
> +#include "mes_v11_0.h"
> +#include "amdgpu_userqueue.h"
> +
> +static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> +                                     struct drm_amdgpu_userq_in *args_in,
> +                                     struct amdgpu_usermode_queue *queue)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
> +       struct drm_amdgpu_userq_mqd *mqd_user;
> +       struct amdgpu_mqd_prop *userq_props;
> +       int r;
> +
> +       /* Incoming MQD parameters from userspace to be saved here */
> +       memset(&mqd_user, 0, sizeof(mqd_user));
> +
> +       /* Structure to initialize MQD for userqueue using generic MQD init function */
> +       userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
> +       if (!userq_props) {
> +               DRM_ERROR("Failed to allocate memory for userq_props\n");
> +               return -ENOMEM;
> +       }
> +
> +       if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd)) {
> +               DRM_ERROR("MQD size mismatch\n");
> +               r = -EINVAL;
> +               goto free_props;
> +       }
> +
> +       mqd_user = memdup_user(u64_to_user_ptr(args_in->mqd), args_in->mqd_size);
> +       if (IS_ERR(mqd_user)) {
> +               DRM_ERROR("Failed to read user MQD\n");
> +               r = -EFAULT;
> +               goto free_props;
> +       }
> +
> +       r = amdgpu_userqueue_create_object(uq_mgr, &queue->mqd, mqd_hw_default->mqd_size);
> +       if (r) {
> +               DRM_ERROR("Failed to create MQD object for userqueue\n");
> +               goto free_mqd_user;
> +       }
> +
> +       /* Initialize the MQD BO with user given values */
> +       userq_props->wptr_gpu_addr = mqd_user->wptr_va;
> +       userq_props->rptr_gpu_addr = mqd_user->rptr_va;
> +       userq_props->queue_size = mqd_user->queue_size;
> +       userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
> +       userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;

We should validate the user virtual addresses and make sure they are
non-0 and not part of the reserved areas of the address space.

Alex

> +       userq_props->use_doorbell = true;
> +
> +       queue->userq_prop = userq_props;
> +
> +       r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
> +       if (r) {
> +               DRM_ERROR("Failed to initialize MQD for userqueue\n");
> +               goto free_mqd;
> +       }
> +
> +       return 0;
> +
> +free_mqd:
> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> +
> +free_mqd_user:
> +       kfree(mqd_user);
> +
> +free_props:
> +       kfree(userq_props);
> +
> +       return r;
> +}
> +
> +static void
> +mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
> +                           struct amdgpu_usermode_queue *queue)
> +{
> +       kfree(queue->userq_prop);
> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> +}
> +
> +const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
> +       .mqd_create = mes_v11_0_userq_mqd_create,
> +       .mqd_destroy = mes_v11_0_userq_mqd_destroy,
> +};
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue
  2024-04-26 13:48 ` [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue Shashank Sharma
@ 2024-05-01 21:11   ` Alex Deucher
  2024-05-02  5:27     ` Sharma, Shashank
  2024-05-02 15:15   ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 21:11 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> The FW expects us to allocate at least one page as context
> space to process gang, process, GDS and FW  related work.
> This patch creates a joint object for the same, and calculates
> GPU space offsets of these spaces.
>
> V1: Addressed review comments on RFC patch:
>     Alex: Make this function IP specific
>
> V2: Addressed review comments from Christian
>     - Allocate only one object for total FW space, and calculate
>       offsets for each of these objects.
>
> V3: Integration with doorbell manager
>
> V4: Review comments:
>     - Remove shadow from FW space list from cover letter (Alex)
>     - Alignment of macro (Luben)
>
> V5: Merged patches 5 and 6 into this single patch
>     Addressed review comments:
>     - Use lower_32_bits instead of mask (Christian)
>     - gfx_v11_0 instead of gfx_v11 in function names (Alex)
>     - Shadow and GDS objects are now coming from userspace (Christian,
>       Alex)
>
> V6:
>     - Add a comment to replace amdgpu_bo_create_kernel() with
>       amdgpu_bo_create() during fw_ctx object creation (Christian).
>     - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
>       of generic queue structure and make it gen11 specific (Alex).
>
> V7:
>    - Using helper function to create/destroy userqueue objects.
>    - Removed FW object space allocation.
>
> V8:
>    - Updating FW object address from user values.
>
> V9:
>    - uppdated function name from gfx_v11_* to mes_v11_*
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 43 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>  2 files changed, 44 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index 9e7dee77d344..9f9fdcb9c294 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -27,6 +27,41 @@
>  #include "mes_v11_0.h"
>  #include "amdgpu_userqueue.h"
>
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> +
> +static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                           struct amdgpu_usermode_queue *queue,
> +                                           struct drm_amdgpu_userq_mqd *mqd_user)
> +{
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
> +       int r, size;
> +
> +       /*
> +        * The FW expects at least one page space allocated for
> +        * process ctx and gang ctx each. Create an object
> +        * for the same.
> +        */
> +       size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
> +       r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);

Is this per queue or per context?  I.e., is this shared with all
queues associated with an fd?

Alex

> +       if (r) {
> +               DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> +               return r;
> +       }
> +
> +       /* Shadow and GDS objects come directly from userspace */
> +       mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
> +       mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> +
> +       mqd->gds_bkup_base_lo = mqd_user->gds_va & 0xFFFFFFFC;
> +       mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
> +
> +       mqd->fw_work_area_base_lo = mqd_user->csa_va & 0xFFFFFFFC;
> +       mqd->fw_work_area_base_hi = upper_32_bits(mqd_user->csa_va);
> +       return 0;
> +}
> +
>  static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                                       struct drm_amdgpu_userq_in *args_in,
>                                       struct amdgpu_usermode_queue *queue)
> @@ -82,6 +117,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                 goto free_mqd;
>         }
>
> +       /* Create BO for FW operations */
> +       r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
> +       if (r) {
> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> +               goto free_mqd;
> +       }
> +
>         return 0;
>
>  free_mqd:
> @@ -100,6 +142,7 @@ static void
>  mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>                             struct amdgpu_usermode_queue *queue)
>  {
> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>         kfree(queue->userq_prop);
>         amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>  }
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index bbd29f68b8d4..643f31474bd8 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_vm        *vm;
>         struct amdgpu_userq_obj mqd;
> +       struct amdgpu_userq_obj fw_obj;
>  };
>
>  struct amdgpu_userq_funcs {
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask
  2024-04-26 13:48 ` [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask Shashank Sharma
@ 2024-05-01 21:27   ` Alex Deucher
  2024-05-02 15:19   ` Christian König
  1 sibling, 0 replies; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 21:27 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Christian König, Alex Deucher

[-- Attachment #1: Type: text/plain, Size: 4728 bytes --]

On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> Current MES GFX mask prevents FW to enable oversubscription. This patch
> does the following:
> - Fixes the mask values and adds a description for the same.
> - Removes the central mask setup and makes it IP specific, as it would
>   be different when the number of pipes and queues are different.
>
> V9: introduce this patch in the series
>
> Cc: Christian König <Christian.Koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
>  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++++++--
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++++++--
>  4 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> index a00cf4756ad0..b405fafc0b71 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> @@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
>                 adev->mes.compute_hqd_mask[i] = 0xc;
>         }
>
> -       for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
> -               adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffffffe;
> -
>         for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
>                 if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
>                     IP_VERSION(6, 0, 0))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index 4c8fc3117ef8..598556619337 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -110,7 +110,6 @@ struct amdgpu_mes {
>         uint32_t                        vmid_mask_gfxhub;
>         uint32_t                        vmid_mask_mmhub;
>         uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
> -       uint32_t                        gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
>         uint32_t                        sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
>         uint32_t                        aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
>         uint32_t                        sch_ctx_offs;
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> index 1e5ad1e08d2a..4d1121d1a1e7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> @@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes *mes)
>                 mes_set_hw_res_pkt.compute_hqd_mask[i] =
>                         mes->compute_hqd_mask[i];
>
> -       for (i = 0; i < MAX_GFX_PIPES; i++)
> -               mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
> +       /*
> +        * GFX pipe 0 queue 0 is being used by kernel
> +        * Set GFX pipe 0 queue 1 for MES scheduling
> +        * GFX pipe 1 can't be used for MES due to HW limitation.
> +        */
> +       mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
> +       mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
>
>         for (i = 0; i < MAX_SDMA_PIPES; i++)
>                 mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index 63f281a9984d..feb7fa2c304c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
>                 mes_set_hw_res_pkt.compute_hqd_mask[i] =
>                         mes->compute_hqd_mask[i];
>
> -       for (i = 0; i < MAX_GFX_PIPES; i++)
> -               mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
> +       /*
> +        * GFX pipe 0 queue 0 is being used by kernel
> +        * Set GFX pipe 0 queue 1 for MES scheduling
> +        * GFX pipe 1 can't be used for MES due to HW limitation.
> +        */
> +       mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
> +       mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;

FWIW, I think this should work on pipe1.  Might be worth playing with.
The attached patches should enable pipe1 for kernel queues similar to
gfx10.  Anyway, something for the future.
Patch is:
Acked-by: Alex Deucher <alexander.deucher@amd.com>

Alex

>
>         for (i = 0; i < MAX_SDMA_PIPES; i++)
>                 mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
> --
> 2.43.2
>

[-- Attachment #2: 0002-drm-amdgpu-gfx11-add-pipe1-hardware-support.patch --]
[-- Type: text/x-patch, Size: 1740 bytes --]

From 793806a205b48d1328635051eac49f9357885bce Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Thu, 11 Apr 2024 17:16:09 -0400
Subject: [PATCH 2/2] drm/amdgpu/gfx11: add pipe1 hardware support

Enable gfx pipe1 hardware support.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 81a35d0f0a58e..357829036662e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -51,6 +51,7 @@
 #include "mes_v11_0.h"
 
 #define GFX11_NUM_GFX_RINGS		1
+#define GFX11_NUM_GFX_RINGS_2		2
 #define GFX11_MEC_HPD_SIZE	2048
 
 #define RLCG_UCODE_LOADING_START_ADDRESS	0x00002000L
@@ -1342,7 +1343,7 @@ static int gfx_v11_0_sw_init(void *handle)
 	case IP_VERSION(11, 0, 2):
 	case IP_VERSION(11, 0, 3):
 		adev->gfx.me.num_me = 1;
-		adev->gfx.me.num_pipe_per_me = 1;
+		adev->gfx.me.num_pipe_per_me = 2;
 		adev->gfx.me.num_queue_per_pipe = 1;
 		adev->gfx.mec.num_mec = 2;
 		adev->gfx.mec.num_pipe_per_mec = 4;
@@ -4714,7 +4715,16 @@ static int gfx_v11_0_early_init(void *handle)
 
 	adev->gfx.funcs = &gfx_v11_0_gfx_funcs;
 
-	adev->gfx.num_gfx_rings = GFX11_NUM_GFX_RINGS;
+	switch (amdgpu_ip_version(adev, GC_HWIP, 0)) {
+	case IP_VERSION(11, 0, 0):
+	case IP_VERSION(11, 0, 2):
+	case IP_VERSION(11, 0, 3):
+		adev->gfx.num_gfx_rings = GFX11_NUM_GFX_RINGS_2;
+		break;
+	default:
+		adev->gfx.num_gfx_rings = GFX11_NUM_GFX_RINGS;
+		break;
+	}
 	adev->gfx.num_compute_rings = min(amdgpu_gfx_get_num_kcq(adev),
 					  AMDGPU_MAX_COMPUTE_RINGS);
 
-- 
2.44.0


[-- Attachment #3: 0001-drm-amdgpu-gfx11-select-HDP-ref-mask-according-to-gf.patch --]
[-- Type: text/x-patch, Size: 1007 bytes --]

From 19f0408347bf1b7b7d15280e1bf7230054fc61d5 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Thu, 11 Apr 2024 17:13:13 -0400
Subject: [PATCH 1/2] drm/amdgpu/gfx11: select HDP ref/mask according to gfx
 ring pipe

Use correct ref/mask for differnent gfx ring pipe. Ported from
ZhenGuo's patch for gfx10.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index ad6431013c738..81a35d0f0a58e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -5293,7 +5293,7 @@ static void gfx_v11_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
 		}
 		reg_mem_engine = 0;
 	} else {
-		ref_and_mask = nbio_hf_reg->ref_and_mask_cp0;
+		ref_and_mask = nbio_hf_reg->ref_and_mask_cp0 << ring->pipe;
 		reg_mem_engine = 1; /* pfp */
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 02/14] drm/amdgpu: add usermode queue base code
  2024-04-26 13:47 ` [PATCH v9 02/14] drm/amdgpu: add usermode queue base code Shashank Sharma
@ 2024-05-01 21:29   ` Alex Deucher
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 21:29 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch adds skeleton code for amdgpu usermode queue.
> It contains:
> - A new files with init functions of usermode queues.
> - A queue context manager in driver private data.
>
> V1: Worked on design review comments from RFC patch series:
> (https://patchwork.freedesktop.org/series/112214/)
> - Alex: Keep a list of queues, instead of single queue per process.
> - Christian: Use the queue manager instead of global ptrs,
>            Don't keep the queue structure in amdgpu_ctx
>
> V2:
>  - Reformatted code, split the big patch into two
>
> V3:
> - Integration with doorbell manager
>
> V4:
> - Align the structure member names to the largest member's column
>   (Luben)
> - Added SPDX license (Luben)
>
> V5:
> - Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
> - Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).
>
> V6: Rebase
> V9: Rebase
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 ++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    | 61 +++++++++++++++++++
>  6 files changed, 113 insertions(+)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 4536c8ad0e11..05a2d1714070 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -260,6 +260,8 @@ amdgpu-y += \
>  # add amdkfd interfaces
>  amdgpu-y += amdgpu_amdkfd.o
>
> +# add usermode queue
> +amdgpu-y += amdgpu_userqueue.o
>
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index b3b84647207e..4ca14b02668b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -112,6 +112,7 @@
>  #include "amdgpu_xcp.h"
>  #include "amdgpu_seq64.h"
>  #include "amdgpu_reg_state.h"
> +#include "amdgpu_userqueue.h"
>
>  #define MAX_GPU_INSTANCE               64
>
> @@ -477,6 +478,7 @@ struct amdgpu_fpriv {
>         struct mutex            bo_list_lock;
>         struct idr              bo_list_handles;
>         struct amdgpu_ctx_mgr   ctx_mgr;
> +       struct amdgpu_userq_mgr userq_mgr;
>         /** GPU partition selection */
>         uint32_t                xcp_id;
>  };
> @@ -1039,6 +1041,7 @@ struct amdgpu_device {
>         bool                            enable_mes_kiq;
>         struct amdgpu_mes               mes;
>         struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
> +       const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
>
>         /* df */
>         struct amdgpu_df                df;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index e4277298cf1a..374970984a61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -50,6 +50,7 @@
>  #include "amdgpu_reset.h"
>  #include "amdgpu_sched.h"
>  #include "amdgpu_xgmi.h"
> +#include "amdgpu_userqueue.h"
>  #include "../amdxcp/amdgpu_xcp_drv.h"
>
>  /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a2df3025a754..d78b06af834e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -44,6 +44,7 @@
>  #include "amdgpu_display.h"
>  #include "amdgpu_ras.h"
>  #include "amd_pcie.h"
> +#include "amdgpu_userqueue.h"
>
>  void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
>  {
> @@ -1388,6 +1389,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>
>         amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>
> +       r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
> +       if (r)
> +               DRM_WARN("Can't setup usermode queues, use legacy workload submission only\n");
> +
>         file_priv->driver_priv = fpriv;
>         goto out_suspend;
>
> @@ -1457,6 +1462,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>
>         amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>         amdgpu_vm_fini(adev, &fpriv->vm);
> +       amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>
>         if (pasid)
>                 amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> new file mode 100644
> index 000000000000..effc0c7c02cf
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2023 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu.h"
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> +{
> +       mutex_init(&userq_mgr->userq_mutex);
> +       idr_init_base(&userq_mgr->userq_idr, 1);
> +       userq_mgr->adev = adev;
> +
> +       return 0;
> +}
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
> +{
> +       idr_destroy(&userq_mgr->userq_idr);
> +       mutex_destroy(&userq_mgr->userq_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> new file mode 100644
> index 000000000000..93ebe4b61682
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright 2023 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef AMDGPU_USERQUEUE_H_
> +#define AMDGPU_USERQUEUE_H_
> +
> +#define AMDGPU_MAX_USERQ_COUNT 512
> +
> +struct amdgpu_mqd_prop;
> +
> +struct amdgpu_usermode_queue {
> +       int                     queue_type;
> +       uint64_t                doorbell_handle;
> +       uint64_t                doorbell_index;
> +       uint64_t                flags;
> +       struct amdgpu_mqd_prop  *userq_prop;
> +       struct amdgpu_userq_mgr *userq_mgr;
> +       struct amdgpu_vm        *vm;
> +};
> +
> +struct amdgpu_userq_funcs {
> +       int (*mqd_create)(struct amdgpu_userq_mgr *uq_mgr,
> +                         struct drm_amdgpu_userq_in *args,
> +                         struct amdgpu_usermode_queue *queue);
> +       void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
> +                           struct amdgpu_usermode_queue *uq);
> +};
> +
> +/* Usermode queues for gfx */
> +struct amdgpu_userq_mgr {
> +       struct idr                      userq_idr;
> +       struct mutex                    userq_mutex;
> +       struct amdgpu_device            *adev;
> +};
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> +
> +#endif
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object
  2024-04-26 13:48 ` [PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
@ 2024-05-01 21:30   ` Alex Deucher
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 21:30 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> This patch introduces amdgpu_userqueue_object and its helper
> functions to creates and destroy this object. The helper
> functions creates/destroys a base amdgpu_bo, kmap/unmap it and
> save the respective GPU and CPU addresses in the encapsulating
> userqueue object.
>
> These helpers will be used to create/destroy userqueue MQD, WPTR
> and FW areas.
>
> V7:
> - Forked out this new patch from V11-gfx-userqueue patch to prevent
>   that patch from growing very big.
> - Using amdgpu_bo_create instead of amdgpu_bo_create_kernel in prep
>   for eviction fences (Christian)
>
> V9:
>  - Rebase
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 62 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    | 13 ++++
>  2 files changed, 75 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index df97b856f891..65cab0ad97a1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -32,6 +32,68 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>         return idr_find(&uq_mgr->userq_idr, qid);
>  }
>
> +int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
> +                                  struct amdgpu_userq_obj *userq_obj,
> +                                  int size)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct amdgpu_bo_param bp;
> +       int r;
> +
> +       memset(&bp, 0, sizeof(bp));
> +       bp.byte_align = PAGE_SIZE;
> +       bp.domain = AMDGPU_GEM_DOMAIN_GTT;
> +       bp.flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
> +                  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> +       bp.type = ttm_bo_type_kernel;
> +       bp.size = size;
> +       bp.resv = NULL;
> +       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
> +
> +       r = amdgpu_bo_create(adev, &bp, &userq_obj->obj);
> +       if (r) {
> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> +               return r;
> +       }
> +
> +       r = amdgpu_bo_reserve(userq_obj->obj, true);
> +       if (r) {
> +               DRM_ERROR("Failed to reserve BO to map (%d)", r);
> +               goto free_obj;
> +       }
> +
> +       r = amdgpu_ttm_alloc_gart(&(userq_obj->obj)->tbo);
> +       if (r) {
> +               DRM_ERROR("Failed to alloc GART for userqueue object (%d)", r);
> +               goto unresv;
> +       }
> +
> +       r = amdgpu_bo_kmap(userq_obj->obj, &userq_obj->cpu_ptr);
> +       if (r) {
> +               DRM_ERROR("Failed to map BO for userqueue (%d)", r);
> +               goto unresv;
> +       }
> +
> +       userq_obj->gpu_addr = amdgpu_bo_gpu_offset(userq_obj->obj);
> +       amdgpu_bo_unreserve(userq_obj->obj);
> +       memset(userq_obj->cpu_ptr, 0, size);
> +       return 0;
> +
> +unresv:
> +       amdgpu_bo_unreserve(userq_obj->obj);
> +
> +free_obj:
> +       amdgpu_bo_unref(&userq_obj->obj);
> +       return r;
> +}
> +
> +void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
> +                                  struct amdgpu_userq_obj *userq_obj)
> +{
> +       amdgpu_bo_kunmap(userq_obj->obj);
> +       amdgpu_bo_unref(&userq_obj->obj);
> +}
> +
>  static int
>  amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>  {
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index b739274c72e1..bbd29f68b8d4 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -29,6 +29,12 @@
>
>  struct amdgpu_mqd_prop;
>
> +struct amdgpu_userq_obj {
> +       void             *cpu_ptr;
> +       uint64_t         gpu_addr;
> +       struct amdgpu_bo *obj;
> +};
> +
>  struct amdgpu_usermode_queue {
>         int                     queue_type;
>         uint64_t                doorbell_handle;
> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_mqd_prop  *userq_prop;
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_vm        *vm;
> +       struct amdgpu_userq_obj mqd;
>  };
>
>  struct amdgpu_userq_funcs {
> @@ -60,4 +67,10 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>
>  void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
>
> +int amdgpu_userqueue_create_object(struct amdgpu_userq_mgr *uq_mgr,
> +                                  struct amdgpu_userq_obj *userq_obj,
> +                                  int size);
> +
> +void amdgpu_userqueue_destroy_object(struct amdgpu_userq_mgr *uq_mgr,
> +                                    struct amdgpu_userq_obj *userq_obj);
>  #endif
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-04-26 13:48 ` [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2024-05-01 21:36   ` Alex Deucher
  2024-05-02  5:31     ` Sharma, Shashank
  2024-05-02 15:18   ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-01 21:36 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> To support oversubscription, MES FW expects WPTR BOs to
> be mapped into GART, before they are submitted to usermode
> queues. This patch adds a function for the same.
>
> V4: fix the wptr value before mapping lookup (Bas, Christian).
>
> V5: Addressed review comments from Christian:
>     - Either pin object or allocate from GART, but not both.
>     - All the handling must be done with the VM locks held.
>
> V7: Addressed review comments from Christian:
>     - Do not take vm->eviction_lock
>     - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>
> V8: Rebase
> V9: Changed the function names from gfx_v11* to mes_v11*
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>  2 files changed, 78 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index 8d2cd61af26b..37b80626e792 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -30,6 +30,74 @@
>  #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>  #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>
> +static int
> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
> +{
> +       int ret;
> +
> +       ret = amdgpu_bo_reserve(bo, true);
> +       if (ret) {
> +               DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> +               goto err_reserve_bo_failed;
> +       }
> +
> +       ret = amdgpu_ttm_alloc_gart(&bo->tbo);
> +       if (ret) {
> +               DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> +               goto err_map_bo_gart_failed;
> +       }
> +
> +       amdgpu_bo_unreserve(bo);
> +       bo = amdgpu_bo_ref(bo);
> +
> +       return 0;
> +
> +err_map_bo_gart_failed:
> +       amdgpu_bo_unreserve(bo);
> +err_reserve_bo_failed:
> +       return ret;
> +}
> +
> +static int
> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
> +                             struct amdgpu_usermode_queue *queue,
> +                             uint64_t wptr)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct amdgpu_bo_va_mapping *wptr_mapping;
> +       struct amdgpu_vm *wptr_vm;
> +       struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
> +       int ret;
> +
> +       wptr_vm = queue->vm;
> +       ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> +       if (ret)
> +               return ret;
> +
> +       wptr &= AMDGPU_GMC_HOLE_MASK;
> +       wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
> +       amdgpu_bo_unreserve(wptr_vm->root.bo);
> +       if (!wptr_mapping) {
> +               DRM_ERROR("Failed to lookup wptr bo\n");
> +               return -EINVAL;
> +       }
> +
> +       wptr_obj->obj = wptr_mapping->bo_va->base.bo;
> +       if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
> +               DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
> +               return -EINVAL;
> +       }
> +
> +       ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
> +       if (ret) {
> +               DRM_ERROR("Failed to map wptr bo to GART\n");
> +               return ret;
> +       }
> +
> +       queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);

The wptr virtual address from the user may not be at offset 0 from the
start of the object.  We should add the offset to the base vmid0 GPU
address.

Alex

> +       return 0;
> +}
> +
>  static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>                                struct amdgpu_usermode_queue *queue,
>                                struct amdgpu_mqd_prop *userq_props)
> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>         queue_input.queue_size = userq_props->queue_size >> 2;
>         queue_input.doorbell_offset = userq_props->doorbell_index;
>         queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +       queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>
>         amdgpu_mes_lock(&adev->mes);
>         r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                 goto free_mqd;
>         }
>
> +       /* FW expects WPTR BOs to be mapped into GART */
> +       r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
> +       if (r) {
> +               DRM_ERROR("Failed to create WPTR mapping\n");
> +               goto free_ctx;
> +       }
> +
>         /* Map userqueue into FW using MES */
>         r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>         if (r) {
> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>                             struct amdgpu_usermode_queue *queue)
>  {
>         mes_v11_0_userq_unmap(uq_mgr, queue);
> +       amdgpu_bo_unref(&queue->wptr_obj.obj);
>         amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>         kfree(queue->userq_prop);
>         amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 643f31474bd8..ffe8a3d73756 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_vm        *vm;
>         struct amdgpu_userq_obj mqd;
>         struct amdgpu_userq_obj fw_obj;
> +       struct amdgpu_userq_obj wptr_obj;
>  };
>
>  struct amdgpu_userq_funcs {
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management
  2024-05-01 20:39   ` Alex Deucher
@ 2024-05-02  5:23     ` Sharma, Shashank
  2024-05-02 12:53       ` Sharma, Shashank
  0 siblings, 1 reply; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02  5:23 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

Hey Alex,

On 01/05/2024 22:39, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> This patch intorduces new UAPI/IOCTL for usermode graphics
>> queue. The userspace app will fill this structure and request
>> the graphics driver to add a graphics work queue for it. The
>> output of this UAPI is a queue id.
>>
>> This UAPI maps the queue into GPU, so the graphics app can start
>> submitting work to the queue as soon as the call returns.
>>
>> V2: Addressed review comments from Alex and Christian
>>      - Make the doorbell offset's comment clearer
>>      - Change the output parameter name to queue_id
>>
>> V3: Integration with doorbell manager
>>
>> V4:
>>      - Updated the UAPI doc (Pierre-Eric)
>>      - Created a Union for engine specific MQDs (Alex)
>>      - Added Christian's R-B
>> V5:
>>      - Add variables for GDS and CSA in MQD structure (Alex)
>>      - Make MQD data a ptr-size pair instead of union (Alex)
>>
>> V9:
>>     - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
>>       drm_amdgpu_userq_mqd as its being used for SDMA and
>>       compute queues as well
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   include/uapi/drm/amdgpu_drm.h | 110 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 110 insertions(+)
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index 96e32dafd4f0..22f56a30f7cb 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -54,6 +54,7 @@ extern "C" {
>>   #define DRM_AMDGPU_VM                  0x13
>>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>   #define DRM_AMDGPU_SCHED               0x15
>> +#define DRM_AMDGPU_USERQ               0x16
>>
>>   #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>> @@ -71,6 +72,7 @@ extern "C" {
>>   #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>   #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>
>>   /**
>>    * DOC: memory domains
>> @@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
>>          union drm_amdgpu_ctx_out out;
>>   };
>>
>> +/* user queue IOCTL */
>> +#define AMDGPU_USERQ_OP_CREATE 1
>> +#define AMDGPU_USERQ_OP_FREE   2
>> +
>> +/* Flag to indicate secure buffer related workload, unused for now */
>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>> +/* Flag to indicate AQL workload, unused for now */
>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>> +
>> +/*
>> + * MQD (memory queue descriptor) is a set of parameters which allow
>> + * the GPU to uniquely define and identify a usermode queue. This
>> + * structure defines the MQD for GFX-V11 IP ver 0.
>> + */
>> +struct drm_amdgpu_userq_mqd {
> Maybe rename this to drm_amdgpu_gfx_userq_mqd since it's gfx specific.
> Then we can add different MQDs for SDMA, compute, etc. as they have
> different metadata.  E.g., the shadow and CSA are gfx only.


Actually this was named drm_amdgpu_userq_mqd_gfx_v11_0 until the last 
patchset, but then I realized that apart from the objects (gds/shadow 
va) nothing is gfx specific, its actually required for every userqueue 
IP which is MES based, so I thought it would be an overkill to create 
multiple structures for almost the same data. If you feel strong about 
this, I can change it again.

- Shashank

> Alex
>
>
>> +       /**
>> +        * @queue_va: Virtual address of the GPU memory which holds the queue
>> +        * object. The queue holds the workload packets.
>> +        */
>> +       __u64   queue_va;
>> +       /**
>> +        * @queue_size: Size of the queue in bytes, this needs to be 256-byte
>> +        * aligned.
>> +        */
>> +       __u64   queue_size;
>> +       /**
>> +        * @rptr_va : Virtual address of the GPU memory which holds the ring RPTR.
>> +        * This object must be at least 8 byte in size and aligned to 8-byte offset.
>> +        */
>> +       __u64   rptr_va;
>> +       /**
>> +        * @wptr_va : Virtual address of the GPU memory which holds the ring WPTR.
>> +        * This object must be at least 8 byte in size and aligned to 8-byte offset.
>> +        *
>> +        * Queue, RPTR and WPTR can come from the same object, as long as the size
>> +        * and alignment related requirements are met.
>> +        */
>> +       __u64   wptr_va;
>> +       /**
>> +        * @shadow_va: Virtual address of the GPU memory to hold the shadow buffer.
>> +        * This must be a from a separate GPU object, and must be at least 4-page
>> +        * sized.
>> +        */
>> +       __u64   shadow_va;
>> +       /**
>> +        * @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
>> +        * This must be a from a separate GPU object, and must be at least 1-page
>> +        * sized.
>> +        */
>> +       __u64   gds_va;
>> +       /**
>> +        * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
>> +        * This must be a from a separate GPU object, and must be at least 1-page
>> +        * sized.
>> +        */
>> +       __u64   csa_va;
>> +};
>> +
>> +struct drm_amdgpu_userq_in {
>> +       /** AMDGPU_USERQ_OP_* */
>> +       __u32   op;
>> +       /** Queue handle for USERQ_OP_FREE */
>> +       __u32   queue_id;
>> +       /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
>> +       __u32   ip_type;
>> +       /**
>> +        * @flags: flags to indicate special function for queue like secure
>> +        * buffer (TMZ). Unused for now.
>> +        */
>> +       __u32   flags;
>> +       /**
>> +        * @doorbell_handle: the handle of doorbell GEM object
>> +        * associated to this client.
>> +        */
>> +       __u32   doorbell_handle;
>> +       /**
>> +        * @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
>> +        * Kernel will generate absolute doorbell offset using doorbell_handle
>> +        * and doorbell_offset in the doorbell bo.
>> +        */
>> +       __u32   doorbell_offset;
>> +       /**
>> +        * @mqd: Queue descriptor for USERQ_OP_CREATE
>> +        * MQD data can be of different size for different GPU IP/engine and
>> +        * their respective versions/revisions, so this points to a __u64 *
>> +        * which holds MQD of this usermode queue.
>> +        */
>> +       __u64 mqd;
>> +       /**
>> +        * @size: size of MQD data in bytes, it must match the MQD structure
>> +        * size of the respective engine/revision defined in UAPI for ex, for
>> +        * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
>> +        */
>> +       __u64 mqd_size;
>> +};
>> +
>> +struct drm_amdgpu_userq_out {
>> +       /** Queue handle */
>> +       __u32   queue_id;
>> +       /** Flags */
>> +       __u32   flags;
>> +};
>> +
>> +union drm_amdgpu_userq {
>> +       struct drm_amdgpu_userq_in in;
>> +       struct drm_amdgpu_userq_out out;
>> +};
>> +
>>   /* vm ioctl */
>>   #define AMDGPU_VM_OP_RESERVE_VMID      1
>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX
  2024-05-01 20:50   ` Alex Deucher
@ 2024-05-02  5:26     ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02  5:26 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig


On 01/05/2024 22:50, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 9:48 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> A Memory queue descriptor (MQD) of a userqueue defines it in
>> the hw's context. As MQD format can vary between different
>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>
>> This patch:
>> - Adds a new file which will be used for MES based userqueue
>>    functions targeting GFX and SDMA IP.
>> - Introduces MQD handler functions for the usermode queues.
>> - Adds new functions to create and destroy userqueue MQD for
>>    MES-V11 for GFX IP.
>>
>> V1: Worked on review comments from Alex:
>>      - Make MQD functions GEN and IP specific
>>
>> V2: Worked on review comments from Alex:
>>      - Reuse the existing adev->mqd[ip] for MQD creation
>>      - Formatting and arrangement of code
>>
>> V3:
>>      - Integration with doorbell manager
>>
>> V4: Review comments addressed:
>>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>>      - Align name of structure members (Luben)
>>      - Don't break up the Cc tag list and the Sob tag list in commit
>>        message (Luben)
>> V5:
>>     - No need to reserve the bo for MQD (Christian).
>>     - Some more changes to support IP specific MQD creation.
>>
>> V6:
>>     - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
>>       calls while creating MQD object to amdgpu_bo_create() once eviction
>>       fences are ready (Christian).
>>
>> V7:
>>     - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
>>     - Use memdup_user instead of copy_from_user (Christian)
>>
>> V9:
>>     - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
>>       that it can be reused for SDMA userqueues as well (Shashank, Alex)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +-
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   4 +
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++++++++++++++++++
>>   3 files changed, 116 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 05a2d1714070..a640bfa468ad 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -184,7 +184,8 @@ amdgpu-y += \
>>   amdgpu-y += \
>>          amdgpu_mes.o \
>>          mes_v10_1.o \
>> -       mes_v11_0.o
>> +       mes_v11_0.o \
>> +       mes_v11_0_userqueue.o
>>
>>   # add UVD block
>>   amdgpu-y += \
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index f7325b02a191..525bd0f4d3f7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -1331,6 +1331,8 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct amdgpu_device *adev)
>>          return 0;
>>   }
>>
>> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
>> +
>>   static int gfx_v11_0_sw_init(void *handle)
>>   {
>>          int i, j, k, r, ring_id = 0;
>> @@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>                  adev->gfx.mec.num_mec = 2;
>>                  adev->gfx.mec.num_pipe_per_mec = 4;
>>                  adev->gfx.mec.num_queue_per_pipe = 4;
>> +               adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>                  break;
>>          case IP_VERSION(11, 0, 1):
>>          case IP_VERSION(11, 0, 4):
>> @@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>                  adev->gfx.mec.num_mec = 1;
>>                  adev->gfx.mec.num_pipe_per_mec = 4;
>>                  adev->gfx.mec.num_queue_per_pipe = 4;
>> +               adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
> Does this work on APUs yet?  If not, we should limit it to just dGPUs
> for now.

I think if we get an APU which is GFX_v11 based it must work on it.

>   Also, we should add minimum firmware version checks for user
> queue support.

Noted.

- Shashank

>
>>                  break;
>>          default:
>>                  adev->gfx.me.num_me = 1;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> new file mode 100644
>> index 000000000000..9e7dee77d344
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -0,0 +1,110 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright 2024 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +#include "amdgpu.h"
>> +#include "amdgpu_gfx.h"
>> +#include "v11_structs.h"
>> +#include "mes_v11_0.h"
>> +#include "amdgpu_userqueue.h"
>> +
>> +static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>> +                                     struct drm_amdgpu_userq_in *args_in,
>> +                                     struct amdgpu_usermode_queue *queue)
>> +{
>> +       struct amdgpu_device *adev = uq_mgr->adev;
>> +       struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
>> +       struct drm_amdgpu_userq_mqd *mqd_user;
>> +       struct amdgpu_mqd_prop *userq_props;
>> +       int r;
>> +
>> +       /* Incoming MQD parameters from userspace to be saved here */
>> +       memset(&mqd_user, 0, sizeof(mqd_user));
>> +
>> +       /* Structure to initialize MQD for userqueue using generic MQD init function */
>> +       userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
>> +       if (!userq_props) {
>> +               DRM_ERROR("Failed to allocate memory for userq_props\n");
>> +               return -ENOMEM;
>> +       }
>> +
>> +       if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd)) {
>> +               DRM_ERROR("MQD size mismatch\n");
>> +               r = -EINVAL;
>> +               goto free_props;
>> +       }
>> +
>> +       mqd_user = memdup_user(u64_to_user_ptr(args_in->mqd), args_in->mqd_size);
>> +       if (IS_ERR(mqd_user)) {
>> +               DRM_ERROR("Failed to read user MQD\n");
>> +               r = -EFAULT;
>> +               goto free_props;
>> +       }
>> +
>> +       r = amdgpu_userqueue_create_object(uq_mgr, &queue->mqd, mqd_hw_default->mqd_size);
>> +       if (r) {
>> +               DRM_ERROR("Failed to create MQD object for userqueue\n");
>> +               goto free_mqd_user;
>> +       }
>> +
>> +       /* Initialize the MQD BO with user given values */
>> +       userq_props->wptr_gpu_addr = mqd_user->wptr_va;
>> +       userq_props->rptr_gpu_addr = mqd_user->rptr_va;
>> +       userq_props->queue_size = mqd_user->queue_size;
>> +       userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
>> +       userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
> We should validate the user virtual addresses and make sure they are
> non-0 and not part of the reserved areas of the address space.
>
> Alex
>
>> +       userq_props->use_doorbell = true;
>> +
>> +       queue->userq_prop = userq_props;
>> +
>> +       r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
>> +       if (r) {
>> +               DRM_ERROR("Failed to initialize MQD for userqueue\n");
>> +               goto free_mqd;
>> +       }
>> +
>> +       return 0;
>> +
>> +free_mqd:
>> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> +
>> +free_mqd_user:
>> +       kfree(mqd_user);
>> +
>> +free_props:
>> +       kfree(userq_props);
>> +
>> +       return r;
>> +}
>> +
>> +static void
>> +mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>> +                           struct amdgpu_usermode_queue *queue)
>> +{
>> +       kfree(queue->userq_prop);
>> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> +}
>> +
>> +const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
>> +       .mqd_create = mes_v11_0_userq_mqd_create,
>> +       .mqd_destroy = mes_v11_0_userq_mqd_destroy,
>> +};
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue
  2024-05-01 21:11   ` Alex Deucher
@ 2024-05-02  5:27     ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02  5:27 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig


On 01/05/2024 23:11, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> The FW expects us to allocate at least one page as context
>> space to process gang, process, GDS and FW  related work.
>> This patch creates a joint object for the same, and calculates
>> GPU space offsets of these spaces.
>>
>> V1: Addressed review comments on RFC patch:
>>      Alex: Make this function IP specific
>>
>> V2: Addressed review comments from Christian
>>      - Allocate only one object for total FW space, and calculate
>>        offsets for each of these objects.
>>
>> V3: Integration with doorbell manager
>>
>> V4: Review comments:
>>      - Remove shadow from FW space list from cover letter (Alex)
>>      - Alignment of macro (Luben)
>>
>> V5: Merged patches 5 and 6 into this single patch
>>      Addressed review comments:
>>      - Use lower_32_bits instead of mask (Christian)
>>      - gfx_v11_0 instead of gfx_v11 in function names (Alex)
>>      - Shadow and GDS objects are now coming from userspace (Christian,
>>        Alex)
>>
>> V6:
>>      - Add a comment to replace amdgpu_bo_create_kernel() with
>>        amdgpu_bo_create() during fw_ctx object creation (Christian).
>>      - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
>>        of generic queue structure and make it gen11 specific (Alex).
>>
>> V7:
>>     - Using helper function to create/destroy userqueue objects.
>>     - Removed FW object space allocation.
>>
>> V8:
>>     - Updating FW object address from user values.
>>
>> V9:
>>     - uppdated function name from gfx_v11_* to mes_v11_*
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 43 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   2 files changed, 44 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index 9e7dee77d344..9f9fdcb9c294 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -27,6 +27,41 @@
>>   #include "mes_v11_0.h"
>>   #include "amdgpu_userqueue.h"
>>
>> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>> +
>> +static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                           struct amdgpu_usermode_queue *queue,
>> +                                           struct drm_amdgpu_userq_mqd *mqd_user)
>> +{
>> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
>> +       struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
>> +       int r, size;
>> +
>> +       /*
>> +        * The FW expects at least one page space allocated for
>> +        * process ctx and gang ctx each. Create an object
>> +        * for the same.
>> +        */
>> +       size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
>> +       r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
> Is this per queue or per context?  I.e., is this shared with all
> queues associated with an fd?

This is per queue object, required for MES mapping of a queue.

- Shashank

> Alex
>
>> +       if (r) {
>> +               DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
>> +               return r;
>> +       }
>> +
>> +       /* Shadow and GDS objects come directly from userspace */
>> +       mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
>> +       mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
>> +
>> +       mqd->gds_bkup_base_lo = mqd_user->gds_va & 0xFFFFFFFC;
>> +       mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
>> +
>> +       mqd->fw_work_area_base_lo = mqd_user->csa_va & 0xFFFFFFFC;
>> +       mqd->fw_work_area_base_hi = upper_32_bits(mqd_user->csa_va);
>> +       return 0;
>> +}
>> +
>>   static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>                                        struct drm_amdgpu_userq_in *args_in,
>>                                        struct amdgpu_usermode_queue *queue)
>> @@ -82,6 +117,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>                  goto free_mqd;
>>          }
>>
>> +       /* Create BO for FW operations */
>> +       r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
>> +       if (r) {
>> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>> +               goto free_mqd;
>> +       }
>> +
>>          return 0;
>>
>>   free_mqd:
>> @@ -100,6 +142,7 @@ static void
>>   mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>>                              struct amdgpu_usermode_queue *queue)
>>   {
>> +       amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>>          kfree(queue->userq_prop);
>>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>>   }
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index bbd29f68b8d4..643f31474bd8 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>>          struct amdgpu_userq_mgr *userq_mgr;
>>          struct amdgpu_vm        *vm;
>>          struct amdgpu_userq_obj mqd;
>> +       struct amdgpu_userq_obj fw_obj;
>>   };
>>
>>   struct amdgpu_userq_funcs {
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-05-01 21:36   ` Alex Deucher
@ 2024-05-02  5:31     ` Sharma, Shashank
  2024-05-02 13:06       ` Kasiviswanathan, Harish
  2024-05-02 13:46       ` Alex Deucher
  0 siblings, 2 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02  5:31 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig


On 01/05/2024 23:36, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> To support oversubscription, MES FW expects WPTR BOs to
>> be mapped into GART, before they are submitted to usermode
>> queues. This patch adds a function for the same.
>>
>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>>
>> V5: Addressed review comments from Christian:
>>      - Either pin object or allocate from GART, but not both.
>>      - All the handling must be done with the VM locks held.
>>
>> V7: Addressed review comments from Christian:
>>      - Do not take vm->eviction_lock
>>      - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>>
>> V8: Rebase
>> V9: Changed the function names from gfx_v11* to mes_v11*
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   2 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index 8d2cd61af26b..37b80626e792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -30,6 +30,74 @@
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>
>> +static int
>> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
>> +{
>> +       int ret;
>> +
>> +       ret = amdgpu_bo_reserve(bo, true);
>> +       if (ret) {
>> +               DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>> +               goto err_reserve_bo_failed;
>> +       }
>> +
>> +       ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>> +       if (ret) {
>> +               DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>> +               goto err_map_bo_gart_failed;
>> +       }
>> +
>> +       amdgpu_bo_unreserve(bo);
>> +       bo = amdgpu_bo_ref(bo);
>> +
>> +       return 0;
>> +
>> +err_map_bo_gart_failed:
>> +       amdgpu_bo_unreserve(bo);
>> +err_reserve_bo_failed:
>> +       return ret;
>> +}
>> +
>> +static int
>> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
>> +                             struct amdgpu_usermode_queue *queue,
>> +                             uint64_t wptr)
>> +{
>> +       struct amdgpu_device *adev = uq_mgr->adev;
>> +       struct amdgpu_bo_va_mapping *wptr_mapping;
>> +       struct amdgpu_vm *wptr_vm;
>> +       struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
>> +       int ret;
>> +
>> +       wptr_vm = queue->vm;
>> +       ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>> +       if (ret)
>> +               return ret;
>> +
>> +       wptr &= AMDGPU_GMC_HOLE_MASK;
>> +       wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
>> +       amdgpu_bo_unreserve(wptr_vm->root.bo);
>> +       if (!wptr_mapping) {
>> +               DRM_ERROR("Failed to lookup wptr bo\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       wptr_obj->obj = wptr_mapping->bo_va->base.bo;
>> +       if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
>> +               DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
>> +       if (ret) {
>> +               DRM_ERROR("Failed to map wptr bo to GART\n");
>> +               return ret;
>> +       }
>> +
>> +       queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> The wptr virtual address from the user may not be at offset 0 from the
> start of the object.  We should add the offset to the base vmid0 GPU
> address.

can you please elaborate a bit here ? wptr_obj->obj is already mapped to 
gart, do we still need this ?

- Shashank

>
> Alex
>
>> +       return 0;
>> +}
>> +
>>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>                                 struct amdgpu_usermode_queue *queue,
>>                                 struct amdgpu_mqd_prop *userq_props)
>> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>          queue_input.queue_size = userq_props->queue_size >> 2;
>>          queue_input.doorbell_offset = userq_props->doorbell_index;
>>          queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +       queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>>
>>          amdgpu_mes_lock(&adev->mes);
>>          r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>                  goto free_mqd;
>>          }
>>
>> +       /* FW expects WPTR BOs to be mapped into GART */
>> +       r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
>> +       if (r) {
>> +               DRM_ERROR("Failed to create WPTR mapping\n");
>> +               goto free_ctx;
>> +       }
>> +
>>          /* Map userqueue into FW using MES */
>>          r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>>          if (r) {
>> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>>                              struct amdgpu_usermode_queue *queue)
>>   {
>>          mes_v11_0_userq_unmap(uq_mgr, queue);
>> +       amdgpu_bo_unref(&queue->wptr_obj.obj);
>>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>>          kfree(queue->userq_prop);
>>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 643f31474bd8..ffe8a3d73756 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>>          struct amdgpu_vm        *vm;
>>          struct amdgpu_userq_obj mqd;
>>          struct amdgpu_userq_obj fw_obj;
>> +       struct amdgpu_userq_obj wptr_obj;
>>   };
>>
>>   struct amdgpu_userq_funcs {
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues
  2024-05-01 20:41   ` Alex Deucher
@ 2024-05-02  5:47     ` Sharma, Shashank
  2024-05-02 13:55       ` Alex Deucher
  0 siblings, 1 reply; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02  5:47 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Arvind Yadav, Christian König, Alex Deucher,
	Srinivasan Shanmugam


On 01/05/2024 22:41, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> This patch does necessary modifications to enable the SDMA
>> usermode queues using the existing userqueue infrastructure.
>>
>> V9: introduced this patch in the series
>>
>> Cc: Christian König <Christian.Koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    | 2 +-
>>   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 ++++
>>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c           | 3 +++
>>   3 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 781283753804..e516487e8db9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>>          int qid, r = 0;
>>
>>          /* Usermode queues are only supported for GFX/SDMA engines as of now */
>> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
>> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
>>                  DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
>>                  return -EINVAL;
>>          }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index a6c3037d2d1f..a5e270eda37b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>                  return r;
>>          }
>>
>> +       /* We don't need to set other FW objects for SDMA queues */
>> +       if (queue->queue_type == AMDGPU_HW_IP_DMA)
>> +               return 0;
>> +
>>          /* Shadow and GDS objects come directly from userspace */
>>          mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
>>          mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> index 361835a61f2e..90354a70c807 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> @@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
>>          return 0;
>>   }
>>
>> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
> Can you include the header rather than adding an extern?
Noted,
>
>> +
>>   static int sdma_v6_0_sw_init(void *handle)
>>   {
>>          struct amdgpu_ring *ring;
>> @@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
>>                  return -EINVAL;
>>          }
>>
>> +       adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
>>          return r;
>>   }
> I think we need a new mqd descriptor in amdgpu_drm.h as well since the
> sdma metadata is different from gfx and compute.

Can you please elaborate on this ? AFAIK SDMA queue doesn't need any 
specific metadata objects (like GFX).

- Shashank

> Alex
>
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue
  2024-05-01 20:44   ` Alex Deucher
@ 2024-05-02  5:50     ` Sharma, Shashank
  2024-05-02 14:10       ` Alex Deucher
  0 siblings, 1 reply; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02  5:50 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig


On 01/05/2024 22:44, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Arvind Yadav <arvind.yadav@amd.com>
>>
>> This patch does the necessary changes required to
>> enable compute workload support using the existing
>> usermode queues infrastructure.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    |  3 ++-
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c           |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +++++++++-
>>   include/uapi/drm/amdgpu_drm.h                    |  1 +
>>   4 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index e516487e8db9..78d34fa7a0b9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>>          int qid, r = 0;
>>
>>          /* Usermode queues are only supported for GFX/SDMA engines as of now */
>> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
>> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA
>> +                       && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
>>                  DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
>>                  return -EINVAL;
>>          }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index 525bd0f4d3f7..27b86f7fe949 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>                  adev->gfx.mec.num_pipe_per_mec = 4;
>>                  adev->gfx.mec.num_queue_per_pipe = 4;
>>                  adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
>>                  break;
>>          case IP_VERSION(11, 0, 1):
>>          case IP_VERSION(11, 0, 4):
>> @@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>                  adev->gfx.mec.num_pipe_per_mec = 4;
>>                  adev->gfx.mec.num_queue_per_pipe = 4;
>>                  adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
>>                  break;
>>          default:
>>                  adev->gfx.me.num_me = 1;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index a5e270eda37b..d61d80f86003 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>          }
>>
>>          /* We don't need to set other FW objects for SDMA queues */
>> -       if (queue->queue_type == AMDGPU_HW_IP_DMA)
>> +       if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
>> +           (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
>>                  return 0;
>>
>>          /* Shadow and GDS objects come directly from userspace */
>> @@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>          userq_props->use_doorbell = true;
>>          userq_props->doorbell_index = queue->doorbell_index;
>>
>> +       if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
>> +               userq_props->eop_gpu_addr = mqd_user->eop_va;
>> +               userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
>> +               userq_props->hqd_queue_priority = AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
>> +               userq_props->hqd_active = false;
>> +       }
>> +
>>          queue->userq_prop = userq_props;
>>
>>          r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index 22f56a30f7cb..676792ad3618 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
>>           * sized.
>>           */
>>          __u64   csa_va;
>> +       __u64   eop_va;
>>   };
> Let's add a new mqd descriptor for compute since it's different from
> gfx and sdma.
the only different thing is this object (vs csa and gds objects), apart 
from that, the mqd is the same as they all are MES based. Am I missing 
something here ?
>
> Also, can we handle the eop buffer as part of the 
> kernel metadata for compute user queues rather than having the user
> specify it?

Sure, we can do it.

- Shashank

>
> Alex
>
>>   struct drm_amdgpu_userq_in {
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management
  2024-05-02  5:23     ` Sharma, Shashank
@ 2024-05-02 12:53       ` Sharma, Shashank
  2024-05-02 13:52         ` Alex Deucher
  0 siblings, 1 reply; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 12:53 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig


On 02/05/2024 07:23, Sharma, Shashank wrote:
> Hey Alex,
>
> On 01/05/2024 22:39, Alex Deucher wrote:
>> On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
>> <shashank.sharma@amd.com> wrote:
>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>
>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>> queue. The userspace app will fill this structure and request
>>> the graphics driver to add a graphics work queue for it. The
>>> output of this UAPI is a queue id.
>>>
>>> This UAPI maps the queue into GPU, so the graphics app can start
>>> submitting work to the queue as soon as the call returns.
>>>
>>> V2: Addressed review comments from Alex and Christian
>>>      - Make the doorbell offset's comment clearer
>>>      - Change the output parameter name to queue_id
>>>
>>> V3: Integration with doorbell manager
>>>
>>> V4:
>>>      - Updated the UAPI doc (Pierre-Eric)
>>>      - Created a Union for engine specific MQDs (Alex)
>>>      - Added Christian's R-B
>>> V5:
>>>      - Add variables for GDS and CSA in MQD structure (Alex)
>>>      - Make MQD data a ptr-size pair instead of union (Alex)
>>>
>>> V9:
>>>     - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
>>>       drm_amdgpu_userq_mqd as its being used for SDMA and
>>>       compute queues as well
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> ---
>>>   include/uapi/drm/amdgpu_drm.h | 110 
>>> ++++++++++++++++++++++++++++++++++
>>>   1 file changed, 110 insertions(+)
>>>
>>> diff --git a/include/uapi/drm/amdgpu_drm.h 
>>> b/include/uapi/drm/amdgpu_drm.h
>>> index 96e32dafd4f0..22f56a30f7cb 100644
>>> --- a/include/uapi/drm/amdgpu_drm.h
>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>> @@ -54,6 +54,7 @@ extern "C" {
>>>   #define DRM_AMDGPU_VM                  0x13
>>>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>   #define DRM_AMDGPU_SCHED               0x15
>>> +#define DRM_AMDGPU_USERQ               0x16
>>>
>>>   #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>> @@ -71,6 +72,7 @@ extern "C" {
>>>   #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE 
>>> + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>>   #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>
>>>   /**
>>>    * DOC: memory domains
>>> @@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
>>>          union drm_amdgpu_ctx_out out;
>>>   };
>>>
>>> +/* user queue IOCTL */
>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>> +#define AMDGPU_USERQ_OP_FREE   2
>>> +
>>> +/* Flag to indicate secure buffer related workload, unused for now */
>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>> +/* Flag to indicate AQL workload, unused for now */
>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>> +
>>> +/*
>>> + * MQD (memory queue descriptor) is a set of parameters which allow
>>> + * the GPU to uniquely define and identify a usermode queue. This
>>> + * structure defines the MQD for GFX-V11 IP ver 0.
>>> + */
>>> +struct drm_amdgpu_userq_mqd {
>> Maybe rename this to drm_amdgpu_gfx_userq_mqd since it's gfx specific.
>> Then we can add different MQDs for SDMA, compute, etc. as they have
>> different metadata.  E.g., the shadow and CSA are gfx only.
>
>
> Actually this was named drm_amdgpu_userq_mqd_gfx_v11_0 until the last 
> patchset, but then I realized that apart from the objects (gds/shadow 
> va) nothing is gfx specific, its actually required for every userqueue 
> IP which is MES based, so I thought it would be an overkill to create 
> multiple structures for almost the same data. If you feel strong about 
> this, I can change it again.
>
> - Shashank


Please ignore my last comment, I understand what you are mentioning, and 
I have reformatted the patches accordingly. Now, I am keeping everything 
reqd for MES in one basic struture (drm_amdgpu_userq_in) and creating  
drm_amdgpu_userq_mqd_gfx_v11 for GFX specific things (like CSA, Shadow 
and GDS areas). Now there will be one separate patch which will enabled 
GFX_IP on MES code, just like how we have separate patches for SDMA and 
Compute IP in this series.  I will send the V10 patches with this 
reformatting in some time.

- Shashank

>
>> Alex
>>
>>
>>> +       /**
>>> +        * @queue_va: Virtual address of the GPU memory which holds 
>>> the queue
>>> +        * object. The queue holds the workload packets.
>>> +        */
>>> +       __u64   queue_va;
>>> +       /**
>>> +        * @queue_size: Size of the queue in bytes, this needs to be 
>>> 256-byte
>>> +        * aligned.
>>> +        */
>>> +       __u64   queue_size;
>>> +       /**
>>> +        * @rptr_va : Virtual address of the GPU memory which holds 
>>> the ring RPTR.
>>> +        * This object must be at least 8 byte in size and aligned 
>>> to 8-byte offset.
>>> +        */
>>> +       __u64   rptr_va;
>>> +       /**
>>> +        * @wptr_va : Virtual address of the GPU memory which holds 
>>> the ring WPTR.
>>> +        * This object must be at least 8 byte in size and aligned 
>>> to 8-byte offset.
>>> +        *
>>> +        * Queue, RPTR and WPTR can come from the same object, as 
>>> long as the size
>>> +        * and alignment related requirements are met.
>>> +        */
>>> +       __u64   wptr_va;
>>> +       /**
>>> +        * @shadow_va: Virtual address of the GPU memory to hold the 
>>> shadow buffer.
>>> +        * This must be a from a separate GPU object, and must be at 
>>> least 4-page
>>> +        * sized.
>>> +        */
>>> +       __u64   shadow_va;
>>> +       /**
>>> +        * @gds_va: Virtual address of the GPU memory to hold the 
>>> GDS buffer.
>>> +        * This must be a from a separate GPU object, and must be at 
>>> least 1-page
>>> +        * sized.
>>> +        */
>>> +       __u64   gds_va;
>>> +       /**
>>> +        * @csa_va: Virtual address of the GPU memory to hold the 
>>> CSA buffer.
>>> +        * This must be a from a separate GPU object, and must be at 
>>> least 1-page
>>> +        * sized.
>>> +        */
>>> +       __u64   csa_va;
>>> +};
>>> +
>>> +struct drm_amdgpu_userq_in {
>>> +       /** AMDGPU_USERQ_OP_* */
>>> +       __u32   op;
>>> +       /** Queue handle for USERQ_OP_FREE */
>>> +       __u32   queue_id;
>>> +       /** the target GPU engine to execute workload 
>>> (AMDGPU_HW_IP_*) */
>>> +       __u32   ip_type;
>>> +       /**
>>> +        * @flags: flags to indicate special function for queue like 
>>> secure
>>> +        * buffer (TMZ). Unused for now.
>>> +        */
>>> +       __u32   flags;
>>> +       /**
>>> +        * @doorbell_handle: the handle of doorbell GEM object
>>> +        * associated to this client.
>>> +        */
>>> +       __u32   doorbell_handle;
>>> +       /**
>>> +        * @doorbell_offset: 32-bit offset of the doorbell in the 
>>> doorbell bo.
>>> +        * Kernel will generate absolute doorbell offset using 
>>> doorbell_handle
>>> +        * and doorbell_offset in the doorbell bo.
>>> +        */
>>> +       __u32   doorbell_offset;
>>> +       /**
>>> +        * @mqd: Queue descriptor for USERQ_OP_CREATE
>>> +        * MQD data can be of different size for different GPU 
>>> IP/engine and
>>> +        * their respective versions/revisions, so this points to a 
>>> __u64 *
>>> +        * which holds MQD of this usermode queue.
>>> +        */
>>> +       __u64 mqd;
>>> +       /**
>>> +        * @size: size of MQD data in bytes, it must match the MQD 
>>> structure
>>> +        * size of the respective engine/revision defined in UAPI 
>>> for ex, for
>>> +        * gfx_v11 workloads, size = 
>>> sizeof(drm_amdgpu_userq_mqd_gfx_v11).
>>> +        */
>>> +       __u64 mqd_size;
>>> +};
>>> +
>>> +struct drm_amdgpu_userq_out {
>>> +       /** Queue handle */
>>> +       __u32   queue_id;
>>> +       /** Flags */
>>> +       __u32   flags;
>>> +};
>>> +
>>> +union drm_amdgpu_userq {
>>> +       struct drm_amdgpu_userq_in in;
>>> +       struct drm_amdgpu_userq_out out;
>>> +};
>>> +
>>>   /* vm ioctl */
>>>   #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>> -- 
>>> 2.43.2
>>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-05-02  5:31     ` Sharma, Shashank
@ 2024-05-02 13:06       ` Kasiviswanathan, Harish
  2024-05-02 13:23         ` Sharma, Shashank
  2024-05-02 13:46       ` Alex Deucher
  1 sibling, 1 reply; 51+ messages in thread
From: Kasiviswanathan, Harish @ 2024-05-02 13:06 UTC (permalink / raw)
  To: Sharma, Shashank, Alex Deucher
  Cc: amd-gfx, Yadav, Arvind, Deucher, Alexander, Koenig, Christian

[AMD Official Use Only - General]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Sharma, Shashank
Sent: Thursday, May 2, 2024 1:32 AM
To: Alex Deucher <alexdeucher@gmail.com>
Cc: amd-gfx@lists.freedesktop.org; Yadav, Arvind <Arvind.Yadav@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART


On 01/05/2024 23:36, Alex Deucher wrote:
> On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> To support oversubscription, MES FW expects WPTR BOs to
>> be mapped into GART, before they are submitted to usermode
>> queues. This patch adds a function for the same.
>>
>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>>
>> V5: Addressed review comments from Christian:
>>      - Either pin object or allocate from GART, but not both.
>>      - All the handling must be done with the VM locks held.
>>
>> V7: Addressed review comments from Christian:
>>      - Do not take vm->eviction_lock
>>      - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>>
>> V8: Rebase
>> V9: Changed the function names from gfx_v11* to mes_v11*
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   2 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index 8d2cd61af26b..37b80626e792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -30,6 +30,74 @@
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>
>> +static int
>> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
>> +{
>> +       int ret;
>> +
>> +       ret = amdgpu_bo_reserve(bo, true);
>> +       if (ret) {
>> +               DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>> +               goto err_reserve_bo_failed;
>> +       }
>> +
>> +       ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>> +       if (ret) {
>> +               DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>> +               goto err_map_bo_gart_failed;
>> +       }
>> +
>> +       amdgpu_bo_unreserve(bo);
>> +       bo = amdgpu_bo_ref(bo);
>> +
>> +       return 0;
>> +
>> +err_map_bo_gart_failed:
>> +       amdgpu_bo_unreserve(bo);
>> +err_reserve_bo_failed:
>> +       return ret;
>> +}
>> +

There is a very similar function amdgpu_amdkfd_map_gtt_bo_to_gart(). Is it possible to unify. Also, adev parameter in the above function is confusing. This was also removed from amdgpu_amdkfd_map_gtt_bo_to_gart(). It looks like bo is mapped to gart of adev, however it doesn't have to be. It is mapped to the gart to which bo is associated.

>> +static int
>> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
>> +                             struct amdgpu_usermode_queue *queue,
>> +                             uint64_t wptr)
>> +{
>> +       struct amdgpu_device *adev = uq_mgr->adev;
>> +       struct amdgpu_bo_va_mapping *wptr_mapping;
>> +       struct amdgpu_vm *wptr_vm;
>> +       struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
>> +       int ret;
>> +
>> +       wptr_vm = queue->vm;
>> +       ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>> +       if (ret)
>> +               return ret;
>> +
>> +       wptr &= AMDGPU_GMC_HOLE_MASK;
>> +       wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
>> +       amdgpu_bo_unreserve(wptr_vm->root.bo);
>> +       if (!wptr_mapping) {
>> +               DRM_ERROR("Failed to lookup wptr bo\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       wptr_obj->obj = wptr_mapping->bo_va->base.bo;
>> +       if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
>> +               DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
>> +       if (ret) {
>> +               DRM_ERROR("Failed to map wptr bo to GART\n");
>> +               return ret;
>> +       }
>> +
>> +       queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> The wptr virtual address from the user may not be at offset 0 from the
> start of the object.  We should add the offset to the base vmid0 GPU
> address.

can you please elaborate a bit here ? wptr_obj->obj is already mapped to
gart, do we still need this ?

- Shashank

>
> Alex
>
>> +       return 0;
>> +}
>> +
>>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>                                 struct amdgpu_usermode_queue *queue,
>>                                 struct amdgpu_mqd_prop *userq_props)
>> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>          queue_input.queue_size = userq_props->queue_size >> 2;
>>          queue_input.doorbell_offset = userq_props->doorbell_index;
>>          queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +       queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>>
>>          amdgpu_mes_lock(&adev->mes);
>>          r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>                  goto free_mqd;
>>          }
>>
>> +       /* FW expects WPTR BOs to be mapped into GART */
>> +       r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
>> +       if (r) {
>> +               DRM_ERROR("Failed to create WPTR mapping\n");
>> +               goto free_ctx;
>> +       }
>> +
>>          /* Map userqueue into FW using MES */
>>          r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>>          if (r) {
>> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>>                              struct amdgpu_usermode_queue *queue)
>>   {
>>          mes_v11_0_userq_unmap(uq_mgr, queue);
>> +       amdgpu_bo_unref(&queue->wptr_obj.obj);
>>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>>          kfree(queue->userq_prop);
>>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 643f31474bd8..ffe8a3d73756 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>>          struct amdgpu_vm        *vm;
>>          struct amdgpu_userq_obj mqd;
>>          struct amdgpu_userq_obj fw_obj;
>> +       struct amdgpu_userq_obj wptr_obj;
>>   };
>>
>>   struct amdgpu_userq_funcs {
>> --
>> 2.43.2
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-05-02 13:06       ` Kasiviswanathan, Harish
@ 2024-05-02 13:23         ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 13:23 UTC (permalink / raw)
  To: Kasiviswanathan, Harish, Alex Deucher
  Cc: amd-gfx, Yadav, Arvind, Deucher, Alexander, Koenig, Christian


On 02/05/2024 15:06, Kasiviswanathan, Harish wrote:
> [AMD Official Use Only - General]
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Sharma, Shashank
> Sent: Thursday, May 2, 2024 1:32 AM
> To: Alex Deucher <alexdeucher@gmail.com>
> Cc: amd-gfx@lists.freedesktop.org; Yadav, Arvind <Arvind.Yadav@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> Subject: Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
>
>
> On 01/05/2024 23:36, Alex Deucher wrote:
>> On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>> To support oversubscription, MES FW expects WPTR BOs to
>>> be mapped into GART, before they are submitted to usermode
>>> queues. This patch adds a function for the same.
>>>
>>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>>>
>>> V5: Addressed review comments from Christian:
>>>       - Either pin object or allocate from GART, but not both.
>>>       - All the handling must be done with the VM locks held.
>>>
>>> V7: Addressed review comments from Christian:
>>>       - Do not take vm->eviction_lock
>>>       - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>>>
>>> V8: Rebase
>>> V9: Changed the function names from gfx_v11* to mes_v11*
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>> ---
>>>    .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>>    2 files changed, 78 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>> index 8d2cd61af26b..37b80626e792 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>> @@ -30,6 +30,74 @@
>>>    #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>    #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>
>>> +static int
>>> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
>>> +{
>>> +       int ret;
>>> +
>>> +       ret = amdgpu_bo_reserve(bo, true);
>>> +       if (ret) {
>>> +               DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>>> +               goto err_reserve_bo_failed;
>>> +       }
>>> +
>>> +       ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>>> +       if (ret) {
>>> +               DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>>> +               goto err_map_bo_gart_failed;
>>> +       }
>>> +
>>> +       amdgpu_bo_unreserve(bo);
>>> +       bo = amdgpu_bo_ref(bo);
>>> +
>>> +       return 0;
>>> +
>>> +err_map_bo_gart_failed:
>>> +       amdgpu_bo_unreserve(bo);
>>> +err_reserve_bo_failed:
>>> +       return ret;
>>> +}
>>> +
> There is a very similar function amdgpu_amdkfd_map_gtt_bo_to_gart(). Is it possible to unify. Also, adev parameter in the above function is confusing. This was also removed from amdgpu_amdkfd_map_gtt_bo_to_gart(). It looks like bo is mapped to gart of adev, however it doesn't have to be. It is mapped to the gart to which bo is associated.

I don't think unification makes much sense here, but I agree that adev 
can be removed from the input args. I will update this.

- Shashank

>>> +static int
>>> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
>>> +                             struct amdgpu_usermode_queue *queue,
>>> +                             uint64_t wptr)
>>> +{
>>> +       struct amdgpu_device *adev = uq_mgr->adev;
>>> +       struct amdgpu_bo_va_mapping *wptr_mapping;
>>> +       struct amdgpu_vm *wptr_vm;
>>> +       struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
>>> +       int ret;
>>> +
>>> +       wptr_vm = queue->vm;
>>> +       ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>>> +       if (ret)
>>> +               return ret;
>>> +
>>> +       wptr &= AMDGPU_GMC_HOLE_MASK;
>>> +       wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
>>> +       amdgpu_bo_unreserve(wptr_vm->root.bo);
>>> +       if (!wptr_mapping) {
>>> +               DRM_ERROR("Failed to lookup wptr bo\n");
>>> +               return -EINVAL;
>>> +       }
>>> +
>>> +       wptr_obj->obj = wptr_mapping->bo_va->base.bo;
>>> +       if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
>>> +               DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
>>> +               return -EINVAL;
>>> +       }
>>> +
>>> +       ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
>>> +       if (ret) {
>>> +               DRM_ERROR("Failed to map wptr bo to GART\n");
>>> +               return ret;
>>> +       }
>>> +
>>> +       queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
>> The wptr virtual address from the user may not be at offset 0 from the
>> start of the object.  We should add the offset to the base vmid0 GPU
>> address.
> can you please elaborate a bit here ? wptr_obj->obj is already mapped to
> gart, do we still need this ?
>
> - Shashank
>
>> Alex
>>
>>> +       return 0;
>>> +}
>>> +
>>>    static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>>                                  struct amdgpu_usermode_queue *queue,
>>>                                  struct amdgpu_mqd_prop *userq_props)
>>> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>>           queue_input.queue_size = userq_props->queue_size >> 2;
>>>           queue_input.doorbell_offset = userq_props->doorbell_index;
>>>           queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>> +       queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>>>
>>>           amdgpu_mes_lock(&adev->mes);
>>>           r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>>                   goto free_mqd;
>>>           }
>>>
>>> +       /* FW expects WPTR BOs to be mapped into GART */
>>> +       r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
>>> +       if (r) {
>>> +               DRM_ERROR("Failed to create WPTR mapping\n");
>>> +               goto free_ctx;
>>> +       }
>>> +
>>>           /* Map userqueue into FW using MES */
>>>           r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>>>           if (r) {
>>> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>>>                               struct amdgpu_usermode_queue *queue)
>>>    {
>>>           mes_v11_0_userq_unmap(uq_mgr, queue);
>>> +       amdgpu_bo_unref(&queue->wptr_obj.obj);
>>>           amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>>>           kfree(queue->userq_prop);
>>>           amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> index 643f31474bd8..ffe8a3d73756 100644
>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>>>           struct amdgpu_vm        *vm;
>>>           struct amdgpu_userq_obj mqd;
>>>           struct amdgpu_userq_obj fw_obj;
>>> +       struct amdgpu_userq_obj wptr_obj;
>>>    };
>>>
>>>    struct amdgpu_userq_funcs {
>>> --
>>> 2.43.2
>>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-05-02  5:31     ` Sharma, Shashank
  2024-05-02 13:06       ` Kasiviswanathan, Harish
@ 2024-05-02 13:46       ` Alex Deucher
  1 sibling, 0 replies; 51+ messages in thread
From: Alex Deucher @ 2024-05-02 13:46 UTC (permalink / raw)
  To: Sharma, Shashank; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Thu, May 2, 2024 at 1:31 AM Sharma, Shashank <shashank.sharma@amd.com> wrote:
>
>
> On 01/05/2024 23:36, Alex Deucher wrote:
> > On Fri, Apr 26, 2024 at 9:57 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >> To support oversubscription, MES FW expects WPTR BOs to
> >> be mapped into GART, before they are submitted to usermode
> >> queues. This patch adds a function for the same.
> >>
> >> V4: fix the wptr value before mapping lookup (Bas, Christian).
> >>
> >> V5: Addressed review comments from Christian:
> >>      - Either pin object or allocate from GART, but not both.
> >>      - All the handling must be done with the VM locks held.
> >>
> >> V7: Addressed review comments from Christian:
> >>      - Do not take vm->eviction_lock
> >>      - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
> >>
> >> V8: Rebase
> >> V9: Changed the function names from gfx_v11* to mes_v11*
> >>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: Christian Koenig <christian.koenig@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >> ---
> >>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
> >>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
> >>   2 files changed, 78 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> index 8d2cd61af26b..37b80626e792 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> @@ -30,6 +30,74 @@
> >>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> >>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> >>
> >> +static int
> >> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
> >> +{
> >> +       int ret;
> >> +
> >> +       ret = amdgpu_bo_reserve(bo, true);
> >> +       if (ret) {
> >> +               DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> >> +               goto err_reserve_bo_failed;
> >> +       }
> >> +
> >> +       ret = amdgpu_ttm_alloc_gart(&bo->tbo);
> >> +       if (ret) {
> >> +               DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> >> +               goto err_map_bo_gart_failed;
> >> +       }
> >> +
> >> +       amdgpu_bo_unreserve(bo);
> >> +       bo = amdgpu_bo_ref(bo);
> >> +
> >> +       return 0;
> >> +
> >> +err_map_bo_gart_failed:
> >> +       amdgpu_bo_unreserve(bo);
> >> +err_reserve_bo_failed:
> >> +       return ret;
> >> +}
> >> +
> >> +static int
> >> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
> >> +                             struct amdgpu_usermode_queue *queue,
> >> +                             uint64_t wptr)
> >> +{
> >> +       struct amdgpu_device *adev = uq_mgr->adev;
> >> +       struct amdgpu_bo_va_mapping *wptr_mapping;
> >> +       struct amdgpu_vm *wptr_vm;
> >> +       struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
> >> +       int ret;
> >> +
> >> +       wptr_vm = queue->vm;
> >> +       ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> >> +       if (ret)
> >> +               return ret;
> >> +
> >> +       wptr &= AMDGPU_GMC_HOLE_MASK;
> >> +       wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
> >> +       amdgpu_bo_unreserve(wptr_vm->root.bo);
> >> +       if (!wptr_mapping) {
> >> +               DRM_ERROR("Failed to lookup wptr bo\n");
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       wptr_obj->obj = wptr_mapping->bo_va->base.bo;
> >> +       if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
> >> +               DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
> >> +               return -EINVAL;
> >> +       }
> >> +
> >> +       ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
> >> +       if (ret) {
> >> +               DRM_ERROR("Failed to map wptr bo to GART\n");
> >> +               return ret;
> >> +       }
> >> +
> >> +       queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> > The wptr virtual address from the user may not be at offset 0 from the
> > start of the object.  We should add the offset to the base vmid0 GPU
> > address.
>
> can you please elaborate a bit here ? wptr_obj->obj is already mapped to
> gart, do we still need this ?

The location that the MES will poll needs to be the same as the
location that the UMD will be writing to.  E.g., if you allocate the
BO and then map it into user space at location 0x5000 in the user's
GPU virtual address space and then the user uses 0x5008 as the wptr
address, we need to make sure that we are polling in MES at vmid0
virtual address + 0x8.  If you map the BO at 0x2000 in the vmid0
address space, you need to make sure to point the firmware to 0x2008.

Alex

>
> - Shashank
>
> >
> > Alex
> >
> >> +       return 0;
> >> +}
> >> +
> >>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
> >>                                 struct amdgpu_usermode_queue *queue,
> >>                                 struct amdgpu_mqd_prop *userq_props)
> >> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
> >>          queue_input.queue_size = userq_props->queue_size >> 2;
> >>          queue_input.doorbell_offset = userq_props->doorbell_index;
> >>          queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> >> +       queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
> >>
> >>          amdgpu_mes_lock(&adev->mes);
> >>          r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> >> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> >>                  goto free_mqd;
> >>          }
> >>
> >> +       /* FW expects WPTR BOs to be mapped into GART */
> >> +       r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
> >> +       if (r) {
> >> +               DRM_ERROR("Failed to create WPTR mapping\n");
> >> +               goto free_ctx;
> >> +       }
> >> +
> >>          /* Map userqueue into FW using MES */
> >>          r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
> >>          if (r) {
> >> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
> >>                              struct amdgpu_usermode_queue *queue)
> >>   {
> >>          mes_v11_0_userq_unmap(uq_mgr, queue);
> >> +       amdgpu_bo_unref(&queue->wptr_obj.obj);
> >>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
> >>          kfree(queue->userq_prop);
> >>          amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> >> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >> index 643f31474bd8..ffe8a3d73756 100644
> >> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
> >>          struct amdgpu_vm        *vm;
> >>          struct amdgpu_userq_obj mqd;
> >>          struct amdgpu_userq_obj fw_obj;
> >> +       struct amdgpu_userq_obj wptr_obj;
> >>   };
> >>
> >>   struct amdgpu_userq_funcs {
> >> --
> >> 2.43.2
> >>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management
  2024-05-02 12:53       ` Sharma, Shashank
@ 2024-05-02 13:52         ` Alex Deucher
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Deucher @ 2024-05-02 13:52 UTC (permalink / raw)
  To: Sharma, Shashank; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Thu, May 2, 2024 at 8:53 AM Sharma, Shashank <shashank.sharma@amd.com> wrote:
>
>
> On 02/05/2024 07:23, Sharma, Shashank wrote:
> > Hey Alex,
> >
> > On 01/05/2024 22:39, Alex Deucher wrote:
> >> On Fri, Apr 26, 2024 at 10:07 AM Shashank Sharma
> >> <shashank.sharma@amd.com> wrote:
> >>> From: Alex Deucher <alexander.deucher@amd.com>
> >>>
> >>> This patch intorduces new UAPI/IOCTL for usermode graphics
> >>> queue. The userspace app will fill this structure and request
> >>> the graphics driver to add a graphics work queue for it. The
> >>> output of this UAPI is a queue id.
> >>>
> >>> This UAPI maps the queue into GPU, so the graphics app can start
> >>> submitting work to the queue as soon as the call returns.
> >>>
> >>> V2: Addressed review comments from Alex and Christian
> >>>      - Make the doorbell offset's comment clearer
> >>>      - Change the output parameter name to queue_id
> >>>
> >>> V3: Integration with doorbell manager
> >>>
> >>> V4:
> >>>      - Updated the UAPI doc (Pierre-Eric)
> >>>      - Created a Union for engine specific MQDs (Alex)
> >>>      - Added Christian's R-B
> >>> V5:
> >>>      - Add variables for GDS and CSA in MQD structure (Alex)
> >>>      - Make MQD data a ptr-size pair instead of union (Alex)
> >>>
> >>> V9:
> >>>     - renamed struct drm_amdgpu_userq_mqd_gfx_v11 to struct
> >>>       drm_amdgpu_userq_mqd as its being used for SDMA and
> >>>       compute queues as well
> >>>
> >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>> ---
> >>>   include/uapi/drm/amdgpu_drm.h | 110
> >>> ++++++++++++++++++++++++++++++++++
> >>>   1 file changed, 110 insertions(+)
> >>>
> >>> diff --git a/include/uapi/drm/amdgpu_drm.h
> >>> b/include/uapi/drm/amdgpu_drm.h
> >>> index 96e32dafd4f0..22f56a30f7cb 100644
> >>> --- a/include/uapi/drm/amdgpu_drm.h
> >>> +++ b/include/uapi/drm/amdgpu_drm.h
> >>> @@ -54,6 +54,7 @@ extern "C" {
> >>>   #define DRM_AMDGPU_VM                  0x13
> >>>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>>   #define DRM_AMDGPU_SCHED               0x15
> >>> +#define DRM_AMDGPU_USERQ               0x16
> >>>
> >>>   #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
> >>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >>> @@ -71,6 +72,7 @@ extern "C" {
> >>>   #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE
> >>> + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
> >>>   #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
> >>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>>
> >>>   /**
> >>>    * DOC: memory domains
> >>> @@ -317,6 +319,114 @@ union drm_amdgpu_ctx {
> >>>          union drm_amdgpu_ctx_out out;
> >>>   };
> >>>
> >>> +/* user queue IOCTL */
> >>> +#define AMDGPU_USERQ_OP_CREATE 1
> >>> +#define AMDGPU_USERQ_OP_FREE   2
> >>> +
> >>> +/* Flag to indicate secure buffer related workload, unused for now */
> >>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >>> +/* Flag to indicate AQL workload, unused for now */
> >>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >>> +
> >>> +/*
> >>> + * MQD (memory queue descriptor) is a set of parameters which allow
> >>> + * the GPU to uniquely define and identify a usermode queue. This
> >>> + * structure defines the MQD for GFX-V11 IP ver 0.
> >>> + */
> >>> +struct drm_amdgpu_userq_mqd {
> >> Maybe rename this to drm_amdgpu_gfx_userq_mqd since it's gfx specific.
> >> Then we can add different MQDs for SDMA, compute, etc. as they have
> >> different metadata.  E.g., the shadow and CSA are gfx only.
> >
> >
> > Actually this was named drm_amdgpu_userq_mqd_gfx_v11_0 until the last
> > patchset, but then I realized that apart from the objects (gds/shadow
> > va) nothing is gfx specific, its actually required for every userqueue
> > IP which is MES based, so I thought it would be an overkill to create
> > multiple structures for almost the same data. If you feel strong about
> > this, I can change it again.
> >
> > - Shashank
>
>
> Please ignore my last comment, I understand what you are mentioning, and
> I have reformatted the patches accordingly. Now, I am keeping everything
> reqd for MES in one basic struture (drm_amdgpu_userq_in) and creating
> drm_amdgpu_userq_mqd_gfx_v11 for GFX specific things (like CSA, Shadow
> and GDS areas). Now there will be one separate patch which will enabled
> GFX_IP on MES code, just like how we have separate patches for SDMA and
> Compute IP in this series.  I will send the V10 patches with this
> reformatting in some time.

Yeah, we just need to make it clear to userspace what buffers are
necessary for which ring type.

Alex

>
> - Shashank
>
> >
> >> Alex
> >>
> >>
> >>> +       /**
> >>> +        * @queue_va: Virtual address of the GPU memory which holds
> >>> the queue
> >>> +        * object. The queue holds the workload packets.
> >>> +        */
> >>> +       __u64   queue_va;
> >>> +       /**
> >>> +        * @queue_size: Size of the queue in bytes, this needs to be
> >>> 256-byte
> >>> +        * aligned.
> >>> +        */
> >>> +       __u64   queue_size;
> >>> +       /**
> >>> +        * @rptr_va : Virtual address of the GPU memory which holds
> >>> the ring RPTR.
> >>> +        * This object must be at least 8 byte in size and aligned
> >>> to 8-byte offset.
> >>> +        */
> >>> +       __u64   rptr_va;
> >>> +       /**
> >>> +        * @wptr_va : Virtual address of the GPU memory which holds
> >>> the ring WPTR.
> >>> +        * This object must be at least 8 byte in size and aligned
> >>> to 8-byte offset.
> >>> +        *
> >>> +        * Queue, RPTR and WPTR can come from the same object, as
> >>> long as the size
> >>> +        * and alignment related requirements are met.
> >>> +        */
> >>> +       __u64   wptr_va;
> >>> +       /**
> >>> +        * @shadow_va: Virtual address of the GPU memory to hold the
> >>> shadow buffer.
> >>> +        * This must be a from a separate GPU object, and must be at
> >>> least 4-page
> >>> +        * sized.
> >>> +        */
> >>> +       __u64   shadow_va;
> >>> +       /**
> >>> +        * @gds_va: Virtual address of the GPU memory to hold the
> >>> GDS buffer.
> >>> +        * This must be a from a separate GPU object, and must be at
> >>> least 1-page
> >>> +        * sized.
> >>> +        */
> >>> +       __u64   gds_va;
> >>> +       /**
> >>> +        * @csa_va: Virtual address of the GPU memory to hold the
> >>> CSA buffer.
> >>> +        * This must be a from a separate GPU object, and must be at
> >>> least 1-page
> >>> +        * sized.
> >>> +        */
> >>> +       __u64   csa_va;
> >>> +};
> >>> +
> >>> +struct drm_amdgpu_userq_in {
> >>> +       /** AMDGPU_USERQ_OP_* */
> >>> +       __u32   op;
> >>> +       /** Queue handle for USERQ_OP_FREE */
> >>> +       __u32   queue_id;
> >>> +       /** the target GPU engine to execute workload
> >>> (AMDGPU_HW_IP_*) */
> >>> +       __u32   ip_type;
> >>> +       /**
> >>> +        * @flags: flags to indicate special function for queue like
> >>> secure
> >>> +        * buffer (TMZ). Unused for now.
> >>> +        */
> >>> +       __u32   flags;
> >>> +       /**
> >>> +        * @doorbell_handle: the handle of doorbell GEM object
> >>> +        * associated to this client.
> >>> +        */
> >>> +       __u32   doorbell_handle;
> >>> +       /**
> >>> +        * @doorbell_offset: 32-bit offset of the doorbell in the
> >>> doorbell bo.
> >>> +        * Kernel will generate absolute doorbell offset using
> >>> doorbell_handle
> >>> +        * and doorbell_offset in the doorbell bo.
> >>> +        */
> >>> +       __u32   doorbell_offset;
> >>> +       /**
> >>> +        * @mqd: Queue descriptor for USERQ_OP_CREATE
> >>> +        * MQD data can be of different size for different GPU
> >>> IP/engine and
> >>> +        * their respective versions/revisions, so this points to a
> >>> __u64 *
> >>> +        * which holds MQD of this usermode queue.
> >>> +        */
> >>> +       __u64 mqd;
> >>> +       /**
> >>> +        * @size: size of MQD data in bytes, it must match the MQD
> >>> structure
> >>> +        * size of the respective engine/revision defined in UAPI
> >>> for ex, for
> >>> +        * gfx_v11 workloads, size =
> >>> sizeof(drm_amdgpu_userq_mqd_gfx_v11).
> >>> +        */
> >>> +       __u64 mqd_size;
> >>> +};
> >>> +
> >>> +struct drm_amdgpu_userq_out {
> >>> +       /** Queue handle */
> >>> +       __u32   queue_id;
> >>> +       /** Flags */
> >>> +       __u32   flags;
> >>> +};
> >>> +
> >>> +union drm_amdgpu_userq {
> >>> +       struct drm_amdgpu_userq_in in;
> >>> +       struct drm_amdgpu_userq_out out;
> >>> +};
> >>> +
> >>>   /* vm ioctl */
> >>>   #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >>> --
> >>> 2.43.2
> >>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues
  2024-05-02  5:47     ` Sharma, Shashank
@ 2024-05-02 13:55       ` Alex Deucher
  2024-05-02 14:01         ` Sharma, Shashank
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-02 13:55 UTC (permalink / raw)
  To: Sharma, Shashank
  Cc: amd-gfx, Arvind Yadav, Christian König, Alex Deucher,
	Srinivasan Shanmugam

On Thu, May 2, 2024 at 1:47 AM Sharma, Shashank <shashank.sharma@amd.com> wrote:
>
>
> On 01/05/2024 22:41, Alex Deucher wrote:
> > On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
> > <shashank.sharma@amd.com> wrote:
> >> This patch does necessary modifications to enable the SDMA
> >> usermode queues using the existing userqueue infrastructure.
> >>
> >> V9: introduced this patch in the series
> >>
> >> Cc: Christian König <Christian.Koenig@amd.com>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    | 2 +-
> >>   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 ++++
> >>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c           | 3 +++
> >>   3 files changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> index 781283753804..e516487e8db9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> @@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> >>          int qid, r = 0;
> >>
> >>          /* Usermode queues are only supported for GFX/SDMA engines as of now */
> >> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
> >> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
> >>                  DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
> >>                  return -EINVAL;
> >>          }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> index a6c3037d2d1f..a5e270eda37b 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> @@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> >>                  return r;
> >>          }
> >>
> >> +       /* We don't need to set other FW objects for SDMA queues */
> >> +       if (queue->queue_type == AMDGPU_HW_IP_DMA)
> >> +               return 0;
> >> +
> >>          /* Shadow and GDS objects come directly from userspace */
> >>          mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
> >>          mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> >> index 361835a61f2e..90354a70c807 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> >> @@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
> >>          return 0;
> >>   }
> >>
> >> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
> > Can you include the header rather than adding an extern?
> Noted,
> >
> >> +
> >>   static int sdma_v6_0_sw_init(void *handle)
> >>   {
> >>          struct amdgpu_ring *ring;
> >> @@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
> >>                  return -EINVAL;
> >>          }
> >>
> >> +       adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
> >>          return r;
> >>   }
> > I think we need a new mqd descriptor in amdgpu_drm.h as well since the
> > sdma metadata is different from gfx and compute.
>
> Can you please elaborate on this ? AFAIK SDMA queue doesn't need any
> specific metadata objects (like GFX).

Right.  I want to make it clear in the IOCTL interface what buffers
are required for which ring types.  E.g., UMD might allocate a shadow
buffer for SDMA, but they don't need it so there is no need to
allocate it.  If we have separate mqd structures for every ring type,
it makes it clear which additional buffers are needed for which ring
types.

Alex

>
> - Shashank
>
> > Alex
> >
> >> --
> >> 2.43.2
> >>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues
  2024-05-02 13:55       ` Alex Deucher
@ 2024-05-02 14:01         ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 14:01 UTC (permalink / raw)
  To: Alex Deucher
  Cc: amd-gfx, Arvind Yadav, Christian König, Alex Deucher,
	Srinivasan Shanmugam


On 02/05/2024 15:55, Alex Deucher wrote:
> On Thu, May 2, 2024 at 1:47 AM Sharma, Shashank <shashank.sharma@amd.com> wrote:
>>
>> On 01/05/2024 22:41, Alex Deucher wrote:
>>> On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
>>> <shashank.sharma@amd.com> wrote:
>>>> This patch does necessary modifications to enable the SDMA
>>>> usermode queues using the existing userqueue infrastructure.
>>>>
>>>> V9: introduced this patch in the series
>>>>
>>>> Cc: Christian König <Christian.Koenig@amd.com>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    | 2 +-
>>>>    drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 4 ++++
>>>>    drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c           | 3 +++
>>>>    3 files changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> index 781283753804..e516487e8db9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> @@ -189,7 +189,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>>>>           int qid, r = 0;
>>>>
>>>>           /* Usermode queues are only supported for GFX/SDMA engines as of now */
>>>> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX) {
>>>> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
>>>>                   DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
>>>>                   return -EINVAL;
>>>>           }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>>> index a6c3037d2d1f..a5e270eda37b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>>> @@ -182,6 +182,10 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>>                   return r;
>>>>           }
>>>>
>>>> +       /* We don't need to set other FW objects for SDMA queues */
>>>> +       if (queue->queue_type == AMDGPU_HW_IP_DMA)
>>>> +               return 0;
>>>> +
>>>>           /* Shadow and GDS objects come directly from userspace */
>>>>           mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
>>>>           mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>>>> index 361835a61f2e..90354a70c807 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>>>> @@ -1225,6 +1225,8 @@ static int sdma_v6_0_early_init(void *handle)
>>>>           return 0;
>>>>    }
>>>>
>>>> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
>>> Can you include the header rather than adding an extern?
>> Noted,
>>>> +
>>>>    static int sdma_v6_0_sw_init(void *handle)
>>>>    {
>>>>           struct amdgpu_ring *ring;
>>>> @@ -1265,6 +1267,7 @@ static int sdma_v6_0_sw_init(void *handle)
>>>>                   return -EINVAL;
>>>>           }
>>>>
>>>> +       adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
>>>>           return r;
>>>>    }
>>> I think we need a new mqd descriptor in amdgpu_drm.h as well since the
>>> sdma metadata is different from gfx and compute.
>> Can you please elaborate on this ? AFAIK SDMA queue doesn't need any
>> specific metadata objects (like GFX).
> Right.  I want to make it clear in the IOCTL interface what buffers
> are required for which ring types.  E.g., UMD might allocate a shadow
> buffer for SDMA, but they don't need it so there is no need to
> allocate it.  If we have separate mqd structures for every ring type,
> it makes it clear which additional buffers are needed for which ring
> types.

Agree, it makes sense.

- Shashank

> Alex
>
>> - Shashank
>>
>>> Alex
>>>
>>>> --
>>>> 2.43.2
>>>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue
  2024-05-02  5:50     ` Sharma, Shashank
@ 2024-05-02 14:10       ` Alex Deucher
  2024-05-02 14:17         ` Sharma, Shashank
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Deucher @ 2024-05-02 14:10 UTC (permalink / raw)
  To: Sharma, Shashank; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig

On Thu, May 2, 2024 at 1:51 AM Sharma, Shashank <shashank.sharma@amd.com> wrote:
>
>
> On 01/05/2024 22:44, Alex Deucher wrote:
> > On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
> > <shashank.sharma@amd.com> wrote:
> >> From: Arvind Yadav <arvind.yadav@amd.com>
> >>
> >> This patch does the necessary changes required to
> >> enable compute workload support using the existing
> >> usermode queues infrastructure.
> >>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: Christian Koenig <christian.koenig@amd.com>
> >> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    |  3 ++-
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c           |  2 ++
> >>   drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +++++++++-
> >>   include/uapi/drm/amdgpu_drm.h                    |  1 +
> >>   4 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> index e516487e8db9..78d34fa7a0b9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> @@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> >>          int qid, r = 0;
> >>
> >>          /* Usermode queues are only supported for GFX/SDMA engines as of now */
> >> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
> >> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA
> >> +                       && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
> >>                  DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
> >>                  return -EINVAL;
> >>          }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> index 525bd0f4d3f7..27b86f7fe949 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> @@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
> >>                  adev->gfx.mec.num_pipe_per_mec = 4;
> >>                  adev->gfx.mec.num_queue_per_pipe = 4;
> >>                  adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
> >> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
> >>                  break;
> >>          case IP_VERSION(11, 0, 1):
> >>          case IP_VERSION(11, 0, 4):
> >> @@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
> >>                  adev->gfx.mec.num_pipe_per_mec = 4;
> >>                  adev->gfx.mec.num_queue_per_pipe = 4;
> >>                  adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
> >> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
> >>                  break;
> >>          default:
> >>                  adev->gfx.me.num_me = 1;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> index a5e270eda37b..d61d80f86003 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> >> @@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> >>          }
> >>
> >>          /* We don't need to set other FW objects for SDMA queues */
> >> -       if (queue->queue_type == AMDGPU_HW_IP_DMA)
> >> +       if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
> >> +           (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
> >>                  return 0;
> >>
> >>          /* Shadow and GDS objects come directly from userspace */
> >> @@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> >>          userq_props->use_doorbell = true;
> >>          userq_props->doorbell_index = queue->doorbell_index;
> >>
> >> +       if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
> >> +               userq_props->eop_gpu_addr = mqd_user->eop_va;
> >> +               userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
> >> +               userq_props->hqd_queue_priority = AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
> >> +               userq_props->hqd_active = false;
> >> +       }
> >> +
> >>          queue->userq_prop = userq_props;
> >>
> >>          r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
> >> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> >> index 22f56a30f7cb..676792ad3618 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
> >>           * sized.
> >>           */
> >>          __u64   csa_va;
> >> +       __u64   eop_va;
> >>   };
> > Let's add a new mqd descriptor for compute since it's different from
> > gfx and sdma.
> the only different thing is this object (vs csa and gds objects), apart
> from that, the mqd is the same as they all are MES based. Am I missing
> something here ?

The scheduling entity is irrelevant.  The mqd is defined by the engine
itself.  E.g., v11_structs.h.  Gfx has one set of requirements,
compute has different ones, and SDMA has different ones.  VPE and VCN
also have mqds.  When we add support for them in the future, they may
have additional requirements.  I want to make it clear in the
interface what additional data are required for each ring type.

> >
> > Also, can we handle the eop buffer as part of the
> > kernel metadata for compute user queues rather than having the user
> > specify it?
>
> Sure, we can do it.

Thinking about it more, I think the eop has to be in the user's GPU
virtual address space so it probably makes more sense for the user to
allocate this, but ideally we'd take an extra ref count on it while
the queue is active to avoid the user freeing it while the queue is
active, but that can probably be a future improvement.

Alex

>
> - Shashank
>
> >
> > Alex
> >
> >>   struct drm_amdgpu_userq_in {
> >> --
> >> 2.43.2
> >>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue
  2024-05-02 14:10       ` Alex Deucher
@ 2024-05-02 14:17         ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 14:17 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx, Arvind Yadav, Alex Deucher, Christian Koenig


On 02/05/2024 16:10, Alex Deucher wrote:
> On Thu, May 2, 2024 at 1:51 AM Sharma, Shashank <shashank.sharma@amd.com> wrote:
>>
>> On 01/05/2024 22:44, Alex Deucher wrote:
>>> On Fri, Apr 26, 2024 at 10:27 AM Shashank Sharma
>>> <shashank.sharma@amd.com> wrote:
>>>> From: Arvind Yadav <arvind.yadav@amd.com>
>>>>
>>>> This patch does the necessary changes required to
>>>> enable compute workload support using the existing
>>>> usermode queues infrastructure.
>>>>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c    |  3 ++-
>>>>    drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c           |  2 ++
>>>>    drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 10 +++++++++-
>>>>    include/uapi/drm/amdgpu_drm.h                    |  1 +
>>>>    4 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> index e516487e8db9..78d34fa7a0b9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> @@ -189,7 +189,8 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>>>>           int qid, r = 0;
>>>>
>>>>           /* Usermode queues are only supported for GFX/SDMA engines as of now */
>>>> -       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
>>>> +       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA
>>>> +                       && args->in.ip_type != AMDGPU_HW_IP_COMPUTE) {
>>>>                   DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
>>>>                   return -EINVAL;
>>>>           }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>> index 525bd0f4d3f7..27b86f7fe949 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>> @@ -1350,6 +1350,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>>>                   adev->gfx.mec.num_pipe_per_mec = 4;
>>>>                   adev->gfx.mec.num_queue_per_pipe = 4;
>>>>                   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>>> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
>>>>                   break;
>>>>           case IP_VERSION(11, 0, 1):
>>>>           case IP_VERSION(11, 0, 4):
>>>> @@ -1362,6 +1363,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>>>                   adev->gfx.mec.num_pipe_per_mec = 4;
>>>>                   adev->gfx.mec.num_queue_per_pipe = 4;
>>>>                   adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>>> +               adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
>>>>                   break;
>>>>           default:
>>>>                   adev->gfx.me.num_me = 1;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>>> index a5e270eda37b..d61d80f86003 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>>> @@ -183,7 +183,8 @@ static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>>           }
>>>>
>>>>           /* We don't need to set other FW objects for SDMA queues */
>>>> -       if (queue->queue_type == AMDGPU_HW_IP_DMA)
>>>> +       if ((queue->queue_type == AMDGPU_HW_IP_DMA) ||
>>>> +           (queue->queue_type == AMDGPU_HW_IP_COMPUTE))
>>>>                   return 0;
>>>>
>>>>           /* Shadow and GDS objects come directly from userspace */
>>>> @@ -246,6 +247,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>>>           userq_props->use_doorbell = true;
>>>>           userq_props->doorbell_index = queue->doorbell_index;
>>>>
>>>> +       if (queue->queue_type == AMDGPU_HW_IP_COMPUTE) {
>>>> +               userq_props->eop_gpu_addr = mqd_user->eop_va;
>>>> +               userq_props->hqd_pipe_priority = AMDGPU_GFX_PIPE_PRIO_NORMAL;
>>>> +               userq_props->hqd_queue_priority = AMDGPU_GFX_QUEUE_PRIORITY_MINIMUM;
>>>> +               userq_props->hqd_active = false;
>>>> +       }
>>>> +
>>>>           queue->userq_prop = userq_props;
>>>>
>>>>           r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>>>> index 22f56a30f7cb..676792ad3618 100644
>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>> @@ -375,6 +375,7 @@ struct drm_amdgpu_userq_mqd {
>>>>            * sized.
>>>>            */
>>>>           __u64   csa_va;
>>>> +       __u64   eop_va;
>>>>    };
>>> Let's add a new mqd descriptor for compute since it's different from
>>> gfx and sdma.
>> the only different thing is this object (vs csa and gds objects), apart
>> from that, the mqd is the same as they all are MES based. Am I missing
>> something here ?
> The scheduling entity is irrelevant.  The mqd is defined by the engine
> itself.  E.g., v11_structs.h.  Gfx has one set of requirements,
> compute has different ones, and SDMA has different ones.  VPE and VCN
> also have mqds.  When we add support for them in the future, they may
> have additional requirements.  I want to make it clear in the
> interface what additional data are required for each ring type.

Yes, this comment was also with the first understanding, so please 
ignore it.

We are aligned on the IP specific MQD structures now.

>
>>> Also, can we handle the eop buffer as part of the
>>> kernel metadata for compute user queues rather than having the user
>>> specify it?
>> Sure, we can do it.
> Thinking about it more, I think the eop has to be in the user's GPU
> virtual address space so it probably makes more sense for the user to
> allocate this, but ideally we'd take an extra ref count on it while
> the queue is active to avoid the user freeing it while the queue is
> active, but that can probably be a future improvement.

I was also thinking if the BO is expected to be created by (VMID != 0), 
so keeping it in userspace makes it aligned with other IP specific MQD 
objects.

Lets keep it for userspace, but will create a separate Compute MQD object .

- Shashank

> Alex
>
>> - Shashank
>>
>>> Alex
>>>
>>>>    struct drm_amdgpu_userq_in {
>>>> --
>>>> 2.43.2
>>>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX
  2024-04-26 13:48 ` [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
  2024-05-01 20:50   ` Alex Deucher
@ 2024-05-02 15:14   ` Christian König
  2024-05-02 15:35     ` Sharma, Shashank
  1 sibling, 1 reply; 51+ messages in thread
From: Christian König @ 2024-05-02 15:14 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Alex Deucher, Christian Koenig



Am 26.04.24 um 15:48 schrieb Shashank Sharma:
> A Memory queue descriptor (MQD) of a userqueue defines it in
> the hw's context. As MQD format can vary between different
> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>
> This patch:
> - Adds a new file which will be used for MES based userqueue
>    functions targeting GFX and SDMA IP.
> - Introduces MQD handler functions for the usermode queues.
> - Adds new functions to create and destroy userqueue MQD for
>    MES-V11 for GFX IP.
>
> V1: Worked on review comments from Alex:
>      - Make MQD functions GEN and IP specific
>
> V2: Worked on review comments from Alex:
>      - Reuse the existing adev->mqd[ip] for MQD creation
>      - Formatting and arrangement of code
>
> V3:
>      - Integration with doorbell manager
>
> V4: Review comments addressed:
>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>      - Align name of structure members (Luben)
>      - Don't break up the Cc tag list and the Sob tag list in commit
>        message (Luben)
> V5:
>     - No need to reserve the bo for MQD (Christian).
>     - Some more changes to support IP specific MQD creation.
>
> V6:
>     - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
>       calls while creating MQD object to amdgpu_bo_create() once eviction
>       fences are ready (Christian).
>
> V7:
>     - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
>     - Use memdup_user instead of copy_from_user (Christian)
>
> V9:
>     - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
>       that it can be reused for SDMA userqueues as well (Shashank, Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   4 +
>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++++++++++++++++++
>   3 files changed, 116 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 05a2d1714070..a640bfa468ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -184,7 +184,8 @@ amdgpu-y += \
>   amdgpu-y += \
>   	amdgpu_mes.o \
>   	mes_v10_1.o \
> -	mes_v11_0.o
> +	mes_v11_0.o \
> +	mes_v11_0_userqueue.o

Do we really need a new C file for this or could we put the two 
functions into mes_v11_0.c as well?

Apart from that it looks correct to me, but I'm really not that deep 
inside the code at the moment.

Regards,
Christian.

>   
>   # add UVD block
>   amdgpu-y += \
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index f7325b02a191..525bd0f4d3f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -1331,6 +1331,8 @@ static int gfx_v11_0_rlc_backdoor_autoload_enable(struct amdgpu_device *adev)
>   	return 0;
>   }
>   
> +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
> +
>   static int gfx_v11_0_sw_init(void *handle)
>   {
>   	int i, j, k, r, ring_id = 0;
> @@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
>   		adev->gfx.mec.num_mec = 2;
>   		adev->gfx.mec.num_pipe_per_mec = 4;
>   		adev->gfx.mec.num_queue_per_pipe = 4;
> +		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>   		break;
>   	case IP_VERSION(11, 0, 1):
>   	case IP_VERSION(11, 0, 4):
> @@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
>   		adev->gfx.mec.num_mec = 1;
>   		adev->gfx.mec.num_pipe_per_mec = 4;
>   		adev->gfx.mec.num_queue_per_pipe = 4;
> +		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>   		break;
>   	default:
>   		adev->gfx.me.num_me = 1;
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> new file mode 100644
> index 000000000000..9e7dee77d344
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -0,0 +1,110 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2024 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include "amdgpu.h"
> +#include "amdgpu_gfx.h"
> +#include "v11_structs.h"
> +#include "mes_v11_0.h"
> +#include "amdgpu_userqueue.h"
> +
> +static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> +				      struct drm_amdgpu_userq_in *args_in,
> +				      struct amdgpu_usermode_queue *queue)
> +{
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
> +	struct drm_amdgpu_userq_mqd *mqd_user;
> +	struct amdgpu_mqd_prop *userq_props;
> +	int r;
> +
> +	/* Incoming MQD parameters from userspace to be saved here */
> +	memset(&mqd_user, 0, sizeof(mqd_user));
> +
> +	/* Structure to initialize MQD for userqueue using generic MQD init function */
> +	userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
> +	if (!userq_props) {
> +		DRM_ERROR("Failed to allocate memory for userq_props\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd)) {
> +		DRM_ERROR("MQD size mismatch\n");
> +		r = -EINVAL;
> +		goto free_props;
> +	}
> +
> +	mqd_user = memdup_user(u64_to_user_ptr(args_in->mqd), args_in->mqd_size);
> +	if (IS_ERR(mqd_user)) {
> +		DRM_ERROR("Failed to read user MQD\n");
> +		r = -EFAULT;
> +		goto free_props;
> +	}
> +
> +	r = amdgpu_userqueue_create_object(uq_mgr, &queue->mqd, mqd_hw_default->mqd_size);
> +	if (r) {
> +		DRM_ERROR("Failed to create MQD object for userqueue\n");
> +		goto free_mqd_user;
> +	}
> +
> +	/* Initialize the MQD BO with user given values */
> +	userq_props->wptr_gpu_addr = mqd_user->wptr_va;
> +	userq_props->rptr_gpu_addr = mqd_user->rptr_va;
> +	userq_props->queue_size = mqd_user->queue_size;
> +	userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
> +	userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
> +	userq_props->use_doorbell = true;
> +
> +	queue->userq_prop = userq_props;
> +
> +	r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, userq_props);
> +	if (r) {
> +		DRM_ERROR("Failed to initialize MQD for userqueue\n");
> +		goto free_mqd;
> +	}
> +
> +	return 0;
> +
> +free_mqd:
> +	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> +
> +free_mqd_user:
> +	kfree(mqd_user);
> +
> +free_props:
> +	kfree(userq_props);
> +
> +	return r;
> +}
> +
> +static void
> +mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
> +			    struct amdgpu_usermode_queue *queue)
> +{
> +	kfree(queue->userq_prop);
> +	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> +}
> +
> +const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
> +	.mqd_create = mes_v11_0_userq_mqd_create,
> +	.mqd_destroy = mes_v11_0_userq_mqd_destroy,
> +};


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue
  2024-04-26 13:48 ` [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue Shashank Sharma
  2024-05-01 21:11   ` Alex Deucher
@ 2024-05-02 15:15   ` Christian König
  1 sibling, 0 replies; 51+ messages in thread
From: Christian König @ 2024-05-02 15:15 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Alex Deucher, Christian Koenig

Am 26.04.24 um 15:48 schrieb Shashank Sharma:
> The FW expects us to allocate at least one page as context
> space to process gang, process, GDS and FW  related work.
> This patch creates a joint object for the same, and calculates
> GPU space offsets of these spaces.
>
> V1: Addressed review comments on RFC patch:
>      Alex: Make this function IP specific
>
> V2: Addressed review comments from Christian
>      - Allocate only one object for total FW space, and calculate
>        offsets for each of these objects.
>
> V3: Integration with doorbell manager
>
> V4: Review comments:
>      - Remove shadow from FW space list from cover letter (Alex)
>      - Alignment of macro (Luben)
>
> V5: Merged patches 5 and 6 into this single patch
>      Addressed review comments:
>      - Use lower_32_bits instead of mask (Christian)
>      - gfx_v11_0 instead of gfx_v11 in function names (Alex)
>      - Shadow and GDS objects are now coming from userspace (Christian,
>        Alex)
>
> V6:
>      - Add a comment to replace amdgpu_bo_create_kernel() with
>        amdgpu_bo_create() during fw_ctx object creation (Christian).
>      - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
>        of generic queue structure and make it gen11 specific (Alex).
>
> V7:
>     - Using helper function to create/destroy userqueue objects.
>     - Removed FW object space allocation.
>
> V8:
>     - Updating FW object address from user values.
>
> V9:
>     - uppdated function name from gfx_v11_* to mes_v11_*
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 43 +++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>   2 files changed, 44 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index 9e7dee77d344..9f9fdcb9c294 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -27,6 +27,41 @@
>   #include "mes_v11_0.h"
>   #include "amdgpu_userqueue.h"
>   
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> +
> +static int mes_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +					    struct amdgpu_usermode_queue *queue,
> +					    struct drm_amdgpu_userq_mqd *mqd_user)
> +{
> +	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +	struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
> +	int r, size;
> +
> +	/*
> +	 * The FW expects at least one page space allocated for
> +	 * process ctx and gang ctx each. Create an object
> +	 * for the same.
> +	 */
> +	size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ;
> +	r = amdgpu_userqueue_create_object(uq_mgr, ctx, size);
> +	if (r) {
> +		DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> +		return r;
> +	}
> +
> +	/* Shadow and GDS objects come directly from userspace */
> +	mqd->shadow_base_lo = mqd_user->shadow_va & 0xFFFFFFFC;
> +	mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> +
> +	mqd->gds_bkup_base_lo = mqd_user->gds_va & 0xFFFFFFFC;
> +	mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
> +
> +	mqd->fw_work_area_base_lo = mqd_user->csa_va & 0xFFFFFFFC;
> +	mqd->fw_work_area_base_hi = upper_32_bits(mqd_user->csa_va);
> +	return 0;
> +}
> +
>   static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>   				      struct drm_amdgpu_userq_in *args_in,
>   				      struct amdgpu_usermode_queue *queue)
> @@ -82,6 +117,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>   		goto free_mqd;
>   	}
>   
> +	/* Create BO for FW operations */
> +	r = mes_v11_0_userq_create_ctx_space(uq_mgr, queue, mqd_user);
> +	if (r) {
> +		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> +		goto free_mqd;
> +	}
> +
>   	return 0;
>   
>   free_mqd:
> @@ -100,6 +142,7 @@ static void
>   mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>   			    struct amdgpu_usermode_queue *queue)
>   {
> +	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>   	kfree(queue->userq_prop);
>   	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>   }
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index bbd29f68b8d4..643f31474bd8 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>   	struct amdgpu_userq_mgr *userq_mgr;
>   	struct amdgpu_vm	*vm;
>   	struct amdgpu_userq_obj mqd;
> +	struct amdgpu_userq_obj fw_obj;
>   };
>   
>   struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-04-26 13:48 ` [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART Shashank Sharma
  2024-05-01 21:36   ` Alex Deucher
@ 2024-05-02 15:18   ` Christian König
  2024-05-02 15:36     ` Sharma, Shashank
  1 sibling, 1 reply; 51+ messages in thread
From: Christian König @ 2024-05-02 15:18 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Alex Deucher, Christian Koenig

Am 26.04.24 um 15:48 schrieb Shashank Sharma:
> To support oversubscription, MES FW expects WPTR BOs to
> be mapped into GART, before they are submitted to usermode
> queues. This patch adds a function for the same.
>
> V4: fix the wptr value before mapping lookup (Bas, Christian).
>
> V5: Addressed review comments from Christian:
>      - Either pin object or allocate from GART, but not both.
>      - All the handling must be done with the VM locks held.
>
> V7: Addressed review comments from Christian:
>      - Do not take vm->eviction_lock
>      - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>
> V8: Rebase
> V9: Changed the function names from gfx_v11* to mes_v11*
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>

The patch itself looks good, but this really need the eviction fence to 
work properly.

Otherwise it can be that the BO mapped into the GART is evicted at some 
point.

Christian.

> ---
>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>   2 files changed, 78 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> index 8d2cd61af26b..37b80626e792 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
> @@ -30,6 +30,74 @@
>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>   
> +static int
> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
> +{
> +	int ret;
> +
> +	ret = amdgpu_bo_reserve(bo, true);
> +	if (ret) {
> +		DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> +		goto err_reserve_bo_failed;
> +	}
> +
> +	ret = amdgpu_ttm_alloc_gart(&bo->tbo);
> +	if (ret) {
> +		DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> +		goto err_map_bo_gart_failed;
> +	}
> +
> +	amdgpu_bo_unreserve(bo);
> +	bo = amdgpu_bo_ref(bo);
> +
> +	return 0;
> +
> +err_map_bo_gart_failed:
> +	amdgpu_bo_unreserve(bo);
> +err_reserve_bo_failed:
> +	return ret;
> +}
> +
> +static int
> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
> +			      struct amdgpu_usermode_queue *queue,
> +			      uint64_t wptr)
> +{
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	struct amdgpu_bo_va_mapping *wptr_mapping;
> +	struct amdgpu_vm *wptr_vm;
> +	struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
> +	int ret;
> +
> +	wptr_vm = queue->vm;
> +	ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> +	if (ret)
> +		return ret;
> +
> +	wptr &= AMDGPU_GMC_HOLE_MASK;
> +	wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
> +	amdgpu_bo_unreserve(wptr_vm->root.bo);
> +	if (!wptr_mapping) {
> +		DRM_ERROR("Failed to lookup wptr bo\n");
> +		return -EINVAL;
> +	}
> +
> +	wptr_obj->obj = wptr_mapping->bo_va->base.bo;
> +	if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
> +		DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
> +	if (ret) {
> +		DRM_ERROR("Failed to map wptr bo to GART\n");
> +		return ret;
> +	}
> +
> +	queue->wptr_obj.gpu_addr = amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
> +	return 0;
> +}
> +
>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>   			       struct amdgpu_usermode_queue *queue,
>   			       struct amdgpu_mqd_prop *userq_props)
> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>   	queue_input.queue_size = userq_props->queue_size >> 2;
>   	queue_input.doorbell_offset = userq_props->doorbell_index;
>   	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +	queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>   
>   	amdgpu_mes_lock(&adev->mes);
>   	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>   		goto free_mqd;
>   	}
>   
> +	/* FW expects WPTR BOs to be mapped into GART */
> +	r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, userq_props->wptr_gpu_addr);
> +	if (r) {
> +		DRM_ERROR("Failed to create WPTR mapping\n");
> +		goto free_ctx;
> +	}
> +
>   	/* Map userqueue into FW using MES */
>   	r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>   	if (r) {
> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>   			    struct amdgpu_usermode_queue *queue)
>   {
>   	mes_v11_0_userq_unmap(uq_mgr, queue);
> +	amdgpu_bo_unref(&queue->wptr_obj.obj);
>   	amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>   	kfree(queue->userq_prop);
>   	amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 643f31474bd8..ffe8a3d73756 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>   	struct amdgpu_vm	*vm;
>   	struct amdgpu_userq_obj mqd;
>   	struct amdgpu_userq_obj fw_obj;
> +	struct amdgpu_userq_obj wptr_obj;
>   };
>   
>   struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask
  2024-04-26 13:48 ` [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask Shashank Sharma
  2024-05-01 21:27   ` Alex Deucher
@ 2024-05-02 15:19   ` Christian König
  2024-05-02 15:42     ` Sharma, Shashank
  1 sibling, 1 reply; 51+ messages in thread
From: Christian König @ 2024-05-02 15:19 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Christian König, Alex Deucher

Am 26.04.24 um 15:48 schrieb Shashank Sharma:
> Current MES GFX mask prevents FW to enable oversubscription. This patch
> does the following:
> - Fixes the mask values and adds a description for the same.
> - Removes the central mask setup and makes it IP specific, as it would
>    be different when the number of pipes and queues are different.
>
> V9: introduce this patch in the series

As far as I can see this is a bug fix for existing code and should be 
pushed completely independent of the other work to amd-staging-drm-next.

Regards,
Christian.

>
> Cc: Christian König <Christian.Koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
>   drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++++++--
>   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++++++--
>   4 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> index a00cf4756ad0..b405fafc0b71 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> @@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
>   		adev->mes.compute_hqd_mask[i] = 0xc;
>   	}
>   
> -	for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
> -		adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffffffe;
> -
>   	for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
>   		if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
>   		    IP_VERSION(6, 0, 0))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index 4c8fc3117ef8..598556619337 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -110,7 +110,6 @@ struct amdgpu_mes {
>   	uint32_t                        vmid_mask_gfxhub;
>   	uint32_t                        vmid_mask_mmhub;
>   	uint32_t                        compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
> -	uint32_t                        gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
>   	uint32_t                        sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
>   	uint32_t                        aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
>   	uint32_t                        sch_ctx_offs;
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> index 1e5ad1e08d2a..4d1121d1a1e7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
> @@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct amdgpu_mes *mes)
>   		mes_set_hw_res_pkt.compute_hqd_mask[i] =
>   			mes->compute_hqd_mask[i];
>   
> -	for (i = 0; i < MAX_GFX_PIPES; i++)
> -		mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
> +	/*
> +	 * GFX pipe 0 queue 0 is being used by kernel
> +	 * Set GFX pipe 0 queue 1 for MES scheduling
> +	 * GFX pipe 1 can't be used for MES due to HW limitation.
> +	 */
> +	mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
> +	mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
>   
>   	for (i = 0; i < MAX_SDMA_PIPES; i++)
>   		mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index 63f281a9984d..feb7fa2c304c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
>   		mes_set_hw_res_pkt.compute_hqd_mask[i] =
>   			mes->compute_hqd_mask[i];
>   
> -	for (i = 0; i < MAX_GFX_PIPES; i++)
> -		mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
> +	/*
> +	 * GFX pipe 0 queue 0 is being used by kernel
> +	 * Set GFX pipe 0 queue 1 for MES scheduling
> +	 * GFX pipe 1 can't be used for MES due to HW limitation.
> +	 */
> +	mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
> +	mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
>   
>   	for (i = 0; i < MAX_SDMA_PIPES; i++)
>   		mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue
  2024-04-26 13:48 ` [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
@ 2024-05-02 15:22   ` Christian König
  2024-05-02 16:19     ` Sharma, Shashank
  2024-05-02 18:18     ` Sharma, Shashank
  0 siblings, 2 replies; 51+ messages in thread
From: Christian König @ 2024-05-02 15:22 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Arvind Yadav, Alex Deucher, Christian Koenig



Am 26.04.24 um 15:48 schrieb Shashank Sharma:
> This patch:
> - adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
> - moves the usequeue initialization code for all IPs under
>    this flag
>
> so that the userqueue works only when the config is enabled.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/Kconfig     | 8 ++++++++
>   drivers/gpu/drm/amd/amdgpu/Makefile    | 8 ++++++--
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++++
>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
>   4 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 22d88f8ef527..bba963527d22 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
>   	  Add -Werror to the build flags for amdgpu.ko.
>   	  Only enable this if you are warning code for amdgpu.ko.
>   
> +config DRM_AMDGPU_USERQ_GFX
> +	bool "Enable Navi 3x gfx usermode queues"
> +	depends on DRM_AMDGPU
> +	default n
> +	help
> +	  Choose this option to enable usermode queue support for GFX
> +          workload submission. This feature is supported on Navi 3X only.

When this is for Navi 3x only I would name that DRM_AMDGPU_NAVI3X_USERQ 
instead.

And since we enable/disable GFX, Compute and SDMA I would drop "gfx" 
from the comment and description.

Apart from that the approach looks good to me.

Christian.

> +
>   source "drivers/gpu/drm/amd/acp/Kconfig"
>   source "drivers/gpu/drm/amd/display/Kconfig"
>   source "drivers/gpu/drm/amd/amdkfd/Kconfig"
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index a640bfa468ad..0b17fc1740a0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -184,8 +184,12 @@ amdgpu-y += \
>   amdgpu-y += \
>   	amdgpu_mes.o \
>   	mes_v10_1.o \
> -	mes_v11_0.o \
> -	mes_v11_0_userqueue.o
> +	mes_v11_0.o
> +
> +# add GFX userqueue support
> +ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
> +amdgpu-y += mes_v11_0_userqueue.o
> +endif
>   
>   # add UVD block
>   amdgpu-y += \
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 27b86f7fe949..8591aed9f9ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
>   		adev->gfx.mec.num_mec = 2;
>   		adev->gfx.mec.num_pipe_per_mec = 4;
>   		adev->gfx.mec.num_queue_per_pipe = 4;
> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>   		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>   		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
> +#endif
>   		break;
>   	case IP_VERSION(11, 0, 1):
>   	case IP_VERSION(11, 0, 4):
> @@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
>   		adev->gfx.mec.num_mec = 1;
>   		adev->gfx.mec.num_pipe_per_mec = 4;
>   		adev->gfx.mec.num_queue_per_pipe = 4;
> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>   		adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>   		adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = &userq_mes_v11_0_funcs;
> +#endif
>   		break;
>   	default:
>   		adev->gfx.me.num_me = 1;
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> index 90354a70c807..084059c95db6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
> @@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
>   		return -EINVAL;
>   	}
>   
> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>   	adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
> +#endif
> +
>   	return r;
>   }
>   


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX
  2024-05-02 15:14   ` Christian König
@ 2024-05-02 15:35     ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 15:35 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Arvind Yadav, Alex Deucher, Christian Koenig


On 02/05/2024 17:14, Christian König wrote:
>
>
> Am 26.04.24 um 15:48 schrieb Shashank Sharma:
>> A Memory queue descriptor (MQD) of a userqueue defines it in
>> the hw's context. As MQD format can vary between different
>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>
>> This patch:
>> - Adds a new file which will be used for MES based userqueue
>>    functions targeting GFX and SDMA IP.
>> - Introduces MQD handler functions for the usermode queues.
>> - Adds new functions to create and destroy userqueue MQD for
>>    MES-V11 for GFX IP.
>>
>> V1: Worked on review comments from Alex:
>>      - Make MQD functions GEN and IP specific
>>
>> V2: Worked on review comments from Alex:
>>      - Reuse the existing adev->mqd[ip] for MQD creation
>>      - Formatting and arrangement of code
>>
>> V3:
>>      - Integration with doorbell manager
>>
>> V4: Review comments addressed:
>>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>>      - Align name of structure members (Luben)
>>      - Don't break up the Cc tag list and the Sob tag list in commit
>>        message (Luben)
>> V5:
>>     - No need to reserve the bo for MQD (Christian).
>>     - Some more changes to support IP specific MQD creation.
>>
>> V6:
>>     - Add a comment reminding us to replace the 
>> amdgpu_bo_create_kernel()
>>       calls while creating MQD object to amdgpu_bo_create() once 
>> eviction
>>       fences are ready (Christian).
>>
>> V7:
>>     - Re-arrange userqueue functions in adev instead of uq_mgr (Alex)
>>     - Use memdup_user instead of copy_from_user (Christian)
>>
>> V9:
>>     - Moved userqueue code from gfx_v11_0.c to new file mes_v11_0.c so
>>       that it can be reused for SDMA userqueues as well (Shashank, Alex)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +-
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |   4 +
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 110 ++++++++++++++++++
>>   3 files changed, 116 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 05a2d1714070..a640bfa468ad 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -184,7 +184,8 @@ amdgpu-y += \
>>   amdgpu-y += \
>>       amdgpu_mes.o \
>>       mes_v10_1.o \
>> -    mes_v11_0.o
>> +    mes_v11_0.o \
>> +    mes_v11_0_userqueue.o
>
> Do we really need a new C file for this or could we put the two 
> functions into mes_v11_0.c as well?
>
> Apart from that it looks correct to me, but I'm really not that deep 
> inside the code at the moment.
>
Actually, this patch adds these two functions, and then the upcoming 
patches add other multiple functions to create/destroy FW objects, 
map/unmap_queue, handle doorbell and map wptr BO on top of these. So 
when we look at it in the end, its probably fine :).

- Shashank

> Regards,
> Christian.
>
>>     # add UVD block
>>   amdgpu-y += \
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index f7325b02a191..525bd0f4d3f7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -1331,6 +1331,8 @@ static int 
>> gfx_v11_0_rlc_backdoor_autoload_enable(struct amdgpu_device *adev)
>>       return 0;
>>   }
>>   +extern const struct amdgpu_userq_funcs userq_mes_v11_0_funcs;
>> +
>>   static int gfx_v11_0_sw_init(void *handle)
>>   {
>>       int i, j, k, r, ring_id = 0;
>> @@ -1347,6 +1349,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>           adev->gfx.mec.num_mec = 2;
>>           adev->gfx.mec.num_pipe_per_mec = 4;
>>           adev->gfx.mec.num_queue_per_pipe = 4;
>> +        adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>           break;
>>       case IP_VERSION(11, 0, 1):
>>       case IP_VERSION(11, 0, 4):
>> @@ -1358,6 +1361,7 @@ static int gfx_v11_0_sw_init(void *handle)
>>           adev->gfx.mec.num_mec = 1;
>>           adev->gfx.mec.num_pipe_per_mec = 4;
>>           adev->gfx.mec.num_queue_per_pipe = 4;
>> +        adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>           break;
>>       default:
>>           adev->gfx.me.num_me = 1;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> new file mode 100644
>> index 000000000000..9e7dee77d344
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -0,0 +1,110 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright 2024 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +#include "amdgpu.h"
>> +#include "amdgpu_gfx.h"
>> +#include "v11_structs.h"
>> +#include "mes_v11_0.h"
>> +#include "amdgpu_userqueue.h"
>> +
>> +static int mes_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>> +                      struct drm_amdgpu_userq_in *args_in,
>> +                      struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_mqd *mqd_hw_default = &adev->mqds[queue->queue_type];
>> +    struct drm_amdgpu_userq_mqd *mqd_user;
>> +    struct amdgpu_mqd_prop *userq_props;
>> +    int r;
>> +
>> +    /* Incoming MQD parameters from userspace to be saved here */
>> +    memset(&mqd_user, 0, sizeof(mqd_user));
>> +
>> +    /* Structure to initialize MQD for userqueue using generic MQD 
>> init function */
>> +    userq_props = kzalloc(sizeof(struct amdgpu_mqd_prop), GFP_KERNEL);
>> +    if (!userq_props) {
>> +        DRM_ERROR("Failed to allocate memory for userq_props\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd)) {
>> +        DRM_ERROR("MQD size mismatch\n");
>> +        r = -EINVAL;
>> +        goto free_props;
>> +    }
>> +
>> +    mqd_user = memdup_user(u64_to_user_ptr(args_in->mqd), 
>> args_in->mqd_size);
>> +    if (IS_ERR(mqd_user)) {
>> +        DRM_ERROR("Failed to read user MQD\n");
>> +        r = -EFAULT;
>> +        goto free_props;
>> +    }
>> +
>> +    r = amdgpu_userqueue_create_object(uq_mgr, &queue->mqd, 
>> mqd_hw_default->mqd_size);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create MQD object for userqueue\n");
>> +        goto free_mqd_user;
>> +    }
>> +
>> +    /* Initialize the MQD BO with user given values */
>> +    userq_props->wptr_gpu_addr = mqd_user->wptr_va;
>> +    userq_props->rptr_gpu_addr = mqd_user->rptr_va;
>> +    userq_props->queue_size = mqd_user->queue_size;
>> +    userq_props->hqd_base_gpu_addr = mqd_user->queue_va;
>> +    userq_props->mqd_gpu_addr = queue->mqd.gpu_addr;
>> +    userq_props->use_doorbell = true;
>> +
>> +    queue->userq_prop = userq_props;
>> +
>> +    r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
>> userq_props);
>> +    if (r) {
>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
>> +        goto free_mqd;
>> +    }
>> +
>> +    return 0;
>> +
>> +free_mqd:
>> +    amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> +
>> +free_mqd_user:
>> +    kfree(mqd_user);
>> +
>> +free_props:
>> +    kfree(userq_props);
>> +
>> +    return r;
>> +}
>> +
>> +static void
>> +mes_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr,
>> +                struct amdgpu_usermode_queue *queue)
>> +{
>> +    kfree(queue->userq_prop);
>> +    amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> +}
>> +
>> +const struct amdgpu_userq_funcs userq_mes_v11_0_funcs = {
>> +    .mqd_create = mes_v11_0_userq_mqd_create,
>> +    .mqd_destroy = mes_v11_0_userq_mqd_destroy,
>> +};
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART
  2024-05-02 15:18   ` Christian König
@ 2024-05-02 15:36     ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 15:36 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Arvind Yadav, Alex Deucher, Christian Koenig


On 02/05/2024 17:18, Christian König wrote:
> Am 26.04.24 um 15:48 schrieb Shashank Sharma:
>> To support oversubscription, MES FW expects WPTR BOs to
>> be mapped into GART, before they are submitted to usermode
>> queues. This patch adds a function for the same.
>>
>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>>
>> V5: Addressed review comments from Christian:
>>      - Either pin object or allocate from GART, but not both.
>>      - All the handling must be done with the VM locks held.
>>
>> V7: Addressed review comments from Christian:
>>      - Do not take vm->eviction_lock
>>      - Use amdgpu_bo_gpu_offset to get the wptr_bo GPU offset
>>
>> V8: Rebase
>> V9: Changed the function names from gfx_v11* to mes_v11*
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>
> The patch itself looks good, but this really need the eviction fence 
> to work properly.
>
> Otherwise it can be that the BO mapped into the GART is evicted at 
> some point.


Noted, eviction fences will be following up soon.

- Shashank

>
> Christian.
>
>> ---
>>   .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 77 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   2 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> index 8d2cd61af26b..37b80626e792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> @@ -30,6 +30,74 @@
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>   +static int
>> +mes_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
>> amdgpu_bo *bo)
>> +{
>> +    int ret;
>> +
>> +    ret = amdgpu_bo_reserve(bo, true);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>> +        goto err_reserve_bo_failed;
>> +    }
>> +
>> +    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>> +        goto err_map_bo_gart_failed;
>> +    }
>> +
>> +    amdgpu_bo_unreserve(bo);
>> +    bo = amdgpu_bo_ref(bo);
>> +
>> +    return 0;
>> +
>> +err_map_bo_gart_failed:
>> +    amdgpu_bo_unreserve(bo);
>> +err_reserve_bo_failed:
>> +    return ret;
>> +}
>> +
>> +static int
>> +mes_v11_0_create_wptr_mapping(struct amdgpu_userq_mgr *uq_mgr,
>> +                  struct amdgpu_usermode_queue *queue,
>> +                  uint64_t wptr)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_bo_va_mapping *wptr_mapping;
>> +    struct amdgpu_vm *wptr_vm;
>> +    struct amdgpu_userq_obj *wptr_obj = &queue->wptr_obj;
>> +    int ret;
>> +
>> +    wptr_vm = queue->vm;
>> +    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>> +    if (ret)
>> +        return ret;
>> +
>> +    wptr &= AMDGPU_GMC_HOLE_MASK;
>> +    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
>> PAGE_SHIFT);
>> +    amdgpu_bo_unreserve(wptr_vm->root.bo);
>> +    if (!wptr_mapping) {
>> +        DRM_ERROR("Failed to lookup wptr bo\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    wptr_obj->obj = wptr_mapping->bo_va->base.bo;
>> +    if (wptr_obj->obj->tbo.base.size > PAGE_SIZE) {
>> +        DRM_ERROR("Requested GART mapping for wptr bo larger than 
>> one page\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    ret = mes_v11_0_map_gtt_bo_to_gart(adev, wptr_obj->obj);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to map wptr bo to GART\n");
>> +        return ret;
>> +    }
>> +
>> +    queue->wptr_obj.gpu_addr = 
>> amdgpu_bo_gpu_offset_no_check(wptr_obj->obj);
>> +    return 0;
>> +}
>> +
>>   static int mes_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>>                      struct amdgpu_usermode_queue *queue,
>>                      struct amdgpu_mqd_prop *userq_props)
>> @@ -61,6 +129,7 @@ static int mes_v11_0_userq_map(struct 
>> amdgpu_userq_mgr *uq_mgr,
>>       queue_input.queue_size = userq_props->queue_size >> 2;
>>       queue_input.doorbell_offset = userq_props->doorbell_index;
>>       queue_input.page_table_base_addr = 
>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +    queue_input.wptr_mc_addr = queue->wptr_obj.gpu_addr;
>>         amdgpu_mes_lock(&adev->mes);
>>       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>> @@ -187,6 +256,13 @@ static int mes_v11_0_userq_mqd_create(struct 
>> amdgpu_userq_mgr *uq_mgr,
>>           goto free_mqd;
>>       }
>>   +    /* FW expects WPTR BOs to be mapped into GART */
>> +    r = mes_v11_0_create_wptr_mapping(uq_mgr, queue, 
>> userq_props->wptr_gpu_addr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create WPTR mapping\n");
>> +        goto free_ctx;
>> +    }
>> +
>>       /* Map userqueue into FW using MES */
>>       r = mes_v11_0_userq_map(uq_mgr, queue, userq_props);
>>       if (r) {
>> @@ -216,6 +292,7 @@ mes_v11_0_userq_mqd_destroy(struct 
>> amdgpu_userq_mgr *uq_mgr,
>>                   struct amdgpu_usermode_queue *queue)
>>   {
>>       mes_v11_0_userq_unmap(uq_mgr, queue);
>> +    amdgpu_bo_unref(&queue->wptr_obj.obj);
>>       amdgpu_userqueue_destroy_object(uq_mgr, &queue->fw_obj);
>>       kfree(queue->userq_prop);
>>       amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 643f31474bd8..ffe8a3d73756 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -45,6 +45,7 @@ struct amdgpu_usermode_queue {
>>       struct amdgpu_vm    *vm;
>>       struct amdgpu_userq_obj mqd;
>>       struct amdgpu_userq_obj fw_obj;
>> +    struct amdgpu_userq_obj wptr_obj;
>>   };
>>     struct amdgpu_userq_funcs {
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask
  2024-05-02 15:19   ` Christian König
@ 2024-05-02 15:42     ` Sharma, Shashank
  0 siblings, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 15:42 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Arvind Yadav, Christian König, Alex Deucher


On 02/05/2024 17:19, Christian König wrote:
> Am 26.04.24 um 15:48 schrieb Shashank Sharma:
>> Current MES GFX mask prevents FW to enable oversubscription. This patch
>> does the following:
>> - Fixes the mask values and adds a description for the same.
>> - Removes the central mask setup and makes it IP specific, as it would
>>    be different when the number of pipes and queues are different.
>>
>> V9: introduce this patch in the series
>
> As far as I can see this is a bug fix for existing code and should be 
> pushed completely independent of the other work to amd-staging-drm-next.
>
Agreed, I added it here for completion of series. I had pushed this as 
single patch as well last week, I will push it accordingly.

- Shashank

> Regards,
> Christian.
>
>>
>> Cc: Christian König <Christian.Koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 -
>>   drivers/gpu/drm/amd/amdgpu/mes_v10_1.c  | 9 +++++++--
>>   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  | 9 +++++++--
>>   4 files changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> index a00cf4756ad0..b405fafc0b71 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
>> @@ -151,9 +151,6 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
>>           adev->mes.compute_hqd_mask[i] = 0xc;
>>       }
>>   -    for (i = 0; i < AMDGPU_MES_MAX_GFX_PIPES; i++)
>> -        adev->mes.gfx_hqd_mask[i] = i ? 0 : 0xfffffffe;
>> -
>>       for (i = 0; i < AMDGPU_MES_MAX_SDMA_PIPES; i++) {
>>           if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) <
>>               IP_VERSION(6, 0, 0))
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>> index 4c8fc3117ef8..598556619337 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
>> @@ -110,7 +110,6 @@ struct amdgpu_mes {
>>       uint32_t                        vmid_mask_gfxhub;
>>       uint32_t                        vmid_mask_mmhub;
>>       uint32_t compute_hqd_mask[AMDGPU_MES_MAX_COMPUTE_PIPES];
>> -    uint32_t gfx_hqd_mask[AMDGPU_MES_MAX_GFX_PIPES];
>>       uint32_t sdma_hqd_mask[AMDGPU_MES_MAX_SDMA_PIPES];
>>       uint32_t aggregated_doorbells[AMDGPU_MES_PRIORITY_NUM_LEVELS];
>>       uint32_t                        sch_ctx_offs;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c 
>> b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
>> index 1e5ad1e08d2a..4d1121d1a1e7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
>> @@ -290,8 +290,13 @@ static int mes_v10_1_set_hw_resources(struct 
>> amdgpu_mes *mes)
>>           mes_set_hw_res_pkt.compute_hqd_mask[i] =
>>               mes->compute_hqd_mask[i];
>>   -    for (i = 0; i < MAX_GFX_PIPES; i++)
>> -        mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
>> +    /*
>> +     * GFX pipe 0 queue 0 is being used by kernel
>> +     * Set GFX pipe 0 queue 1 for MES scheduling
>> +     * GFX pipe 1 can't be used for MES due to HW limitation.
>> +     */
>> +    mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
>> +    mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
>>         for (i = 0; i < MAX_SDMA_PIPES; i++)
>>           mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>> index 63f281a9984d..feb7fa2c304c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
>> @@ -387,8 +387,13 @@ static int mes_v11_0_set_hw_resources(struct 
>> amdgpu_mes *mes)
>>           mes_set_hw_res_pkt.compute_hqd_mask[i] =
>>               mes->compute_hqd_mask[i];
>>   -    for (i = 0; i < MAX_GFX_PIPES; i++)
>> -        mes_set_hw_res_pkt.gfx_hqd_mask[i] = mes->gfx_hqd_mask[i];
>> +    /*
>> +     * GFX pipe 0 queue 0 is being used by kernel
>> +     * Set GFX pipe 0 queue 1 for MES scheduling
>> +     * GFX pipe 1 can't be used for MES due to HW limitation.
>> +     */
>> +    mes_set_hw_res_pkt.gfx_hqd_mask[0] = 0x2;
>> +    mes_set_hw_res_pkt.gfx_hqd_mask[1] = 0;
>>         for (i = 0; i < MAX_SDMA_PIPES; i++)
>>           mes_set_hw_res_pkt.sdma_hqd_mask[i] = mes->sdma_hqd_mask[i];
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 00/14] AMDGPU usermode queues
  2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
                   ` (13 preceding siblings ...)
  2024-04-26 13:48 ` [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
@ 2024-05-02 15:51 ` Alex Deucher
  14 siblings, 0 replies; 51+ messages in thread
From: Alex Deucher @ 2024-05-02 15:51 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: amd-gfx, Arvind Yadav

On Fri, Apr 26, 2024 at 10:17 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> This patch series introduces AMDGPU usermode queues for gfx workloads.
> Usermode queues is a method of GPU workload submission into the graphics
> hardware without any interaction with kernel/DRM schedulers. In this
> method, a userspace graphics application can create its own workqueue and
> submit it directly in the GPU HW.
>
> The general idea of how this is supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
>   - Shadow bufffer pages.
>   - GDS buffer pages (as required).
> - The application picks a 32-bit offset in the doorbell page for this
>   queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the GPU addresses of these objects (read ptr,
>   write ptr, queue base address, shadow, gds) with doorbell object and
>   32-bit doorbell offset in the doorbell page.
> - The kernel creates the queue and maps it in the HW.
> - The application maps the GPU buffers in process address space.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - After filling the workload data in the queue, the app must write the
>   number of dwords added in the queue into the doorbell offset and the
>   WPTR buffer, and the GPU will start fetching the data.
> - This series adds usermode queue support for all three MES based IPs
>   (GFX, SDMA and Compute).

I think we also need a new INFO IOCTL query to get the doorbell
offsets for each engine type within each doorbell page.

Alex

>
> libDRM changes for this series and a sample DRM test program can be found
> in the MESA merge request here:
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>
> Alex Deucher (1):
>   drm/amdgpu: UAPI for user queue management
>
> Arvind Yadav (1):
>   drm/amdgpu: enable compute/gfx usermode queue
>
> Shashank Sharma (12):
>   drm/amdgpu: add usermode queue base code
>   drm/amdgpu: add new IOCTL for usermode queue
>   drm/amdgpu: add helpers to create userqueue object
>   drm/amdgpu: create MES-V11 usermode queue for GFX
>   drm/amdgpu: create context space for usermode queue
>   drm/amdgpu: map usermode queue into MES
>   drm/amdgpu: map wptr BO into GART
>   drm/amdgpu: generate doorbell index for userqueue
>   drm/amdgpu: cleanup leftover queues
>   drm/amdgpu: fix MES GFX mask
>   drm/amdgpu: enable SDMA usermode queues
>   drm/amdgpu: add kernel config for gfx-userqueue
>
>  drivers/gpu/drm/amd/amdgpu/Kconfig            |   8 +
>  drivers/gpu/drm/amd/amdgpu/Makefile           |   7 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |   3 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 296 ++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  10 +
>  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c        |   9 +-
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |   9 +-
>  .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c  | 317 ++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |   6 +
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  79 +++++
>  include/uapi/drm/amdgpu_drm.h                 | 111 ++++++
>  15 files changed, 859 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue
  2024-05-02 15:22   ` Christian König
@ 2024-05-02 16:19     ` Sharma, Shashank
  2024-05-02 18:18     ` Sharma, Shashank
  1 sibling, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 16:19 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Arvind Yadav, Alex Deucher, Christian Koenig


On 02/05/2024 17:22, Christian König wrote:
>
>
> Am 26.04.24 um 15:48 schrieb Shashank Sharma:
>> This patch:
>> - adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
>> - moves the usequeue initialization code for all IPs under
>>    this flag
>>
>> so that the userqueue works only when the config is enabled.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Kconfig     | 8 ++++++++
>>   drivers/gpu/drm/amd/amdgpu/Makefile    | 8 ++++++--
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++++
>>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
>>   4 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
>> b/drivers/gpu/drm/amd/amdgpu/Kconfig
>> index 22d88f8ef527..bba963527d22 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
>> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
>> @@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
>>         Add -Werror to the build flags for amdgpu.ko.
>>         Only enable this if you are warning code for amdgpu.ko.
>>   +config DRM_AMDGPU_USERQ_GFX
>> +    bool "Enable Navi 3x gfx usermode queues"
>> +    depends on DRM_AMDGPU
>> +    default n
>> +    help
>> +      Choose this option to enable usermode queue support for GFX
>> +          workload submission. This feature is supported on Navi 3X 
>> only.
>
> When this is for Navi 3x only I would name that 
> DRM_AMDGPU_NAVI3X_USERQ instead.
>
Noted,
> And since we enable/disable GFX, Compute and SDMA I would drop "gfx" 
> from the comment and description.
>
Noted, I just did not want users to get confused with KFD queues, hence 
added GFX.

I will update the patch with both the changes.

- Shashank

> Apart from that the approach looks good to me.
>
> Christian.
>
>> +
>>   source "drivers/gpu/drm/amd/acp/Kconfig"
>>   source "drivers/gpu/drm/amd/display/Kconfig"
>>   source "drivers/gpu/drm/amd/amdkfd/Kconfig"
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index a640bfa468ad..0b17fc1740a0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -184,8 +184,12 @@ amdgpu-y += \
>>   amdgpu-y += \
>>       amdgpu_mes.o \
>>       mes_v10_1.o \
>> -    mes_v11_0.o \
>> -    mes_v11_0_userqueue.o
>> +    mes_v11_0.o
>> +
>> +# add GFX userqueue support
>> +ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
>> +amdgpu-y += mes_v11_0_userqueue.o
>> +endif
>>     # add UVD block
>>   amdgpu-y += \
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index 27b86f7fe949..8591aed9f9ab 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
>>           adev->gfx.mec.num_mec = 2;
>>           adev->gfx.mec.num_pipe_per_mec = 4;
>>           adev->gfx.mec.num_queue_per_pipe = 4;
>> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>>           adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>           adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
>> &userq_mes_v11_0_funcs;
>> +#endif
>>           break;
>>       case IP_VERSION(11, 0, 1):
>>       case IP_VERSION(11, 0, 4):
>> @@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
>>           adev->gfx.mec.num_mec = 1;
>>           adev->gfx.mec.num_pipe_per_mec = 4;
>>           adev->gfx.mec.num_queue_per_pipe = 4;
>> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>>           adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>           adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
>> &userq_mes_v11_0_funcs;
>> +#endif
>>           break;
>>       default:
>>           adev->gfx.me.num_me = 1;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> index 90354a70c807..084059c95db6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> @@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
>>           return -EINVAL;
>>       }
>>   +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>>       adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
>> +#endif
>> +
>>       return r;
>>   }
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue
  2024-05-02 15:22   ` Christian König
  2024-05-02 16:19     ` Sharma, Shashank
@ 2024-05-02 18:18     ` Sharma, Shashank
  1 sibling, 0 replies; 51+ messages in thread
From: Sharma, Shashank @ 2024-05-02 18:18 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Arvind Yadav, Alex Deucher, Christian Koenig


On 02/05/2024 17:22, Christian König wrote:
>
>
> Am 26.04.24 um 15:48 schrieb Shashank Sharma:
>> This patch:
>> - adds a kernel config option "CONFIG_DRM_AMD_USERQ_GFX"
>> - moves the usequeue initialization code for all IPs under
>>    this flag
>>
>> so that the userqueue works only when the config is enabled.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Kconfig     | 8 ++++++++
>>   drivers/gpu/drm/amd/amdgpu/Makefile    | 8 ++++++--
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++++
>>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +++
>>   4 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
>> b/drivers/gpu/drm/amd/amdgpu/Kconfig
>> index 22d88f8ef527..bba963527d22 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
>> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
>> @@ -80,6 +80,14 @@ config DRM_AMDGPU_WERROR
>>         Add -Werror to the build flags for amdgpu.ko.
>>         Only enable this if you are warning code for amdgpu.ko.
>>   +config DRM_AMDGPU_USERQ_GFX
>> +    bool "Enable Navi 3x gfx usermode queues"
>> +    depends on DRM_AMDGPU
>> +    default n
>> +    help
>> +      Choose this option to enable usermode queue support for GFX
>> +          workload submission. This feature is supported on Navi 3X 
>> only.
>
> When this is for Navi 3x only I would name that 
> DRM_AMDGPU_NAVI3X_USERQ instead.
>
> And since we enable/disable GFX, Compute and SDMA I would drop "gfx" 
> from the comment and description.
>
> Apart from that the approach looks good to me.
>
Agree, both the review comments addressed in V10.

- Shashank

> Christian.
>
>> +
>>   source "drivers/gpu/drm/amd/acp/Kconfig"
>>   source "drivers/gpu/drm/amd/display/Kconfig"
>>   source "drivers/gpu/drm/amd/amdkfd/Kconfig"
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index a640bfa468ad..0b17fc1740a0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -184,8 +184,12 @@ amdgpu-y += \
>>   amdgpu-y += \
>>       amdgpu_mes.o \
>>       mes_v10_1.o \
>> -    mes_v11_0.o \
>> -    mes_v11_0_userqueue.o
>> +    mes_v11_0.o
>> +
>> +# add GFX userqueue support
>> +ifneq ($(CONFIG_DRM_AMD_USERQ_GFX),)
>> +amdgpu-y += mes_v11_0_userqueue.o
>> +endif
>>     # add UVD block
>>   amdgpu-y += \
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index 27b86f7fe949..8591aed9f9ab 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -1349,8 +1349,10 @@ static int gfx_v11_0_sw_init(void *handle)
>>           adev->gfx.mec.num_mec = 2;
>>           adev->gfx.mec.num_pipe_per_mec = 4;
>>           adev->gfx.mec.num_queue_per_pipe = 4;
>> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>>           adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>           adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
>> &userq_mes_v11_0_funcs;
>> +#endif
>>           break;
>>       case IP_VERSION(11, 0, 1):
>>       case IP_VERSION(11, 0, 4):
>> @@ -1362,8 +1364,10 @@ static int gfx_v11_0_sw_init(void *handle)
>>           adev->gfx.mec.num_mec = 1;
>>           adev->gfx.mec.num_pipe_per_mec = 4;
>>           adev->gfx.mec.num_queue_per_pipe = 4;
>> +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>>           adev->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_mes_v11_0_funcs;
>>           adev->userq_funcs[AMDGPU_HW_IP_COMPUTE] = 
>> &userq_mes_v11_0_funcs;
>> +#endif
>>           break;
>>       default:
>>           adev->gfx.me.num_me = 1;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> index 90354a70c807..084059c95db6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
>> @@ -1267,7 +1267,10 @@ static int sdma_v6_0_sw_init(void *handle)
>>           return -EINVAL;
>>       }
>>   +#ifdef CONFIG_DRM_AMD_USERQ_GFX
>>       adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_v11_0_funcs;
>> +#endif
>> +
>>       return r;
>>   }
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2024-05-02 18:18 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26 13:47 [PATCH v9 00/14] AMDGPU usermode queues Shashank Sharma
2024-04-26 13:47 ` [PATCH v9 01/14] drm/amdgpu: UAPI for user queue management Shashank Sharma
2024-05-01 20:39   ` Alex Deucher
2024-05-02  5:23     ` Sharma, Shashank
2024-05-02 12:53       ` Sharma, Shashank
2024-05-02 13:52         ` Alex Deucher
2024-04-26 13:47 ` [PATCH v9 02/14] drm/amdgpu: add usermode queue base code Shashank Sharma
2024-05-01 21:29   ` Alex Deucher
2024-04-26 13:47 ` [PATCH v9 03/14] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
2024-04-30 10:55   ` Christian König
2024-04-26 13:48 ` [PATCH v9 04/14] drm/amdgpu: add helpers to create userqueue object Shashank Sharma
2024-05-01 21:30   ` Alex Deucher
2024-04-26 13:48 ` [PATCH v9 05/14] drm/amdgpu: create MES-V11 usermode queue for GFX Shashank Sharma
2024-05-01 20:50   ` Alex Deucher
2024-05-02  5:26     ` Sharma, Shashank
2024-05-02 15:14   ` Christian König
2024-05-02 15:35     ` Sharma, Shashank
2024-04-26 13:48 ` [PATCH v9 06/14] drm/amdgpu: create context space for usermode queue Shashank Sharma
2024-05-01 21:11   ` Alex Deucher
2024-05-02  5:27     ` Sharma, Shashank
2024-05-02 15:15   ` Christian König
2024-04-26 13:48 ` [PATCH v9 07/14] drm/amdgpu: map usermode queue into MES Shashank Sharma
2024-04-26 13:48 ` [PATCH v9 08/14] drm/amdgpu: map wptr BO into GART Shashank Sharma
2024-05-01 21:36   ` Alex Deucher
2024-05-02  5:31     ` Sharma, Shashank
2024-05-02 13:06       ` Kasiviswanathan, Harish
2024-05-02 13:23         ` Sharma, Shashank
2024-05-02 13:46       ` Alex Deucher
2024-05-02 15:18   ` Christian König
2024-05-02 15:36     ` Sharma, Shashank
2024-04-26 13:48 ` [PATCH v9 09/14] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
2024-04-26 13:48 ` [PATCH v9 10/14] drm/amdgpu: cleanup leftover queues Shashank Sharma
2024-04-26 13:48 ` [PATCH v9 11/14] drm/amdgpu: fix MES GFX mask Shashank Sharma
2024-05-01 21:27   ` Alex Deucher
2024-05-02 15:19   ` Christian König
2024-05-02 15:42     ` Sharma, Shashank
2024-04-26 13:48 ` [PATCH v9 12/14] drm/amdgpu: enable SDMA usermode queues Shashank Sharma
2024-05-01 20:41   ` Alex Deucher
2024-05-02  5:47     ` Sharma, Shashank
2024-05-02 13:55       ` Alex Deucher
2024-05-02 14:01         ` Sharma, Shashank
2024-04-26 13:48 ` [PATCH v9 13/14] drm/amdgpu: enable compute/gfx usermode queue Shashank Sharma
2024-05-01 20:44   ` Alex Deucher
2024-05-02  5:50     ` Sharma, Shashank
2024-05-02 14:10       ` Alex Deucher
2024-05-02 14:17         ` Sharma, Shashank
2024-04-26 13:48 ` [PATCH v9 14/14] drm/amdgpu: add kernel config for gfx-userqueue Shashank Sharma
2024-05-02 15:22   ` Christian König
2024-05-02 16:19     ` Sharma, Shashank
2024-05-02 18:18     ` Sharma, Shashank
2024-05-02 15:51 ` [PATCH v9 00/14] AMDGPU usermode queues Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).