All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/9] AMDGPU usermode queues
@ 2023-09-08 16:04 Shashank Sharma
  2023-09-08 16:04 ` [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: arvind.yadav, Shashank Sharma

This patch series introduces AMDGPU usermode queues for gfx workloads.
Usermode queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue
and submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
  - Shadow bufffer pages.
  - GDS buffer pages (if required).
- The application picks a 32-bit offset in the doorbell page for this
  queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the GPU addresses of these objects (read ptr,
  write ptr, queue base address, shadow, gds) with doorbell object and
  32-bit doorbell offset in the doorbell page.
- The kernel creates the queue and maps it in the HW.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset, and the
  GPU will start fetching the data.

libDRM changes for this series and a sample DRM test program can be found
in the MESA merge request here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

This patch series depends on the doorbell-manager changes, which were published
here (targeted merged in V6.6):
https://patchwork.freedesktop.org/series/115802

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Shashank Sharma (8):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: create GFX-gen11 usermode queue
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: map wptr BO into GART
  drm/amdgpu: generate doorbell index for userqueue
  drm/amdgpu: cleanup leftover queues

 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 227 ++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 292 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  74 +++++
 include/uapi/drm/amdgpu_drm.h                 | 110 +++++++
 8 files changed, 715 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-10-04 21:23   ` Felix Kuehling
  2023-09-08 16:04 ` [PATCH v6 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

From: Alex Deucher <alexander.deucher@amd.com>

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
    - Make the doorbell offset's comment clearer
    - Change the output parameter name to queue_id

V3: Integration with doorbell manager

V4:
    - Updated the UAPI doc (Pierre-Eric)
    - Created a Union for engine specific MQDs (Alex)
    - Added Christian's R-B
V5:
    - Add variables for GDS and CSA in MQD structure (Alex)
    - Make MQD data a ptr-size pair instead of union (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 110 ++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 79b14828d542..627b4a38c855 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM			0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
 #define DRM_AMDGPU_SCHED		0x15
+#define DRM_AMDGPU_USERQ		0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -304,6 +306,114 @@ union drm_amdgpu_ctx {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE	1
+#define AMDGPU_USERQ_OP_FREE	2
+
+/* Flag to indicate secure buffer related workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
+/* Flag to indicate AQL workload, unused for now */
+#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
+
+/*
+ * MQD (memory queue descriptor) is a set of parameters which allow
+ * the GPU to uniquely define and identify a usermode queue. This
+ * structure defines the MQD for GFX-V11 IP ver 0.
+ */
+struct drm_amdgpu_userq_mqd_gfx_v11_0 {
+	/**
+	 * @queue_va: Virtual address of the GPU memory which holds the queue
+	 * object. The queue holds the workload packets.
+	 */
+	__u64   queue_va;
+	/**
+	 * @queue_size: Size of the queue in bytes, this needs to be 256-byte
+	 * aligned.
+	 */
+	__u64   queue_size;
+	/**
+	 * @rptr_va : Virtual address of the GPU memory which holds the ring RPTR.
+	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
+	 */
+	__u64   rptr_va;
+	/**
+	 * @wptr_va : Virtual address of the GPU memory which holds the ring WPTR.
+	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
+	 *
+	 * Queue, RPTR and WPTR can come from the same object, as long as the size
+	 * and alignment related requirements are met.
+	 */
+	__u64   wptr_va;
+	/**
+	 * @shadow_va: Virtual address of the GPU memory to hold the shadow buffer.
+	 * This must be a from a separate GPU object, and must be at least 4-page
+	 * sized.
+	 */
+	__u64   shadow_va;
+	/**
+	 * @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
+	 * This must be a from a separate GPU object, and must be at least 1-page
+	 * sized.
+	 */
+	__u64   gds_va;
+	/**
+	 * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
+	 * This must be a from a separate GPU object, and must be at least 1-page
+	 * sized.
+	 */
+	__u64   csa_va;
+};
+
+struct drm_amdgpu_userq_in {
+	/** AMDGPU_USERQ_OP_* */
+	__u32	op;
+	/** Queue handle for USERQ_OP_FREE */
+	__u32	queue_id;
+	/** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
+	__u32   ip_type;
+	/**
+	 * @flags: flags to indicate special function for queue like secure
+	 * buffer (TMZ). Unused for now.
+	 */
+	__u32   flags;
+	/**
+	 * @doorbell_handle: the handle of doorbell GEM object
+	 * associated to this client.
+	 */
+	__u32   doorbell_handle;
+	/**
+	 * @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
+	 * Kernel will generate absolute doorbell offset using doorbell_handle
+	 * and doorbell_offset in the doorbell bo.
+	 */
+	__u32   doorbell_offset;
+	/**
+	 * @mqd: Queue descriptor for USERQ_OP_CREATE
+	 * MQD data can be of different size for different GPU IP/engine and
+	 * their respective versions/revisions, so this points to a __u64 *
+	 * which holds MQD of this usermode queue.
+	 */
+	__u64 mqd;
+	/**
+	 * @size: size of MQD data in bytes, it must match the MQD structure
+	 * size of the respective engine/revision defined in UAPI for ex, for
+	 * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
+	 */
+	__u64 mqd_size;
+};
+
+struct drm_amdgpu_userq_out {
+	/** Queue handle */
+	__u32	queue_id;
+	/** Flags */
+	__u32	flags;
+};
+
+union drm_amdgpu_userq {
+	struct drm_amdgpu_userq_in in;
+	struct drm_amdgpu_userq_out out;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 2/9] drm/amdgpu: add usermode queue base code
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
  2023-09-08 16:04 ` [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-08 16:04 ` [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

This patch adds skeleton code for amdgpu usermode queue.
It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

V4:
- Align the structure member names to the largest member's column
  (Luben)
- Added SPDX license (Luben)

V5:
- Do not add amdgpu.h in amdgpu_userqueue.h (Christian).
- Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian).

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 ++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 62 +++++++++++++++++++
 6 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 8d16f280b695..e79fa8e84b99 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -239,6 +239,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6dc950c1b689..385dc1e01121 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -108,6 +108,7 @@
 #include "amdgpu_mca.h"
 #include "amdgpu_ras.h"
 #include "amdgpu_xcp.h"
+#include "amdgpu_userqueue.h"
 
 #define MAX_GPU_INSTANCE		64
 
@@ -471,6 +472,7 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	struct amdgpu_userq_mgr	userq_mgr;
 	/** GPU partition selection */
 	uint32_t		xcp_id;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0593ef8fe0a6..2a89e303c3db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -50,6 +50,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_userqueue.h"
 #include "../amdxcp/amdgpu_xcp_drv.h"
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 12414a713256..12ee6129d27a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -44,6 +44,7 @@
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
 #include "amd_pcie.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1261,6 +1262,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+	if (r)
+		DRM_WARN("Can't setup usermode queues, use legacy workload submission only\n");
+
 	file_priv->driver_priv = fpriv;
 	goto out_suspend;
 
@@ -1328,6 +1333,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
+	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
 	if (pasid)
 		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index 000000000000..effc0c7c02cf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
+{
+	mutex_init(&userq_mgr->userq_mutex);
+	idr_init_base(&userq_mgr->userq_idr, 1);
+	userq_mgr->adev = adev;
+
+	return 0;
+}
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
+{
+	idr_destroy(&userq_mgr->userq_idr);
+	mutex_destroy(&userq_mgr->userq_mutex);
+}
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
new file mode 100644
index 000000000000..79ffa131a514
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDGPU_USERQUEUE_H_
+#define AMDGPU_USERQUEUE_H_
+
+#define AMDGPU_MAX_USERQ_COUNT 512
+
+struct amdgpu_mqd_prop;
+
+struct amdgpu_usermode_queue {
+	int			queue_type;
+	uint64_t		doorbell_handle;
+	uint64_t		doorbell_index;
+	uint64_t		flags;
+	struct amdgpu_mqd_prop	*userq_prop;
+	struct amdgpu_userq_mgr *userq_mgr;
+	struct amdgpu_vm	*vm;
+};
+
+struct amdgpu_userq_funcs {
+	int (*mqd_create)(struct amdgpu_userq_mgr *uq_mgr,
+			  struct drm_amdgpu_userq_in *args,
+			  struct amdgpu_usermode_queue *queue);
+	void (*mqd_destroy)(struct amdgpu_userq_mgr *uq_mgr,
+			    struct amdgpu_usermode_queue *uq);
+};
+
+/* Usermode queues for gfx */
+struct amdgpu_userq_mgr {
+	struct idr			userq_idr;
+	struct mutex			userq_mutex;
+	struct amdgpu_device		*adev;
+	const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
+};
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
+
+#endif
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
  2023-09-08 16:04 ` [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
  2023-09-08 16:04 ` [PATCH v6 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-20 14:47   ` Zhang, Yifan
  2023-09-08 16:04 ` [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 " Shashank Sharma
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
- Remove wrapper functions for queue indexing (Christian)
- Do not save the queue id/idr in queue itself (Christian)
- Move the idr allocation in the IP independent generic space
  (Christian)

V6:
- Check the validity of input IP type (Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 116 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 3 files changed, 119 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2a89e303c3db..214a66b33dfa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2830,6 +2830,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..44769423ba30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,122 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+	return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+	const struct amdgpu_userq_funcs *uq_funcs;
+	struct amdgpu_usermode_queue *queue;
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+	if (!queue) {
+		DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+		mutex_unlock(&uq_mgr->userq_mutex);
+		return -EINVAL;
+	}
+	uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
+	uq_funcs->mqd_destroy(uq_mgr, queue);
+	idr_remove(&uq_mgr->userq_idr, queue_id);
+	kfree(queue);
+
+	mutex_unlock(&uq_mgr->userq_mutex);
+	return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+	const struct amdgpu_userq_funcs *uq_funcs;
+	struct amdgpu_usermode_queue *queue;
+	int qid, r = 0;
+
+	/* Usermode queues are only supported for GFX/SDMA engines as of now */
+	if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
+		DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
+		return -EINVAL;
+	}
+
+	mutex_lock(&uq_mgr->userq_mutex);
+
+	uq_funcs = uq_mgr->userq_funcs[args->in.ip_type];
+	if (!uq_funcs) {
+		DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", args->in.ip_type);
+		r = -EINVAL;
+		goto unlock;
+	}
+
+	queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+	if (!queue) {
+		DRM_ERROR("Failed to allocate memory for queue\n");
+		r = -ENOMEM;
+		goto unlock;
+	}
+	queue->doorbell_handle = args->in.doorbell_handle;
+	queue->doorbell_index = args->in.doorbell_offset;
+	queue->queue_type = args->in.ip_type;
+	queue->flags = args->in.flags;
+	queue->vm = &fpriv->vm;
+
+	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
+	if (r) {
+		DRM_ERROR("Failed to create Queue\n");
+		goto unlock;
+	}
+
+	qid = idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ_COUNT, GFP_KERNEL);
+	if (qid < 0) {
+		DRM_ERROR("Failed to allocate a queue id\n");
+		uq_funcs->mqd_destroy(uq_mgr, queue);
+		r = -ENOMEM;
+		goto unlock;
+	}
+	args->out.queue_id = qid;
+
+unlock:
+	mutex_unlock(&uq_mgr->userq_mutex);
+	return r;
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *filp)
+{
+	union drm_amdgpu_userq *args = data;
+	int r = 0;
+
+	switch (args->in.op) {
+	case AMDGPU_USERQ_OP_CREATE:
+		r = amdgpu_userqueue_create(filp, args);
+		if (r)
+			DRM_ERROR("Failed to create usermode queue\n");
+		break;
+
+	case AMDGPU_USERQ_OP_FREE:
+		r = amdgpu_userqueue_destroy(filp, args->in.queue_id);
+		if (r)
+			DRM_ERROR("Failed to destroy usermode queue\n");
+		break;
+
+	default:
+		DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
+		return -EINVAL;
+	}
+
+	return r;
+}
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 79ffa131a514..55ed6512a565 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -55,6 +55,8 @@ struct amdgpu_userq_mgr {
 	const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
 };
 
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
                   ` (2 preceding siblings ...)
  2023-09-08 16:04 ` [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-14  7:45   ` Christian König
  2023-09-08 16:04 ` [PATCH v6 5/9] drm/amdgpu: create context space for " Shashank Sharma
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

A Memory queue descriptor (MQD) of a userqueue defines it in
the hw's context. As MQD format can vary between different
graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Introduces MQD handler functions for the usermode queues.
- Adds new functions to create and destroy userqueue MQD for
  GFX-GEN-11 IP

V1: Worked on review comments from Alex:
    - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
    - Reuse the existing adev->mqd[ip] for MQD creation
    - Formatting and arrangement of code

V3:
    - Integration with doorbell manager

V4: Review comments addressed:
    - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
    - Align name of structure members (Luben)
    - Don't break up the Cc tag list and the Sob tag list in commit
      message (Luben)
V5:
   - No need to reserve the bo for MQD (Christian).
   - Some more changes to support IP specific MQD creation.

V6:
   - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
     calls while creating MQD object to amdgpu_bo_create() once eviction
     fences are ready (Christian).

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
 3 files changed, 100 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 44769423ba30..03fc8e89eafb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
 	return r;
 }
 
+extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
+
+static void
+amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
+{
+	int maj;
+	struct amdgpu_device *adev = uq_mgr->adev;
+	uint32_t version = adev->ip_versions[GC_HWIP][0];
+
+	/* We support usermode queue only for GFX V11 as of now */
+	maj = IP_VERSION_MAJ(version);
+	if (maj == 11)
+		uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
+}
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
 	mutex_init(&userq_mgr->userq_mutex);
 	idr_init_base(&userq_mgr->userq_idr, 1);
 	userq_mgr->adev = adev;
 
+	amdgpu_userqueue_setup_gfx(userq_mgr);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 0451533ddde4..6760abda08df 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -30,6 +30,7 @@
 #include "amdgpu_psp.h"
 #include "amdgpu_smu.h"
 #include "amdgpu_atomfirmware.h"
+#include "amdgpu_userqueue.h"
 #include "imu_v11_0.h"
 #include "soc21.h"
 #include "nvd.h"
@@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
 	.rev = 0,
 	.funcs = &gfx_v11_0_ip_funcs,
 };
+
+static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
+				      struct drm_amdgpu_userq_in *args_in,
+				      struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_mqd *mqd_gfx_generic = &adev->mqds[AMDGPU_HW_IP_GFX];
+	struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
+	struct amdgpu_mqd_prop userq_props;
+	int r;
+
+	/* Incoming MQD parameters from userspace to be saved here */
+	memset(&mqd_user, 0, sizeof(mqd_user));
+
+	/* Structure to initialize MQD for userqueue using generic MQD init function */
+	memset(&userq_props, 0, sizeof(userq_props));
+
+	if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd_gfx_v11_0)) {
+		DRM_ERROR("MQD size mismatch\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd), args_in->mqd_size)) {
+		DRM_ERROR("Failed to get user MQD\n");
+		return -EFAULT;
+	}
+
+	/*
+	 * Create BO for actual Userqueue MQD now
+	 * Todo: replace the calls to bo_create_kernel() with bo_create() and use
+	 * implicit pinning for the MQD buffers.
+	 */
+	r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size, PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_GTT,
+				    &queue->mqd.obj,
+				    &queue->mqd.gpu_addr,
+				    &queue->mqd.cpu_ptr);
+	if (r) {
+		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+		return -ENOMEM;
+	}
+	memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
+
+	/* Initialize the MQD BO with user given values */
+	userq_props.wptr_gpu_addr = mqd_user.wptr_va;
+	userq_props.rptr_gpu_addr = mqd_user.rptr_va;
+	userq_props.queue_size = mqd_user.queue_size;
+	userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
+	userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
+	userq_props.use_doorbell = true;
+
+	r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr, &userq_props);
+	if (r) {
+		DRM_ERROR("Failed to initialize MQD for userqueue\n");
+		goto free_mqd;
+	}
+
+	return 0;
+
+free_mqd:
+	amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, &queue->mqd.cpu_ptr);
+	return r;
+}
+
+static void
+gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_userq_obj *mqd = &queue->mqd;
+
+	amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
+}
+
+const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
+	.mqd_create = gfx_v11_0_userq_mqd_create,
+	.mqd_destroy = gfx_v11_0_userq_mqd_destroy,
+};
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 55ed6512a565..240f92796f00 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -29,6 +29,12 @@
 
 struct amdgpu_mqd_prop;
 
+struct amdgpu_userq_obj {
+	void		 *cpu_ptr;
+	uint64_t	 gpu_addr;
+	struct amdgpu_bo *obj;
+};
+
 struct amdgpu_usermode_queue {
 	int			queue_type;
 	uint64_t		doorbell_handle;
@@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_mqd_prop	*userq_prop;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
+	struct amdgpu_userq_obj mqd;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 5/9] drm/amdgpu: create context space for usermode queue
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
                   ` (3 preceding siblings ...)
  2023-09-08 16:04 ` [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 " Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-20 15:21   ` Alex Deucher
  2023-09-08 16:04 ` [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

The FW expects us to allocate at least one page as context
space to process gang, process, GDS and FW  related work.
This patch creates a joint object for the same, and calculates
GPU space offsets of these spaces.

V1: Addressed review comments on RFC patch:
    Alex: Make this function IP specific

V2: Addressed review comments from Christian
    - Allocate only one object for total FW space, and calculate
      offsets for each of these objects.

V3: Integration with doorbell manager

V4: Review comments:
    - Remove shadow from FW space list from cover letter (Alex)
    - Alignment of macro (Luben)

V5: Merged patches 5 and 6 into this single patch
    Addressed review comments:
    - Use lower_32_bits instead of mask (Christian)
    - gfx_v11_0 instead of gfx_v11 in function names (Alex)
    - Shadow and GDS objects are now coming from userspace (Christian,
      Alex)

V6:
    - Add a comment to replace amdgpu_bo_create_kernel() with
      amdgpu_bo_create() during fw_ctx object creation (Christian).
    - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
      of generic queue structure and make it gen11 specific (Alex).

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 61 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 6760abda08df..8ffb5dee72a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -61,6 +61,9 @@
 #define regCGTT_WD_CLK_CTRL_BASE_IDX	1
 #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1	0x4e7e
 #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1_BASE_IDX	1
+#define AMDGPU_USERQ_PROC_CTX_SZ   PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ   PAGE_SIZE
+#define AMDGPU_USERQ_FW_CTX_SZ     PAGE_SIZE
 
 MODULE_FIRMWARE("amdgpu/gc_11_0_0_pfp.bin");
 MODULE_FIRMWARE("amdgpu/gc_11_0_0_me.bin");
@@ -6424,6 +6427,56 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
 	.funcs = &gfx_v11_0_ip_funcs,
 };
 
+static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+					      struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+
+	amdgpu_bo_free_kernel(&ctx->obj, &ctx->gpu_addr, &ctx->cpu_ptr);
+}
+
+static int gfx_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+					    struct amdgpu_usermode_queue *queue,
+					    struct drm_amdgpu_userq_mqd_gfx_v11_0 *mqd_user)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+	uint64_t fw_ctx_gpu_addr;
+	int r, size;
+
+	/*
+	 * The FW expects at least one page space allocated for
+	 * process ctx, gang ctx and fw ctx each. Create an object
+	 * for the same.
+	 */
+	size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_FW_CTX_SZ +
+	       AMDGPU_USERQ_GANG_CTX_SZ;
+	r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_GTT,
+				    &ctx->obj,
+				    &ctx->gpu_addr,
+				    &ctx->cpu_ptr);
+	if (r) {
+		DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
+		return r;
+	}
+
+	fw_ctx_gpu_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ +
+			  AMDGPU_USERQ_GANG_CTX_SZ;
+	mqd->fw_work_area_base_lo = lower_32_bits(fw_ctx_gpu_addr);
+	mqd->fw_work_area_base_lo = upper_32_bits(fw_ctx_gpu_addr);
+
+	/* Shadow and GDS objects come directly from userspace */
+	mqd->shadow_base_lo = lower_32_bits(mqd_user->shadow_va);
+	mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
+
+	mqd->gds_bkup_base_lo = lower_32_bits(mqd_user->gds_va);
+	mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
+
+	return 0;
+}
+
 static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 				      struct drm_amdgpu_userq_in *args_in,
 				      struct amdgpu_usermode_queue *queue)
@@ -6480,6 +6533,13 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* Create BO for FW operations */
+	r = gfx_v11_0_userq_create_ctx_space(uq_mgr, queue, &mqd_user);
+	if (r) {
+		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
+		goto free_mqd;
+	}
+
 	return 0;
 
 free_mqd:
@@ -6492,6 +6552,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
 {
 	struct amdgpu_userq_obj *mqd = &queue->mqd;
 
+	gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
 	amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
 }
 
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 240f92796f00..34e20daa06c8 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
 	struct amdgpu_userq_obj mqd;
+	struct amdgpu_userq_obj fw_obj;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
                   ` (4 preceding siblings ...)
  2023-09-08 16:04 ` [PATCH v6 5/9] drm/amdgpu: create context space for " Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-20 15:23   ` Alex Deucher
  2023-09-20 15:28   ` Alex Deucher
  2023-09-08 16:04 ` [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
    - Map/Unmap should be IP specific.
V2:
    Addressed review comments from Christian:
    - Fix the wptr_mc_addr calculation (moved into another patch)
    Addressed review comments from Alex:
    - Do not add fptrs for map/unmap

V3: Integration with doorbell manager
V4: Rebase
V5: Use gfx_v11_0 for function names (Alex)
V6: Removed queue->proc/gang/fw_ctx_address variables and doing the
    address calculations locally to keep the queue structure GEN
    independent (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 72 ++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 8ffb5dee72a9..e266674e0d44 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -6427,6 +6427,67 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
 	.funcs = &gfx_v11_0_ip_funcs,
 };
 
+static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
+				  struct amdgpu_usermode_queue *queue)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct mes_remove_queue_input queue_input;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	int r;
+
+	memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+	queue_input.doorbell_offset = queue->doorbell_index;
+	queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+
+	amdgpu_mes_lock(&adev->mes);
+	r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+	amdgpu_mes_unlock(&adev->mes);
+	if (r)
+		DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
+static int gfx_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
+			       struct amdgpu_usermode_queue *queue,
+			       struct amdgpu_mqd_prop *userq_props)
+{
+	struct amdgpu_device *adev = uq_mgr->adev;
+	struct amdgpu_userq_obj *ctx = &queue->fw_obj;
+	struct mes_add_queue_input queue_input;
+	int r;
+
+	memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+	queue_input.process_va_start = 0;
+	queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
+	queue_input.process_quantum = 100000; /* 10ms */
+	queue_input.gang_quantum = 10000; /* 1ms */
+	queue_input.paging = false;
+
+	queue_input.process_context_addr = ctx->gpu_addr;
+	queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+	queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+	queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+	queue_input.process_id = queue->vm->pasid;
+	queue_input.queue_type = queue->queue_type;
+	queue_input.mqd_addr = queue->mqd.gpu_addr;
+	queue_input.wptr_addr = userq_props->wptr_gpu_addr;
+	queue_input.queue_size = userq_props->queue_size >> 2;
+	queue_input.doorbell_offset = userq_props->doorbell_index;
+	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+	amdgpu_mes_lock(&adev->mes);
+	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+	amdgpu_mes_unlock(&adev->mes);
+	if (r) {
+		DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+		return r;
+	}
+
+	DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", userq_props->doorbell_index);
+	return 0;
+}
+
 static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
 					      struct amdgpu_usermode_queue *queue)
 {
@@ -6540,8 +6601,18 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* Map userqueue into FW using MES */
+	r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
+	if (r) {
+		DRM_ERROR("Failed to init MQD\n");
+		goto free_ctx;
+	}
+
 	return 0;
 
+free_ctx:
+	gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
+
 free_mqd:
 	amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, &queue->mqd.cpu_ptr);
 	return r;
@@ -6552,6 +6623,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
 {
 	struct amdgpu_userq_obj *mqd = &queue->mqd;
 
+	gfx_v11_0_userq_unmap(uq_mgr, queue);
 	gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
 	amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
                   ` (5 preceding siblings ...)
  2023-09-08 16:04 ` [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-18 10:32   ` Christian König
  2023-09-08 16:04 ` [PATCH v6 8/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
  2023-09-08 16:04 ` [PATCH v6 9/9] drm/amdgpu: cleanup leftover queues Shashank Sharma
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

To support oversubscription, MES FW expects WPTR BOs to
be mapped into GART, before they are submitted to usermode
queues. This patch adds a function for the same.

V4: fix the wptr value before mapping lookup (Bas, Christian).
V5: Addressed review comments from Christian:
    - Either pin object or allocate from GART, but not both.
    - All the handling must be done with the VM locks held.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 81 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 2 files changed, 82 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index e266674e0d44..c0eb622dfc37 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -6427,6 +6427,79 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
 	.funcs = &gfx_v11_0_ip_funcs,
 };
 
+static int
+gfx_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+{
+	int ret;
+
+	ret = amdgpu_bo_reserve(bo, true);
+	if (ret) {
+		DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+		goto err_reserve_bo_failed;
+	}
+
+	ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+	if (ret) {
+		DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+		goto err_map_bo_gart_failed;
+	}
+
+	amdgpu_bo_unreserve(bo);
+	bo = amdgpu_bo_ref(bo);
+
+	return 0;
+
+err_map_bo_gart_failed:
+	amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+	return ret;
+}
+
+static int
+gfx_v11_0_create_wptr_mapping(struct amdgpu_device *adev,
+			      struct amdgpu_usermode_queue *queue,
+			      uint64_t wptr)
+{
+	struct amdgpu_bo_va_mapping *wptr_mapping;
+	struct amdgpu_vm *wptr_vm;
+	struct amdgpu_bo *wptr_bo = NULL;
+	int ret;
+
+	mutex_lock(&queue->vm->eviction_lock);
+	wptr_vm = queue->vm;
+	ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+	if (ret)
+		goto unlock;
+
+	wptr &= AMDGPU_GMC_HOLE_MASK;
+	wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+	amdgpu_bo_unreserve(wptr_vm->root.bo);
+	if (!wptr_mapping) {
+		DRM_ERROR("Failed to lookup wptr bo\n");
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	wptr_bo = wptr_mapping->bo_va->base.bo;
+	if (wptr_bo->tbo.base.size > PAGE_SIZE) {
+		DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	ret = gfx_v11_0_map_gtt_bo_to_gart(adev, wptr_bo);
+	if (ret) {
+		DRM_ERROR("Failed to map wptr bo to GART\n");
+		goto unlock;
+	}
+
+	queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
+
+unlock:
+	mutex_unlock(&queue->vm->eviction_lock);
+	return ret;
+}
+
 static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
 				  struct amdgpu_usermode_queue *queue)
 {
@@ -6475,6 +6548,7 @@ static int gfx_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
 	queue_input.queue_size = userq_props->queue_size >> 2;
 	queue_input.doorbell_offset = userq_props->doorbell_index;
 	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+	queue_input.wptr_mc_addr = queue->wptr_mc_addr;
 
 	amdgpu_mes_lock(&adev->mes);
 	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
@@ -6601,6 +6675,13 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 		goto free_mqd;
 	}
 
+	/* FW expects WPTR BOs to be mapped into GART */
+	r = gfx_v11_0_create_wptr_mapping(adev, queue, userq_props.wptr_gpu_addr);
+	if (r) {
+		DRM_ERROR("Failed to create WPTR mapping\n");
+		goto free_ctx;
+	}
+
 	/* Map userqueue into FW using MES */
 	r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
 	if (r) {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 34e20daa06c8..ae155de62560 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -39,6 +39,7 @@ struct amdgpu_usermode_queue {
 	int			queue_type;
 	uint64_t		doorbell_handle;
 	uint64_t		doorbell_index;
+	uint64_t		wptr_mc_addr;
 	uint64_t		flags;
 	struct amdgpu_mqd_prop	*userq_prop;
 	struct amdgpu_userq_mgr *userq_mgr;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 8/9] drm/amdgpu: generate doorbell index for userqueue
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
                   ` (6 preceding siblings ...)
  2023-09-08 16:04 ` [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-20 15:25   ` Alex Deucher
  2023-09-08 16:04 ` [PATCH v6 9/9] drm/amdgpu: cleanup leftover queues Shashank Sharma
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav, Shashank Sharma

The userspace sends us the doorbell object and the relative doobell
index in the object to be used for the usermode queue, but the FW
expects the absolute doorbell index on the PCI BAR in the MQD. This
patch adds a function to convert this relative doorbell index to
absolute doorbell index.

This patch is dependent on the doorbell manager series which is
expected to be merged soon:
Link: https://patchwork.freedesktop.org/series/115802/

V5: Fix the db object reference leak (Christian)
V6: Pin the doorbell bo in userqueue_create() function, and unpin it
    in userqueue destoy (Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 +++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 03fc8e89eafb..a311d4949bb8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -32,6 +32,35 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
 	return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+				     struct amdgpu_usermode_queue *queue,
+				     struct drm_file *filp,
+				     uint32_t doorbell_offset)
+{
+	uint64_t index;
+	struct drm_gem_object *gobj;
+
+	gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+	if (gobj == NULL) {
+		DRM_ERROR("Can't find GEM object for doorbell\n");
+		return -EINVAL;
+	}
+
+	queue->db_bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+	drm_gem_object_put(gobj);
+
+	/* Pin the BO before generating the index, unpin in queue destroy */
+	if (amdgpu_bo_pin(queue->db_bo, AMDGPU_GEM_DOMAIN_DOORBELL)) {
+		DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
+		return -EINVAL;
+	}
+
+	index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, queue->db_bo, doorbell_offset);
+	DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+	return index;
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
@@ -50,6 +79,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 	}
 	uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
 	uq_funcs->mqd_destroy(uq_mgr, queue);
+	amdgpu_bo_unpin(queue->db_bo);
+	amdgpu_bo_unref(&queue->db_bo);
 	idr_remove(&uq_mgr->userq_idr, queue_id);
 	kfree(queue);
 
@@ -64,6 +95,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
 	const struct amdgpu_userq_funcs *uq_funcs;
 	struct amdgpu_usermode_queue *queue;
+	uint64_t index;
 	int qid, r = 0;
 
 	/* Usermode queues are only supported for GFX/SDMA engines as of now */
@@ -93,6 +125,14 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 	queue->flags = args->in.flags;
 	queue->vm = &fpriv->vm;
 
+	/* Convert relative doorbell offset into absolute doorbell index */
+	index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, args->in.doorbell_offset);
+	if (index == (uint64_t)-EINVAL) {
+		DRM_ERROR("Failed to get doorbell for queue\n");
+		goto unlock;
+	}
+	queue->doorbell_index = index;
+
 	r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
 	if (r) {
 		DRM_ERROR("Failed to create Queue\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index c0eb622dfc37..d855df9b7b37 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -6660,6 +6660,7 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
 	userq_props.queue_size = mqd_user.queue_size;
 	userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
 	userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
+	userq_props.doorbell_index = queue->doorbell_index;
 	userq_props.use_doorbell = true;
 
 	r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr, &userq_props);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index ae155de62560..b5600cff086e 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_mqd_prop	*userq_prop;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_vm	*vm;
+	struct amdgpu_bo	*db_bo;
 	struct amdgpu_userq_obj mqd;
 	struct amdgpu_userq_obj fw_obj;
 };
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 9/9] drm/amdgpu: cleanup leftover queues
  2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
                   ` (7 preceding siblings ...)
  2023-09-08 16:04 ` [PATCH v6 8/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
@ 2023-09-08 16:04 ` Shashank Sharma
  2023-09-20 15:26   ` Alex Deucher
  8 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-08 16:04 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Bas Nieuwenhuizen, Christian Koenig, arvind.yadav,
	Shashank Sharma

This patch adds code to cleanup any leftover userqueues which
a user might have missed to destroy due to a crash or any other
programming error.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 25 +++++++++++++++----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index a311d4949bb8..0c78579b0791 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -61,12 +61,23 @@ amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
 	return index;
 }
 
+static void
+amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
+			 struct amdgpu_usermode_queue *queue,
+			 int queue_id)
+{
+	const struct amdgpu_userq_funcs *uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
+
+	uq_funcs->mqd_destroy(uq_mgr, queue);
+	idr_remove(&uq_mgr->userq_idr, queue_id);
+	kfree(queue);
+}
+
 static int
 amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 {
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
-	const struct amdgpu_userq_funcs *uq_funcs;
 	struct amdgpu_usermode_queue *queue;
 
 	mutex_lock(&uq_mgr->userq_mutex);
@@ -77,12 +88,10 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
 		mutex_unlock(&uq_mgr->userq_mutex);
 		return -EINVAL;
 	}
-	uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
-	uq_funcs->mqd_destroy(uq_mgr, queue);
+
 	amdgpu_bo_unpin(queue->db_bo);
 	amdgpu_bo_unref(&queue->db_bo);
-	idr_remove(&uq_mgr->userq_idr, queue_id);
-	kfree(queue);
+	amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
 
 	mutex_unlock(&uq_mgr->userq_mutex);
 	return 0;
@@ -207,6 +216,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
 {
+	uint32_t queue_id;
+	struct amdgpu_usermode_queue *queue;
+
+	idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
+		amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
+
 	idr_destroy(&userq_mgr->userq_idr);
 	mutex_destroy(&userq_mgr->userq_mutex);
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-08 16:04 ` [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 " Shashank Sharma
@ 2023-09-14  7:45   ` Christian König
  2023-09-14  8:24     ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-09-14  7:45 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, arvind.yadav

Am 08.09.23 um 18:04 schrieb Shashank Sharma:
> A Memory queue descriptor (MQD) of a userqueue defines it in
> the hw's context. As MQD format can vary between different
> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>
> This patch:
> - Introduces MQD handler functions for the usermode queues.
> - Adds new functions to create and destroy userqueue MQD for
>    GFX-GEN-11 IP
>
> V1: Worked on review comments from Alex:
>      - Make MQD functions GEN and IP specific
>
> V2: Worked on review comments from Alex:
>      - Reuse the existing adev->mqd[ip] for MQD creation
>      - Formatting and arrangement of code
>
> V3:
>      - Integration with doorbell manager
>
> V4: Review comments addressed:
>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>      - Align name of structure members (Luben)
>      - Don't break up the Cc tag list and the Sob tag list in commit
>        message (Luben)
> V5:
>     - No need to reserve the bo for MQD (Christian).
>     - Some more changes to support IP specific MQD creation.
>
> V6:
>     - Add a comment reminding us to replace the amdgpu_bo_create_kernel()
>       calls while creating MQD object to amdgpu_bo_create() once eviction
>       fences are ready (Christian).
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77 +++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>   3 files changed, 100 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 44769423ba30..03fc8e89eafb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>   	return r;
>   }
>   
> +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
> +
> +static void
> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +	int maj;
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	uint32_t version = adev->ip_versions[GC_HWIP][0];
> +
> +	/* We support usermode queue only for GFX V11 as of now */
> +	maj = IP_VERSION_MAJ(version);
> +	if (maj == 11)
> +		uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
> +}

That belongs into gfx_v11.c and not here.

> +
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>   {
>   	mutex_init(&userq_mgr->userq_mutex);
>   	idr_init_base(&userq_mgr->userq_idr, 1);
>   	userq_mgr->adev = adev;
>   
> +	amdgpu_userqueue_setup_gfx(userq_mgr);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 0451533ddde4..6760abda08df 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -30,6 +30,7 @@
>   #include "amdgpu_psp.h"
>   #include "amdgpu_smu.h"
>   #include "amdgpu_atomfirmware.h"
> +#include "amdgpu_userqueue.h"
>   #include "imu_v11_0.h"
>   #include "soc21.h"
>   #include "nvd.h"
> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
>   	.rev = 0,
>   	.funcs = &gfx_v11_0_ip_funcs,
>   };
> +
> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> +				      struct drm_amdgpu_userq_in *args_in,
> +				      struct amdgpu_usermode_queue *queue)
> +{
> +	struct amdgpu_device *adev = uq_mgr->adev;
> +	struct amdgpu_mqd *mqd_gfx_generic = &adev->mqds[AMDGPU_HW_IP_GFX];
> +	struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
> +	struct amdgpu_mqd_prop userq_props;
> +	int r;
> +
> +	/* Incoming MQD parameters from userspace to be saved here */
> +	memset(&mqd_user, 0, sizeof(mqd_user));
> +
> +	/* Structure to initialize MQD for userqueue using generic MQD init function */
> +	memset(&userq_props, 0, sizeof(userq_props));
> +
> +	if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd_gfx_v11_0)) {
> +		DRM_ERROR("MQD size mismatch\n");
> +		return -EINVAL;
> +	}
> +
> +	if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd), args_in->mqd_size)) {
> +		DRM_ERROR("Failed to get user MQD\n");
> +		return -EFAULT;
> +	}
> +
> +	/*
> +	 * Create BO for actual Userqueue MQD now
> +	 * Todo: replace the calls to bo_create_kernel() with bo_create() and use
> +	 * implicit pinning for the MQD buffers.

Well not implicit pinning, but rather fencing of the BO.

Regards,
Christian.

> +	 */
> +	r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size, PAGE_SIZE,
> +				    AMDGPU_GEM_DOMAIN_GTT,
> +				    &queue->mqd.obj,
> +				    &queue->mqd.gpu_addr,
> +				    &queue->mqd.cpu_ptr);
> +	if (r) {
> +		DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> +		return -ENOMEM;
> +	}
> +	memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
> +
> +	/* Initialize the MQD BO with user given values */
> +	userq_props.wptr_gpu_addr = mqd_user.wptr_va;
> +	userq_props.rptr_gpu_addr = mqd_user.rptr_va;
> +	userq_props.queue_size = mqd_user.queue_size;
> +	userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
> +	userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
> +	userq_props.use_doorbell = true;
> +
> +	r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr, &userq_props);
> +	if (r) {
> +		DRM_ERROR("Failed to initialize MQD for userqueue\n");
> +		goto free_mqd;
> +	}
> +
> +	return 0;
> +
> +free_mqd:
> +	amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, &queue->mqd.cpu_ptr);
> +	return r;
> +}
> +
> +static void
> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +	struct amdgpu_userq_obj *mqd = &queue->mqd;
> +
> +	amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
> +}
> +
> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
> +	.mqd_create = gfx_v11_0_userq_mqd_create,
> +	.mqd_destroy = gfx_v11_0_userq_mqd_destroy,
> +};
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 55ed6512a565..240f92796f00 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -29,6 +29,12 @@
>   
>   struct amdgpu_mqd_prop;
>   
> +struct amdgpu_userq_obj {
> +	void		 *cpu_ptr;
> +	uint64_t	 gpu_addr;
> +	struct amdgpu_bo *obj;
> +};
> +
>   struct amdgpu_usermode_queue {
>   	int			queue_type;
>   	uint64_t		doorbell_handle;
> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>   	struct amdgpu_mqd_prop	*userq_prop;
>   	struct amdgpu_userq_mgr *userq_mgr;
>   	struct amdgpu_vm	*vm;
> +	struct amdgpu_userq_obj mqd;
>   };
>   
>   struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-14  7:45   ` Christian König
@ 2023-09-14  8:24     ` Shashank Sharma
  2023-09-28 13:11       ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-14  8:24 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: Alex Deucher, arvind.yadav


On 14/09/2023 09:45, Christian König wrote:
> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>> A Memory queue descriptor (MQD) of a userqueue defines it in
>> the hw's context. As MQD format can vary between different
>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>
>> This patch:
>> - Introduces MQD handler functions for the usermode queues.
>> - Adds new functions to create and destroy userqueue MQD for
>>    GFX-GEN-11 IP
>>
>> V1: Worked on review comments from Alex:
>>      - Make MQD functions GEN and IP specific
>>
>> V2: Worked on review comments from Alex:
>>      - Reuse the existing adev->mqd[ip] for MQD creation
>>      - Formatting and arrangement of code
>>
>> V3:
>>      - Integration with doorbell manager
>>
>> V4: Review comments addressed:
>>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>>      - Align name of structure members (Luben)
>>      - Don't break up the Cc tag list and the Sob tag list in commit
>>        message (Luben)
>> V5:
>>     - No need to reserve the bo for MQD (Christian).
>>     - Some more changes to support IP specific MQD creation.
>>
>> V6:
>>     - Add a comment reminding us to replace the 
>> amdgpu_bo_create_kernel()
>>       calls while creating MQD object to amdgpu_bo_create() once 
>> eviction
>>       fences are ready (Christian).
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>>   3 files changed, 100 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 44769423ba30..03fc8e89eafb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev, 
>> void *data,
>>       return r;
>>   }
>>   +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
>> +
>> +static void
>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    int maj;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>> +
>> +    /* We support usermode queue only for GFX V11 as of now */
>> +    maj = IP_VERSION_MAJ(version);
>> +    if (maj == 11)
>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
>> +}
>
> That belongs into gfx_v11.c and not here.


Agree,

>
>> +
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>> struct amdgpu_device *adev)
>>   {
>>       mutex_init(&userq_mgr->userq_mutex);
>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>       userq_mgr->adev = adev;
>>   +    amdgpu_userqueue_setup_gfx(userq_mgr);
>>       return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index 0451533ddde4..6760abda08df 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -30,6 +30,7 @@
>>   #include "amdgpu_psp.h"
>>   #include "amdgpu_smu.h"
>>   #include "amdgpu_atomfirmware.h"
>> +#include "amdgpu_userqueue.h"
>>   #include "imu_v11_0.h"
>>   #include "soc21.h"
>>   #include "nvd.h"
>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version 
>> gfx_v11_0_ip_block =
>>       .rev = 0,
>>       .funcs = &gfx_v11_0_ip_funcs,
>>   };
>> +
>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>> +                      struct drm_amdgpu_userq_in *args_in,
>> +                      struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_mqd *mqd_gfx_generic = &adev->mqds[AMDGPU_HW_IP_GFX];
>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
>> +    struct amdgpu_mqd_prop userq_props;
>> +    int r;
>> +
>> +    /* Incoming MQD parameters from userspace to be saved here */
>> +    memset(&mqd_user, 0, sizeof(mqd_user));
>> +
>> +    /* Structure to initialize MQD for userqueue using generic MQD 
>> init function */
>> +    memset(&userq_props, 0, sizeof(userq_props));
>> +
>> +    if (args_in->mqd_size != sizeof(struct 
>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
>> +        DRM_ERROR("MQD size mismatch\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd), 
>> args_in->mqd_size)) {
>> +        DRM_ERROR("Failed to get user MQD\n");
>> +        return -EFAULT;
>> +    }
>> +
>> +    /*
>> +     * Create BO for actual Userqueue MQD now
>> +     * Todo: replace the calls to bo_create_kernel() with 
>> bo_create() and use
>> +     * implicit pinning for the MQD buffers.
>
> Well not implicit pinning, but rather fencing of the BO.
>
Noted.

- Shashank


> Regards,
> Christian.
>
>> +     */
>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size, 
>> PAGE_SIZE,
>> +                    AMDGPU_GEM_DOMAIN_GTT,
>> +                    &queue->mqd.obj,
>> +                    &queue->mqd.gpu_addr,
>> +                    &queue->mqd.cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>> +        return -ENOMEM;
>> +    }
>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
>> +
>> +    /* Initialize the MQD BO with user given values */
>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
>> +    userq_props.queue_size = mqd_user.queue_size;
>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
>> +    userq_props.use_doorbell = true;
>> +
>> +    r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
>> &userq_props);
>> +    if (r) {
>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
>> +        goto free_mqd;
>> +    }
>> +
>> +    return 0;
>> +
>> +free_mqd:
>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, 
>> &queue->mqd.cpu_ptr);
>> +    return r;
>> +}
>> +
>> +static void
>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct 
>> amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
>> +
>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>> +}
>> +
>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
>> +};
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 55ed6512a565..240f92796f00 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -29,6 +29,12 @@
>>     struct amdgpu_mqd_prop;
>>   +struct amdgpu_userq_obj {
>> +    void         *cpu_ptr;
>> +    uint64_t     gpu_addr;
>> +    struct amdgpu_bo *obj;
>> +};
>> +
>>   struct amdgpu_usermode_queue {
>>       int            queue_type;
>>       uint64_t        doorbell_handle;
>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>>       struct amdgpu_mqd_prop    *userq_prop;
>>       struct amdgpu_userq_mgr *userq_mgr;
>>       struct amdgpu_vm    *vm;
>> +    struct amdgpu_userq_obj mqd;
>>   };
>>     struct amdgpu_userq_funcs {
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART
  2023-09-08 16:04 ` [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2023-09-18 10:32   ` Christian König
  2023-10-04 21:34     ` Felix Kuehling
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-09-18 10:32 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, arvind.yadav

Am 08.09.23 um 18:04 schrieb Shashank Sharma:
> To support oversubscription, MES FW expects WPTR BOs to
> be mapped into GART, before they are submitted to usermode
> queues. This patch adds a function for the same.
>
> V4: fix the wptr value before mapping lookup (Bas, Christian).
> V5: Addressed review comments from Christian:
>      - Either pin object or allocate from GART, but not both.
>      - All the handling must be done with the VM locks held.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 81 +++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>   2 files changed, 82 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index e266674e0d44..c0eb622dfc37 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -6427,6 +6427,79 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
>   	.funcs = &gfx_v11_0_ip_funcs,
>   };
>   
> +static int
> +gfx_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
> +{
> +	int ret;
> +
> +	ret = amdgpu_bo_reserve(bo, true);
> +	if (ret) {
> +		DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> +		goto err_reserve_bo_failed;
> +	}
> +
> +	ret = amdgpu_ttm_alloc_gart(&bo->tbo);
> +	if (ret) {
> +		DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> +		goto err_map_bo_gart_failed;
> +	}
> +
> +	amdgpu_bo_unreserve(bo);

The GART mapping can become invalid as soon as you unlock the BOs.

You need to attach an eviction fence for this to work correctly.

> +	bo = amdgpu_bo_ref(bo);
> +
> +	return 0;
> +
> +err_map_bo_gart_failed:
> +	amdgpu_bo_unreserve(bo);
> +err_reserve_bo_failed:
> +	return ret;
> +}
> +
> +static int
> +gfx_v11_0_create_wptr_mapping(struct amdgpu_device *adev,
> +			      struct amdgpu_usermode_queue *queue,
> +			      uint64_t wptr)
> +{
> +	struct amdgpu_bo_va_mapping *wptr_mapping;
> +	struct amdgpu_vm *wptr_vm;
> +	struct amdgpu_bo *wptr_bo = NULL;
> +	int ret;
> +
> +	mutex_lock(&queue->vm->eviction_lock);

Never ever touch the eviction lock outside of the VM code! That lock is 
completely unrelated to what you do here.

> +	wptr_vm = queue->vm;
> +	ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> +	if (ret)
> +		goto unlock;
> +
> +	wptr &= AMDGPU_GMC_HOLE_MASK;
> +	wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
> +	amdgpu_bo_unreserve(wptr_vm->root.bo);
> +	if (!wptr_mapping) {
> +		DRM_ERROR("Failed to lookup wptr bo\n");
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	wptr_bo = wptr_mapping->bo_va->base.bo;
> +	if (wptr_bo->tbo.base.size > PAGE_SIZE) {
> +		DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
> +		ret = -EINVAL;
> +		goto unlock;
> +	}

We probably also want to enforce that this BO is a per VM BO.

> +
> +	ret = gfx_v11_0_map_gtt_bo_to_gart(adev, wptr_bo);
> +	if (ret) {
> +		DRM_ERROR("Failed to map wptr bo to GART\n");
> +		goto unlock;
> +	}
> +
> +	queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;

This needs to be amdgpu_bo_gpu_offset() instead.

Regards,
Christian.

> +
> +unlock:
> +	mutex_unlock(&queue->vm->eviction_lock);
> +	return ret;
> +}
> +
>   static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
>   				  struct amdgpu_usermode_queue *queue)
>   {
> @@ -6475,6 +6548,7 @@ static int gfx_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
>   	queue_input.queue_size = userq_props->queue_size >> 2;
>   	queue_input.doorbell_offset = userq_props->doorbell_index;
>   	queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +	queue_input.wptr_mc_addr = queue->wptr_mc_addr;
>   
>   	amdgpu_mes_lock(&adev->mes);
>   	r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> @@ -6601,6 +6675,13 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>   		goto free_mqd;
>   	}
>   
> +	/* FW expects WPTR BOs to be mapped into GART */
> +	r = gfx_v11_0_create_wptr_mapping(adev, queue, userq_props.wptr_gpu_addr);
> +	if (r) {
> +		DRM_ERROR("Failed to create WPTR mapping\n");
> +		goto free_ctx;
> +	}
> +
>   	/* Map userqueue into FW using MES */
>   	r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
>   	if (r) {
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 34e20daa06c8..ae155de62560 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -39,6 +39,7 @@ struct amdgpu_usermode_queue {
>   	int			queue_type;
>   	uint64_t		doorbell_handle;
>   	uint64_t		doorbell_index;
> +	uint64_t		wptr_mc_addr;
>   	uint64_t		flags;
>   	struct amdgpu_mqd_prop	*userq_prop;
>   	struct amdgpu_userq_mgr *userq_mgr;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue
  2023-09-08 16:04 ` [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2023-09-20 14:47   ` Zhang, Yifan
  2023-09-20 14:53     ` Sharma, Shashank
  0 siblings, 1 reply; 32+ messages in thread
From: Zhang, Yifan @ 2023-09-20 14:47 UTC (permalink / raw)
  To: Sharma, Shashank, amd-gfx
  Cc: Deucher, Alexander, Koenig, Christian, Yadav, Arvind, Sharma, Shashank

[AMD Official Use Only - General]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Shashank Sharma
Sent: Saturday, September 9, 2023 12:05 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
- Remove wrapper functions for queue indexing (Christian)
- Do not save the queue id/idr in queue itself (Christian)
- Move the idr allocation in the IP independent generic space
  (Christian)

V6:
- Check the validity of input IP type (Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 116 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 3 files changed, 119 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2a89e303c3db..214a66b33dfa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2830,6 +2830,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
        DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
        DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
        DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl,
+DRM_AUTH|DRM_RENDER_ALLOW),
 };

 static const struct drm_driver amdgpu_kms_driver = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..44769423ba30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,122 @@
  */

 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid) {
+       return idr_find(&uq_mgr->userq_idr, qid); }
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id) {
+       struct amdgpu_fpriv *fpriv = filp->driver_priv;
+       struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+       const struct amdgpu_userq_funcs *uq_funcs;
+       struct amdgpu_usermode_queue *queue;
+
+       mutex_lock(&uq_mgr->userq_mutex);
+
+       queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+       if (!queue) {
+               DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+               mutex_unlock(&uq_mgr->userq_mutex);
+               return -EINVAL;
+       }
+       uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
+       uq_funcs->mqd_destroy(uq_mgr, queue);
+       idr_remove(&uq_mgr->userq_idr, queue_id);
+       kfree(queue);
+
+       mutex_unlock(&uq_mgr->userq_mutex);
+       return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
+*args) {
+       struct amdgpu_fpriv *fpriv = filp->driver_priv;
+       struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+       const struct amdgpu_userq_funcs *uq_funcs;
+       struct amdgpu_usermode_queue *queue;
+       int qid, r = 0;
+
+       /* Usermode queues are only supported for GFX/SDMA engines as of now */
+       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
+               DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
+               return -EINVAL;
+       }
+
+       mutex_lock(&uq_mgr->userq_mutex);
+
+       uq_funcs = uq_mgr->userq_funcs[args->in.ip_type];
+       if (!uq_funcs) {
+               DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", args->in.ip_type);
+               r = -EINVAL;
+               goto unlock;
+       }
+
+       queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+       if (!queue) {
+               DRM_ERROR("Failed to allocate memory for queue\n");
+               r = -ENOMEM;
+               goto unlock;
+       }
+       queue->doorbell_handle = args->in.doorbell_handle;
+       queue->doorbell_index = args->in.doorbell_offset;
+       queue->queue_type = args->in.ip_type;
+       queue->flags = args->in.flags;
+       queue->vm = &fpriv->vm;
+
+       r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
+       if (r) {
+               DRM_ERROR("Failed to create Queue\n");
+               goto unlock;

Should kfree(queue) in this path to avoid memory leak

+       }
+
+       qid = idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ_COUNT, GFP_KERNEL);
+       if (qid < 0) {
+               DRM_ERROR("Failed to allocate a queue id\n");
+               uq_funcs->mqd_destroy(uq_mgr, queue);
+               r = -ENOMEM;
+               goto unlock;

Should kfree(queue)  in this path to avoid memory leak

+       }
+       args->out.queue_id = qid;
+

Should add
free_queue:
        kfree(queue) ;
to avoid memory leak

+unlock:
+       mutex_unlock(&uq_mgr->userq_mutex);
+       return r;
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+                      struct drm_file *filp)
+{
+       union drm_amdgpu_userq *args = data;
+       int r = 0;
+
+       switch (args->in.op) {
+       case AMDGPU_USERQ_OP_CREATE:
+               r = amdgpu_userqueue_create(filp, args);
+               if (r)
+                       DRM_ERROR("Failed to create usermode queue\n");
+               break;
+
+       case AMDGPU_USERQ_OP_FREE:
+               r = amdgpu_userqueue_destroy(filp, args->in.queue_id);
+               if (r)
+                       DRM_ERROR("Failed to destroy usermode queue\n");
+               break;
+
+       default:
+               DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
+               return -EINVAL;
+       }
+
+       return r;
+}

 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)  { diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 79ffa131a514..55ed6512a565 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -55,6 +55,8 @@ struct amdgpu_userq_mgr {
        const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];  };

+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct
+drm_file *filp);
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);

 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
--
2.42.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue
  2023-09-20 14:47   ` Zhang, Yifan
@ 2023-09-20 14:53     ` Sharma, Shashank
  0 siblings, 0 replies; 32+ messages in thread
From: Sharma, Shashank @ 2023-09-20 14:53 UTC (permalink / raw)
  To: Zhang, Yifan, amd-gfx
  Cc: Deucher, Alexander, Koenig, Christian, Yadav, Arvind

[AMD Official Use Only - General]

-----Original Message-----
From: Zhang, Yifan <Yifan1.Zhang@amd.com>
Sent: Wednesday, September 20, 2023 4:48 PM
To: Sharma, Shashank <Shashank.Sharma@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: RE: [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue

[AMD Official Use Only - General]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Shashank Sharma
Sent: Saturday, September 9, 2023 12:05 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

V4:
 - Accommodate MQD union related changes in UAPI (Alex)
 - Do not set the queue size twice (Bas)

V5:
- Remove wrapper functions for queue indexing (Christian)
- Do not save the queue id/idr in queue itself (Christian)
- Move the idr allocation in the IP independent generic space
  (Christian)

V6:
- Check the validity of input IP type (Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 116 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 3 files changed, 119 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2a89e303c3db..214a66b33dfa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2830,6 +2830,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
        DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
        DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
        DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl,
+DRM_AUTH|DRM_RENDER_ALLOW),
 };

 static const struct drm_driver amdgpu_kms_driver = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index effc0c7c02cf..44769423ba30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,122 @@
  */

 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid) {
+       return idr_find(&uq_mgr->userq_idr, qid); }
+
+static int
+amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id) {
+       struct amdgpu_fpriv *fpriv = filp->driver_priv;
+       struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+       const struct amdgpu_userq_funcs *uq_funcs;
+       struct amdgpu_usermode_queue *queue;
+
+       mutex_lock(&uq_mgr->userq_mutex);
+
+       queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+       if (!queue) {
+               DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+               mutex_unlock(&uq_mgr->userq_mutex);
+               return -EINVAL;
+       }
+       uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
+       uq_funcs->mqd_destroy(uq_mgr, queue);
+       idr_remove(&uq_mgr->userq_idr, queue_id);
+       kfree(queue);
+
+       mutex_unlock(&uq_mgr->userq_mutex);
+       return 0;
+}
+
+static int
+amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
+*args) {
+       struct amdgpu_fpriv *fpriv = filp->driver_priv;
+       struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+       const struct amdgpu_userq_funcs *uq_funcs;
+       struct amdgpu_usermode_queue *queue;
+       int qid, r = 0;
+
+       /* Usermode queues are only supported for GFX/SDMA engines as of now */
+       if (args->in.ip_type != AMDGPU_HW_IP_GFX && args->in.ip_type != AMDGPU_HW_IP_DMA) {
+               DRM_ERROR("Usermode queue doesn't support IP type %u\n", args->in.ip_type);
+               return -EINVAL;
+       }
+
+       mutex_lock(&uq_mgr->userq_mutex);
+
+       uq_funcs = uq_mgr->userq_funcs[args->in.ip_type];
+       if (!uq_funcs) {
+               DRM_ERROR("Usermode queue is not supported for this IP (%u)\n", args->in.ip_type);
+               r = -EINVAL;
+               goto unlock;
+       }
+
+       queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+       if (!queue) {
+               DRM_ERROR("Failed to allocate memory for queue\n");
+               r = -ENOMEM;
+               goto unlock;
+       }
+       queue->doorbell_handle = args->in.doorbell_handle;
+       queue->doorbell_index = args->in.doorbell_offset;
+       queue->queue_type = args->in.ip_type;
+       queue->flags = args->in.flags;
+       queue->vm = &fpriv->vm;
+
+       r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
+       if (r) {
+               DRM_ERROR("Failed to create Queue\n");
+               goto unlock;

Should kfree(queue) in this path to avoid memory leak

+       }
+
+       qid = idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ_COUNT, GFP_KERNEL);
+       if (qid < 0) {
+               DRM_ERROR("Failed to allocate a queue id\n");
+               uq_funcs->mqd_destroy(uq_mgr, queue);
+               r = -ENOMEM;
+               goto unlock;

Should kfree(queue)  in this path to avoid memory leak

+       }
+       args->out.queue_id = qid;
+

Should add
free_queue:
        kfree(queue) ;
to avoid memory leak



That's correct, looks like it got removed when I moved the code shift to accommodate the cleanup patch.

Thank you for pointing that.

- Shashank




+unlock:
+       mutex_unlock(&uq_mgr->userq_mutex);
+       return r;
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+                      struct drm_file *filp) {
+       union drm_amdgpu_userq *args = data;
+       int r = 0;
+
+       switch (args->in.op) {
+       case AMDGPU_USERQ_OP_CREATE:
+               r = amdgpu_userqueue_create(filp, args);
+               if (r)
+                       DRM_ERROR("Failed to create usermode queue\n");
+               break;
+
+       case AMDGPU_USERQ_OP_FREE:
+               r = amdgpu_userqueue_destroy(filp, args->in.queue_id);
+               if (r)
+                       DRM_ERROR("Failed to destroy usermode queue\n");
+               break;
+
+       default:
+               DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
+               return -EINVAL;
+       }
+
+       return r;
+}

 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)  { diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 79ffa131a514..55ed6512a565 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -55,6 +55,8 @@ struct amdgpu_userq_mgr {
        const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];  };

+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct
+drm_file *filp);
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);

 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
--
2.42.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 5/9] drm/amdgpu: create context space for usermode queue
  2023-09-08 16:04 ` [PATCH v6 5/9] drm/amdgpu: create context space for " Shashank Sharma
@ 2023-09-20 15:21   ` Alex Deucher
  2023-09-29 17:50     ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Deucher @ 2023-09-20 15:21 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx

On Fri, Sep 8, 2023 at 12:45 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> The FW expects us to allocate at least one page as context
> space to process gang, process, GDS and FW  related work.
> This patch creates a joint object for the same, and calculates
> GPU space offsets of these spaces.
>
> V1: Addressed review comments on RFC patch:
>     Alex: Make this function IP specific
>
> V2: Addressed review comments from Christian
>     - Allocate only one object for total FW space, and calculate
>       offsets for each of these objects.
>
> V3: Integration with doorbell manager
>
> V4: Review comments:
>     - Remove shadow from FW space list from cover letter (Alex)
>     - Alignment of macro (Luben)
>
> V5: Merged patches 5 and 6 into this single patch
>     Addressed review comments:
>     - Use lower_32_bits instead of mask (Christian)
>     - gfx_v11_0 instead of gfx_v11 in function names (Alex)
>     - Shadow and GDS objects are now coming from userspace (Christian,
>       Alex)
>
> V6:
>     - Add a comment to replace amdgpu_bo_create_kernel() with
>       amdgpu_bo_create() during fw_ctx object creation (Christian).
>     - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
>       of generic queue structure and make it gen11 specific (Alex).
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 61 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>  2 files changed, 62 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 6760abda08df..8ffb5dee72a9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -61,6 +61,9 @@
>  #define regCGTT_WD_CLK_CTRL_BASE_IDX   1
>  #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1  0x4e7e
>  #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1_BASE_IDX 1
> +#define AMDGPU_USERQ_PROC_CTX_SZ   PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ   PAGE_SIZE
> +#define AMDGPU_USERQ_FW_CTX_SZ     PAGE_SIZE
>
>  MODULE_FIRMWARE("amdgpu/gc_11_0_0_pfp.bin");
>  MODULE_FIRMWARE("amdgpu/gc_11_0_0_me.bin");
> @@ -6424,6 +6427,56 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
>         .funcs = &gfx_v11_0_ip_funcs,
>  };
>
> +static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                             struct amdgpu_usermode_queue *queue)
> +{
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +
> +       amdgpu_bo_free_kernel(&ctx->obj, &ctx->gpu_addr, &ctx->cpu_ptr);
> +}
> +
> +static int gfx_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                           struct amdgpu_usermode_queue *queue,
> +                                           struct drm_amdgpu_userq_mqd_gfx_v11_0 *mqd_user)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
> +       uint64_t fw_ctx_gpu_addr;
> +       int r, size;
> +
> +       /*
> +        * The FW expects at least one page space allocated for
> +        * process ctx, gang ctx and fw ctx each. Create an object
> +        * for the same.
> +        */
> +       size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_FW_CTX_SZ +
> +              AMDGPU_USERQ_GANG_CTX_SZ;
> +       r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                   AMDGPU_GEM_DOMAIN_GTT,
> +                                   &ctx->obj,
> +                                   &ctx->gpu_addr,
> +                                   &ctx->cpu_ptr);
> +       if (r) {
> +               DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> +               return r;
> +       }
> +
> +       fw_ctx_gpu_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ +
> +                         AMDGPU_USERQ_GANG_CTX_SZ;
> +       mqd->fw_work_area_base_lo = lower_32_bits(fw_ctx_gpu_addr);
> +       mqd->fw_work_area_base_lo = upper_32_bits(fw_ctx_gpu_addr);
> +
> +       /* Shadow and GDS objects come directly from userspace */
> +       mqd->shadow_base_lo = lower_32_bits(mqd_user->shadow_va);
> +       mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> +
> +       mqd->gds_bkup_base_lo = lower_32_bits(mqd_user->gds_va);
> +       mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
> +
> +       return 0;
> +}
> +
>  static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                                       struct drm_amdgpu_userq_in *args_in,
>                                       struct amdgpu_usermode_queue *queue)
> @@ -6480,6 +6533,13 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                 goto free_mqd;
>         }
>
> +       /* Create BO for FW operations */
> +       r = gfx_v11_0_userq_create_ctx_space(uq_mgr, queue, &mqd_user);
> +       if (r) {
> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> +               goto free_mqd;
> +       }
> +
>         return 0;
>
>  free_mqd:
> @@ -6492,6 +6552,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
>  {
>         struct amdgpu_userq_obj *mqd = &queue->mqd;
>
> +       gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
>         amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>  }
>
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 240f92796f00..34e20daa06c8 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_vm        *vm;
>         struct amdgpu_userq_obj mqd;
> +       struct amdgpu_userq_obj fw_obj;

Since this is gfx 11 specific, I feel like this would be better stored
in some gfx 11 structure rather than the generic user queue structure.
Maybe a driver private pointer here would make more sense, then each
IP can hang whatever structure they want here for IP specific
metadata.

Alex


>  };
>
>  struct amdgpu_userq_funcs {
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES
  2023-09-08 16:04 ` [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
@ 2023-09-20 15:23   ` Alex Deucher
  2023-09-20 15:28   ` Alex Deucher
  1 sibling, 0 replies; 32+ messages in thread
From: Alex Deucher @ 2023-09-20 15:23 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx

On Fri, Sep 8, 2023 at 12:20 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch adds new functions to map/unmap a usermode queue into
> the FW, using the MES ring. As soon as this mapping is done, the
> queue would  be considered ready to accept the workload.
>
> V1: Addressed review comments from Alex on the RFC patch series
>     - Map/Unmap should be IP specific.
> V2:
>     Addressed review comments from Christian:
>     - Fix the wptr_mc_addr calculation (moved into another patch)
>     Addressed review comments from Alex:
>     - Do not add fptrs for map/unmap
>
> V3: Integration with doorbell manager
> V4: Rebase
> V5: Use gfx_v11_0 for function names (Alex)
> V6: Removed queue->proc/gang/fw_ctx_address variables and doing the
>     address calculations locally to keep the queue structure GEN
>     independent (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 72 ++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 8ffb5dee72a9..e266674e0d44 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -6427,6 +6427,67 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
>         .funcs = &gfx_v11_0_ip_funcs,
>  };
>
> +static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
> +                                 struct amdgpu_usermode_queue *queue)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct mes_remove_queue_input queue_input;
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       int r;
> +
> +       memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
> +       queue_input.doorbell_offset = queue->doorbell_index;
> +       queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
> +
> +       amdgpu_mes_lock(&adev->mes);
> +       r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
> +       amdgpu_mes_unlock(&adev->mes);
> +       if (r)
> +               DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
> +}
> +
> +static int gfx_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
> +                              struct amdgpu_usermode_queue *queue,
> +                              struct amdgpu_mqd_prop *userq_props)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       struct mes_add_queue_input queue_input;
> +       int r;
> +
> +       memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
> +
> +       queue_input.process_va_start = 0;
> +       queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
> +       queue_input.process_quantum = 100000; /* 10ms */
> +       queue_input.gang_quantum = 10000; /* 1ms */
> +       queue_input.paging = false;
> +
> +       queue_input.process_context_addr = ctx->gpu_addr;
> +       queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
> +       queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +       queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +
> +       queue_input.process_id = queue->vm->pasid;
> +       queue_input.queue_type = queue->queue_type;
> +       queue_input.mqd_addr = queue->mqd.gpu_addr;
> +       queue_input.wptr_addr = userq_props->wptr_gpu_addr;
> +       queue_input.queue_size = userq_props->queue_size >> 2;
> +       queue_input.doorbell_offset = userq_props->doorbell_index;
> +       queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +
> +       amdgpu_mes_lock(&adev->mes);
> +       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> +       amdgpu_mes_unlock(&adev->mes);
> +       if (r) {
> +               DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
> +               return r;
> +       }
> +
> +       DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", userq_props->doorbell_index);
> +       return 0;
> +}
> +
>  static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>                                               struct amdgpu_usermode_queue *queue)
>  {
> @@ -6540,8 +6601,18 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                 goto free_mqd;
>         }
>
> +       /* Map userqueue into FW using MES */
> +       r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
> +       if (r) {
> +               DRM_ERROR("Failed to init MQD\n");
> +               goto free_ctx;
> +       }
> +
>         return 0;
>
> +free_ctx:
> +       gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
> +
>  free_mqd:
>         amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, &queue->mqd.cpu_ptr);
>         return r;
> @@ -6552,6 +6623,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
>  {
>         struct amdgpu_userq_obj *mqd = &queue->mqd;
>
> +       gfx_v11_0_userq_unmap(uq_mgr, queue);
>         gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
>         amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>  }
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 8/9] drm/amdgpu: generate doorbell index for userqueue
  2023-09-08 16:04 ` [PATCH v6 8/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
@ 2023-09-20 15:25   ` Alex Deucher
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Deucher @ 2023-09-20 15:25 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx

On Fri, Sep 8, 2023 at 11:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> The userspace sends us the doorbell object and the relative doobell
> index in the object to be used for the usermode queue, but the FW
> expects the absolute doorbell index on the PCI BAR in the MQD. This
> patch adds a function to convert this relative doorbell index to
> absolute doorbell index.
>
> This patch is dependent on the doorbell manager series which is
> expected to be merged soon:
> Link: https://patchwork.freedesktop.org/series/115802/
>
> V5: Fix the db object reference leak (Christian)
> V6: Pin the doorbell bo in userqueue_create() function, and unpin it
>     in userqueue destoy (Christian)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 40 +++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  1 +
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>  3 files changed, 42 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 03fc8e89eafb..a311d4949bb8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -32,6 +32,35 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>         return idr_find(&uq_mgr->userq_idr, qid);
>  }
>
> +static uint64_t
> +amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
> +                                    struct amdgpu_usermode_queue *queue,
> +                                    struct drm_file *filp,
> +                                    uint32_t doorbell_offset)
> +{
> +       uint64_t index;
> +       struct drm_gem_object *gobj;
> +
> +       gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
> +       if (gobj == NULL) {
> +               DRM_ERROR("Can't find GEM object for doorbell\n");
> +               return -EINVAL;
> +       }
> +
> +       queue->db_bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
> +       drm_gem_object_put(gobj);
> +
> +       /* Pin the BO before generating the index, unpin in queue destroy */
> +       if (amdgpu_bo_pin(queue->db_bo, AMDGPU_GEM_DOMAIN_DOORBELL)) {
> +               DRM_ERROR("[Usermode queues] Failed to pin doorbell object\n");
> +               return -EINVAL;
> +       }
> +
> +       index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, queue->db_bo, doorbell_offset);
> +       DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
> +       return index;
> +}
> +
>  static int
>  amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>  {
> @@ -50,6 +79,8 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>         }
>         uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
>         uq_funcs->mqd_destroy(uq_mgr, queue);
> +       amdgpu_bo_unpin(queue->db_bo);
> +       amdgpu_bo_unref(&queue->db_bo);
>         idr_remove(&uq_mgr->userq_idr, queue_id);
>         kfree(queue);
>
> @@ -64,6 +95,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>         struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>         const struct amdgpu_userq_funcs *uq_funcs;
>         struct amdgpu_usermode_queue *queue;
> +       uint64_t index;
>         int qid, r = 0;
>
>         /* Usermode queues are only supported for GFX/SDMA engines as of now */
> @@ -93,6 +125,14 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>         queue->flags = args->in.flags;
>         queue->vm = &fpriv->vm;
>
> +       /* Convert relative doorbell offset into absolute doorbell index */
> +       index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, args->in.doorbell_offset);
> +       if (index == (uint64_t)-EINVAL) {
> +               DRM_ERROR("Failed to get doorbell for queue\n");
> +               goto unlock;
> +       }
> +       queue->doorbell_index = index;
> +
>         r = uq_funcs->mqd_create(uq_mgr, &args->in, queue);
>         if (r) {
>                 DRM_ERROR("Failed to create Queue\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index c0eb622dfc37..d855df9b7b37 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -6660,6 +6660,7 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>         userq_props.queue_size = mqd_user.queue_size;
>         userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
>         userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
> +       userq_props.doorbell_index = queue->doorbell_index;
>         userq_props.use_doorbell = true;
>
>         r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr, &userq_props);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index ae155de62560..b5600cff086e 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_mqd_prop  *userq_prop;
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_vm        *vm;
> +       struct amdgpu_bo        *db_bo;
>         struct amdgpu_userq_obj mqd;
>         struct amdgpu_userq_obj fw_obj;
>  };
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 9/9] drm/amdgpu: cleanup leftover queues
  2023-09-08 16:04 ` [PATCH v6 9/9] drm/amdgpu: cleanup leftover queues Shashank Sharma
@ 2023-09-20 15:26   ` Alex Deucher
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Deucher @ 2023-09-20 15:26 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx, Bas Nieuwenhuizen

On Fri, Sep 8, 2023 at 12:25 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch adds code to cleanup any leftover userqueues which
> a user might have missed to destroy due to a crash or any other
> programming error.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 25 +++++++++++++++----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index a311d4949bb8..0c78579b0791 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -61,12 +61,23 @@ amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
>         return index;
>  }
>
> +static void
> +amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr *uq_mgr,
> +                        struct amdgpu_usermode_queue *queue,
> +                        int queue_id)
> +{
> +       const struct amdgpu_userq_funcs *uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
> +
> +       uq_funcs->mqd_destroy(uq_mgr, queue);
> +       idr_remove(&uq_mgr->userq_idr, queue_id);
> +       kfree(queue);
> +}
> +
>  static int
>  amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>  {
>         struct amdgpu_fpriv *fpriv = filp->driver_priv;
>         struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> -       const struct amdgpu_userq_funcs *uq_funcs;
>         struct amdgpu_usermode_queue *queue;
>
>         mutex_lock(&uq_mgr->userq_mutex);
> @@ -77,12 +88,10 @@ amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>                 mutex_unlock(&uq_mgr->userq_mutex);
>                 return -EINVAL;
>         }
> -       uq_funcs = uq_mgr->userq_funcs[queue->queue_type];
> -       uq_funcs->mqd_destroy(uq_mgr, queue);
> +
>         amdgpu_bo_unpin(queue->db_bo);
>         amdgpu_bo_unref(&queue->db_bo);
> -       idr_remove(&uq_mgr->userq_idr, queue_id);
> -       kfree(queue);
> +       amdgpu_userqueue_cleanup(uq_mgr, queue, queue_id);
>
>         mutex_unlock(&uq_mgr->userq_mutex);
>         return 0;
> @@ -207,6 +216,12 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>
>  void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>  {
> +       uint32_t queue_id;
> +       struct amdgpu_usermode_queue *queue;
> +
> +       idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id)
> +               amdgpu_userqueue_cleanup(userq_mgr, queue, queue_id);
> +
>         idr_destroy(&userq_mgr->userq_idr);
>         mutex_destroy(&userq_mgr->userq_mutex);
>  }
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES
  2023-09-08 16:04 ` [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
  2023-09-20 15:23   ` Alex Deucher
@ 2023-09-20 15:28   ` Alex Deucher
  1 sibling, 0 replies; 32+ messages in thread
From: Alex Deucher @ 2023-09-20 15:28 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx

On Fri, Sep 8, 2023 at 12:20 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch adds new functions to map/unmap a usermode queue into
> the FW, using the MES ring. As soon as this mapping is done, the
> queue would  be considered ready to accept the workload.
>
> V1: Addressed review comments from Alex on the RFC patch series
>     - Map/Unmap should be IP specific.
> V2:
>     Addressed review comments from Christian:
>     - Fix the wptr_mc_addr calculation (moved into another patch)
>     Addressed review comments from Alex:
>     - Do not add fptrs for map/unmap
>
> V3: Integration with doorbell manager
> V4: Rebase
> V5: Use gfx_v11_0 for function names (Alex)
> V6: Removed queue->proc/gang/fw_ctx_address variables and doing the
>     address calculations locally to keep the queue structure GEN
>     independent (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 72 ++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 8ffb5dee72a9..e266674e0d44 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -6427,6 +6427,67 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
>         .funcs = &gfx_v11_0_ip_funcs,
>  };
>
> +static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
> +                                 struct amdgpu_usermode_queue *queue)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct mes_remove_queue_input queue_input;
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       int r;
> +
> +       memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
> +       queue_input.doorbell_offset = queue->doorbell_index;
> +       queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
> +
> +       amdgpu_mes_lock(&adev->mes);
> +       r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
> +       amdgpu_mes_unlock(&adev->mes);
> +       if (r)
> +               DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
> +}
> +
> +static int gfx_v11_0_userq_map(struct amdgpu_userq_mgr *uq_mgr,
> +                              struct amdgpu_usermode_queue *queue,
> +                              struct amdgpu_mqd_prop *userq_props)
> +{
> +       struct amdgpu_device *adev = uq_mgr->adev;
> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> +       struct mes_add_queue_input queue_input;
> +       int r;
> +
> +       memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
> +
> +       queue_input.process_va_start = 0;
> +       queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
> +       queue_input.process_quantum = 100000; /* 10ms */
> +       queue_input.gang_quantum = 10000; /* 1ms */
> +       queue_input.paging = false;
> +
> +       queue_input.process_context_addr = ctx->gpu_addr;
> +       queue_input.gang_context_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
> +       queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +       queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;

I can't remember, did we have a plan for priority handling?
Compositors would want high priority queues for example.

Alex

> +
> +       queue_input.process_id = queue->vm->pasid;
> +       queue_input.queue_type = queue->queue_type;
> +       queue_input.mqd_addr = queue->mqd.gpu_addr;
> +       queue_input.wptr_addr = userq_props->wptr_gpu_addr;
> +       queue_input.queue_size = userq_props->queue_size >> 2;
> +       queue_input.doorbell_offset = userq_props->doorbell_index;
> +       queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +
> +       amdgpu_mes_lock(&adev->mes);
> +       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> +       amdgpu_mes_unlock(&adev->mes);
> +       if (r) {
> +               DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
> +               return r;
> +       }
> +
> +       DRM_DEBUG_DRIVER("Queue (doorbell:%d) mapped successfully\n", userq_props->doorbell_index);
> +       return 0;
> +}
> +
>  static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>                                               struct amdgpu_usermode_queue *queue)
>  {
> @@ -6540,8 +6601,18 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>                 goto free_mqd;
>         }
>
> +       /* Map userqueue into FW using MES */
> +       r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
> +       if (r) {
> +               DRM_ERROR("Failed to init MQD\n");
> +               goto free_ctx;
> +       }
> +
>         return 0;
>
> +free_ctx:
> +       gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
> +
>  free_mqd:
>         amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, &queue->mqd.cpu_ptr);
>         return r;
> @@ -6552,6 +6623,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
>  {
>         struct amdgpu_userq_obj *mqd = &queue->mqd;
>
> +       gfx_v11_0_userq_unmap(uq_mgr, queue);
>         gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
>         amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>  }
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-14  8:24     ` Shashank Sharma
@ 2023-09-28 13:11       ` Shashank Sharma
  2023-09-28 13:27         ` Alex Deucher
  0 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-28 13:11 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: Alex Deucher, arvind.yadav


On 14/09/2023 10:24, Shashank Sharma wrote:
>
> On 14/09/2023 09:45, Christian König wrote:
>> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>>> A Memory queue descriptor (MQD) of a userqueue defines it in
>>> the hw's context. As MQD format can vary between different
>>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>>
>>> This patch:
>>> - Introduces MQD handler functions for the usermode queues.
>>> - Adds new functions to create and destroy userqueue MQD for
>>>    GFX-GEN-11 IP
>>>
>>> V1: Worked on review comments from Alex:
>>>      - Make MQD functions GEN and IP specific
>>>
>>> V2: Worked on review comments from Alex:
>>>      - Reuse the existing adev->mqd[ip] for MQD creation
>>>      - Formatting and arrangement of code
>>>
>>> V3:
>>>      - Integration with doorbell manager
>>>
>>> V4: Review comments addressed:
>>>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>>>      - Align name of structure members (Luben)
>>>      - Don't break up the Cc tag list and the Sob tag list in commit
>>>        message (Luben)
>>> V5:
>>>     - No need to reserve the bo for MQD (Christian).
>>>     - Some more changes to support IP specific MQD creation.
>>>
>>> V6:
>>>     - Add a comment reminding us to replace the 
>>> amdgpu_bo_create_kernel()
>>>       calls while creating MQD object to amdgpu_bo_create() once 
>>> eviction
>>>       fences are ready (Christian).
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77 
>>> +++++++++++++++++++
>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>>>   3 files changed, 100 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> index 44769423ba30..03fc8e89eafb 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev, 
>>> void *data,
>>>       return r;
>>>   }
>>>   +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
>>> +
>>> +static void
>>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
>>> +{
>>> +    int maj;
>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>> +
>>> +    /* We support usermode queue only for GFX V11 as of now */
>>> +    maj = IP_VERSION_MAJ(version);
>>> +    if (maj == 11)
>>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
>>> +}
>>
>> That belongs into gfx_v11.c and not here.
>
>
> Agree,

On a second thought, we can't move it to gfx_v11.c, as this is the place 
where we are setting up the gfx_userqueue functions in fpriv->uq_mgr() 
for the first time, and we do not have another option but to check the 
IP and setup the functions here. The only other option to do this will 
be to move  uq_mgr->userq_funcs to adev->gfx.userq_funcs and setup them 
with the IP init (as Alex once suggested). Please let me know your 
thoughts on this.

- Shashank

>
>>
>>> +
>>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>>> struct amdgpu_device *adev)
>>>   {
>>>       mutex_init(&userq_mgr->userq_mutex);
>>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>>       userq_mgr->adev = adev;
>>>   +    amdgpu_userqueue_setup_gfx(userq_mgr);
>>>       return 0;
>>>   }
>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>> index 0451533ddde4..6760abda08df 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>> @@ -30,6 +30,7 @@
>>>   #include "amdgpu_psp.h"
>>>   #include "amdgpu_smu.h"
>>>   #include "amdgpu_atomfirmware.h"
>>> +#include "amdgpu_userqueue.h"
>>>   #include "imu_v11_0.h"
>>>   #include "soc21.h"
>>>   #include "nvd.h"
>>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version 
>>> gfx_v11_0_ip_block =
>>>       .rev = 0,
>>>       .funcs = &gfx_v11_0_ip_funcs,
>>>   };
>>> +
>>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>> +                      struct drm_amdgpu_userq_in *args_in,
>>> +                      struct amdgpu_usermode_queue *queue)
>>> +{
>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>> +    struct amdgpu_mqd *mqd_gfx_generic = 
>>> &adev->mqds[AMDGPU_HW_IP_GFX];
>>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
>>> +    struct amdgpu_mqd_prop userq_props;
>>> +    int r;
>>> +
>>> +    /* Incoming MQD parameters from userspace to be saved here */
>>> +    memset(&mqd_user, 0, sizeof(mqd_user));
>>> +
>>> +    /* Structure to initialize MQD for userqueue using generic MQD 
>>> init function */
>>> +    memset(&userq_props, 0, sizeof(userq_props));
>>> +
>>> +    if (args_in->mqd_size != sizeof(struct 
>>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
>>> +        DRM_ERROR("MQD size mismatch\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd), 
>>> args_in->mqd_size)) {
>>> +        DRM_ERROR("Failed to get user MQD\n");
>>> +        return -EFAULT;
>>> +    }
>>> +
>>> +    /*
>>> +     * Create BO for actual Userqueue MQD now
>>> +     * Todo: replace the calls to bo_create_kernel() with 
>>> bo_create() and use
>>> +     * implicit pinning for the MQD buffers.
>>
>> Well not implicit pinning, but rather fencing of the BO.
>>
> Noted.
>
> - Shashank
>
>
>> Regards,
>> Christian.
>>
>>> +     */
>>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size, 
>>> PAGE_SIZE,
>>> +                    AMDGPU_GEM_DOMAIN_GTT,
>>> +                    &queue->mqd.obj,
>>> +                    &queue->mqd.gpu_addr,
>>> +                    &queue->mqd.cpu_ptr);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>>> +        return -ENOMEM;
>>> +    }
>>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
>>> +
>>> +    /* Initialize the MQD BO with user given values */
>>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
>>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
>>> +    userq_props.queue_size = mqd_user.queue_size;
>>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
>>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
>>> +    userq_props.use_doorbell = true;
>>> +
>>> +    r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
>>> &userq_props);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
>>> +        goto free_mqd;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +free_mqd:
>>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr, 
>>> &queue->mqd.cpu_ptr);
>>> +    return r;
>>> +}
>>> +
>>> +static void
>>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct 
>>> amdgpu_usermode_queue *queue)
>>> +{
>>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
>>> +
>>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>>> +}
>>> +
>>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
>>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
>>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
>>> +};
>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> index 55ed6512a565..240f92796f00 100644
>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> @@ -29,6 +29,12 @@
>>>     struct amdgpu_mqd_prop;
>>>   +struct amdgpu_userq_obj {
>>> +    void         *cpu_ptr;
>>> +    uint64_t     gpu_addr;
>>> +    struct amdgpu_bo *obj;
>>> +};
>>> +
>>>   struct amdgpu_usermode_queue {
>>>       int            queue_type;
>>>       uint64_t        doorbell_handle;
>>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>>>       struct amdgpu_mqd_prop    *userq_prop;
>>>       struct amdgpu_userq_mgr *userq_mgr;
>>>       struct amdgpu_vm    *vm;
>>> +    struct amdgpu_userq_obj mqd;
>>>   };
>>>     struct amdgpu_userq_funcs {
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-28 13:11       ` Shashank Sharma
@ 2023-09-28 13:27         ` Alex Deucher
  2023-09-28 13:40           ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Deucher @ 2023-09-28 13:27 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian König, amd-gfx

On Thu, Sep 28, 2023 at 9:22 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 14/09/2023 10:24, Shashank Sharma wrote:
> >
> > On 14/09/2023 09:45, Christian König wrote:
> >> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
> >>> A Memory queue descriptor (MQD) of a userqueue defines it in
> >>> the hw's context. As MQD format can vary between different
> >>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
> >>>
> >>> This patch:
> >>> - Introduces MQD handler functions for the usermode queues.
> >>> - Adds new functions to create and destroy userqueue MQD for
> >>>    GFX-GEN-11 IP
> >>>
> >>> V1: Worked on review comments from Alex:
> >>>      - Make MQD functions GEN and IP specific
> >>>
> >>> V2: Worked on review comments from Alex:
> >>>      - Reuse the existing adev->mqd[ip] for MQD creation
> >>>      - Formatting and arrangement of code
> >>>
> >>> V3:
> >>>      - Integration with doorbell manager
> >>>
> >>> V4: Review comments addressed:
> >>>      - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
> >>>      - Align name of structure members (Luben)
> >>>      - Don't break up the Cc tag list and the Sob tag list in commit
> >>>        message (Luben)
> >>> V5:
> >>>     - No need to reserve the bo for MQD (Christian).
> >>>     - Some more changes to support IP specific MQD creation.
> >>>
> >>> V6:
> >>>     - Add a comment reminding us to replace the
> >>> amdgpu_bo_create_kernel()
> >>>       calls while creating MQD object to amdgpu_bo_create() once
> >>> eviction
> >>>       fences are ready (Christian).
> >>>
> >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
> >>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77
> >>> +++++++++++++++++++
> >>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
> >>>   3 files changed, 100 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>> index 44769423ba30..03fc8e89eafb 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev,
> >>> void *data,
> >>>       return r;
> >>>   }
> >>>   +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
> >>> +
> >>> +static void
> >>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
> >>> +{
> >>> +    int maj;
> >>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> >>> +
> >>> +    /* We support usermode queue only for GFX V11 as of now */
> >>> +    maj = IP_VERSION_MAJ(version);
> >>> +    if (maj == 11)
> >>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
> >>> +}
> >>
> >> That belongs into gfx_v11.c and not here.
> >
> >
> > Agree,
>
> On a second thought, we can't move it to gfx_v11.c, as this is the place
> where we are setting up the gfx_userqueue functions in fpriv->uq_mgr()
> for the first time, and we do not have another option but to check the
> IP and setup the functions here. The only other option to do this will
> be to move  uq_mgr->userq_funcs to adev->gfx.userq_funcs and setup them
> with the IP init (as Alex once suggested). Please let me know your
> thoughts on this.

That seems cleaner to me.  They should be global anyway and could be
set as part of the individual IP init sequences.  Then the presence of
a pointer could be used to determine whether or not a particular IP
type supports user queues.

Alex


>
> - Shashank
>
> >
> >>
> >>> +
> >>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr,
> >>> struct amdgpu_device *adev)
> >>>   {
> >>>       mutex_init(&userq_mgr->userq_mutex);
> >>>       idr_init_base(&userq_mgr->userq_idr, 1);
> >>>       userq_mgr->adev = adev;
> >>>   +    amdgpu_userqueue_setup_gfx(userq_mgr);
> >>>       return 0;
> >>>   }
> >>>   diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>> index 0451533ddde4..6760abda08df 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>> @@ -30,6 +30,7 @@
> >>>   #include "amdgpu_psp.h"
> >>>   #include "amdgpu_smu.h"
> >>>   #include "amdgpu_atomfirmware.h"
> >>> +#include "amdgpu_userqueue.h"
> >>>   #include "imu_v11_0.h"
> >>>   #include "soc21.h"
> >>>   #include "nvd.h"
> >>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version
> >>> gfx_v11_0_ip_block =
> >>>       .rev = 0,
> >>>       .funcs = &gfx_v11_0_ip_funcs,
> >>>   };
> >>> +
> >>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> >>> +                      struct drm_amdgpu_userq_in *args_in,
> >>> +                      struct amdgpu_usermode_queue *queue)
> >>> +{
> >>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>> +    struct amdgpu_mqd *mqd_gfx_generic =
> >>> &adev->mqds[AMDGPU_HW_IP_GFX];
> >>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
> >>> +    struct amdgpu_mqd_prop userq_props;
> >>> +    int r;
> >>> +
> >>> +    /* Incoming MQD parameters from userspace to be saved here */
> >>> +    memset(&mqd_user, 0, sizeof(mqd_user));
> >>> +
> >>> +    /* Structure to initialize MQD for userqueue using generic MQD
> >>> init function */
> >>> +    memset(&userq_props, 0, sizeof(userq_props));
> >>> +
> >>> +    if (args_in->mqd_size != sizeof(struct
> >>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
> >>> +        DRM_ERROR("MQD size mismatch\n");
> >>> +        return -EINVAL;
> >>> +    }
> >>> +
> >>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd),
> >>> args_in->mqd_size)) {
> >>> +        DRM_ERROR("Failed to get user MQD\n");
> >>> +        return -EFAULT;
> >>> +    }
> >>> +
> >>> +    /*
> >>> +     * Create BO for actual Userqueue MQD now
> >>> +     * Todo: replace the calls to bo_create_kernel() with
> >>> bo_create() and use
> >>> +     * implicit pinning for the MQD buffers.
> >>
> >> Well not implicit pinning, but rather fencing of the BO.
> >>
> > Noted.
> >
> > - Shashank
> >
> >
> >> Regards,
> >> Christian.
> >>
> >>> +     */
> >>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size,
> >>> PAGE_SIZE,
> >>> +                    AMDGPU_GEM_DOMAIN_GTT,
> >>> +                    &queue->mqd.obj,
> >>> +                    &queue->mqd.gpu_addr,
> >>> +                    &queue->mqd.cpu_ptr);
> >>> +    if (r) {
> >>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> >>> +        return -ENOMEM;
> >>> +    }
> >>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
> >>> +
> >>> +    /* Initialize the MQD BO with user given values */
> >>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
> >>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
> >>> +    userq_props.queue_size = mqd_user.queue_size;
> >>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
> >>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
> >>> +    userq_props.use_doorbell = true;
> >>> +
> >>> +    r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr,
> >>> &userq_props);
> >>> +    if (r) {
> >>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
> >>> +        goto free_mqd;
> >>> +    }
> >>> +
> >>> +    return 0;
> >>> +
> >>> +free_mqd:
> >>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr,
> >>> &queue->mqd.cpu_ptr);
> >>> +    return r;
> >>> +}
> >>> +
> >>> +static void
> >>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct
> >>> amdgpu_usermode_queue *queue)
> >>> +{
> >>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
> >>> +
> >>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
> >>> +}
> >>> +
> >>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
> >>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
> >>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
> >>> +};
> >>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>> index 55ed6512a565..240f92796f00 100644
> >>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>> @@ -29,6 +29,12 @@
> >>>     struct amdgpu_mqd_prop;
> >>>   +struct amdgpu_userq_obj {
> >>> +    void         *cpu_ptr;
> >>> +    uint64_t     gpu_addr;
> >>> +    struct amdgpu_bo *obj;
> >>> +};
> >>> +
> >>>   struct amdgpu_usermode_queue {
> >>>       int            queue_type;
> >>>       uint64_t        doorbell_handle;
> >>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
> >>>       struct amdgpu_mqd_prop    *userq_prop;
> >>>       struct amdgpu_userq_mgr *userq_mgr;
> >>>       struct amdgpu_vm    *vm;
> >>> +    struct amdgpu_userq_obj mqd;
> >>>   };
> >>>     struct amdgpu_userq_funcs {
> >>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-28 13:27         ` Alex Deucher
@ 2023-09-28 13:40           ` Shashank Sharma
  2023-09-28 13:52             ` Alex Deucher
  0 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-28 13:40 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, arvind.yadav, Christian König, amd-gfx


On 28/09/2023 15:27, Alex Deucher wrote:
> On Thu, Sep 28, 2023 at 9:22 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 14/09/2023 10:24, Shashank Sharma wrote:
>>> On 14/09/2023 09:45, Christian König wrote:
>>>> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>>>>> A Memory queue descriptor (MQD) of a userqueue defines it in
>>>>> the hw's context. As MQD format can vary between different
>>>>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>>>>
>>>>> This patch:
>>>>> - Introduces MQD handler functions for the usermode queues.
>>>>> - Adds new functions to create and destroy userqueue MQD for
>>>>>     GFX-GEN-11 IP
>>>>>
>>>>> V1: Worked on review comments from Alex:
>>>>>       - Make MQD functions GEN and IP specific
>>>>>
>>>>> V2: Worked on review comments from Alex:
>>>>>       - Reuse the existing adev->mqd[ip] for MQD creation
>>>>>       - Formatting and arrangement of code
>>>>>
>>>>> V3:
>>>>>       - Integration with doorbell manager
>>>>>
>>>>> V4: Review comments addressed:
>>>>>       - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>>>>>       - Align name of structure members (Luben)
>>>>>       - Don't break up the Cc tag list and the Sob tag list in commit
>>>>>         message (Luben)
>>>>> V5:
>>>>>      - No need to reserve the bo for MQD (Christian).
>>>>>      - Some more changes to support IP specific MQD creation.
>>>>>
>>>>> V6:
>>>>>      - Add a comment reminding us to replace the
>>>>> amdgpu_bo_create_kernel()
>>>>>        calls while creating MQD object to amdgpu_bo_create() once
>>>>> eviction
>>>>>        fences are ready (Christian).
>>>>>
>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
>>>>>    drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77
>>>>> +++++++++++++++++++
>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>>>>>    3 files changed, 100 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>> index 44769423ba30..03fc8e89eafb 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev,
>>>>> void *data,
>>>>>        return r;
>>>>>    }
>>>>>    +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
>>>>> +
>>>>> +static void
>>>>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
>>>>> +{
>>>>> +    int maj;
>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>>>> +
>>>>> +    /* We support usermode queue only for GFX V11 as of now */
>>>>> +    maj = IP_VERSION_MAJ(version);
>>>>> +    if (maj == 11)
>>>>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
>>>>> +}
>>>> That belongs into gfx_v11.c and not here.
>>>
>>> Agree,
>> On a second thought, we can't move it to gfx_v11.c, as this is the place
>> where we are setting up the gfx_userqueue functions in fpriv->uq_mgr()
>> for the first time, and we do not have another option but to check the
>> IP and setup the functions here. The only other option to do this will
>> be to move  uq_mgr->userq_funcs to adev->gfx.userq_funcs and setup them
>> with the IP init (as Alex once suggested). Please let me know your
>> thoughts on this.
> That seems cleaner to me.  They should be global anyway and could be
> set as part of the individual IP init sequences.  Then the presence of
> a pointer could be used to determine whether or not a particular IP
> type supports user queues.
>
> Alex
>
So if I understand this correctly, this is how we are looking to arrange 
the userqueue IP functions:

- Presence of adev->gfx.funcs.userqueue_funcs() will decide if this IP 
supports userqueue or not.

- sw_init function of the IP will setup these fptrs like:

   in gfx_v11_0_sw_init :

     if (GFX_MAJ == 11)

         adev->gfx.funcs.userqueue_funcs = gfx_v11_0_userqueue_funcs


In amdgpu_userqueue_ioctl:

     |

CASE: create:

     amdgpu_userqueue_create()

     if (adev->gfx.funcs.userqueue_funcs) {

         adev->gfx.funcs.userqueue_funcs.create_mqd();

     }


CASE: destroy:

     amdgpu_userqueue_destroy()

     if (adev->gfx.funcs.userqueue_funcs) {

         adev->gfx.funcs.userqueue_funcs.destroy_mqd();

     }

and so on ...

Am I getting this right ?

- Shashank

>> - Shashank
>>
>>>>> +
>>>>>    int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr,
>>>>> struct amdgpu_device *adev)
>>>>>    {
>>>>>        mutex_init(&userq_mgr->userq_mutex);
>>>>>        idr_init_base(&userq_mgr->userq_idr, 1);
>>>>>        userq_mgr->adev = adev;
>>>>>    +    amdgpu_userqueue_setup_gfx(userq_mgr);
>>>>>        return 0;
>>>>>    }
>>>>>    diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>> index 0451533ddde4..6760abda08df 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>> @@ -30,6 +30,7 @@
>>>>>    #include "amdgpu_psp.h"
>>>>>    #include "amdgpu_smu.h"
>>>>>    #include "amdgpu_atomfirmware.h"
>>>>> +#include "amdgpu_userqueue.h"
>>>>>    #include "imu_v11_0.h"
>>>>>    #include "soc21.h"
>>>>>    #include "nvd.h"
>>>>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version
>>>>> gfx_v11_0_ip_block =
>>>>>        .rev = 0,
>>>>>        .funcs = &gfx_v11_0_ip_funcs,
>>>>>    };
>>>>> +
>>>>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>>>> +                      struct drm_amdgpu_userq_in *args_in,
>>>>> +                      struct amdgpu_usermode_queue *queue)
>>>>> +{
>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>> +    struct amdgpu_mqd *mqd_gfx_generic =
>>>>> &adev->mqds[AMDGPU_HW_IP_GFX];
>>>>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
>>>>> +    struct amdgpu_mqd_prop userq_props;
>>>>> +    int r;
>>>>> +
>>>>> +    /* Incoming MQD parameters from userspace to be saved here */
>>>>> +    memset(&mqd_user, 0, sizeof(mqd_user));
>>>>> +
>>>>> +    /* Structure to initialize MQD for userqueue using generic MQD
>>>>> init function */
>>>>> +    memset(&userq_props, 0, sizeof(userq_props));
>>>>> +
>>>>> +    if (args_in->mqd_size != sizeof(struct
>>>>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
>>>>> +        DRM_ERROR("MQD size mismatch\n");
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +
>>>>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd),
>>>>> args_in->mqd_size)) {
>>>>> +        DRM_ERROR("Failed to get user MQD\n");
>>>>> +        return -EFAULT;
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * Create BO for actual Userqueue MQD now
>>>>> +     * Todo: replace the calls to bo_create_kernel() with
>>>>> bo_create() and use
>>>>> +     * implicit pinning for the MQD buffers.
>>>> Well not implicit pinning, but rather fencing of the BO.
>>>>
>>> Noted.
>>>
>>> - Shashank
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> +     */
>>>>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size,
>>>>> PAGE_SIZE,
>>>>> +                    AMDGPU_GEM_DOMAIN_GTT,
>>>>> +                    &queue->mqd.obj,
>>>>> +                    &queue->mqd.gpu_addr,
>>>>> +                    &queue->mqd.cpu_ptr);
>>>>> +    if (r) {
>>>>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>>>>> +        return -ENOMEM;
>>>>> +    }
>>>>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
>>>>> +
>>>>> +    /* Initialize the MQD BO with user given values */
>>>>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
>>>>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
>>>>> +    userq_props.queue_size = mqd_user.queue_size;
>>>>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
>>>>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
>>>>> +    userq_props.use_doorbell = true;
>>>>> +
>>>>> +    r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr,
>>>>> &userq_props);
>>>>> +    if (r) {
>>>>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
>>>>> +        goto free_mqd;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +
>>>>> +free_mqd:
>>>>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr,
>>>>> &queue->mqd.cpu_ptr);
>>>>> +    return r;
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct
>>>>> amdgpu_usermode_queue *queue)
>>>>> +{
>>>>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
>>>>> +
>>>>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>>>>> +}
>>>>> +
>>>>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
>>>>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
>>>>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
>>>>> +};
>>>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>> index 55ed6512a565..240f92796f00 100644
>>>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>> @@ -29,6 +29,12 @@
>>>>>      struct amdgpu_mqd_prop;
>>>>>    +struct amdgpu_userq_obj {
>>>>> +    void         *cpu_ptr;
>>>>> +    uint64_t     gpu_addr;
>>>>> +    struct amdgpu_bo *obj;
>>>>> +};
>>>>> +
>>>>>    struct amdgpu_usermode_queue {
>>>>>        int            queue_type;
>>>>>        uint64_t        doorbell_handle;
>>>>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>>>>>        struct amdgpu_mqd_prop    *userq_prop;
>>>>>        struct amdgpu_userq_mgr *userq_mgr;
>>>>>        struct amdgpu_vm    *vm;
>>>>> +    struct amdgpu_userq_obj mqd;
>>>>>    };
>>>>>      struct amdgpu_userq_funcs {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-28 13:40           ` Shashank Sharma
@ 2023-09-28 13:52             ` Alex Deucher
  2023-09-28 13:54               ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Deucher @ 2023-09-28 13:52 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian König, amd-gfx

On Thu, Sep 28, 2023 at 9:40 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 28/09/2023 15:27, Alex Deucher wrote:
> > On Thu, Sep 28, 2023 at 9:22 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>
> >> On 14/09/2023 10:24, Shashank Sharma wrote:
> >>> On 14/09/2023 09:45, Christian König wrote:
> >>>> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
> >>>>> A Memory queue descriptor (MQD) of a userqueue defines it in
> >>>>> the hw's context. As MQD format can vary between different
> >>>>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
> >>>>>
> >>>>> This patch:
> >>>>> - Introduces MQD handler functions for the usermode queues.
> >>>>> - Adds new functions to create and destroy userqueue MQD for
> >>>>>     GFX-GEN-11 IP
> >>>>>
> >>>>> V1: Worked on review comments from Alex:
> >>>>>       - Make MQD functions GEN and IP specific
> >>>>>
> >>>>> V2: Worked on review comments from Alex:
> >>>>>       - Reuse the existing adev->mqd[ip] for MQD creation
> >>>>>       - Formatting and arrangement of code
> >>>>>
> >>>>> V3:
> >>>>>       - Integration with doorbell manager
> >>>>>
> >>>>> V4: Review comments addressed:
> >>>>>       - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
> >>>>>       - Align name of structure members (Luben)
> >>>>>       - Don't break up the Cc tag list and the Sob tag list in commit
> >>>>>         message (Luben)
> >>>>> V5:
> >>>>>      - No need to reserve the bo for MQD (Christian).
> >>>>>      - Some more changes to support IP specific MQD creation.
> >>>>>
> >>>>> V6:
> >>>>>      - Add a comment reminding us to replace the
> >>>>> amdgpu_bo_create_kernel()
> >>>>>        calls while creating MQD object to amdgpu_bo_create() once
> >>>>> eviction
> >>>>>        fences are ready (Christian).
> >>>>>
> >>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >>>>> ---
> >>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
> >>>>>    drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77
> >>>>> +++++++++++++++++++
> >>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
> >>>>>    3 files changed, 100 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>> index 44769423ba30..03fc8e89eafb 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev,
> >>>>> void *data,
> >>>>>        return r;
> >>>>>    }
> >>>>>    +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
> >>>>> +
> >>>>> +static void
> >>>>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
> >>>>> +{
> >>>>> +    int maj;
> >>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> >>>>> +
> >>>>> +    /* We support usermode queue only for GFX V11 as of now */
> >>>>> +    maj = IP_VERSION_MAJ(version);
> >>>>> +    if (maj == 11)
> >>>>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
> >>>>> +}
> >>>> That belongs into gfx_v11.c and not here.
> >>>
> >>> Agree,
> >> On a second thought, we can't move it to gfx_v11.c, as this is the place
> >> where we are setting up the gfx_userqueue functions in fpriv->uq_mgr()
> >> for the first time, and we do not have another option but to check the
> >> IP and setup the functions here. The only other option to do this will
> >> be to move  uq_mgr->userq_funcs to adev->gfx.userq_funcs and setup them
> >> with the IP init (as Alex once suggested). Please let me know your
> >> thoughts on this.
> > That seems cleaner to me.  They should be global anyway and could be
> > set as part of the individual IP init sequences.  Then the presence of
> > a pointer could be used to determine whether or not a particular IP
> > type supports user queues.
> >
> > Alex
> >
> So if I understand this correctly, this is how we are looking to arrange
> the userqueue IP functions:
>
> - Presence of adev->gfx.funcs.userqueue_funcs() will decide if this IP
> supports userqueue or not.
>
> - sw_init function of the IP will setup these fptrs like:
>
>    in gfx_v11_0_sw_init :
>
>      if (GFX_MAJ == 11)
>
>          adev->gfx.funcs.userqueue_funcs = gfx_v11_0_userqueue_funcs

I was thinking something more like:

adev->userq_funcs[AMD_IP_BLOCK_TYPE_GFX] = gfx_v11_0_userqueue_funcs;

That way there would be one place for all of the all of the fptrs and
you could use the IP type to query the right one.

And then in the IOCTLs, you could just check if the pointer is valid.  E.g.,

if (!adev->userq_funcs[ip_block_type])
   return -EINVAL;

etc.

You could store any metadata relevant to each userq in the per fd user
queue manager and then just pass the state to the global userq
functions for each IP.

Alex

>
>
> In amdgpu_userqueue_ioctl:
>
>      |
>
> CASE: create:
>
>      amdgpu_userqueue_create()
>
>      if (adev->gfx.funcs.userqueue_funcs) {
>
>          adev->gfx.funcs.userqueue_funcs.create_mqd();
>
>      }
>
>
> CASE: destroy:
>
>      amdgpu_userqueue_destroy()
>
>      if (adev->gfx.funcs.userqueue_funcs) {
>
>          adev->gfx.funcs.userqueue_funcs.destroy_mqd();
>
>      }
>
> and so on ...
>
> Am I getting this right ?
>
> - Shashank
>
> >> - Shashank
> >>
> >>>>> +
> >>>>>    int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr,
> >>>>> struct amdgpu_device *adev)
> >>>>>    {
> >>>>>        mutex_init(&userq_mgr->userq_mutex);
> >>>>>        idr_init_base(&userq_mgr->userq_idr, 1);
> >>>>>        userq_mgr->adev = adev;
> >>>>>    +    amdgpu_userqueue_setup_gfx(userq_mgr);
> >>>>>        return 0;
> >>>>>    }
> >>>>>    diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>>>> index 0451533ddde4..6760abda08df 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >>>>> @@ -30,6 +30,7 @@
> >>>>>    #include "amdgpu_psp.h"
> >>>>>    #include "amdgpu_smu.h"
> >>>>>    #include "amdgpu_atomfirmware.h"
> >>>>> +#include "amdgpu_userqueue.h"
> >>>>>    #include "imu_v11_0.h"
> >>>>>    #include "soc21.h"
> >>>>>    #include "nvd.h"
> >>>>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version
> >>>>> gfx_v11_0_ip_block =
> >>>>>        .rev = 0,
> >>>>>        .funcs = &gfx_v11_0_ip_funcs,
> >>>>>    };
> >>>>> +
> >>>>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> >>>>> +                      struct drm_amdgpu_userq_in *args_in,
> >>>>> +                      struct amdgpu_usermode_queue *queue)
> >>>>> +{
> >>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>>> +    struct amdgpu_mqd *mqd_gfx_generic =
> >>>>> &adev->mqds[AMDGPU_HW_IP_GFX];
> >>>>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
> >>>>> +    struct amdgpu_mqd_prop userq_props;
> >>>>> +    int r;
> >>>>> +
> >>>>> +    /* Incoming MQD parameters from userspace to be saved here */
> >>>>> +    memset(&mqd_user, 0, sizeof(mqd_user));
> >>>>> +
> >>>>> +    /* Structure to initialize MQD for userqueue using generic MQD
> >>>>> init function */
> >>>>> +    memset(&userq_props, 0, sizeof(userq_props));
> >>>>> +
> >>>>> +    if (args_in->mqd_size != sizeof(struct
> >>>>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
> >>>>> +        DRM_ERROR("MQD size mismatch\n");
> >>>>> +        return -EINVAL;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd),
> >>>>> args_in->mqd_size)) {
> >>>>> +        DRM_ERROR("Failed to get user MQD\n");
> >>>>> +        return -EFAULT;
> >>>>> +    }
> >>>>> +
> >>>>> +    /*
> >>>>> +     * Create BO for actual Userqueue MQD now
> >>>>> +     * Todo: replace the calls to bo_create_kernel() with
> >>>>> bo_create() and use
> >>>>> +     * implicit pinning for the MQD buffers.
> >>>> Well not implicit pinning, but rather fencing of the BO.
> >>>>
> >>> Noted.
> >>>
> >>> - Shashank
> >>>
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> +     */
> >>>>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size,
> >>>>> PAGE_SIZE,
> >>>>> +                    AMDGPU_GEM_DOMAIN_GTT,
> >>>>> +                    &queue->mqd.obj,
> >>>>> +                    &queue->mqd.gpu_addr,
> >>>>> +                    &queue->mqd.cpu_ptr);
> >>>>> +    if (r) {
> >>>>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> >>>>> +        return -ENOMEM;
> >>>>> +    }
> >>>>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
> >>>>> +
> >>>>> +    /* Initialize the MQD BO with user given values */
> >>>>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
> >>>>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
> >>>>> +    userq_props.queue_size = mqd_user.queue_size;
> >>>>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
> >>>>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
> >>>>> +    userq_props.use_doorbell = true;
> >>>>> +
> >>>>> +    r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr,
> >>>>> &userq_props);
> >>>>> +    if (r) {
> >>>>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
> >>>>> +        goto free_mqd;
> >>>>> +    }
> >>>>> +
> >>>>> +    return 0;
> >>>>> +
> >>>>> +free_mqd:
> >>>>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr,
> >>>>> &queue->mqd.cpu_ptr);
> >>>>> +    return r;
> >>>>> +}
> >>>>> +
> >>>>> +static void
> >>>>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct
> >>>>> amdgpu_usermode_queue *queue)
> >>>>> +{
> >>>>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
> >>>>> +
> >>>>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
> >>>>> +}
> >>>>> +
> >>>>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
> >>>>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
> >>>>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
> >>>>> +};
> >>>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>> index 55ed6512a565..240f92796f00 100644
> >>>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>> @@ -29,6 +29,12 @@
> >>>>>      struct amdgpu_mqd_prop;
> >>>>>    +struct amdgpu_userq_obj {
> >>>>> +    void         *cpu_ptr;
> >>>>> +    uint64_t     gpu_addr;
> >>>>> +    struct amdgpu_bo *obj;
> >>>>> +};
> >>>>> +
> >>>>>    struct amdgpu_usermode_queue {
> >>>>>        int            queue_type;
> >>>>>        uint64_t        doorbell_handle;
> >>>>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
> >>>>>        struct amdgpu_mqd_prop    *userq_prop;
> >>>>>        struct amdgpu_userq_mgr *userq_mgr;
> >>>>>        struct amdgpu_vm    *vm;
> >>>>> +    struct amdgpu_userq_obj mqd;
> >>>>>    };
> >>>>>      struct amdgpu_userq_funcs {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-28 13:52             ` Alex Deucher
@ 2023-09-28 13:54               ` Shashank Sharma
  2023-10-02  9:42                 ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-28 13:54 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, arvind.yadav, Christian König, amd-gfx


On 28/09/2023 15:52, Alex Deucher wrote:
> On Thu, Sep 28, 2023 at 9:40 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 28/09/2023 15:27, Alex Deucher wrote:
>>> On Thu, Sep 28, 2023 at 9:22 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> On 14/09/2023 10:24, Shashank Sharma wrote:
>>>>> On 14/09/2023 09:45, Christian König wrote:
>>>>>> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>>>>>>> A Memory queue descriptor (MQD) of a userqueue defines it in
>>>>>>> the hw's context. As MQD format can vary between different
>>>>>>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>>>>>>
>>>>>>> This patch:
>>>>>>> - Introduces MQD handler functions for the usermode queues.
>>>>>>> - Adds new functions to create and destroy userqueue MQD for
>>>>>>>      GFX-GEN-11 IP
>>>>>>>
>>>>>>> V1: Worked on review comments from Alex:
>>>>>>>        - Make MQD functions GEN and IP specific
>>>>>>>
>>>>>>> V2: Worked on review comments from Alex:
>>>>>>>        - Reuse the existing adev->mqd[ip] for MQD creation
>>>>>>>        - Formatting and arrangement of code
>>>>>>>
>>>>>>> V3:
>>>>>>>        - Integration with doorbell manager
>>>>>>>
>>>>>>> V4: Review comments addressed:
>>>>>>>        - Do not create a new file for userq, reuse gfx_v11_0.c (Alex)
>>>>>>>        - Align name of structure members (Luben)
>>>>>>>        - Don't break up the Cc tag list and the Sob tag list in commit
>>>>>>>          message (Luben)
>>>>>>> V5:
>>>>>>>       - No need to reserve the bo for MQD (Christian).
>>>>>>>       - Some more changes to support IP specific MQD creation.
>>>>>>>
>>>>>>> V6:
>>>>>>>       - Add a comment reminding us to replace the
>>>>>>> amdgpu_bo_create_kernel()
>>>>>>>         calls while creating MQD object to amdgpu_bo_create() once
>>>>>>> eviction
>>>>>>>         fences are ready (Christian).
>>>>>>>
>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
>>>>>>>     drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77
>>>>>>> +++++++++++++++++++
>>>>>>>     .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>>>>>>>     3 files changed, 100 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>> index 44769423ba30..03fc8e89eafb 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev,
>>>>>>> void *data,
>>>>>>>         return r;
>>>>>>>     }
>>>>>>>     +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
>>>>>>> +
>>>>>>> +static void
>>>>>>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
>>>>>>> +{
>>>>>>> +    int maj;
>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>>>>>> +
>>>>>>> +    /* We support usermode queue only for GFX V11 as of now */
>>>>>>> +    maj = IP_VERSION_MAJ(version);
>>>>>>> +    if (maj == 11)
>>>>>>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
>>>>>>> +}
>>>>>> That belongs into gfx_v11.c and not here.
>>>>> Agree,
>>>> On a second thought, we can't move it to gfx_v11.c, as this is the place
>>>> where we are setting up the gfx_userqueue functions in fpriv->uq_mgr()
>>>> for the first time, and we do not have another option but to check the
>>>> IP and setup the functions here. The only other option to do this will
>>>> be to move  uq_mgr->userq_funcs to adev->gfx.userq_funcs and setup them
>>>> with the IP init (as Alex once suggested). Please let me know your
>>>> thoughts on this.
>>> That seems cleaner to me.  They should be global anyway and could be
>>> set as part of the individual IP init sequences.  Then the presence of
>>> a pointer could be used to determine whether or not a particular IP
>>> type supports user queues.
>>>
>>> Alex
>>>
>> So if I understand this correctly, this is how we are looking to arrange
>> the userqueue IP functions:
>>
>> - Presence of adev->gfx.funcs.userqueue_funcs() will decide if this IP
>> supports userqueue or not.
>>
>> - sw_init function of the IP will setup these fptrs like:
>>
>>     in gfx_v11_0_sw_init :
>>
>>       if (GFX_MAJ == 11)
>>
>>           adev->gfx.funcs.userqueue_funcs = gfx_v11_0_userqueue_funcs
> I was thinking something more like:
>
> adev->userq_funcs[AMD_IP_BLOCK_TYPE_GFX] = gfx_v11_0_userqueue_funcs;
>
> That way there would be one place for all of the all of the fptrs and
> you could use the IP type to query the right one.
>
> And then in the IOCTLs, you could just check if the pointer is valid.  E.g.,
>
> if (!adev->userq_funcs[ip_block_type])
>     return -EINVAL;
>
> etc.
>
> You could store any metadata relevant to each userq in the per fd user
> queue manager and then just pass the state to the global userq
> functions for each IP.

Makes sense, we can do that, hope this works for Christian as well, 
@Christian ?

- Shashank

>
> Alex
>
>>
>> In amdgpu_userqueue_ioctl:
>>
>>       |
>>
>> CASE: create:
>>
>>       amdgpu_userqueue_create()
>>
>>       if (adev->gfx.funcs.userqueue_funcs) {
>>
>>           adev->gfx.funcs.userqueue_funcs.create_mqd();
>>
>>       }
>>
>>
>> CASE: destroy:
>>
>>       amdgpu_userqueue_destroy()
>>
>>       if (adev->gfx.funcs.userqueue_funcs) {
>>
>>           adev->gfx.funcs.userqueue_funcs.destroy_mqd();
>>
>>       }
>>
>> and so on ...
>>
>> Am I getting this right ?
>>
>> - Shashank
>>
>>>> - Shashank
>>>>
>>>>>>> +
>>>>>>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr,
>>>>>>> struct amdgpu_device *adev)
>>>>>>>     {
>>>>>>>         mutex_init(&userq_mgr->userq_mutex);
>>>>>>>         idr_init_base(&userq_mgr->userq_idr, 1);
>>>>>>>         userq_mgr->adev = adev;
>>>>>>>     +    amdgpu_userqueue_setup_gfx(userq_mgr);
>>>>>>>         return 0;
>>>>>>>     }
>>>>>>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>> index 0451533ddde4..6760abda08df 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>> @@ -30,6 +30,7 @@
>>>>>>>     #include "amdgpu_psp.h"
>>>>>>>     #include "amdgpu_smu.h"
>>>>>>>     #include "amdgpu_atomfirmware.h"
>>>>>>> +#include "amdgpu_userqueue.h"
>>>>>>>     #include "imu_v11_0.h"
>>>>>>>     #include "soc21.h"
>>>>>>>     #include "nvd.h"
>>>>>>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version
>>>>>>> gfx_v11_0_ip_block =
>>>>>>>         .rev = 0,
>>>>>>>         .funcs = &gfx_v11_0_ip_funcs,
>>>>>>>     };
>>>>>>> +
>>>>>>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>>>>>> +                      struct drm_amdgpu_userq_in *args_in,
>>>>>>> +                      struct amdgpu_usermode_queue *queue)
>>>>>>> +{
>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>> +    struct amdgpu_mqd *mqd_gfx_generic =
>>>>>>> &adev->mqds[AMDGPU_HW_IP_GFX];
>>>>>>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
>>>>>>> +    struct amdgpu_mqd_prop userq_props;
>>>>>>> +    int r;
>>>>>>> +
>>>>>>> +    /* Incoming MQD parameters from userspace to be saved here */
>>>>>>> +    memset(&mqd_user, 0, sizeof(mqd_user));
>>>>>>> +
>>>>>>> +    /* Structure to initialize MQD for userqueue using generic MQD
>>>>>>> init function */
>>>>>>> +    memset(&userq_props, 0, sizeof(userq_props));
>>>>>>> +
>>>>>>> +    if (args_in->mqd_size != sizeof(struct
>>>>>>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
>>>>>>> +        DRM_ERROR("MQD size mismatch\n");
>>>>>>> +        return -EINVAL;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd),
>>>>>>> args_in->mqd_size)) {
>>>>>>> +        DRM_ERROR("Failed to get user MQD\n");
>>>>>>> +        return -EFAULT;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * Create BO for actual Userqueue MQD now
>>>>>>> +     * Todo: replace the calls to bo_create_kernel() with
>>>>>>> bo_create() and use
>>>>>>> +     * implicit pinning for the MQD buffers.
>>>>>> Well not implicit pinning, but rather fencing of the BO.
>>>>>>
>>>>> Noted.
>>>>>
>>>>> - Shashank
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> +     */
>>>>>>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size,
>>>>>>> PAGE_SIZE,
>>>>>>> +                    AMDGPU_GEM_DOMAIN_GTT,
>>>>>>> +                    &queue->mqd.obj,
>>>>>>> +                    &queue->mqd.gpu_addr,
>>>>>>> +                    &queue->mqd.cpu_ptr);
>>>>>>> +    if (r) {
>>>>>>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>>>>>>> +        return -ENOMEM;
>>>>>>> +    }
>>>>>>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
>>>>>>> +
>>>>>>> +    /* Initialize the MQD BO with user given values */
>>>>>>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
>>>>>>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
>>>>>>> +    userq_props.queue_size = mqd_user.queue_size;
>>>>>>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
>>>>>>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
>>>>>>> +    userq_props.use_doorbell = true;
>>>>>>> +
>>>>>>> +    r = mqd_gfx_generic->init_mqd(adev, (void *)queue->mqd.cpu_ptr,
>>>>>>> &userq_props);
>>>>>>> +    if (r) {
>>>>>>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
>>>>>>> +        goto free_mqd;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +
>>>>>>> +free_mqd:
>>>>>>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr,
>>>>>>> &queue->mqd.cpu_ptr);
>>>>>>> +    return r;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct
>>>>>>> amdgpu_usermode_queue *queue)
>>>>>>> +{
>>>>>>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
>>>>>>> +
>>>>>>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>>>>>>> +}
>>>>>>> +
>>>>>>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
>>>>>>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
>>>>>>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
>>>>>>> +};
>>>>>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>> index 55ed6512a565..240f92796f00 100644
>>>>>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>> @@ -29,6 +29,12 @@
>>>>>>>       struct amdgpu_mqd_prop;
>>>>>>>     +struct amdgpu_userq_obj {
>>>>>>> +    void         *cpu_ptr;
>>>>>>> +    uint64_t     gpu_addr;
>>>>>>> +    struct amdgpu_bo *obj;
>>>>>>> +};
>>>>>>> +
>>>>>>>     struct amdgpu_usermode_queue {
>>>>>>>         int            queue_type;
>>>>>>>         uint64_t        doorbell_handle;
>>>>>>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>>>>>>>         struct amdgpu_mqd_prop    *userq_prop;
>>>>>>>         struct amdgpu_userq_mgr *userq_mgr;
>>>>>>>         struct amdgpu_vm    *vm;
>>>>>>> +    struct amdgpu_userq_obj mqd;
>>>>>>>     };
>>>>>>>       struct amdgpu_userq_funcs {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 5/9] drm/amdgpu: create context space for usermode queue
  2023-09-20 15:21   ` Alex Deucher
@ 2023-09-29 17:50     ` Shashank Sharma
  2023-10-04 13:13       ` Alex Deucher
  0 siblings, 1 reply; 32+ messages in thread
From: Shashank Sharma @ 2023-09-29 17:50 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx


On 20/09/2023 17:21, Alex Deucher wrote:
> On Fri, Sep 8, 2023 at 12:45 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> The FW expects us to allocate at least one page as context
>> space to process gang, process, GDS and FW  related work.
>> This patch creates a joint object for the same, and calculates
>> GPU space offsets of these spaces.
>>
>> V1: Addressed review comments on RFC patch:
>>      Alex: Make this function IP specific
>>
>> V2: Addressed review comments from Christian
>>      - Allocate only one object for total FW space, and calculate
>>        offsets for each of these objects.
>>
>> V3: Integration with doorbell manager
>>
>> V4: Review comments:
>>      - Remove shadow from FW space list from cover letter (Alex)
>>      - Alignment of macro (Luben)
>>
>> V5: Merged patches 5 and 6 into this single patch
>>      Addressed review comments:
>>      - Use lower_32_bits instead of mask (Christian)
>>      - gfx_v11_0 instead of gfx_v11 in function names (Alex)
>>      - Shadow and GDS objects are now coming from userspace (Christian,
>>        Alex)
>>
>> V6:
>>      - Add a comment to replace amdgpu_bo_create_kernel() with
>>        amdgpu_bo_create() during fw_ctx object creation (Christian).
>>      - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
>>        of generic queue structure and make it gen11 specific (Alex).
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 61 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   2 files changed, 62 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index 6760abda08df..8ffb5dee72a9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -61,6 +61,9 @@
>>   #define regCGTT_WD_CLK_CTRL_BASE_IDX   1
>>   #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1  0x4e7e
>>   #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1_BASE_IDX 1
>> +#define AMDGPU_USERQ_PROC_CTX_SZ   PAGE_SIZE
>> +#define AMDGPU_USERQ_GANG_CTX_SZ   PAGE_SIZE
>> +#define AMDGPU_USERQ_FW_CTX_SZ     PAGE_SIZE
>>
>>   MODULE_FIRMWARE("amdgpu/gc_11_0_0_pfp.bin");
>>   MODULE_FIRMWARE("amdgpu/gc_11_0_0_me.bin");
>> @@ -6424,6 +6427,56 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
>>          .funcs = &gfx_v11_0_ip_funcs,
>>   };
>>
>> +static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                             struct amdgpu_usermode_queue *queue)
>> +{
>> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
>> +
>> +       amdgpu_bo_free_kernel(&ctx->obj, &ctx->gpu_addr, &ctx->cpu_ptr);
>> +}
>> +
>> +static int gfx_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                           struct amdgpu_usermode_queue *queue,
>> +                                           struct drm_amdgpu_userq_mqd_gfx_v11_0 *mqd_user)
>> +{
>> +       struct amdgpu_device *adev = uq_mgr->adev;
>> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
>> +       struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
>> +       uint64_t fw_ctx_gpu_addr;
>> +       int r, size;
>> +
>> +       /*
>> +        * The FW expects at least one page space allocated for
>> +        * process ctx, gang ctx and fw ctx each. Create an object
>> +        * for the same.
>> +        */
>> +       size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_FW_CTX_SZ +
>> +              AMDGPU_USERQ_GANG_CTX_SZ;
>> +       r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>> +                                   AMDGPU_GEM_DOMAIN_GTT,
>> +                                   &ctx->obj,
>> +                                   &ctx->gpu_addr,
>> +                                   &ctx->cpu_ptr);
>> +       if (r) {
>> +               DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
>> +               return r;
>> +       }
>> +
>> +       fw_ctx_gpu_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ +
>> +                         AMDGPU_USERQ_GANG_CTX_SZ;
>> +       mqd->fw_work_area_base_lo = lower_32_bits(fw_ctx_gpu_addr);
>> +       mqd->fw_work_area_base_lo = upper_32_bits(fw_ctx_gpu_addr);
>> +
>> +       /* Shadow and GDS objects come directly from userspace */
>> +       mqd->shadow_base_lo = lower_32_bits(mqd_user->shadow_va);
>> +       mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
>> +
>> +       mqd->gds_bkup_base_lo = lower_32_bits(mqd_user->gds_va);
>> +       mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
>> +
>> +       return 0;
>> +}
>> +
>>   static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>                                        struct drm_amdgpu_userq_in *args_in,
>>                                        struct amdgpu_usermode_queue *queue)
>> @@ -6480,6 +6533,13 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
>>                  goto free_mqd;
>>          }
>>
>> +       /* Create BO for FW operations */
>> +       r = gfx_v11_0_userq_create_ctx_space(uq_mgr, queue, &mqd_user);
>> +       if (r) {
>> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>> +               goto free_mqd;
>> +       }
>> +
>>          return 0;
>>
>>   free_mqd:
>> @@ -6492,6 +6552,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
>>   {
>>          struct amdgpu_userq_obj *mqd = &queue->mqd;
>>
>> +       gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
>>          amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
>>   }
>>
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 240f92796f00..34e20daa06c8 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
>>          struct amdgpu_userq_mgr *userq_mgr;
>>          struct amdgpu_vm        *vm;
>>          struct amdgpu_userq_obj mqd;
>> +       struct amdgpu_userq_obj fw_obj;
> Since this is gfx 11 specific, I feel like this would be better stored
> in some gfx 11 structure rather than the generic user queue structure.
> Maybe a driver private pointer here would make more sense, then each
> IP can hang whatever structure they want here for IP specific
> metadata.


I was thinking more on this, and to me it seems like it's the size of 
this FW space which is going to be specific to a IP, but some object 
space probably will always be required, as MES will always need some 
space to save its process and gang ctx. So if this is not a big concern 
for you, I would like to keep it here and see how this space requirement 
evolves over the time.

- Shashank

>
> Alex
>
>
>>   };
>>
>>   struct amdgpu_userq_funcs {
>> --
>> 2.42.0
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 usermode queue
  2023-09-28 13:54               ` Shashank Sharma
@ 2023-10-02  9:42                 ` Christian König
  0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2023-10-02  9:42 UTC (permalink / raw)
  To: Shashank Sharma, Alex Deucher; +Cc: Alex Deucher, arvind.yadav, amd-gfx

Am 28.09.23 um 15:54 schrieb Shashank Sharma:
>
> On 28/09/2023 15:52, Alex Deucher wrote:
>> On Thu, Sep 28, 2023 at 9:40 AM Shashank Sharma 
>> <shashank.sharma@amd.com> wrote:
>>>
>>> On 28/09/2023 15:27, Alex Deucher wrote:
>>>> On Thu, Sep 28, 2023 at 9:22 AM Shashank Sharma 
>>>> <shashank.sharma@amd.com> wrote:
>>>>> On 14/09/2023 10:24, Shashank Sharma wrote:
>>>>>> On 14/09/2023 09:45, Christian König wrote:
>>>>>>> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>>>>>>>> A Memory queue descriptor (MQD) of a userqueue defines it in
>>>>>>>> the hw's context. As MQD format can vary between different
>>>>>>>> graphics IPs, we need gfx GEN specific handlers to create MQDs.
>>>>>>>>
>>>>>>>> This patch:
>>>>>>>> - Introduces MQD handler functions for the usermode queues.
>>>>>>>> - Adds new functions to create and destroy userqueue MQD for
>>>>>>>>      GFX-GEN-11 IP
>>>>>>>>
>>>>>>>> V1: Worked on review comments from Alex:
>>>>>>>>        - Make MQD functions GEN and IP specific
>>>>>>>>
>>>>>>>> V2: Worked on review comments from Alex:
>>>>>>>>        - Reuse the existing adev->mqd[ip] for MQD creation
>>>>>>>>        - Formatting and arrangement of code
>>>>>>>>
>>>>>>>> V3:
>>>>>>>>        - Integration with doorbell manager
>>>>>>>>
>>>>>>>> V4: Review comments addressed:
>>>>>>>>        - Do not create a new file for userq, reuse gfx_v11_0.c 
>>>>>>>> (Alex)
>>>>>>>>        - Align name of structure members (Luben)
>>>>>>>>        - Don't break up the Cc tag list and the Sob tag list in 
>>>>>>>> commit
>>>>>>>>          message (Luben)
>>>>>>>> V5:
>>>>>>>>       - No need to reserve the bo for MQD (Christian).
>>>>>>>>       - Some more changes to support IP specific MQD creation.
>>>>>>>>
>>>>>>>> V6:
>>>>>>>>       - Add a comment reminding us to replace the
>>>>>>>> amdgpu_bo_create_kernel()
>>>>>>>>         calls while creating MQD object to amdgpu_bo_create() once
>>>>>>>> eviction
>>>>>>>>         fences are ready (Christian).
>>>>>>>>
>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 ++++
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 77
>>>>>>>> +++++++++++++++++++
>>>>>>>>     .../gpu/drm/amd/include/amdgpu_userqueue.h    | 7 ++
>>>>>>>>     3 files changed, 100 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> index 44769423ba30..03fc8e89eafb 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> @@ -140,12 +140,28 @@ int amdgpu_userq_ioctl(struct drm_device 
>>>>>>>> *dev,
>>>>>>>> void *data,
>>>>>>>>         return r;
>>>>>>>>     }
>>>>>>>>     +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr)
>>>>>>>> +{
>>>>>>>> +    int maj;
>>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>>>>>>> +
>>>>>>>> +    /* We support usermode queue only for GFX V11 as of now */
>>>>>>>> +    maj = IP_VERSION_MAJ(version);
>>>>>>>> +    if (maj == 11)
>>>>>>>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = 
>>>>>>>> &userq_gfx_v11_funcs;
>>>>>>>> +}
>>>>>>> That belongs into gfx_v11.c and not here.
>>>>>> Agree,
>>>>> On a second thought, we can't move it to gfx_v11.c, as this is the 
>>>>> place
>>>>> where we are setting up the gfx_userqueue functions in 
>>>>> fpriv->uq_mgr()
>>>>> for the first time, and we do not have another option but to check 
>>>>> the
>>>>> IP and setup the functions here. The only other option to do this 
>>>>> will
>>>>> be to move  uq_mgr->userq_funcs to adev->gfx.userq_funcs and setup 
>>>>> them
>>>>> with the IP init (as Alex once suggested). Please let me know your
>>>>> thoughts on this.
>>>> That seems cleaner to me.  They should be global anyway and could be
>>>> set as part of the individual IP init sequences.  Then the presence of
>>>> a pointer could be used to determine whether or not a particular IP
>>>> type supports user queues.
>>>>
>>>> Alex
>>>>
>>> So if I understand this correctly, this is how we are looking to 
>>> arrange
>>> the userqueue IP functions:
>>>
>>> - Presence of adev->gfx.funcs.userqueue_funcs() will decide if this IP
>>> supports userqueue or not.
>>>
>>> - sw_init function of the IP will setup these fptrs like:
>>>
>>>     in gfx_v11_0_sw_init :
>>>
>>>       if (GFX_MAJ == 11)
>>>
>>>           adev->gfx.funcs.userqueue_funcs = gfx_v11_0_userqueue_funcs
>> I was thinking something more like:
>>
>> adev->userq_funcs[AMD_IP_BLOCK_TYPE_GFX] = gfx_v11_0_userqueue_funcs;
>>
>> That way there would be one place for all of the all of the fptrs and
>> you could use the IP type to query the right one.
>>
>> And then in the IOCTLs, you could just check if the pointer is 
>> valid.  E.g.,
>>
>> if (!adev->userq_funcs[ip_block_type])
>>     return -EINVAL;
>>
>> etc.
>>
>> You could store any metadata relevant to each userq in the per fd user
>> queue manager and then just pass the state to the global userq
>> functions for each IP.
>
> Makes sense, we can do that, hope this works for Christian as well, 
> @Christian ?

Perfectly fine with me.

Regards,
Christian.

>
> - Shashank
>
>>
>> Alex
>>
>>>
>>> In amdgpu_userqueue_ioctl:
>>>
>>>       |
>>>
>>> CASE: create:
>>>
>>>       amdgpu_userqueue_create()
>>>
>>>       if (adev->gfx.funcs.userqueue_funcs) {
>>>
>>>           adev->gfx.funcs.userqueue_funcs.create_mqd();
>>>
>>>       }
>>>
>>>
>>> CASE: destroy:
>>>
>>>       amdgpu_userqueue_destroy()
>>>
>>>       if (adev->gfx.funcs.userqueue_funcs) {
>>>
>>>           adev->gfx.funcs.userqueue_funcs.destroy_mqd();
>>>
>>>       }
>>>
>>> and so on ...
>>>
>>> Am I getting this right ?
>>>
>>> - Shashank
>>>
>>>>> - Shashank
>>>>>
>>>>>>>> +
>>>>>>>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr,
>>>>>>>> struct amdgpu_device *adev)
>>>>>>>>     {
>>>>>>>>         mutex_init(&userq_mgr->userq_mutex);
>>>>>>>>         idr_init_base(&userq_mgr->userq_idr, 1);
>>>>>>>>         userq_mgr->adev = adev;
>>>>>>>>     +    amdgpu_userqueue_setup_gfx(userq_mgr);
>>>>>>>>         return 0;
>>>>>>>>     }
>>>>>>>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>>> index 0451533ddde4..6760abda08df 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>>>>>>> @@ -30,6 +30,7 @@
>>>>>>>>     #include "amdgpu_psp.h"
>>>>>>>>     #include "amdgpu_smu.h"
>>>>>>>>     #include "amdgpu_atomfirmware.h"
>>>>>>>> +#include "amdgpu_userqueue.h"
>>>>>>>>     #include "imu_v11_0.h"
>>>>>>>>     #include "soc21.h"
>>>>>>>>     #include "nvd.h"
>>>>>>>> @@ -6422,3 +6423,79 @@ const struct amdgpu_ip_block_version
>>>>>>>> gfx_v11_0_ip_block =
>>>>>>>>         .rev = 0,
>>>>>>>>         .funcs = &gfx_v11_0_ip_funcs,
>>>>>>>>     };
>>>>>>>> +
>>>>>>>> +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr 
>>>>>>>> *uq_mgr,
>>>>>>>> +                      struct drm_amdgpu_userq_in *args_in,
>>>>>>>> +                      struct amdgpu_usermode_queue *queue)
>>>>>>>> +{
>>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>>> +    struct amdgpu_mqd *mqd_gfx_generic =
>>>>>>>> &adev->mqds[AMDGPU_HW_IP_GFX];
>>>>>>>> +    struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user;
>>>>>>>> +    struct amdgpu_mqd_prop userq_props;
>>>>>>>> +    int r;
>>>>>>>> +
>>>>>>>> +    /* Incoming MQD parameters from userspace to be saved here */
>>>>>>>> +    memset(&mqd_user, 0, sizeof(mqd_user));
>>>>>>>> +
>>>>>>>> +    /* Structure to initialize MQD for userqueue using generic 
>>>>>>>> MQD
>>>>>>>> init function */
>>>>>>>> +    memset(&userq_props, 0, sizeof(userq_props));
>>>>>>>> +
>>>>>>>> +    if (args_in->mqd_size != sizeof(struct
>>>>>>>> drm_amdgpu_userq_mqd_gfx_v11_0)) {
>>>>>>>> +        DRM_ERROR("MQD size mismatch\n");
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (copy_from_user(&mqd_user, u64_to_user_ptr(args_in->mqd),
>>>>>>>> args_in->mqd_size)) {
>>>>>>>> +        DRM_ERROR("Failed to get user MQD\n");
>>>>>>>> +        return -EFAULT;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * Create BO for actual Userqueue MQD now
>>>>>>>> +     * Todo: replace the calls to bo_create_kernel() with
>>>>>>>> bo_create() and use
>>>>>>>> +     * implicit pinning for the MQD buffers.
>>>>>>> Well not implicit pinning, but rather fencing of the BO.
>>>>>>>
>>>>>> Noted.
>>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> +     */
>>>>>>>> +    r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size,
>>>>>>>> PAGE_SIZE,
>>>>>>>> +                    AMDGPU_GEM_DOMAIN_GTT,
>>>>>>>> +                    &queue->mqd.obj,
>>>>>>>> +                    &queue->mqd.gpu_addr,
>>>>>>>> +                    &queue->mqd.cpu_ptr);
>>>>>>>> +    if (r) {
>>>>>>>> +        DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
>>>>>>>> +        return -ENOMEM;
>>>>>>>> +    }
>>>>>>>> +    memset(queue->mqd.cpu_ptr, 0, mqd_gfx_generic->mqd_size);
>>>>>>>> +
>>>>>>>> +    /* Initialize the MQD BO with user given values */
>>>>>>>> +    userq_props.wptr_gpu_addr = mqd_user.wptr_va;
>>>>>>>> +    userq_props.rptr_gpu_addr = mqd_user.rptr_va;
>>>>>>>> +    userq_props.queue_size = mqd_user.queue_size;
>>>>>>>> +    userq_props.hqd_base_gpu_addr = mqd_user.queue_va;
>>>>>>>> +    userq_props.mqd_gpu_addr = queue->mqd.gpu_addr;
>>>>>>>> +    userq_props.use_doorbell = true;
>>>>>>>> +
>>>>>>>> +    r = mqd_gfx_generic->init_mqd(adev, (void 
>>>>>>>> *)queue->mqd.cpu_ptr,
>>>>>>>> &userq_props);
>>>>>>>> +    if (r) {
>>>>>>>> +        DRM_ERROR("Failed to initialize MQD for userqueue\n");
>>>>>>>> +        goto free_mqd;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +
>>>>>>>> +free_mqd:
>>>>>>>> +    amdgpu_bo_free_kernel(&queue->mqd.obj, &queue->mqd.gpu_addr,
>>>>>>>> &queue->mqd.cpu_ptr);
>>>>>>>> +    return r;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, 
>>>>>>>> struct
>>>>>>>> amdgpu_usermode_queue *queue)
>>>>>>>> +{
>>>>>>>> +    struct amdgpu_userq_obj *mqd = &queue->mqd;
>>>>>>>> +
>>>>>>>> +    amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, 
>>>>>>>> &mqd->cpu_ptr);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
>>>>>>>> +    .mqd_create = gfx_v11_0_userq_mqd_create,
>>>>>>>> +    .mqd_destroy = gfx_v11_0_userq_mqd_destroy,
>>>>>>>> +};
>>>>>>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>> index 55ed6512a565..240f92796f00 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>> @@ -29,6 +29,12 @@
>>>>>>>>       struct amdgpu_mqd_prop;
>>>>>>>>     +struct amdgpu_userq_obj {
>>>>>>>> +    void         *cpu_ptr;
>>>>>>>> +    uint64_t     gpu_addr;
>>>>>>>> +    struct amdgpu_bo *obj;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>>     struct amdgpu_usermode_queue {
>>>>>>>>         int            queue_type;
>>>>>>>>         uint64_t        doorbell_handle;
>>>>>>>> @@ -37,6 +43,7 @@ struct amdgpu_usermode_queue {
>>>>>>>>         struct amdgpu_mqd_prop    *userq_prop;
>>>>>>>>         struct amdgpu_userq_mgr *userq_mgr;
>>>>>>>>         struct amdgpu_vm    *vm;
>>>>>>>> +    struct amdgpu_userq_obj mqd;
>>>>>>>>     };
>>>>>>>>       struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 5/9] drm/amdgpu: create context space for usermode queue
  2023-09-29 17:50     ` Shashank Sharma
@ 2023-10-04 13:13       ` Alex Deucher
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Deucher @ 2023-10-04 13:13 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, arvind.yadav, Christian Koenig, amd-gfx

On Fri, Sep 29, 2023 at 1:50 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 20/09/2023 17:21, Alex Deucher wrote:
> > On Fri, Sep 8, 2023 at 12:45 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >> The FW expects us to allocate at least one page as context
> >> space to process gang, process, GDS and FW  related work.
> >> This patch creates a joint object for the same, and calculates
> >> GPU space offsets of these spaces.
> >>
> >> V1: Addressed review comments on RFC patch:
> >>      Alex: Make this function IP specific
> >>
> >> V2: Addressed review comments from Christian
> >>      - Allocate only one object for total FW space, and calculate
> >>        offsets for each of these objects.
> >>
> >> V3: Integration with doorbell manager
> >>
> >> V4: Review comments:
> >>      - Remove shadow from FW space list from cover letter (Alex)
> >>      - Alignment of macro (Luben)
> >>
> >> V5: Merged patches 5 and 6 into this single patch
> >>      Addressed review comments:
> >>      - Use lower_32_bits instead of mask (Christian)
> >>      - gfx_v11_0 instead of gfx_v11 in function names (Alex)
> >>      - Shadow and GDS objects are now coming from userspace (Christian,
> >>        Alex)
> >>
> >> V6:
> >>      - Add a comment to replace amdgpu_bo_create_kernel() with
> >>        amdgpu_bo_create() during fw_ctx object creation (Christian).
> >>      - Move proc_ctx_gpu_addr, gang_ctx_gpu_addr and fw_ctx_gpu_addr out
> >>        of generic queue structure and make it gen11 specific (Alex).
> >>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: Christian Koenig <christian.koenig@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 61 +++++++++++++++++++
> >>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
> >>   2 files changed, 62 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> index 6760abda08df..8ffb5dee72a9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> >> @@ -61,6 +61,9 @@
> >>   #define regCGTT_WD_CLK_CTRL_BASE_IDX   1
> >>   #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1  0x4e7e
> >>   #define regRLC_RLCS_BOOTLOAD_STATUS_gc_11_0_1_BASE_IDX 1
> >> +#define AMDGPU_USERQ_PROC_CTX_SZ   PAGE_SIZE
> >> +#define AMDGPU_USERQ_GANG_CTX_SZ   PAGE_SIZE
> >> +#define AMDGPU_USERQ_FW_CTX_SZ     PAGE_SIZE
> >>
> >>   MODULE_FIRMWARE("amdgpu/gc_11_0_0_pfp.bin");
> >>   MODULE_FIRMWARE("amdgpu/gc_11_0_0_me.bin");
> >> @@ -6424,6 +6427,56 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block =
> >>          .funcs = &gfx_v11_0_ip_funcs,
> >>   };
> >>
> >> +static void gfx_v11_0_userq_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> >> +                                             struct amdgpu_usermode_queue *queue)
> >> +{
> >> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> >> +
> >> +       amdgpu_bo_free_kernel(&ctx->obj, &ctx->gpu_addr, &ctx->cpu_ptr);
> >> +}
> >> +
> >> +static int gfx_v11_0_userq_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> >> +                                           struct amdgpu_usermode_queue *queue,
> >> +                                           struct drm_amdgpu_userq_mqd_gfx_v11_0 *mqd_user)
> >> +{
> >> +       struct amdgpu_device *adev = uq_mgr->adev;
> >> +       struct amdgpu_userq_obj *ctx = &queue->fw_obj;
> >> +       struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
> >> +       uint64_t fw_ctx_gpu_addr;
> >> +       int r, size;
> >> +
> >> +       /*
> >> +        * The FW expects at least one page space allocated for
> >> +        * process ctx, gang ctx and fw ctx each. Create an object
> >> +        * for the same.
> >> +        */
> >> +       size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_FW_CTX_SZ +
> >> +              AMDGPU_USERQ_GANG_CTX_SZ;
> >> +       r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> >> +                                   AMDGPU_GEM_DOMAIN_GTT,
> >> +                                   &ctx->obj,
> >> +                                   &ctx->gpu_addr,
> >> +                                   &ctx->cpu_ptr);
> >> +       if (r) {
> >> +               DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> >> +               return r;
> >> +       }
> >> +
> >> +       fw_ctx_gpu_addr = ctx->gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ +
> >> +                         AMDGPU_USERQ_GANG_CTX_SZ;
> >> +       mqd->fw_work_area_base_lo = lower_32_bits(fw_ctx_gpu_addr);
> >> +       mqd->fw_work_area_base_lo = upper_32_bits(fw_ctx_gpu_addr);
> >> +
> >> +       /* Shadow and GDS objects come directly from userspace */
> >> +       mqd->shadow_base_lo = lower_32_bits(mqd_user->shadow_va);
> >> +       mqd->shadow_base_hi = upper_32_bits(mqd_user->shadow_va);
> >> +
> >> +       mqd->gds_bkup_base_lo = lower_32_bits(mqd_user->gds_va);
> >> +       mqd->gds_bkup_base_hi = upper_32_bits(mqd_user->gds_va);
> >> +
> >> +       return 0;
> >> +}
> >> +
> >>   static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> >>                                        struct drm_amdgpu_userq_in *args_in,
> >>                                        struct amdgpu_usermode_queue *queue)
> >> @@ -6480,6 +6533,13 @@ static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr,
> >>                  goto free_mqd;
> >>          }
> >>
> >> +       /* Create BO for FW operations */
> >> +       r = gfx_v11_0_userq_create_ctx_space(uq_mgr, queue, &mqd_user);
> >> +       if (r) {
> >> +               DRM_ERROR("Failed to allocate BO for userqueue (%d)", r);
> >> +               goto free_mqd;
> >> +       }
> >> +
> >>          return 0;
> >>
> >>   free_mqd:
> >> @@ -6492,6 +6552,7 @@ gfx_v11_0_userq_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_userm
> >>   {
> >>          struct amdgpu_userq_obj *mqd = &queue->mqd;
> >>
> >> +       gfx_v11_0_userq_destroy_ctx_space(uq_mgr, queue);
> >>          amdgpu_bo_free_kernel(&mqd->obj, &mqd->gpu_addr, &mqd->cpu_ptr);
> >>   }
> >>
> >> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >> index 240f92796f00..34e20daa06c8 100644
> >> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >> @@ -44,6 +44,7 @@ struct amdgpu_usermode_queue {
> >>          struct amdgpu_userq_mgr *userq_mgr;
> >>          struct amdgpu_vm        *vm;
> >>          struct amdgpu_userq_obj mqd;
> >> +       struct amdgpu_userq_obj fw_obj;
> > Since this is gfx 11 specific, I feel like this would be better stored
> > in some gfx 11 structure rather than the generic user queue structure.
> > Maybe a driver private pointer here would make more sense, then each
> > IP can hang whatever structure they want here for IP specific
> > metadata.
>
>
> I was thinking more on this, and to me it seems like it's the size of
> this FW space which is going to be specific to a IP, but some object
> space probably will always be required, as MES will always need some
> space to save its process and gang ctx. So if this is not a big concern
> for you, I would like to keep it here and see how this space requirement
> evolves over the time.

Sure.  We can revisit this later.

Alex

>
> - Shashank
>
> >
> > Alex
> >
> >
> >>   };
> >>
> >>   struct amdgpu_userq_funcs {
> >> --
> >> 2.42.0
> >>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management
  2023-09-08 16:04 ` [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2023-10-04 21:23   ` Felix Kuehling
  2023-10-05 10:59     ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2023-10-04 21:23 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav


On 2023-09-08 12:04, Shashank Sharma wrote:
> From: Alex Deucher <alexander.deucher@amd.com>
>
> This patch intorduces new UAPI/IOCTL for usermode graphics
> queue. The userspace app will fill this structure and request
> the graphics driver to add a graphics work queue for it. The
> output of this UAPI is a queue id.
>
> This UAPI maps the queue into GPU, so the graphics app can start
> submitting work to the queue as soon as the call returns.
>
> V2: Addressed review comments from Alex and Christian
>      - Make the doorbell offset's comment clearer
>      - Change the output parameter name to queue_id
>
> V3: Integration with doorbell manager
>
> V4:
>      - Updated the UAPI doc (Pierre-Eric)
>      - Created a Union for engine specific MQDs (Alex)
>      - Added Christian's R-B
> V5:
>      - Add variables for GDS and CSA in MQD structure (Alex)
>      - Make MQD data a ptr-size pair instead of union (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   include/uapi/drm/amdgpu_drm.h | 110 ++++++++++++++++++++++++++++++++++
>   1 file changed, 110 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 79b14828d542..627b4a38c855 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -54,6 +54,7 @@ extern "C" {
>   #define DRM_AMDGPU_VM			0x13
>   #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
>   #define DRM_AMDGPU_SCHED		0x15
> +#define DRM_AMDGPU_USERQ		0x16
>   
>   #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>   #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -71,6 +72,7 @@ extern "C" {
>   #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>   #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> +#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>   
>   /**
>    * DOC: memory domains
> @@ -304,6 +306,114 @@ union drm_amdgpu_ctx {
>   	union drm_amdgpu_ctx_out out;
>   };
>   
> +/* user queue IOCTL */
> +#define AMDGPU_USERQ_OP_CREATE	1
> +#define AMDGPU_USERQ_OP_FREE	2
> +
> +/* Flag to indicate secure buffer related workload, unused for now */
> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
> +/* Flag to indicate AQL workload, unused for now */
> +#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
> +
> +/*
> + * MQD (memory queue descriptor) is a set of parameters which allow

I find the term MQD misleading. For the firmware the MQD is a very 
different data structure from what you are defining here. It's a 
persistent data structure in kernel address space (VMID0) that is shared 
between the driver and the firmware that gets loaded or updated when 
queues are mapped or unmapped. I'd want to avoid confusing the firmware 
MQD with this structure.

Regards,
   Felix


> + * the GPU to uniquely define and identify a usermode queue. This
> + * structure defines the MQD for GFX-V11 IP ver 0.
> + */
> +struct drm_amdgpu_userq_mqd_gfx_v11_0 {
> +	/**
> +	 * @queue_va: Virtual address of the GPU memory which holds the queue
> +	 * object. The queue holds the workload packets.
> +	 */
> +	__u64   queue_va;
> +	/**
> +	 * @queue_size: Size of the queue in bytes, this needs to be 256-byte
> +	 * aligned.
> +	 */
> +	__u64   queue_size;
> +	/**
> +	 * @rptr_va : Virtual address of the GPU memory which holds the ring RPTR.
> +	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
> +	 */
> +	__u64   rptr_va;
> +	/**
> +	 * @wptr_va : Virtual address of the GPU memory which holds the ring WPTR.
> +	 * This object must be at least 8 byte in size and aligned to 8-byte offset.
> +	 *
> +	 * Queue, RPTR and WPTR can come from the same object, as long as the size
> +	 * and alignment related requirements are met.
> +	 */
> +	__u64   wptr_va;
> +	/**
> +	 * @shadow_va: Virtual address of the GPU memory to hold the shadow buffer.
> +	 * This must be a from a separate GPU object, and must be at least 4-page
> +	 * sized.
> +	 */
> +	__u64   shadow_va;
> +	/**
> +	 * @gds_va: Virtual address of the GPU memory to hold the GDS buffer.
> +	 * This must be a from a separate GPU object, and must be at least 1-page
> +	 * sized.
> +	 */
> +	__u64   gds_va;
> +	/**
> +	 * @csa_va: Virtual address of the GPU memory to hold the CSA buffer.
> +	 * This must be a from a separate GPU object, and must be at least 1-page
> +	 * sized.
> +	 */
> +	__u64   csa_va;
> +};
> +
> +struct drm_amdgpu_userq_in {
> +	/** AMDGPU_USERQ_OP_* */
> +	__u32	op;
> +	/** Queue handle for USERQ_OP_FREE */
> +	__u32	queue_id;
> +	/** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
> +	__u32   ip_type;
> +	/**
> +	 * @flags: flags to indicate special function for queue like secure
> +	 * buffer (TMZ). Unused for now.
> +	 */
> +	__u32   flags;
> +	/**
> +	 * @doorbell_handle: the handle of doorbell GEM object
> +	 * associated to this client.
> +	 */
> +	__u32   doorbell_handle;
> +	/**
> +	 * @doorbell_offset: 32-bit offset of the doorbell in the doorbell bo.
> +	 * Kernel will generate absolute doorbell offset using doorbell_handle
> +	 * and doorbell_offset in the doorbell bo.
> +	 */
> +	__u32   doorbell_offset;
> +	/**
> +	 * @mqd: Queue descriptor for USERQ_OP_CREATE
> +	 * MQD data can be of different size for different GPU IP/engine and
> +	 * their respective versions/revisions, so this points to a __u64 *
> +	 * which holds MQD of this usermode queue.
> +	 */
> +	__u64 mqd;
> +	/**
> +	 * @size: size of MQD data in bytes, it must match the MQD structure
> +	 * size of the respective engine/revision defined in UAPI for ex, for
> +	 * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
> +	 */
> +	__u64 mqd_size;
> +};
> +
> +struct drm_amdgpu_userq_out {
> +	/** Queue handle */
> +	__u32	queue_id;
> +	/** Flags */
> +	__u32	flags;
> +};
> +
> +union drm_amdgpu_userq {
> +	struct drm_amdgpu_userq_in in;
> +	struct drm_amdgpu_userq_out out;
> +};
> +
>   /* vm ioctl */
>   #define AMDGPU_VM_OP_RESERVE_VMID	1
>   #define AMDGPU_VM_OP_UNRESERVE_VMID	2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART
  2023-09-18 10:32   ` Christian König
@ 2023-10-04 21:34     ` Felix Kuehling
  2023-10-05 10:57       ` Shashank Sharma
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2023-10-04 21:34 UTC (permalink / raw)
  To: Christian König, Shashank Sharma, amd-gfx; +Cc: Alex Deucher, arvind.yadav


On 2023-09-18 06:32, Christian König wrote:
> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>> To support oversubscription, MES FW expects WPTR BOs to
>> be mapped into GART, before they are submitted to usermode
>> queues. This patch adds a function for the same.
>>
>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>> V5: Addressed review comments from Christian:
>>      - Either pin object or allocate from GART, but not both.
>>      - All the handling must be done with the VM locks held.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 81 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   2 files changed, 82 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> index e266674e0d44..c0eb622dfc37 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>> @@ -6427,6 +6427,79 @@ const struct amdgpu_ip_block_version 
>> gfx_v11_0_ip_block =
>>       .funcs = &gfx_v11_0_ip_funcs,
>>   };
>>   +static int
>> +gfx_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
>> amdgpu_bo *bo)
>> +{
>> +    int ret;
>> +
>> +    ret = amdgpu_bo_reserve(bo, true);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>> +        goto err_reserve_bo_failed;
>> +    }
>> +
>> +    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>> +        goto err_map_bo_gart_failed;
>> +    }
>> +
>> +    amdgpu_bo_unreserve(bo);
>
> The GART mapping can become invalid as soon as you unlock the BOs.
>
> You need to attach an eviction fence for this to work correctly.

Don't you need an eviction fence on the WPTR BO regardless of the GTT 
mapping?

Regards,
   Felix


>
>> +    bo = amdgpu_bo_ref(bo);
>> +
>> +    return 0;
>> +
>> +err_map_bo_gart_failed:
>> +    amdgpu_bo_unreserve(bo);
>> +err_reserve_bo_failed:
>> +    return ret;
>> +}
>> +
>> +static int
>> +gfx_v11_0_create_wptr_mapping(struct amdgpu_device *adev,
>> +                  struct amdgpu_usermode_queue *queue,
>> +                  uint64_t wptr)
>> +{
>> +    struct amdgpu_bo_va_mapping *wptr_mapping;
>> +    struct amdgpu_vm *wptr_vm;
>> +    struct amdgpu_bo *wptr_bo = NULL;
>> +    int ret;
>> +
>> +    mutex_lock(&queue->vm->eviction_lock);
>
> Never ever touch the eviction lock outside of the VM code! That lock 
> is completely unrelated to what you do here.
>
>> +    wptr_vm = queue->vm;
>> +    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>> +    if (ret)
>> +        goto unlock;
>> +
>> +    wptr &= AMDGPU_GMC_HOLE_MASK;
>> +    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
>> PAGE_SHIFT);
>> +    amdgpu_bo_unreserve(wptr_vm->root.bo);
>> +    if (!wptr_mapping) {
>> +        DRM_ERROR("Failed to lookup wptr bo\n");
>> +        ret = -EINVAL;
>> +        goto unlock;
>> +    }
>> +
>> +    wptr_bo = wptr_mapping->bo_va->base.bo;
>> +    if (wptr_bo->tbo.base.size > PAGE_SIZE) {
>> +        DRM_ERROR("Requested GART mapping for wptr bo larger than 
>> one page\n");
>> +        ret = -EINVAL;
>> +        goto unlock;
>> +    }
>
> We probably also want to enforce that this BO is a per VM BO.
>
>> +
>> +    ret = gfx_v11_0_map_gtt_bo_to_gart(adev, wptr_bo);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to map wptr bo to GART\n");
>> +        goto unlock;
>> +    }
>> +
>> +    queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
>
> This needs to be amdgpu_bo_gpu_offset() instead.
>
> Regards,
> Christian.
>
>> +
>> +unlock:
>> +    mutex_unlock(&queue->vm->eviction_lock);
>> +    return ret;
>> +}
>> +
>>   static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>                     struct amdgpu_usermode_queue *queue)
>>   {
>> @@ -6475,6 +6548,7 @@ static int gfx_v11_0_userq_map(struct 
>> amdgpu_userq_mgr *uq_mgr,
>>       queue_input.queue_size = userq_props->queue_size >> 2;
>>       queue_input.doorbell_offset = userq_props->doorbell_index;
>>       queue_input.page_table_base_addr = 
>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +    queue_input.wptr_mc_addr = queue->wptr_mc_addr;
>>         amdgpu_mes_lock(&adev->mes);
>>       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>> @@ -6601,6 +6675,13 @@ static int gfx_v11_0_userq_mqd_create(struct 
>> amdgpu_userq_mgr *uq_mgr,
>>           goto free_mqd;
>>       }
>>   +    /* FW expects WPTR BOs to be mapped into GART */
>> +    r = gfx_v11_0_create_wptr_mapping(adev, queue, 
>> userq_props.wptr_gpu_addr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create WPTR mapping\n");
>> +        goto free_ctx;
>> +    }
>> +
>>       /* Map userqueue into FW using MES */
>>       r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
>>       if (r) {
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 34e20daa06c8..ae155de62560 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -39,6 +39,7 @@ struct amdgpu_usermode_queue {
>>       int            queue_type;
>>       uint64_t        doorbell_handle;
>>       uint64_t        doorbell_index;
>> +    uint64_t        wptr_mc_addr;
>>       uint64_t        flags;
>>       struct amdgpu_mqd_prop    *userq_prop;
>>       struct amdgpu_userq_mgr *userq_mgr;
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART
  2023-10-04 21:34     ` Felix Kuehling
@ 2023-10-05 10:57       ` Shashank Sharma
  0 siblings, 0 replies; 32+ messages in thread
From: Shashank Sharma @ 2023-10-05 10:57 UTC (permalink / raw)
  To: Felix Kuehling, Christian König, amd-gfx; +Cc: Alex Deucher, arvind.yadav

Hey Felix,

On 04/10/2023 23:34, Felix Kuehling wrote:
>
> On 2023-09-18 06:32, Christian König wrote:
>> Am 08.09.23 um 18:04 schrieb Shashank Sharma:
>>> To support oversubscription, MES FW expects WPTR BOs to
>>> be mapped into GART, before they are submitted to usermode
>>> queues. This patch adds a function for the same.
>>>
>>> V4: fix the wptr value before mapping lookup (Bas, Christian).
>>> V5: Addressed review comments from Christian:
>>>      - Either pin object or allocate from GART, but not both.
>>>      - All the handling must be done with the VM locks held.
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 81 
>>> +++++++++++++++++++
>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>>   2 files changed, 82 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>> index e266674e0d44..c0eb622dfc37 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
>>> @@ -6427,6 +6427,79 @@ const struct amdgpu_ip_block_version 
>>> gfx_v11_0_ip_block =
>>>       .funcs = &gfx_v11_0_ip_funcs,
>>>   };
>>>   +static int
>>> +gfx_v11_0_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
>>> amdgpu_bo *bo)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = amdgpu_bo_reserve(bo, true);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>>> +        goto err_reserve_bo_failed;
>>> +    }
>>> +
>>> +    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>>> +        goto err_map_bo_gart_failed;
>>> +    }
>>> +
>>> +    amdgpu_bo_unreserve(bo);
>>
>> The GART mapping can become invalid as soon as you unlock the BOs.
>>
>> You need to attach an eviction fence for this to work correctly.
>
> Don't you need an eviction fence on the WPTR BO regardless of the GTT 
> mapping?

Yes, Christian also mentioned this in this iteration, I have implemented 
the basic eviction fence for [V7], I will publish it soon.

- Shashank

>
> Regards,
>   Felix
>
>
>>
>>> +    bo = amdgpu_bo_ref(bo);
>>> +
>>> +    return 0;
>>> +
>>> +err_map_bo_gart_failed:
>>> +    amdgpu_bo_unreserve(bo);
>>> +err_reserve_bo_failed:
>>> +    return ret;
>>> +}
>>> +
>>> +static int
>>> +gfx_v11_0_create_wptr_mapping(struct amdgpu_device *adev,
>>> +                  struct amdgpu_usermode_queue *queue,
>>> +                  uint64_t wptr)
>>> +{
>>> +    struct amdgpu_bo_va_mapping *wptr_mapping;
>>> +    struct amdgpu_vm *wptr_vm;
>>> +    struct amdgpu_bo *wptr_bo = NULL;
>>> +    int ret;
>>> +
>>> +    mutex_lock(&queue->vm->eviction_lock);
>>
>> Never ever touch the eviction lock outside of the VM code! That lock 
>> is completely unrelated to what you do here.
>>
>>> +    wptr_vm = queue->vm;
>>> +    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>>> +    if (ret)
>>> +        goto unlock;
>>> +
>>> +    wptr &= AMDGPU_GMC_HOLE_MASK;
>>> +    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
>>> PAGE_SHIFT);
>>> +    amdgpu_bo_unreserve(wptr_vm->root.bo);
>>> +    if (!wptr_mapping) {
>>> +        DRM_ERROR("Failed to lookup wptr bo\n");
>>> +        ret = -EINVAL;
>>> +        goto unlock;
>>> +    }
>>> +
>>> +    wptr_bo = wptr_mapping->bo_va->base.bo;
>>> +    if (wptr_bo->tbo.base.size > PAGE_SIZE) {
>>> +        DRM_ERROR("Requested GART mapping for wptr bo larger than 
>>> one page\n");
>>> +        ret = -EINVAL;
>>> +        goto unlock;
>>> +    }
>>
>> We probably also want to enforce that this BO is a per VM BO.
>>
>>> +
>>> +    ret = gfx_v11_0_map_gtt_bo_to_gart(adev, wptr_bo);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to map wptr bo to GART\n");
>>> +        goto unlock;
>>> +    }
>>> +
>>> +    queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
>>
>> This needs to be amdgpu_bo_gpu_offset() instead.
>>
>> Regards,
>> Christian.
>>
>>> +
>>> +unlock:
>>> +    mutex_unlock(&queue->vm->eviction_lock);
>>> +    return ret;
>>> +}
>>> +
>>>   static void gfx_v11_0_userq_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>>                     struct amdgpu_usermode_queue *queue)
>>>   {
>>> @@ -6475,6 +6548,7 @@ static int gfx_v11_0_userq_map(struct 
>>> amdgpu_userq_mgr *uq_mgr,
>>>       queue_input.queue_size = userq_props->queue_size >> 2;
>>>       queue_input.doorbell_offset = userq_props->doorbell_index;
>>>       queue_input.page_table_base_addr = 
>>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>> +    queue_input.wptr_mc_addr = queue->wptr_mc_addr;
>>>         amdgpu_mes_lock(&adev->mes);
>>>       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>> @@ -6601,6 +6675,13 @@ static int gfx_v11_0_userq_mqd_create(struct 
>>> amdgpu_userq_mgr *uq_mgr,
>>>           goto free_mqd;
>>>       }
>>>   +    /* FW expects WPTR BOs to be mapped into GART */
>>> +    r = gfx_v11_0_create_wptr_mapping(adev, queue, 
>>> userq_props.wptr_gpu_addr);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to create WPTR mapping\n");
>>> +        goto free_ctx;
>>> +    }
>>> +
>>>       /* Map userqueue into FW using MES */
>>>       r = gfx_v11_0_userq_map(uq_mgr, queue, &userq_props);
>>>       if (r) {
>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> index 34e20daa06c8..ae155de62560 100644
>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> @@ -39,6 +39,7 @@ struct amdgpu_usermode_queue {
>>>       int            queue_type;
>>>       uint64_t        doorbell_handle;
>>>       uint64_t        doorbell_index;
>>> +    uint64_t        wptr_mc_addr;
>>>       uint64_t        flags;
>>>       struct amdgpu_mqd_prop    *userq_prop;
>>>       struct amdgpu_userq_mgr *userq_mgr;
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management
  2023-10-04 21:23   ` Felix Kuehling
@ 2023-10-05 10:59     ` Shashank Sharma
  0 siblings, 0 replies; 32+ messages in thread
From: Shashank Sharma @ 2023-10-05 10:59 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx; +Cc: Alex Deucher, Christian Koenig, arvind.yadav


On 04/10/2023 23:23, Felix Kuehling wrote:
>
> On 2023-09-08 12:04, Shashank Sharma wrote:
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> This patch intorduces new UAPI/IOCTL for usermode graphics
>> queue. The userspace app will fill this structure and request
>> the graphics driver to add a graphics work queue for it. The
>> output of this UAPI is a queue id.
>>
>> This UAPI maps the queue into GPU, so the graphics app can start
>> submitting work to the queue as soon as the call returns.
>>
>> V2: Addressed review comments from Alex and Christian
>>      - Make the doorbell offset's comment clearer
>>      - Change the output parameter name to queue_id
>>
>> V3: Integration with doorbell manager
>>
>> V4:
>>      - Updated the UAPI doc (Pierre-Eric)
>>      - Created a Union for engine specific MQDs (Alex)
>>      - Added Christian's R-B
>> V5:
>>      - Add variables for GDS and CSA in MQD structure (Alex)
>>      - Make MQD data a ptr-size pair instead of union (Alex)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   include/uapi/drm/amdgpu_drm.h | 110 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 110 insertions(+)
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h 
>> b/include/uapi/drm/amdgpu_drm.h
>> index 79b14828d542..627b4a38c855 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -54,6 +54,7 @@ extern "C" {
>>   #define DRM_AMDGPU_VM            0x13
>>   #define DRM_AMDGPU_FENCE_TO_HANDLE    0x14
>>   #define DRM_AMDGPU_SCHED        0x15
>> +#define DRM_AMDGPU_USERQ        0x16
>>     #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP    DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>> @@ -71,6 +72,7 @@ extern "C" {
>>   #define DRM_IOCTL_AMDGPU_VM        DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE 
>> + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>   #define DRM_IOCTL_AMDGPU_SCHED        DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>> +#define DRM_IOCTL_AMDGPU_USERQ        DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>     /**
>>    * DOC: memory domains
>> @@ -304,6 +306,114 @@ union drm_amdgpu_ctx {
>>       union drm_amdgpu_ctx_out out;
>>   };
>>   +/* user queue IOCTL */
>> +#define AMDGPU_USERQ_OP_CREATE    1
>> +#define AMDGPU_USERQ_OP_FREE    2
>> +
>> +/* Flag to indicate secure buffer related workload, unused for now */
>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE    (1 << 0)
>> +/* Flag to indicate AQL workload, unused for now */
>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL    (1 << 1)
>> +
>> +/*
>> + * MQD (memory queue descriptor) is a set of parameters which allow
>
> I find the term MQD misleading. For the firmware the MQD is a very 
> different data structure from what you are defining here. It's a 
> persistent data structure in kernel address space (VMID0) that is 
> shared between the driver and the firmware that gets loaded or updated 
> when queues are mapped or unmapped. I'd want to avoid confusing the 
> firmware MQD with this structure.
>
I agree, I can change the name to something else like 
userq_properties_gfx_v11 or something similar

- Shashank

> Regards,
>   Felix
>
>
>> + * the GPU to uniquely define and identify a usermode queue. This
>> + * structure defines the MQD for GFX-V11 IP ver 0.
>> + */
>> +struct drm_amdgpu_userq_mqd_gfx_v11_0 {
>> +    /**
>> +     * @queue_va: Virtual address of the GPU memory which holds the 
>> queue
>> +     * object. The queue holds the workload packets.
>> +     */
>> +    __u64   queue_va;
>> +    /**
>> +     * @queue_size: Size of the queue in bytes, this needs to be 
>> 256-byte
>> +     * aligned.
>> +     */
>> +    __u64   queue_size;
>> +    /**
>> +     * @rptr_va : Virtual address of the GPU memory which holds the 
>> ring RPTR.
>> +     * This object must be at least 8 byte in size and aligned to 
>> 8-byte offset.
>> +     */
>> +    __u64   rptr_va;
>> +    /**
>> +     * @wptr_va : Virtual address of the GPU memory which holds the 
>> ring WPTR.
>> +     * This object must be at least 8 byte in size and aligned to 
>> 8-byte offset.
>> +     *
>> +     * Queue, RPTR and WPTR can come from the same object, as long 
>> as the size
>> +     * and alignment related requirements are met.
>> +     */
>> +    __u64   wptr_va;
>> +    /**
>> +     * @shadow_va: Virtual address of the GPU memory to hold the 
>> shadow buffer.
>> +     * This must be a from a separate GPU object, and must be at 
>> least 4-page
>> +     * sized.
>> +     */
>> +    __u64   shadow_va;
>> +    /**
>> +     * @gds_va: Virtual address of the GPU memory to hold the GDS 
>> buffer.
>> +     * This must be a from a separate GPU object, and must be at 
>> least 1-page
>> +     * sized.
>> +     */
>> +    __u64   gds_va;
>> +    /**
>> +     * @csa_va: Virtual address of the GPU memory to hold the CSA 
>> buffer.
>> +     * This must be a from a separate GPU object, and must be at 
>> least 1-page
>> +     * sized.
>> +     */
>> +    __u64   csa_va;
>> +};
>> +
>> +struct drm_amdgpu_userq_in {
>> +    /** AMDGPU_USERQ_OP_* */
>> +    __u32    op;
>> +    /** Queue handle for USERQ_OP_FREE */
>> +    __u32    queue_id;
>> +    /** the target GPU engine to execute workload (AMDGPU_HW_IP_*) */
>> +    __u32   ip_type;
>> +    /**
>> +     * @flags: flags to indicate special function for queue like secure
>> +     * buffer (TMZ). Unused for now.
>> +     */
>> +    __u32   flags;
>> +    /**
>> +     * @doorbell_handle: the handle of doorbell GEM object
>> +     * associated to this client.
>> +     */
>> +    __u32   doorbell_handle;
>> +    /**
>> +     * @doorbell_offset: 32-bit offset of the doorbell in the 
>> doorbell bo.
>> +     * Kernel will generate absolute doorbell offset using 
>> doorbell_handle
>> +     * and doorbell_offset in the doorbell bo.
>> +     */
>> +    __u32   doorbell_offset;
>> +    /**
>> +     * @mqd: Queue descriptor for USERQ_OP_CREATE
>> +     * MQD data can be of different size for different GPU IP/engine 
>> and
>> +     * their respective versions/revisions, so this points to a __u64 *
>> +     * which holds MQD of this usermode queue.
>> +     */
>> +    __u64 mqd;
>> +    /**
>> +     * @size: size of MQD data in bytes, it must match the MQD 
>> structure
>> +     * size of the respective engine/revision defined in UAPI for 
>> ex, for
>> +     * gfx_v11 workloads, size = sizeof(drm_amdgpu_userq_mqd_gfx_v11).
>> +     */
>> +    __u64 mqd_size;
>> +};
>> +
>> +struct drm_amdgpu_userq_out {
>> +    /** Queue handle */
>> +    __u32    queue_id;
>> +    /** Flags */
>> +    __u32    flags;
>> +};
>> +
>> +union drm_amdgpu_userq {
>> +    struct drm_amdgpu_userq_in in;
>> +    struct drm_amdgpu_userq_out out;
>> +};
>> +
>>   /* vm ioctl */
>>   #define AMDGPU_VM_OP_RESERVE_VMID    1
>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-10-05 11:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-08 16:04 [PATCH v6 0/9] AMDGPU usermode queues Shashank Sharma
2023-09-08 16:04 ` [PATCH v6 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
2023-10-04 21:23   ` Felix Kuehling
2023-10-05 10:59     ` Shashank Sharma
2023-09-08 16:04 ` [PATCH v6 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
2023-09-08 16:04 ` [PATCH v6 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
2023-09-20 14:47   ` Zhang, Yifan
2023-09-20 14:53     ` Sharma, Shashank
2023-09-08 16:04 ` [PATCH v6 4/9] drm/amdgpu: create GFX-gen11 " Shashank Sharma
2023-09-14  7:45   ` Christian König
2023-09-14  8:24     ` Shashank Sharma
2023-09-28 13:11       ` Shashank Sharma
2023-09-28 13:27         ` Alex Deucher
2023-09-28 13:40           ` Shashank Sharma
2023-09-28 13:52             ` Alex Deucher
2023-09-28 13:54               ` Shashank Sharma
2023-10-02  9:42                 ` Christian König
2023-09-08 16:04 ` [PATCH v6 5/9] drm/amdgpu: create context space for " Shashank Sharma
2023-09-20 15:21   ` Alex Deucher
2023-09-29 17:50     ` Shashank Sharma
2023-10-04 13:13       ` Alex Deucher
2023-09-08 16:04 ` [PATCH v6 6/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
2023-09-20 15:23   ` Alex Deucher
2023-09-20 15:28   ` Alex Deucher
2023-09-08 16:04 ` [PATCH v6 7/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
2023-09-18 10:32   ` Christian König
2023-10-04 21:34     ` Felix Kuehling
2023-10-05 10:57       ` Shashank Sharma
2023-09-08 16:04 ` [PATCH v6 8/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
2023-09-20 15:25   ` Alex Deucher
2023-09-08 16:04 ` [PATCH v6 9/9] drm/amdgpu: cleanup leftover queues Shashank Sharma
2023-09-20 15:26   ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.