All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] AMDGPU usermode queues
@ 2023-02-03 21:54 Shashank Sharma
  2023-02-03 21:54 ` [PATCH 1/8] drm/amdgpu: UAPI for user queue management Shashank Sharma
                   ` (9 more replies)
  0 siblings, 10 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, shashank.sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

This patch series introduces AMDGPU usermode graphics queues.
User queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue
and submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
- Kernel picks any 32-bit offset in the doorbell page for this queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the the GPU addresses of these objects (read
  ptr, write ptr, queue base address and doorbell address)
- The kernel creates the queue and maps it in the HW.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- Once the data is filled in the queue, the app must write the number of
  dwords in the doorbell offset, and the GPU will start fetching the data.

libDRM changes for this series and a sample DRM test program can be found
in the MESA merge request here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

The RFC patch series and previous discussion can be seen here:
https://patchwork.freedesktop.org/series/112214/

This patch series needs the doorbell re-design changes, which are being
reviewed here:
https://patchwork.freedesktop.org/series/113669/

In absence of the doorbell patches, this patch series uses a hack patch
to test the functionality. That hack patch is also published here at the
end of the series, just for reference.

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Arvind Yadav (1):
  drm/amdgpu: DO-NOT-MERGE add busy-waiting delay

Shashank Sharma (6):
  drm/amdgpu: add usermode queues
  drm/amdgpu: introduce userqueue MQD handlers
  drm/amdgpu: Add V11 graphics MQD functions
  drm/amdgpu: Create context for usermode queue
  drm/amdgpu: Map userqueue into HW
  drm/amdgpu: DO-NOT-MERGE doorbell hack

 drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 365 ++++++++++++++++++
 .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 300 ++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  93 +++++
 drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
 include/uapi/drm/amdgpu_drm.h                 |  59 +++
 9 files changed, 837 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-03 22:07   ` Alex Deucher
  2023-02-03 21:54 ` [PATCH 2/8] drm/amdgpu: add usermode queues Shashank Sharma
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, shashank.sharma

From: Alex Deucher <alexander.deucher@amd.com>

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 4038abe8505a..6c5235d107b3 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM			0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
 #define DRM_AMDGPU_SCHED		0x15
+#define DRM_AMDGPU_USERQ		0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE	1
+#define AMDGPU_USERQ_OP_FREE	2
+
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
+#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
+
+struct drm_amdgpu_userq_mqd {
+	/** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
+	__u32	flags;
+	/** IP type: AMDGPU_HW_IP_* */
+	__u32	ip_type;
+	/** GEM object handle */
+	__u32   doorbell_handle;
+	/** Doorbell offset in dwords */
+	__u32   doorbell_offset;
+	/** GPU virtual address of the queue */
+	__u64   queue_va;
+	/** Size of the queue in bytes */
+	__u64   queue_size;
+	/** GPU virtual address of the rptr */
+	__u64   rptr_va;
+	/** GPU virtual address of the wptr */
+	__u64   wptr_va;
+};
+
+struct drm_amdgpu_userq_in {
+	/** AMDGPU_USERQ_OP_* */
+	__u32	op;
+	/** Flags */
+	__u32	flags;
+	/** Queue handle to associate the queue free call with,
+	 * unused for queue create calls */
+	__u32	queue_id;
+	__u32	pad;
+	/** Queue descriptor */
+	struct drm_amdgpu_userq_mqd mqd;
+};
+
+struct drm_amdgpu_userq_out {
+	/** Queue handle */
+	__u32	q_id;
+	/** Flags */
+	__u32	flags;
+};
+
+union drm_amdgpu_userq {
+	struct drm_amdgpu_userq_in in;
+	struct drm_amdgpu_userq_out out;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/8] drm/amdgpu: add usermode queues
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
  2023-02-03 21:54 ` [PATCH 1/8] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-07  7:08   ` Christian König
  2023-02-07 14:54   ` Alex Deucher
  2023-02-03 21:54 ` [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers Shashank Sharma
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, shashank.sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

This patch adds skeleton code for usermode queue creation. It
typically contains:
- A new structure to keep all the user queue data in one place.
- An IOCTL function to create/free a usermode queue.
- A function to generate unique index for the queue.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)

- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 155 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  64 ++++++++
 6 files changed, 230 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 798d0e9a60b7..764801cc8203 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -210,6 +210,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6b74df446694..0625d6bdadf4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -109,6 +109,7 @@
 #include "amdgpu_fdinfo.h"
 #include "amdgpu_mca.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_userqueue.h"
 
 #define MAX_GPU_INSTANCE		16
 
@@ -482,6 +483,7 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	struct amdgpu_userq_mgr	userq_mgr;
 };
 
 int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b4f2d61ea0d5..229976a2d0e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -52,6 +52,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_userqueue.h"
 
 /*
  * KMS wrapper.
@@ -2748,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 7aa7e52ca784..52e61e339a88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1187,6 +1187,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+	if (r)
+		DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
+
 	file_priv->driver_priv = fpriv;
 	goto out_suspend;
 
@@ -1254,6 +1258,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
+	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
 	if (pasid)
 		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index 000000000000..d5bc7fe81750
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,155 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+#include "amdgpu_vm.h"
+
+static inline int
+amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
+}
+
+static inline void
+amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
+{
+    idr_remove(&uq_mgr->userq_idr, queue_id);
+}
+
+static struct amdgpu_usermode_queue
+*amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+    return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+    int r, pasid;
+    struct amdgpu_usermode_queue *queue;
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_vm *vm = &fpriv->vm;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
+
+    pasid = vm->pasid;
+    if (vm->pasid < 0) {
+        DRM_WARN("No PASID info found\n");
+        pasid = 0;
+    }
+
+    mutex_lock(&uq_mgr->userq_mutex);
+
+    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+    if (!queue) {
+        DRM_ERROR("Failed to allocate memory for queue\n");
+        mutex_unlock(&uq_mgr->userq_mutex);
+        return -ENOMEM;
+    }
+
+    queue->vm = vm;
+    queue->pasid = pasid;
+    queue->wptr_gpu_addr = mqd_in->wptr_va;
+    queue->rptr_gpu_addr = mqd_in->rptr_va;
+    queue->queue_size = mqd_in->queue_size;
+    queue->queue_type = mqd_in->ip_type;
+    queue->queue_gpu_addr = mqd_in->queue_va;
+    queue->flags = mqd_in->flags;
+    queue->use_doorbell = true;
+    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
+    if (queue->queue_id < 0) {
+        DRM_ERROR("Failed to allocate a queue id\n");
+        r = queue->queue_id;
+        goto free_queue;
+    }
+
+    list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
+    args->out.q_id = queue->queue_id;
+    args->out.flags = 0;
+    mutex_unlock(&uq_mgr->userq_mutex);
+    return 0;
+
+free_queue:
+    mutex_unlock(&uq_mgr->userq_mutex);
+    kfree(queue);
+    return r;
+}
+
+static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct amdgpu_usermode_queue *queue;
+
+    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+    if (!queue) {
+        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+        return;
+    }
+
+    mutex_lock(&uq_mgr->userq_mutex);
+    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
+    list_del(&queue->userq_node);
+    mutex_unlock(&uq_mgr->userq_mutex);
+    kfree(queue);
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *filp)
+{
+    union drm_amdgpu_userq *args = data;
+    int r = 0;
+
+    switch (args->in.op) {
+    case AMDGPU_USERQ_OP_CREATE:
+        r = amdgpu_userqueue_create(filp, args);
+        if (r)
+            DRM_ERROR("Failed to create usermode queue\n");
+        break;
+
+    case AMDGPU_USERQ_OP_FREE:
+        amdgpu_userqueue_destroy(filp, args->in.queue_id);
+        break;
+
+    default:
+        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
+        return -EINVAL;
+    }
+
+    return r;
+}
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
+{
+    mutex_init(&userq_mgr->userq_mutex);
+    idr_init_base(&userq_mgr->userq_idr, 1);
+    INIT_LIST_HEAD(&userq_mgr->userq_list);
+    userq_mgr->adev = adev;
+
+    return 0;
+}
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
+{
+    idr_destroy(&userq_mgr->userq_idr);
+    mutex_destroy(&userq_mgr->userq_mutex);
+}
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
new file mode 100644
index 000000000000..9557588fe34f
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDGPU_USERQUEUE_H_
+#define AMDGPU_USERQUEUE_H_
+
+#define AMDGPU_MAX_USERQ 512
+
+struct amdgpu_userq_mgr {
+	struct idr userq_idr;
+	struct mutex userq_mutex;
+	struct list_head userq_list;
+	struct amdgpu_device *adev;
+};
+
+struct amdgpu_usermode_queue {
+	int		queue_id;
+	int		queue_type;
+	int		queue_size;
+	int		pasid;
+	int		doorbell_index;
+	int 		use_doorbell;
+
+	uint64_t	wptr_gpu_addr;
+	uint64_t	rptr_gpu_addr;
+	uint64_t	queue_gpu_addr;
+	uint64_t	flags;
+
+	uint64_t	mqd_gpu_addr;
+	void 		*mqd_cpu_ptr;
+
+	struct amdgpu_bo	*mqd_obj;
+	struct amdgpu_vm    	*vm;
+	struct amdgpu_userq_mgr *userq_mgr;
+	struct list_head 	userq_node;
+};
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
  2023-02-03 21:54 ` [PATCH 1/8] drm/amdgpu: UAPI for user queue management Shashank Sharma
  2023-02-03 21:54 ` [PATCH 2/8] drm/amdgpu: add usermode queues Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-07  7:11   ` Christian König
  2023-02-07 14:59   ` Alex Deucher
  2023-02-03 21:54 ` [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions Shashank Sharma
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, shashank.sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
context. As the method of formation of a MQD, and its format can vary between
different graphics IPs, we need gfx GEN specific handlers to create MQDs.

This patch:
- Introduces MQD hander functions for the usermode queues
- A general function to create and destroy MQD for a userqueue.

V1: Worked on review comments from Alex on RFC patches:
    MQD creation should be gen and IP specific.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 64 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  9 +++
 2 files changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index d5bc7fe81750..625c2fe1e84a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -42,6 +42,60 @@ static struct amdgpu_usermode_queue
     return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+static int
+amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    int r;
+    int size;
+    struct amdgpu_device *adev = uq_mgr->adev;
+
+    if (!uq_mgr->userq_mqd_funcs) {
+        DRM_ERROR("Userqueue not initialized\n");
+        return -EINVAL;
+    }
+
+    size = uq_mgr->userq_mqd_funcs->mqd_size(uq_mgr);
+    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_VRAM,
+                                &queue->mqd_obj,
+                                &queue->mqd_gpu_addr,
+                                &queue->mqd_cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
+        return r;
+    }
+
+    memset(queue->mqd_cpu_ptr, 0, size);
+    r = amdgpu_bo_reserve(queue->mqd_obj, false);
+    if (unlikely(r != 0)) {
+        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
+        goto free_mqd;
+    }
+
+    r = uq_mgr->userq_mqd_funcs->mqd_create(uq_mgr, queue);
+    amdgpu_bo_unreserve(queue->mqd_obj);
+    if (r) {
+        DRM_ERROR("Failed to create MQD for queue\n");
+        goto free_mqd;
+    }
+    return 0;
+
+free_mqd:
+    amdgpu_bo_free_kernel(&queue->mqd_obj,
+			   &queue->mqd_gpu_addr,
+			   &queue->mqd_cpu_ptr);
+   return r;
+}
+
+static void
+amdgpu_userqueue_destroy_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    uq_mgr->userq_mqd_funcs->mqd_destroy(uq_mgr, queue);
+    amdgpu_bo_free_kernel(&queue->mqd_obj,
+			   &queue->mqd_gpu_addr,
+			   &queue->mqd_cpu_ptr);
+}
+
 static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 {
     int r, pasid;
@@ -82,12 +136,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
         goto free_queue;
     }
 
+    r = amdgpu_userqueue_create_mqd(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to create MQD\n");
+        goto free_qid;
+    }
+
     list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
     args->out.q_id = queue->queue_id;
     args->out.flags = 0;
     mutex_unlock(&uq_mgr->userq_mutex);
     return 0;
 
+free_qid:
+    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
+
 free_queue:
     mutex_unlock(&uq_mgr->userq_mutex);
     kfree(queue);
@@ -107,6 +170,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
     }
 
     mutex_lock(&uq_mgr->userq_mutex);
+    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
     list_del(&queue->userq_node);
     mutex_unlock(&uq_mgr->userq_mutex);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 9557588fe34f..a6abdfd5cb74 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -26,10 +26,13 @@
 
 #define AMDGPU_MAX_USERQ 512
 
+struct amdgpu_userq_mqd_funcs;
+
 struct amdgpu_userq_mgr {
 	struct idr userq_idr;
 	struct mutex userq_mutex;
 	struct list_head userq_list;
+	const struct amdgpu_userq_mqd_funcs *userq_mqd_funcs;
 	struct amdgpu_device *adev;
 };
 
@@ -57,6 +60,12 @@ struct amdgpu_usermode_queue {
 
 int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
 
+struct amdgpu_userq_mqd_funcs {
+	int (*mqd_size)(struct amdgpu_userq_mgr *);
+	int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+};
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (2 preceding siblings ...)
  2023-02-03 21:54 ` [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-07 15:17   ` Alex Deucher
  2023-02-03 21:54 ` [PATCH 5/8] drm/amdgpu: Create context for usermode queue Shashank Sharma
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig,
	Arvind Yadav, shashank.sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

MQD describes the properies of a user queue to the HW, and allows it to
accurately configure the queue while mapping it in GPU HW. This patch
adds:
- A new header file which contains the userqueue MQD definition for
  V11 graphics engine.
- A new function which fills it with userqueue data and prepares MQD
- A function which sets-up the MQD function ptrs in the generic userqueue
  creation code.

V1: Addressed review comments from RFC patch series
    - Reuse the existing MQD structure instead of creating a new one
    - MQD format and creation can be IP specific, keep it like that

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
 .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
 drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
 4 files changed, 169 insertions(+), 8 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 764801cc8203..6ae9d5792791 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
 
 # add usermode queue
 amdgpu-y += amdgpu_userqueue.o
+amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 625c2fe1e84a..9f3490a91776 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
     return r;
 }
 
+extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
+
+static int
+amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
+{
+    int maj;
+    struct amdgpu_device *adev = uq_mgr->adev;
+    uint32_t version = adev->ip_versions[GC_HWIP][0];
+
+    maj = IP_VERSION_MAJ(version);
+    if (maj == 11) {
+        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
+    } else {
+        DRM_WARN("This IP doesn't support usermode queues\n");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
+    int r;
+
     mutex_init(&userq_mgr->userq_mutex);
     idr_init_base(&userq_mgr->userq_idr, 1);
     INIT_LIST_HEAD(&userq_mgr->userq_list);
     userq_mgr->adev = adev;
 
+    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
+    if (r) {
+        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
+        return r;
+    }
+
     return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
new file mode 100644
index 000000000000..57889729d635
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_userqueue.h"
+#include "v11_structs.h"
+#include "amdgpu_mes.h"
+#include "gc/gc_11_0_0_offset.h"
+#include "gc/gc_11_0_0_sh_mask.h"
+
+static int
+amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    uint32_t tmp, rb_bufsz;
+    uint64_t hqd_gpu_addr, wb_gpu_addr;
+    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
+    struct amdgpu_device *adev = uq_mgr->adev;
+
+    /* set up gfx hqd wptr */
+    mqd->cp_gfx_hqd_wptr = 0;
+    mqd->cp_gfx_hqd_wptr_hi = 0;
+
+    /* set the pointer to the MQD */
+    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
+    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
+
+    /* set up mqd control */
+    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
+    mqd->cp_gfx_mqd_control = tmp;
+
+    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
+    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
+    mqd->cp_gfx_hqd_vmid = 0;
+
+    /* set up default queue priority level
+    * 0x0 = low priority, 0x1 = high priority */
+    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
+    mqd->cp_gfx_hqd_queue_priority = tmp;
+
+    /* set up time quantum */
+    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
+    mqd->cp_gfx_hqd_quantum = tmp;
+
+    /* set up gfx hqd base. this is similar as CP_RB_BASE */
+    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
+    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
+    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
+
+    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
+    wb_gpu_addr = queue->rptr_gpu_addr;
+    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
+    mqd->cp_gfx_hqd_rptr_addr_hi =
+    upper_32_bits(wb_gpu_addr) & 0xffff;
+
+    /* set up rb_wptr_poll addr */
+    wb_gpu_addr = queue->wptr_gpu_addr;
+    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
+    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
+
+    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
+    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
+    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
+    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
+#ifdef __BIG_ENDIAN
+    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
+#endif
+    mqd->cp_gfx_hqd_cntl = tmp;
+
+    /* set up cp_doorbell_control */
+    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
+    if (queue->use_doorbell) {
+        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
+                    DOORBELL_OFFSET, queue->doorbell_index);
+        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
+                    DOORBELL_EN, 1);
+    } else {
+        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
+                    DOORBELL_EN, 0);
+    }
+    mqd->cp_rb_doorbell_control = tmp;
+
+    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
+    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
+
+    /* activate the queue */
+    mqd->cp_gfx_hqd_active = 1;
+
+    return 0;
+}
+
+static void
+amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+
+}
+
+static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
+{
+    return sizeof(struct v11_gfx_mqd);
+}
+
+const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
+    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
+    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
+    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
+};
diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
index b8ff7456ae0b..f8008270f813 100644
--- a/drivers/gpu/drm/amd/include/v11_structs.h
+++ b/drivers/gpu/drm/amd/include/v11_structs.h
@@ -25,14 +25,14 @@
 #define V11_STRUCTS_H_
 
 struct v11_gfx_mqd {
-	uint32_t reserved_0; // offset: 0  (0x0)
-	uint32_t reserved_1; // offset: 1  (0x1)
-	uint32_t reserved_2; // offset: 2  (0x2)
-	uint32_t reserved_3; // offset: 3  (0x3)
-	uint32_t reserved_4; // offset: 4  (0x4)
-	uint32_t reserved_5; // offset: 5  (0x5)
-	uint32_t reserved_6; // offset: 6  (0x6)
-	uint32_t reserved_7; // offset: 7  (0x7)
+	uint32_t shadow_base_lo; // offset: 0  (0x0)
+	uint32_t shadow_base_hi; // offset: 1  (0x1)
+	uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
+	uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
+	uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
+	uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
+	uint32_t shadow_initialized; // offset: 6  (0x6)
+	uint32_t ib_vmid; // offset: 7  (0x7)
 	uint32_t reserved_8; // offset: 8  (0x8)
 	uint32_t reserved_9; // offset: 9  (0x9)
 	uint32_t reserved_10; // offset: 10  (0xA)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (3 preceding siblings ...)
  2023-02-03 21:54 ` [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-07  7:14   ` Christian König
  2023-02-07 16:51   ` Alex Deucher
  2023-02-03 21:54 ` [PATCH 6/8] drm/amdgpu: Map userqueue into HW Shashank Sharma
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, shashank.sharma

The FW expects us to allocate atleast one page as context space to
process gang, process, shadow, GDS and FW_space related work. This
patch creates some object for the same, and adds an IP specific
functions to do this.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
 .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
 3 files changed, 171 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 9f3490a91776..18281b3a51f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
     return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+static void
+amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+                                   struct amdgpu_usermode_queue *queue)
+{
+    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
+}
+
+static int
+amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+                                  struct amdgpu_usermode_queue *queue)
+{
+    int r;
+
+    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to create context space for queue\n");
+        return r;
+    }
+
+    return 0;
+}
+
 static int
 amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
 {
@@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
         goto free_qid;
     }
 
+    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to create context space\n");
+        goto free_mqd;
+    }
+
     list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
     args->out.q_id = queue->queue_id;
     args->out.flags = 0;
     mutex_unlock(&uq_mgr->userq_mutex);
     return 0;
 
+free_mqd:
+    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
+
 free_qid:
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
 
@@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
     }
 
     mutex_lock(&uq_mgr->userq_mutex);
+    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
     amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
     list_del(&queue->userq_node);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
index 57889729d635..687f90a587e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
@@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
 
 }
 
+static int amdgpu_userq_gfx_v11_ctx_create(struct amdgpu_userq_mgr *uq_mgr,
+                                           struct amdgpu_usermode_queue *queue)
+{
+    int r;
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
+    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
+    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
+    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
+    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
+
+    /*
+     * The FW expects atleast one page space allocated for
+     * process context related work, and one for gang context.
+     */
+    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_VRAM,
+                                &pctx->obj,
+                                &pctx->gpu_addr,
+                                &pctx->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", r);
+        return r;
+    }
+
+    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_VRAM,
+                                &gctx->obj,
+                                &gctx->gpu_addr,
+                                &gctx->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", r);
+        goto err_gangctx;
+    }
+
+    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_VRAM,
+                                &gdsctx->obj,
+                                &gdsctx->gpu_addr,
+                                &gdsctx->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
+        goto err_gdsctx;
+    }
+
+    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_VRAM,
+                                &fwctx->obj,
+                                &fwctx->gpu_addr,
+                                &fwctx->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
+        goto err_fwctx;
+    }
+
+    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_VRAM,
+                                &sctx->obj,
+                                &sctx->gpu_addr,
+                                &sctx->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate shadow bo for userqueue (%d)", r);
+        goto err_sctx;
+    }
+
+    return 0;
+
+err_sctx:
+    amdgpu_bo_free_kernel(&fwctx->obj,
+                          &fwctx->gpu_addr,
+                          &fwctx->cpu_ptr);
+
+err_fwctx:
+    amdgpu_bo_free_kernel(&gdsctx->obj,
+                          &gdsctx->gpu_addr,
+                          &gdsctx->cpu_ptr);
+
+err_gdsctx:
+    amdgpu_bo_free_kernel(&gctx->obj,
+                          &gctx->gpu_addr,
+                          &gctx->cpu_ptr);
+
+err_gangctx:
+    amdgpu_bo_free_kernel(&pctx->obj,
+                          &pctx->gpu_addr,
+                          &pctx->cpu_ptr);
+    return r;
+}
+
+static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
+                                            struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
+    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
+    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
+    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
+    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
+
+    amdgpu_bo_free_kernel(&sctx->obj,
+                          &sctx->gpu_addr,
+                          &sctx->cpu_ptr);
+
+    amdgpu_bo_free_kernel(&fwctx->obj,
+                          &fwctx->gpu_addr,
+                          &fwctx->cpu_ptr);
+
+    amdgpu_bo_free_kernel(&gdsctx->obj,
+                          &gdsctx->gpu_addr,
+                          &gdsctx->cpu_ptr);
+
+    amdgpu_bo_free_kernel(&gctx->obj,
+                          &gctx->gpu_addr,
+                          &gctx->cpu_ptr);
+
+    amdgpu_bo_free_kernel(&pctx->obj,
+                          &pctx->gpu_addr,
+                          &pctx->cpu_ptr);
+}
+
 static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
 {
     return sizeof(struct v11_gfx_mqd);
@@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
     .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
     .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
     .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
+    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
+    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
 };
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index a6abdfd5cb74..3adcd31618f7 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -25,9 +25,19 @@
 #define AMDGPU_USERQUEUE_H_
 
 #define AMDGPU_MAX_USERQ 512
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
 
 struct amdgpu_userq_mqd_funcs;
 
+struct amdgpu_userq_ctx {
+	struct amdgpu_bo *obj;
+	uint64_t gpu_addr;
+	void	*cpu_ptr;
+};
+
 struct amdgpu_userq_mgr {
 	struct idr userq_idr;
 	struct mutex userq_mutex;
@@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
 	uint64_t	mqd_gpu_addr;
 	void 		*mqd_cpu_ptr;
 
+	struct amdgpu_userq_ctx	proc_ctx;
+	struct amdgpu_userq_ctx	gang_ctx;
+	struct amdgpu_userq_ctx	gds_ctx;
+	struct amdgpu_userq_ctx	fw_ctx;
+	struct amdgpu_userq_ctx	shadow_ctx;
+
 	struct amdgpu_bo	*mqd_obj;
 	struct amdgpu_vm    	*vm;
 	struct amdgpu_userq_mgr *userq_mgr;
@@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
 	int (*mqd_size)(struct amdgpu_userq_mgr *);
 	int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
 	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+	int (*ctx_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+	void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
 };
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/8] drm/amdgpu: Map userqueue into HW
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (4 preceding siblings ...)
  2023-02-03 21:54 ` [PATCH 5/8] drm/amdgpu: Create context for usermode queue Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-07  7:20   ` Christian König
  2023-02-03 21:54 ` [PATCH 7/8] drm/amdgpu: DO-NOT-MERGE add busy-waiting delay Shashank Sharma
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, shashank.sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

This patch adds new fptrs to prepare the usermode queue to be
mapped or unmapped into the HW. After this mapping, the queue
will be ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
    - Map/Unmap should be IP specific.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 57 +++++++++++++++++++
 .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 47 +++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  2 +
 3 files changed, 106 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 18281b3a51f1..cbfe2608c040 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -42,6 +42,53 @@ static struct amdgpu_usermode_queue
     return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+static void
+amdgpu_userqueue_unmap(struct amdgpu_userq_mgr *uq_mgr,
+                     struct amdgpu_usermode_queue *queue)
+{
+    int r;
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct mes_remove_queue_input remove_request;
+
+    uq_mgr->userq_mqd_funcs->prepare_unmap(uq_mgr, queue, (void *)&remove_request);
+
+    amdgpu_mes_lock(&adev->mes);
+    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &remove_request);
+    amdgpu_mes_unlock(&adev->mes);
+    if (r) {
+        DRM_ERROR("Failed to unmap usermode queue %d\n", queue->queue_id);
+        return;
+    }
+
+    DRM_DEBUG_DRIVER("Usermode queue %d unmapped\n", queue->queue_id);
+}
+
+static int
+amdgpu_userqueue_map(struct amdgpu_userq_mgr *uq_mgr,
+                     struct amdgpu_usermode_queue *queue)
+{
+    int r;
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct mes_add_queue_input add_request;
+
+    r = uq_mgr->userq_mqd_funcs->prepare_map(uq_mgr, queue, (void *)&add_request);
+    if (r) {
+        DRM_ERROR("Failed to map userqueue\n");
+        return r;
+    }
+
+    amdgpu_mes_lock(&adev->mes);
+    r = adev->mes.funcs->add_hw_queue(&adev->mes, &add_request);
+    amdgpu_mes_unlock(&adev->mes);
+    if (r) {
+        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+        return r;
+    }
+
+    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
+    return 0;
+}
+
 static void
 amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
                                    struct amdgpu_usermode_queue *queue)
@@ -170,12 +217,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
         goto free_mqd;
     }
 
+    r = amdgpu_userqueue_map(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to map userqueue\n");
+        goto free_ctx;
+    }
+
     list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
     args->out.q_id = queue->queue_id;
     args->out.flags = 0;
     mutex_unlock(&uq_mgr->userq_mutex);
     return 0;
 
+free_ctx:
+    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
+
 free_mqd:
     amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
 
@@ -201,6 +257,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
     }
 
     mutex_lock(&uq_mgr->userq_mutex);
+    amdgpu_userqueue_unmap(uq_mgr, queue);
     amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
     amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
index 687f90a587e3..d317bb600fd9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
@@ -24,6 +24,7 @@
 #include "amdgpu_userqueue.h"
 #include "v11_structs.h"
 #include "amdgpu_mes.h"
+#include "mes_api_def.h"
 #include "gc/gc_11_0_0_offset.h"
 #include "gc/gc_11_0_0_sh_mask.h"
 
@@ -239,6 +240,50 @@ static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
                           &pctx->cpu_ptr);
 }
 
+static int
+amdgpu_userq_gfx_v11_prepare_map(struct amdgpu_userq_mgr *uq_mgr,
+                                 struct amdgpu_usermode_queue *queue,
+                                 void *q_input)
+{
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct mes_add_queue_input *queue_input = q_input;
+
+    memset(queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+    queue_input->process_va_start = 0;
+    queue_input->process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
+    queue_input->process_quantum = 100000; /* 10ms */
+    queue_input->gang_quantum = 10000; /* 1ms */
+    queue_input->paging = false;
+
+    queue_input->gang_context_addr = queue->gang_ctx.gpu_addr;
+    queue_input->process_context_addr = queue->proc_ctx.gpu_addr;
+    queue_input->inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+    queue_input->gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+    queue_input->process_id = queue->pasid;
+    queue_input->queue_type = queue->queue_type;
+    queue_input->mqd_addr = queue->mqd_gpu_addr;
+    queue_input->wptr_addr = queue->wptr_gpu_addr;
+    queue_input->wptr_mc_addr = queue->wptr_gpu_addr << AMDGPU_GPU_PAGE_SHIFT;
+    queue_input->queue_size = queue->queue_size >> 2;
+    queue_input->doorbell_offset = queue->doorbell_index;
+    queue_input->page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+    return 0;
+}
+
+static void
+amdgpu_userq_gfx_v11_prepare_unmap(struct amdgpu_userq_mgr *uq_mgr,
+                                   struct amdgpu_usermode_queue *queue,
+                                   void *q_input)
+{
+    struct mes_remove_queue_input *queue_input = q_input;
+
+    memset(queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+    queue_input->doorbell_offset = queue->doorbell_index;
+    queue_input->gang_context_addr = queue->gang_ctx.gpu_addr;
+}
+
 static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
 {
     return sizeof(struct v11_gfx_mqd);
@@ -250,4 +295,6 @@ const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
     .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
     .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
     .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
+    .prepare_map = amdgpu_userq_gfx_v11_prepare_map,
+    .prepare_unmap = amdgpu_userq_gfx_v11_prepare_unmap,
 };
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 3adcd31618f7..202fac237501 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -82,6 +82,8 @@ struct amdgpu_userq_mqd_funcs {
 	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
 	int (*ctx_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
 	void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+	int (*prepare_map)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *, void *);
+	void (*prepare_unmap)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *, void *);
 };
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/8] drm/amdgpu: DO-NOT-MERGE add busy-waiting delay
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (5 preceding siblings ...)
  2023-02-03 21:54 ` [PATCH 6/8] drm/amdgpu: Map userqueue into HW Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-03 21:54 ` [PATCH 8/8] drm/amdgpu: DO-NOT-MERGE doorbell hack Shashank Sharma
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, christian.koenig, Arvind Yadav, shashank.sharma

From: Arvind Yadav <arvind.yadav@amd.com>

This patch adds 20 ms of busy-waiting delay after mapping the
usermode queue in MES HW. It was observed during the testing that
this delay is required for expected results.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>

Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index cbfe2608c040..a28ed8e98f7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -24,6 +24,15 @@
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
 
+static inline void userqueue_busy_wait(unsigned long ms)
+{
+   unsigned long start = jiffies;
+   unsigned long timeout = start + ms;
+   while(time_before(jiffies, timeout)) {
+
+   }
+}
+
 static inline int
 amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
 {
@@ -85,6 +94,7 @@ amdgpu_userqueue_map(struct amdgpu_userq_mgr *uq_mgr,
         return r;
     }
 
+    userqueue_busy_wait(20);
     DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
     return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 8/8] drm/amdgpu: DO-NOT-MERGE doorbell hack
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (6 preceding siblings ...)
  2023-02-03 21:54 ` [PATCH 7/8] drm/amdgpu: DO-NOT-MERGE add busy-waiting delay Shashank Sharma
@ 2023-02-03 21:54 ` Shashank Sharma
  2023-02-06  0:52 ` [PATCH 0/8] AMDGPU usermode queues Dave Airlie
  2023-02-06 15:39 ` Michel Dänzer
  9 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 21:54 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, shashank.sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

The doorbell patches, being reviewed here, are requied for Usermode
queues:
https://patchwork.freedesktop.org/series/113669/

This hack patch adds a doorbell IOCTL just to test the usermode
queues functionality, and must not be merged.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>

Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 19 +++++++++++++++++++
 include/uapi/drm/amdgpu_drm.h                 |  6 ++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index a28ed8e98f7b..b8715dfe27bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -33,6 +33,8 @@ static inline void userqueue_busy_wait(unsigned long ms)
    }
 }
 
+#define AMDGPU_USERQ_DOORBELL_INDEX (AMDGPU_NAVI10_DOORBELL_GFX_USERQUEUE_START + 4)
+
 static inline int
 amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
 {
@@ -208,6 +210,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
     queue->queue_gpu_addr = mqd_in->queue_va;
     queue->flags = mqd_in->flags;
     queue->use_doorbell = true;
+    queue->doorbell_index = AMDGPU_USERQ_DOORBELL_INDEX;
     queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
     if (queue->queue_id < 0) {
         DRM_ERROR("Failed to allocate a queue id\n");
@@ -276,6 +279,22 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
     kfree(queue);
 }
 
+int amdgpu_userq_doorbell_ring_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *filp)
+{
+    struct drm_amdgpu_db_ring *in = data;
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct amdgpu_device *adev = uq_mgr->adev;
+
+    mutex_lock(&uq_mgr->userq_mutex);
+    WDOORBELL32(AMDGPU_USERQ_DOORBELL_INDEX, in->val);
+    mutex_unlock(&uq_mgr->userq_mutex);
+
+    DRM_DEBUG_DRIVER("Doorbell rung\n");
+    return 0;
+}
+
 int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
 		       struct drm_file *filp)
 {
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 6c5235d107b3..2d94cca566e0 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -55,6 +55,7 @@ extern "C" {
 #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
 #define DRM_AMDGPU_SCHED		0x15
 #define DRM_AMDGPU_USERQ		0x16
+#define DRM_AMDGPU_USERQ_DOORBELL_RING		0x17
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -73,6 +74,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
 #define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
+#define DRM_IOCTL_AMDGPU_USERQ_DOORBELL_RING		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_DOORBELL_RING, struct drm_amdgpu_db_ring)
 
 /**
  * DOC: memory domains
@@ -350,6 +352,10 @@ struct drm_amdgpu_userq_out {
 	__u32	flags;
 };
 
+struct drm_amdgpu_db_ring {
+	__u64 val;
+};
+
 union drm_amdgpu_userq {
 	struct drm_amdgpu_userq_in in;
 	struct drm_amdgpu_userq_out out;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-03 21:54 ` [PATCH 1/8] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2023-02-03 22:07   ` Alex Deucher
  2023-02-03 22:26     ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-03 22:07 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: alexander.deucher, christian.koenig, amd-gfx

On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Alex Deucher <alexander.deucher@amd.com>
>
> This patch intorduces new UAPI/IOCTL for usermode graphics
> queue. The userspace app will fill this structure and request
> the graphics driver to add a graphics work queue for it. The
> output of this UAPI is a queue id.
>
> This UAPI maps the queue into GPU, so the graphics app can start
> submitting work to the queue as soon as the call returns.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 4038abe8505a..6c5235d107b3 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -54,6 +54,7 @@ extern "C" {
>  #define DRM_AMDGPU_VM                  0x13
>  #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>  #define DRM_AMDGPU_SCHED               0x15
> +#define DRM_AMDGPU_USERQ               0x16
>
>  #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>  #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -71,6 +72,7 @@ extern "C" {
>  #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>  #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>  #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>
>  /**
>   * DOC: memory domains
> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>         union drm_amdgpu_ctx_out out;
>  };
>
> +/* user queue IOCTL */
> +#define AMDGPU_USERQ_OP_CREATE 1
> +#define AMDGPU_USERQ_OP_FREE   2
> +
> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> +
> +struct drm_amdgpu_userq_mqd {
> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> +       __u32   flags;
> +       /** IP type: AMDGPU_HW_IP_* */
> +       __u32   ip_type;
> +       /** GEM object handle */
> +       __u32   doorbell_handle;
> +       /** Doorbell offset in dwords */
> +       __u32   doorbell_offset;

Since doorbells are 64 bit, maybe this offset should be in qwords.


> +       /** GPU virtual address of the queue */
> +       __u64   queue_va;
> +       /** Size of the queue in bytes */
> +       __u64   queue_size;
> +       /** GPU virtual address of the rptr */
> +       __u64   rptr_va;
> +       /** GPU virtual address of the wptr */
> +       __u64   wptr_va;
> +};
> +
> +struct drm_amdgpu_userq_in {
> +       /** AMDGPU_USERQ_OP_* */
> +       __u32   op;
> +       /** Flags */
> +       __u32   flags;
> +       /** Queue handle to associate the queue free call with,
> +        * unused for queue create calls */
> +       __u32   queue_id;
> +       __u32   pad;
> +       /** Queue descriptor */
> +       struct drm_amdgpu_userq_mqd mqd;
> +};
> +
> +struct drm_amdgpu_userq_out {
> +       /** Queue handle */
> +       __u32   q_id;

Maybe this should be queue_id to match the input.

Alex

> +       /** Flags */
> +       __u32   flags;
> +};
> +
> +union drm_amdgpu_userq {
> +       struct drm_amdgpu_userq_in in;
> +       struct drm_amdgpu_userq_out out;
> +};
> +
>  /* vm ioctl */
>  #define AMDGPU_VM_OP_RESERVE_VMID      1
>  #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-03 22:07   ` Alex Deucher
@ 2023-02-03 22:26     ` Shashank Sharma
  2023-02-06 16:56       ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-03 22:26 UTC (permalink / raw)
  To: Alex Deucher; +Cc: alexander.deucher, christian.koenig, amd-gfx

Hey Alex,

On 03/02/2023 23:07, Alex Deucher wrote:
> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> This patch intorduces new UAPI/IOCTL for usermode graphics
>> queue. The userspace app will fill this structure and request
>> the graphics driver to add a graphics work queue for it. The
>> output of this UAPI is a queue id.
>>
>> This UAPI maps the queue into GPU, so the graphics app can start
>> submitting work to the queue as soon as the call returns.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 53 insertions(+)
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index 4038abe8505a..6c5235d107b3 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -54,6 +54,7 @@ extern "C" {
>>   #define DRM_AMDGPU_VM                  0x13
>>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>   #define DRM_AMDGPU_SCHED               0x15
>> +#define DRM_AMDGPU_USERQ               0x16
>>
>>   #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>> @@ -71,6 +72,7 @@ extern "C" {
>>   #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>   #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>
>>   /**
>>    * DOC: memory domains
>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>          union drm_amdgpu_ctx_out out;
>>   };
>>
>> +/* user queue IOCTL */
>> +#define AMDGPU_USERQ_OP_CREATE 1
>> +#define AMDGPU_USERQ_OP_FREE   2
>> +
>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>> +
>> +struct drm_amdgpu_userq_mqd {
>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>> +       __u32   flags;
>> +       /** IP type: AMDGPU_HW_IP_* */
>> +       __u32   ip_type;
>> +       /** GEM object handle */
>> +       __u32   doorbell_handle;
>> +       /** Doorbell offset in dwords */
>> +       __u32   doorbell_offset;
> Since doorbells are 64 bit, maybe this offset should be in qwords.

Can you please help to cross check this information ? All the existing 
kernel doorbell calculations are keeping doorbells size as sizeof(u32)

>
>> +       /** GPU virtual address of the queue */
>> +       __u64   queue_va;
>> +       /** Size of the queue in bytes */
>> +       __u64   queue_size;
>> +       /** GPU virtual address of the rptr */
>> +       __u64   rptr_va;
>> +       /** GPU virtual address of the wptr */
>> +       __u64   wptr_va;
>> +};
>> +
>> +struct drm_amdgpu_userq_in {
>> +       /** AMDGPU_USERQ_OP_* */
>> +       __u32   op;
>> +       /** Flags */
>> +       __u32   flags;
>> +       /** Queue handle to associate the queue free call with,
>> +        * unused for queue create calls */
>> +       __u32   queue_id;
>> +       __u32   pad;
>> +       /** Queue descriptor */
>> +       struct drm_amdgpu_userq_mqd mqd;
>> +};
>> +
>> +struct drm_amdgpu_userq_out {
>> +       /** Queue handle */
>> +       __u32   q_id;
> Maybe this should be queue_id to match the input.

Agree.

- Shashank

> Alex
>
>> +       /** Flags */
>> +       __u32   flags;
>> +};
>> +
>> +union drm_amdgpu_userq {
>> +       struct drm_amdgpu_userq_in in;
>> +       struct drm_amdgpu_userq_out out;
>> +};
>> +
>>   /* vm ioctl */
>>   #define AMDGPU_VM_OP_RESERVE_VMID      1
>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8] AMDGPU usermode queues
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (7 preceding siblings ...)
  2023-02-03 21:54 ` [PATCH 8/8] drm/amdgpu: DO-NOT-MERGE doorbell hack Shashank Sharma
@ 2023-02-06  0:52 ` Dave Airlie
  2023-02-06  8:57   ` Christian König
  2023-02-06 15:39 ` Michel Dänzer
  9 siblings, 1 reply; 50+ messages in thread
From: Dave Airlie @ 2023-02-06  0:52 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, amd-gfx

On Sat, 4 Feb 2023 at 07:54, Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This patch series introduces AMDGPU usermode graphics queues.
> User queues is a method of GPU workload submission into the graphics
> hardware without any interaction with kernel/DRM schedulers. In this
> method, a userspace graphics application can create its own workqueue
> and submit it directly in the GPU HW.
>
> The general idea of how this is supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
> - Kernel picks any 32-bit offset in the doorbell page for this queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the the GPU addresses of these objects (read
>   ptr, write ptr, queue base address and doorbell address)
> - The kernel creates the queue and maps it in the HW.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - Once the data is filled in the queue, the app must write the number of
>   dwords in the doorbell offset, and the GPU will start fetching the data.

So I just have one question about forward progress here, let's call it
the 51% of VRAM problem.

You have two apps they both have working sets that allocate > 51% of VRAM.

Application (a) has the VRAM and mapping for the user queues and is
submitting work
Application (b) wants to submit work, it has no queue mapping as it
was previously evicted, does (b) have to call an ioctl to get it's
mapping back?
When (b) calls the ioctl (a) loses it mapping. Control returns to (b),
but before it submits any work on the ring mapping it has, (a) gets
control and notices it has no queues, so calls the ioctl, and (b)
loses it mapping, and around and around they go never making forward
progress.

What's the exit strategy for something like that, fall back to kernel
submit so you can get a memory objects validated and submit some work?

Dave.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8] AMDGPU usermode queues
  2023-02-06  0:52 ` [PATCH 0/8] AMDGPU usermode queues Dave Airlie
@ 2023-02-06  8:57   ` Christian König
  0 siblings, 0 replies; 50+ messages in thread
From: Christian König @ 2023-02-06  8:57 UTC (permalink / raw)
  To: Dave Airlie, Shashank Sharma
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, amd-gfx

Am 06.02.23 um 01:52 schrieb Dave Airlie:
> On Sat, 4 Feb 2023 at 07:54, Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch series introduces AMDGPU usermode graphics queues.
>> User queues is a method of GPU workload submission into the graphics
>> hardware without any interaction with kernel/DRM schedulers. In this
>> method, a userspace graphics application can create its own workqueue
>> and submit it directly in the GPU HW.
>>
>> The general idea of how this is supposed to work:
>> - The application creates the following GPU objetcs:
>>    - A queue object to hold the workload packets.
>>    - A read pointer object.
>>    - A write pointer object.
>>    - A doorbell page.
>> - Kernel picks any 32-bit offset in the doorbell page for this queue.
>> - The application uses the usermode_queue_create IOCTL introduced in
>>    this patch, by passing the the GPU addresses of these objects (read
>>    ptr, write ptr, queue base address and doorbell address)
>> - The kernel creates the queue and maps it in the HW.
>> - The application can start submitting the data in the queue as soon as
>>    the kernel IOCTL returns.
>> - Once the data is filled in the queue, the app must write the number of
>>    dwords in the doorbell offset, and the GPU will start fetching the data.
> So I just have one question about forward progress here, let's call it
> the 51% of VRAM problem.
>
> You have two apps they both have working sets that allocate > 51% of VRAM.

Marek and I have been working on this quite extensively.

> Application (a) has the VRAM and mapping for the user queues and is
> submitting work
> Application (b) wants to submit work, it has no queue mapping as it
> was previously evicted, does (b) have to call an ioctl to get it's
> mapping back?

Long story short: No, but that's a bit more complicated to explain.

> When (b) calls the ioctl (a) loses it mapping. Control returns to (b),
> but before it submits any work on the ring mapping it has, (a) gets
> control and notices it has no queues, so calls the ioctl, and (b)
> loses it mapping, and around and around they go never making forward
> progress.
>
> What's the exit strategy for something like that, fall back to kernel
> submit so you can get a memory objects validated and submit some work?

First of all the fw makes sure that processes can only be evicted after 
they used up their time slice. So when you have two processes fighting 
over a shared resource (can be memory, locks or whatever) they will 
always get until the end of their time slice before they are pushed away 
from the hw.

Then when a process is evicted we take a look at what the process has 
already scheduled as work on the hw. If the process isn't idle we start 
a delayed work item to get it going again (similar to what the KFD is 
doing at the moment). When the process is idle we unmap the doorbell 
page(s) from the CPU and wait for the page fault which signals that the 
process wants to submit something again.

And the last component is a static resource management which distributes 
the available resources equally between the different active processes 
fighting over them. Activity of a process is determined by the periodic 
interrupts send by the hw for running processes.

I call the memory management algorithm based on this Robin Hood 
(https://drive.google.com/file/d/1vIrX37c3B2IgWFtZ2UpeKxh0-YMlV6NU/view) 
and simulated that a bit in some spread sheets, but it isn't fully 
implemented yet. I'm working on this for a couple of years now and 
slowly pushing DRM/TTM into the direction we need for this to work.

Christian.

>
> Dave.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8] AMDGPU usermode queues
  2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
                   ` (8 preceding siblings ...)
  2023-02-06  0:52 ` [PATCH 0/8] AMDGPU usermode queues Dave Airlie
@ 2023-02-06 15:39 ` Michel Dänzer
  2023-02-06 16:11   ` Alex Deucher
  9 siblings, 1 reply; 50+ messages in thread
From: Michel Dänzer @ 2023-02-06 15:39 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: alexander.deucher, Shashank Sharma, christian.koenig

On 2/3/23 22:54, Shashank Sharma wrote:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
> 
> This patch series introduces AMDGPU usermode graphics queues.
> User queues is a method of GPU workload submission into the graphics
> hardware without any interaction with kernel/DRM schedulers. In this
> method, a userspace graphics application can create its own workqueue
> and submit it directly in the GPU HW.
> 
> The general idea of how this is supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
> - Kernel picks any 32-bit offset in the doorbell page for this queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the the GPU addresses of these objects (read
>   ptr, write ptr, queue base address and doorbell address)
> - The kernel creates the queue and maps it in the HW.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - Once the data is filled in the queue, the app must write the number of
>   dwords in the doorbell offset, and the GPU will start fetching the data.
> 
> libDRM changes for this series and a sample DRM test program can be found
> in the MESA merge request here:
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

I hope everyone's clear these libdrm_amdgpu changes won't be sufficient uAPI validation to allow the kernel bits to be merged upstream.

This will require an implementation in the Mesa radeonsi / RADV driver, ideally with working implicit synchronization for BOs shared via dma-buf.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/8] AMDGPU usermode queues
  2023-02-06 15:39 ` Michel Dänzer
@ 2023-02-06 16:11   ` Alex Deucher
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Deucher @ 2023-02-06 16:11 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, amd-gfx,
	Shashank Sharma

On Mon, Feb 6, 2023 at 10:39 AM Michel Dänzer
<michel.daenzer@mailbox.org> wrote:
>
> On 2/3/23 22:54, Shashank Sharma wrote:
> > From: Shashank Sharma <contactshashanksharma@gmail.com>
> >
> > This patch series introduces AMDGPU usermode graphics queues.
> > User queues is a method of GPU workload submission into the graphics
> > hardware without any interaction with kernel/DRM schedulers. In this
> > method, a userspace graphics application can create its own workqueue
> > and submit it directly in the GPU HW.
> >
> > The general idea of how this is supposed to work:
> > - The application creates the following GPU objetcs:
> >   - A queue object to hold the workload packets.
> >   - A read pointer object.
> >   - A write pointer object.
> >   - A doorbell page.
> > - Kernel picks any 32-bit offset in the doorbell page for this queue.
> > - The application uses the usermode_queue_create IOCTL introduced in
> >   this patch, by passing the the GPU addresses of these objects (read
> >   ptr, write ptr, queue base address and doorbell address)
> > - The kernel creates the queue and maps it in the HW.
> > - The application can start submitting the data in the queue as soon as
> >   the kernel IOCTL returns.
> > - Once the data is filled in the queue, the app must write the number of
> >   dwords in the doorbell offset, and the GPU will start fetching the data.
> >
> > libDRM changes for this series and a sample DRM test program can be found
> > in the MESA merge request here:
> > https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>
> I hope everyone's clear these libdrm_amdgpu changes won't be sufficient uAPI validation to allow the kernel bits to be merged upstream.

Right, this is just what we have been using to bring up the feature so far.

Alex

>
> This will require an implementation in the Mesa radeonsi / RADV driver, ideally with working implicit synchronization for BOs shared via dma-buf.
>
>
> --
> Earthling Michel Dänzer            |                  https://redhat.com
> Libre software enthusiast          |         Mesa and Xwayland developer
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-03 22:26     ` Shashank Sharma
@ 2023-02-06 16:56       ` Alex Deucher
  2023-02-06 17:01         ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-06 16:56 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: alexander.deucher, christian.koenig, amd-gfx

On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> Hey Alex,
>
> On 03/02/2023 23:07, Alex Deucher wrote:
> > On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >> From: Alex Deucher <alexander.deucher@amd.com>
> >>
> >> This patch intorduces new UAPI/IOCTL for usermode graphics
> >> queue. The userspace app will fill this structure and request
> >> the graphics driver to add a graphics work queue for it. The
> >> output of this UAPI is a queue id.
> >>
> >> This UAPI maps the queue into GPU, so the graphics app can start
> >> submitting work to the queue as soon as the call returns.
> >>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: Christian Koenig <christian.koenig@amd.com>
> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> ---
> >>   include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
> >>   1 file changed, 53 insertions(+)
> >>
> >> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> >> index 4038abe8505a..6c5235d107b3 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -54,6 +54,7 @@ extern "C" {
> >>   #define DRM_AMDGPU_VM                  0x13
> >>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>   #define DRM_AMDGPU_SCHED               0x15
> >> +#define DRM_AMDGPU_USERQ               0x16
> >>
> >>   #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
> >>   #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >> @@ -71,6 +72,7 @@ extern "C" {
> >>   #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
> >>   #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>
> >>   /**
> >>    * DOC: memory domains
> >> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
> >>          union drm_amdgpu_ctx_out out;
> >>   };
> >>
> >> +/* user queue IOCTL */
> >> +#define AMDGPU_USERQ_OP_CREATE 1
> >> +#define AMDGPU_USERQ_OP_FREE   2
> >> +
> >> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >> +
> >> +struct drm_amdgpu_userq_mqd {
> >> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> >> +       __u32   flags;
> >> +       /** IP type: AMDGPU_HW_IP_* */
> >> +       __u32   ip_type;
> >> +       /** GEM object handle */
> >> +       __u32   doorbell_handle;
> >> +       /** Doorbell offset in dwords */
> >> +       __u32   doorbell_offset;
> > Since doorbells are 64 bit, maybe this offset should be in qwords.
>
> Can you please help to cross check this information ? All the existing
> kernel doorbell calculations are keeping doorbells size as sizeof(u32)

Doorbells on pre-vega hardware are 32 bits so that is where that comes
from, but from vega onward most doorbells are 64 bit.  I think some
versions of VCN may still use 32 bit doorbells.  Internally in the
kernel driver we just use two slots for newer hardware, but for the
UAPI, I think we can just stick with 64 bit slots to avoid confusion.
Even if an engine only uses a 32 bit one, I don't know that there is
much value to trying to support variable doorbell sizes.

Alex

>
> >
> >> +       /** GPU virtual address of the queue */
> >> +       __u64   queue_va;
> >> +       /** Size of the queue in bytes */
> >> +       __u64   queue_size;
> >> +       /** GPU virtual address of the rptr */
> >> +       __u64   rptr_va;
> >> +       /** GPU virtual address of the wptr */
> >> +       __u64   wptr_va;
> >> +};
> >> +
> >> +struct drm_amdgpu_userq_in {
> >> +       /** AMDGPU_USERQ_OP_* */
> >> +       __u32   op;
> >> +       /** Flags */
> >> +       __u32   flags;
> >> +       /** Queue handle to associate the queue free call with,
> >> +        * unused for queue create calls */
> >> +       __u32   queue_id;
> >> +       __u32   pad;
> >> +       /** Queue descriptor */
> >> +       struct drm_amdgpu_userq_mqd mqd;
> >> +};
> >> +
> >> +struct drm_amdgpu_userq_out {
> >> +       /** Queue handle */
> >> +       __u32   q_id;
> > Maybe this should be queue_id to match the input.
>
> Agree.
>
> - Shashank
>
> > Alex
> >
> >> +       /** Flags */
> >> +       __u32   flags;
> >> +};
> >> +
> >> +union drm_amdgpu_userq {
> >> +       struct drm_amdgpu_userq_in in;
> >> +       struct drm_amdgpu_userq_out out;
> >> +};
> >> +
> >>   /* vm ioctl */
> >>   #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >> --
> >> 2.34.1
> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-06 16:56       ` Alex Deucher
@ 2023-02-06 17:01         ` Christian König
  2023-02-06 21:03           ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-06 17:01 UTC (permalink / raw)
  To: Alex Deucher, Shashank Sharma; +Cc: alexander.deucher, amd-gfx

Am 06.02.23 um 17:56 schrieb Alex Deucher:
> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> Hey Alex,
>>
>> On 03/02/2023 23:07, Alex Deucher wrote:
>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>
>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>> queue. The userspace app will fill this structure and request
>>>> the graphics driver to add a graphics work queue for it. The
>>>> output of this UAPI is a queue id.
>>>>
>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>> submitting work to the queue as soon as the call returns.
>>>>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>>    include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 53 insertions(+)
>>>>
>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>>>> index 4038abe8505a..6c5235d107b3 100644
>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>    #define DRM_AMDGPU_VM                  0x13
>>>>    #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>    #define DRM_AMDGPU_SCHED               0x15
>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>
>>>>    #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>>>    #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>    #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>    #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>>>    #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>
>>>>    /**
>>>>     * DOC: memory domains
>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>>>           union drm_amdgpu_ctx_out out;
>>>>    };
>>>>
>>>> +/* user queue IOCTL */
>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>> +
>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>> +
>>>> +struct drm_amdgpu_userq_mqd {
>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>> +       __u32   flags;
>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>> +       __u32   ip_type;
>>>> +       /** GEM object handle */
>>>> +       __u32   doorbell_handle;
>>>> +       /** Doorbell offset in dwords */
>>>> +       __u32   doorbell_offset;
>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
>> Can you please help to cross check this information ? All the existing
>> kernel doorbell calculations are keeping doorbells size as sizeof(u32)
> Doorbells on pre-vega hardware are 32 bits so that is where that comes
> from, but from vega onward most doorbells are 64 bit.  I think some
> versions of VCN may still use 32 bit doorbells.  Internally in the
> kernel driver we just use two slots for newer hardware, but for the
> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
> Even if an engine only uses a 32 bit one, I don't know that there is
> much value to trying to support variable doorbell sizes.

I think we can stick with using __u32 because this is *not* the size of 
the doorbell entries.

Instead this is the offset into the BO where to find the doorbell for 
this queue (which then in turn is 64bits wide).

Since we will probably never have more than 4GiB doorbells we should be 
pretty save to use 32bits here.

Christian.

>
> Alex
>
>>>> +       /** GPU virtual address of the queue */
>>>> +       __u64   queue_va;
>>>> +       /** Size of the queue in bytes */
>>>> +       __u64   queue_size;
>>>> +       /** GPU virtual address of the rptr */
>>>> +       __u64   rptr_va;
>>>> +       /** GPU virtual address of the wptr */
>>>> +       __u64   wptr_va;
>>>> +};
>>>> +
>>>> +struct drm_amdgpu_userq_in {
>>>> +       /** AMDGPU_USERQ_OP_* */
>>>> +       __u32   op;
>>>> +       /** Flags */
>>>> +       __u32   flags;
>>>> +       /** Queue handle to associate the queue free call with,
>>>> +        * unused for queue create calls */
>>>> +       __u32   queue_id;
>>>> +       __u32   pad;
>>>> +       /** Queue descriptor */
>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>> +};
>>>> +
>>>> +struct drm_amdgpu_userq_out {
>>>> +       /** Queue handle */
>>>> +       __u32   q_id;
>>> Maybe this should be queue_id to match the input.
>> Agree.
>>
>> - Shashank
>>
>>> Alex
>>>
>>>> +       /** Flags */
>>>> +       __u32   flags;
>>>> +};
>>>> +
>>>> +union drm_amdgpu_userq {
>>>> +       struct drm_amdgpu_userq_in in;
>>>> +       struct drm_amdgpu_userq_out out;
>>>> +};
>>>> +
>>>>    /* vm ioctl */
>>>>    #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>    #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>> --
>>>> 2.34.1
>>>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-06 17:01         ` Christian König
@ 2023-02-06 21:03           ` Alex Deucher
  2023-02-07  7:03             ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-06 21:03 UTC (permalink / raw)
  To: Christian König; +Cc: alexander.deucher, amd-gfx, Shashank Sharma

On Mon, Feb 6, 2023 at 12:01 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 06.02.23 um 17:56 schrieb Alex Deucher:
> > On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >> Hey Alex,
> >>
> >> On 03/02/2023 23:07, Alex Deucher wrote:
> >>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>> From: Alex Deucher <alexander.deucher@amd.com>
> >>>>
> >>>> This patch intorduces new UAPI/IOCTL for usermode graphics
> >>>> queue. The userspace app will fill this structure and request
> >>>> the graphics driver to add a graphics work queue for it. The
> >>>> output of this UAPI is a queue id.
> >>>>
> >>>> This UAPI maps the queue into GPU, so the graphics app can start
> >>>> submitting work to the queue as soon as the call returns.
> >>>>
> >>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>> ---
> >>>>    include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
> >>>>    1 file changed, 53 insertions(+)
> >>>>
> >>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> >>>> index 4038abe8505a..6c5235d107b3 100644
> >>>> --- a/include/uapi/drm/amdgpu_drm.h
> >>>> +++ b/include/uapi/drm/amdgpu_drm.h
> >>>> @@ -54,6 +54,7 @@ extern "C" {
> >>>>    #define DRM_AMDGPU_VM                  0x13
> >>>>    #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>>>    #define DRM_AMDGPU_SCHED               0x15
> >>>> +#define DRM_AMDGPU_USERQ               0x16
> >>>>
> >>>>    #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
> >>>>    #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >>>> @@ -71,6 +72,7 @@ extern "C" {
> >>>>    #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>>>    #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
> >>>>    #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >>>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>>>
> >>>>    /**
> >>>>     * DOC: memory domains
> >>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
> >>>>           union drm_amdgpu_ctx_out out;
> >>>>    };
> >>>>
> >>>> +/* user queue IOCTL */
> >>>> +#define AMDGPU_USERQ_OP_CREATE 1
> >>>> +#define AMDGPU_USERQ_OP_FREE   2
> >>>> +
> >>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >>>> +
> >>>> +struct drm_amdgpu_userq_mqd {
> >>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> >>>> +       __u32   flags;
> >>>> +       /** IP type: AMDGPU_HW_IP_* */
> >>>> +       __u32   ip_type;
> >>>> +       /** GEM object handle */
> >>>> +       __u32   doorbell_handle;
> >>>> +       /** Doorbell offset in dwords */
> >>>> +       __u32   doorbell_offset;
> >>> Since doorbells are 64 bit, maybe this offset should be in qwords.
> >> Can you please help to cross check this information ? All the existing
> >> kernel doorbell calculations are keeping doorbells size as sizeof(u32)
> > Doorbells on pre-vega hardware are 32 bits so that is where that comes
> > from, but from vega onward most doorbells are 64 bit.  I think some
> > versions of VCN may still use 32 bit doorbells.  Internally in the
> > kernel driver we just use two slots for newer hardware, but for the
> > UAPI, I think we can just stick with 64 bit slots to avoid confusion.
> > Even if an engine only uses a 32 bit one, I don't know that there is
> > much value to trying to support variable doorbell sizes.
>
> I think we can stick with using __u32 because this is *not* the size of
> the doorbell entries.
>
> Instead this is the offset into the BO where to find the doorbell for
> this queue (which then in turn is 64bits wide).
>
> Since we will probably never have more than 4GiB doorbells we should be
> pretty save to use 32bits here.

Yes, the offset would still be 32 bits, but the units would be qwords.  E.g.,

+       /** Doorbell offset in qwords */
+       __u32   doorbell_offset;

That way you couldn't accidently specify an overlapping doorbell.

Alex

>
> Christian.
>
> >
> > Alex
> >
> >>>> +       /** GPU virtual address of the queue */
> >>>> +       __u64   queue_va;
> >>>> +       /** Size of the queue in bytes */
> >>>> +       __u64   queue_size;
> >>>> +       /** GPU virtual address of the rptr */
> >>>> +       __u64   rptr_va;
> >>>> +       /** GPU virtual address of the wptr */
> >>>> +       __u64   wptr_va;
> >>>> +};
> >>>> +
> >>>> +struct drm_amdgpu_userq_in {
> >>>> +       /** AMDGPU_USERQ_OP_* */
> >>>> +       __u32   op;
> >>>> +       /** Flags */
> >>>> +       __u32   flags;
> >>>> +       /** Queue handle to associate the queue free call with,
> >>>> +        * unused for queue create calls */
> >>>> +       __u32   queue_id;
> >>>> +       __u32   pad;
> >>>> +       /** Queue descriptor */
> >>>> +       struct drm_amdgpu_userq_mqd mqd;
> >>>> +};
> >>>> +
> >>>> +struct drm_amdgpu_userq_out {
> >>>> +       /** Queue handle */
> >>>> +       __u32   q_id;
> >>> Maybe this should be queue_id to match the input.
> >> Agree.
> >>
> >> - Shashank
> >>
> >>> Alex
> >>>
> >>>> +       /** Flags */
> >>>> +       __u32   flags;
> >>>> +};
> >>>> +
> >>>> +union drm_amdgpu_userq {
> >>>> +       struct drm_amdgpu_userq_in in;
> >>>> +       struct drm_amdgpu_userq_out out;
> >>>> +};
> >>>> +
> >>>>    /* vm ioctl */
> >>>>    #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>>>    #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >>>> --
> >>>> 2.34.1
> >>>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-06 21:03           ` Alex Deucher
@ 2023-02-07  7:03             ` Christian König
  2023-02-07  7:38               ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07  7:03 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: alexander.deucher, amd-gfx, Shashank Sharma

Am 06.02.23 um 22:03 schrieb Alex Deucher:
> On Mon, Feb 6, 2023 at 12:01 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> Hey Alex,
>>>>
>>>> On 03/02/2023 23:07, Alex Deucher wrote:
>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>>>
>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>>>> queue. The userspace app will fill this structure and request
>>>>>> the graphics driver to add a graphics work queue for it. The
>>>>>> output of this UAPI is a queue id.
>>>>>>
>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>>>> submitting work to the queue as soon as the call returns.
>>>>>>
>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> ---
>>>>>>     include/uapi/drm/amdgpu_drm.h | 53 +++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 53 insertions(+)
>>>>>>
>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>>>>>> index 4038abe8505a..6c5235d107b3 100644
>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>>>     #define DRM_AMDGPU_VM                  0x13
>>>>>>     #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>>>     #define DRM_AMDGPU_SCHED               0x15
>>>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>>>
>>>>>>     #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>>>>>     #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>>>     #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>>>     #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>>>>>     #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>>>
>>>>>>     /**
>>>>>>      * DOC: memory domains
>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>>>>>            union drm_amdgpu_ctx_out out;
>>>>>>     };
>>>>>>
>>>>>> +/* user queue IOCTL */
>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>>>> +
>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>>>> +
>>>>>> +struct drm_amdgpu_userq_mqd {
>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>>>> +       __u32   flags;
>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>>>> +       __u32   ip_type;
>>>>>> +       /** GEM object handle */
>>>>>> +       __u32   doorbell_handle;
>>>>>> +       /** Doorbell offset in dwords */
>>>>>> +       __u32   doorbell_offset;
>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
>>>> Can you please help to cross check this information ? All the existing
>>>> kernel doorbell calculations are keeping doorbells size as sizeof(u32)
>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
>>> from, but from vega onward most doorbells are 64 bit.  I think some
>>> versions of VCN may still use 32 bit doorbells.  Internally in the
>>> kernel driver we just use two slots for newer hardware, but for the
>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
>>> Even if an engine only uses a 32 bit one, I don't know that there is
>>> much value to trying to support variable doorbell sizes.
>> I think we can stick with using __u32 because this is *not* the size of
>> the doorbell entries.
>>
>> Instead this is the offset into the BO where to find the doorbell for
>> this queue (which then in turn is 64bits wide).
>>
>> Since we will probably never have more than 4GiB doorbells we should be
>> pretty save to use 32bits here.
> Yes, the offset would still be 32 bits, but the units would be qwords.  E.g.,
>
> +       /** Doorbell offset in qwords */
> +       __u32   doorbell_offset;
>
> That way you couldn't accidently specify an overlapping doorbell.

Ah, so you only wanted to fix the comment. That was absolutely not clear 
from the discussion.

Christian.

>
> Alex
>
>> Christian.
>>
>>> Alex
>>>
>>>>>> +       /** GPU virtual address of the queue */
>>>>>> +       __u64   queue_va;
>>>>>> +       /** Size of the queue in bytes */
>>>>>> +       __u64   queue_size;
>>>>>> +       /** GPU virtual address of the rptr */
>>>>>> +       __u64   rptr_va;
>>>>>> +       /** GPU virtual address of the wptr */
>>>>>> +       __u64   wptr_va;
>>>>>> +};
>>>>>> +
>>>>>> +struct drm_amdgpu_userq_in {
>>>>>> +       /** AMDGPU_USERQ_OP_* */
>>>>>> +       __u32   op;
>>>>>> +       /** Flags */
>>>>>> +       __u32   flags;
>>>>>> +       /** Queue handle to associate the queue free call with,
>>>>>> +        * unused for queue create calls */
>>>>>> +       __u32   queue_id;
>>>>>> +       __u32   pad;
>>>>>> +       /** Queue descriptor */
>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>>>> +};
>>>>>> +
>>>>>> +struct drm_amdgpu_userq_out {
>>>>>> +       /** Queue handle */
>>>>>> +       __u32   q_id;
>>>>> Maybe this should be queue_id to match the input.
>>>> Agree.
>>>>
>>>> - Shashank
>>>>
>>>>> Alex
>>>>>
>>>>>> +       /** Flags */
>>>>>> +       __u32   flags;
>>>>>> +};
>>>>>> +
>>>>>> +union drm_amdgpu_userq {
>>>>>> +       struct drm_amdgpu_userq_in in;
>>>>>> +       struct drm_amdgpu_userq_out out;
>>>>>> +};
>>>>>> +
>>>>>>     /* vm ioctl */
>>>>>>     #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>>>     #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>>>> --
>>>>>> 2.34.1
>>>>>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] drm/amdgpu: add usermode queues
  2023-02-03 21:54 ` [PATCH 2/8] drm/amdgpu: add usermode queues Shashank Sharma
@ 2023-02-07  7:08   ` Christian König
  2023-02-07  7:40     ` Shashank Sharma
  2023-02-07 14:54   ` Alex Deucher
  1 sibling, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07  7:08 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: alexander.deucher, Shashank Sharma

Am 03.02.23 um 22:54 schrieb Shashank Sharma:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This patch adds skeleton code for usermode queue creation. It
> typically contains:
> - A new structure to keep all the user queue data in one place.
> - An IOCTL function to create/free a usermode queue.
> - A function to generate unique index for the queue.
> - A queue context manager in driver private data.
>
> V1: Worked on design review comments from RFC patch series:
> (https://patchwork.freedesktop.org/series/112214/)
>
> - Alex: Keep a list of queues, instead of single queue per process.
> - Christian: Use the queue manager instead of global ptrs,
>             Don't keep the queue structure in amdgpu_ctx
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   5 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 155 ++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  64 ++++++++
>   6 files changed, 230 insertions(+)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 798d0e9a60b7..764801cc8203 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -210,6 +210,8 @@ amdgpu-y += \
>   # add amdkfd interfaces
>   amdgpu-y += amdgpu_amdkfd.o
>   
> +# add usermode queue
> +amdgpu-y += amdgpu_userqueue.o
>   
>   ifneq ($(CONFIG_HSA_AMD),)
>   AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 6b74df446694..0625d6bdadf4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -109,6 +109,7 @@
>   #include "amdgpu_fdinfo.h"
>   #include "amdgpu_mca.h"
>   #include "amdgpu_ras.h"
> +#include "amdgpu_userqueue.h"
>   
>   #define MAX_GPU_INSTANCE		16
>   
> @@ -482,6 +483,7 @@ struct amdgpu_fpriv {
>   	struct mutex		bo_list_lock;
>   	struct idr		bo_list_handles;
>   	struct amdgpu_ctx_mgr	ctx_mgr;
> +	struct amdgpu_userq_mgr	userq_mgr;
>   };
>   
>   int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b4f2d61ea0d5..229976a2d0e7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -52,6 +52,7 @@
>   #include "amdgpu_ras.h"
>   #include "amdgpu_xgmi.h"
>   #include "amdgpu_reset.h"
> +#include "amdgpu_userqueue.h"
>   
>   /*
>    * KMS wrapper.
> @@ -2748,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   };
>   
>   static const struct drm_driver amdgpu_kms_driver = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 7aa7e52ca784..52e61e339a88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1187,6 +1187,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>   
>   	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>   
> +	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
> +	if (r)
> +		DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
> +
>   	file_priv->driver_priv = fpriv;
>   	goto out_suspend;
>   
> @@ -1254,6 +1258,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>   
>   	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>   	amdgpu_vm_fini(adev, &fpriv->vm);
> +	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>   
>   	if (pasid)
>   		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> new file mode 100644
> index 000000000000..d5bc7fe81750
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -0,0 +1,155 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu.h"
> +#include "amdgpu_vm.h"
> +
> +static inline int
> +amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
> +}
> +
> +static inline void
> +amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
> +{
> +    idr_remove(&uq_mgr->userq_idr, queue_id);
> +}
> +
> +static struct amdgpu_usermode_queue
> +*amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)

Please put the * on the previous line, it took me a moment to realize 
that you not return the queue by value here.

> +{
> +    return idr_find(&uq_mgr->userq_idr, qid);
> +}
> +
> +static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> +{
> +    int r, pasid;
> +    struct amdgpu_usermode_queue *queue;
> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +    struct amdgpu_vm *vm = &fpriv->vm;
> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;

We usually put variables like i and r last. The PCI maintainers even 
require that you sort the variables in reverse xmas tree.

> +
> +    pasid = vm->pasid;
> +    if (vm->pasid < 0) {
> +        DRM_WARN("No PASID info found\n");
> +        pasid = 0;
> +    }
> +
> +    mutex_lock(&uq_mgr->userq_mutex);
> +
> +    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
> +    if (!queue) {
> +        DRM_ERROR("Failed to allocate memory for queue\n");
> +        mutex_unlock(&uq_mgr->userq_mutex);
> +        return -ENOMEM;
> +    }
> +
> +    queue->vm = vm;
> +    queue->pasid = pasid;
> +    queue->wptr_gpu_addr = mqd_in->wptr_va;
> +    queue->rptr_gpu_addr = mqd_in->rptr_va;
> +    queue->queue_size = mqd_in->queue_size;
> +    queue->queue_type = mqd_in->ip_type;
> +    queue->queue_gpu_addr = mqd_in->queue_va;
> +    queue->flags = mqd_in->flags;
> +    queue->use_doorbell = true;
> +    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
> +    if (queue->queue_id < 0) {
> +        DRM_ERROR("Failed to allocate a queue id\n");
> +        r = queue->queue_id;
> +        goto free_queue;
> +    }
> +
> +    list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
> +    args->out.q_id = queue->queue_id;
> +    args->out.flags = 0;
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    return 0;
> +
> +free_queue:
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    kfree(queue);
> +    return r;
> +}
> +
> +static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
> +{
> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +    struct amdgpu_usermode_queue *queue;
> +
> +    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
> +    if (!queue) {
> +        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
> +        return;
> +    }
> +
> +    mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> +    list_del(&queue->userq_node);
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    kfree(queue);
> +}
> +
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> +		       struct drm_file *filp)
> +{
> +    union drm_amdgpu_userq *args = data;
> +    int r = 0;
> +
> +    switch (args->in.op) {
> +    case AMDGPU_USERQ_OP_CREATE:
> +        r = amdgpu_userqueue_create(filp, args);
> +        if (r)
> +            DRM_ERROR("Failed to create usermode queue\n");
> +        break;
> +
> +    case AMDGPU_USERQ_OP_FREE:
> +        amdgpu_userqueue_destroy(filp, args->in.queue_id);
> +        break;
> +
> +    default:
> +        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
> +        return -EINVAL;
> +    }
> +
> +    return r;
> +}
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> +{
> +    mutex_init(&userq_mgr->userq_mutex);
> +    idr_init_base(&userq_mgr->userq_idr, 1);
> +    INIT_LIST_HEAD(&userq_mgr->userq_list);

Why do you need an extra list when you already have the idr?

Apart from those nit picks looks good to me.

Christian.

> +    userq_mgr->adev = adev;
> +
> +    return 0;
> +}
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
> +{
> +    idr_destroy(&userq_mgr->userq_idr);
> +    mutex_destroy(&userq_mgr->userq_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> new file mode 100644
> index 000000000000..9557588fe34f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -0,0 +1,64 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef AMDGPU_USERQUEUE_H_
> +#define AMDGPU_USERQUEUE_H_
> +
> +#define AMDGPU_MAX_USERQ 512
> +
> +struct amdgpu_userq_mgr {
> +	struct idr userq_idr;
> +	struct mutex userq_mutex;
> +	struct list_head userq_list;
> +	struct amdgpu_device *adev;
> +};
> +
> +struct amdgpu_usermode_queue {
> +	int		queue_id;
> +	int		queue_type;
> +	int		queue_size;
> +	int		pasid;
> +	int		doorbell_index;
> +	int 		use_doorbell;
> +
> +	uint64_t	wptr_gpu_addr;
> +	uint64_t	rptr_gpu_addr;
> +	uint64_t	queue_gpu_addr;
> +	uint64_t	flags;
> +
> +	uint64_t	mqd_gpu_addr;
> +	void 		*mqd_cpu_ptr;
> +
> +	struct amdgpu_bo	*mqd_obj;
> +	struct amdgpu_vm    	*vm;
> +	struct amdgpu_userq_mgr *userq_mgr;
> +	struct list_head 	userq_node;
> +};
> +
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> +
> +#endif


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers
  2023-02-03 21:54 ` [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers Shashank Sharma
@ 2023-02-07  7:11   ` Christian König
  2023-02-07  7:41     ` Shashank Sharma
  2023-02-07 14:59   ` Alex Deucher
  1 sibling, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07  7:11 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: alexander.deucher, Shashank Sharma

Am 03.02.23 um 22:54 schrieb Shashank Sharma:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
> context. As the method of formation of a MQD, and its format can vary between
> different graphics IPs, we need gfx GEN specific handlers to create MQDs.
>
> This patch:
> - Introduces MQD hander functions for the usermode queues
> - A general function to create and destroy MQD for a userqueue.
>
> V1: Worked on review comments from Alex on RFC patches:
>      MQD creation should be gen and IP specific.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 64 +++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  9 +++
>   2 files changed, 73 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index d5bc7fe81750..625c2fe1e84a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -42,6 +42,60 @@ static struct amdgpu_usermode_queue
>       return idr_find(&uq_mgr->userq_idr, qid);
>   }
>   
> +static int
> +amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +    int size;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +
> +    if (!uq_mgr->userq_mqd_funcs) {
> +        DRM_ERROR("Userqueue not initialized\n");
> +        return -EINVAL;
> +    }
> +
> +    size = uq_mgr->userq_mqd_funcs->mqd_size(uq_mgr);
> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &queue->mqd_obj,
> +                                &queue->mqd_gpu_addr,
> +                                &queue->mqd_cpu_ptr);

We can't use amdgpu_bo_create_kernel() here, this pins the BO.

Instead all BOs of the process must be fenced with some eviction fence.

Christian.

> +    if (r) {
> +        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
> +        return r;
> +    }
> +
> +    memset(queue->mqd_cpu_ptr, 0, size);
> +    r = amdgpu_bo_reserve(queue->mqd_obj, false);
> +    if (unlikely(r != 0)) {
> +        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
> +        goto free_mqd;
> +    }
> +
> +    r = uq_mgr->userq_mqd_funcs->mqd_create(uq_mgr, queue);
> +    amdgpu_bo_unreserve(queue->mqd_obj);
> +    if (r) {
> +        DRM_ERROR("Failed to create MQD for queue\n");
> +        goto free_mqd;
> +    }
> +    return 0;
> +
> +free_mqd:
> +    amdgpu_bo_free_kernel(&queue->mqd_obj,
> +			   &queue->mqd_gpu_addr,
> +			   &queue->mqd_cpu_ptr);
> +   return r;
> +}
> +
> +static void
> +amdgpu_userqueue_destroy_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    uq_mgr->userq_mqd_funcs->mqd_destroy(uq_mgr, queue);
> +    amdgpu_bo_free_kernel(&queue->mqd_obj,
> +			   &queue->mqd_gpu_addr,
> +			   &queue->mqd_cpu_ptr);
> +}
> +
>   static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>   {
>       int r, pasid;
> @@ -82,12 +136,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>           goto free_queue;
>       }
>   
> +    r = amdgpu_userqueue_create_mqd(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create MQD\n");
> +        goto free_qid;
> +    }
> +
>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>       args->out.q_id = queue->queue_id;
>       args->out.flags = 0;
>       mutex_unlock(&uq_mgr->userq_mutex);
>       return 0;
>   
> +free_qid:
> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> +
>   free_queue:
>       mutex_unlock(&uq_mgr->userq_mutex);
>       kfree(queue);
> @@ -107,6 +170,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>       }
>   
>       mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>       list_del(&queue->userq_node);
>       mutex_unlock(&uq_mgr->userq_mutex);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 9557588fe34f..a6abdfd5cb74 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -26,10 +26,13 @@
>   
>   #define AMDGPU_MAX_USERQ 512
>   
> +struct amdgpu_userq_mqd_funcs;
> +
>   struct amdgpu_userq_mgr {
>   	struct idr userq_idr;
>   	struct mutex userq_mutex;
>   	struct list_head userq_list;
> +	const struct amdgpu_userq_mqd_funcs *userq_mqd_funcs;
>   	struct amdgpu_device *adev;
>   };
>   
> @@ -57,6 +60,12 @@ struct amdgpu_usermode_queue {
>   
>   int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
>   
> +struct amdgpu_userq_mqd_funcs {
> +	int (*mqd_size)(struct amdgpu_userq_mgr *);
> +	int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +};
> +
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>   
>   void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-03 21:54 ` [PATCH 5/8] drm/amdgpu: Create context for usermode queue Shashank Sharma
@ 2023-02-07  7:14   ` Christian König
  2023-02-07  7:51     ` Shashank Sharma
  2023-02-07 16:51   ` Alex Deucher
  1 sibling, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07  7:14 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: alexander.deucher

Am 03.02.23 um 22:54 schrieb Shashank Sharma:
> The FW expects us to allocate atleast one page as context space to
> process gang, process, shadow, GDS and FW_space related work. This
> patch creates some object for the same, and adds an IP specific
> functions to do this.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 ++++++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
>   3 files changed, 171 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 9f3490a91776..18281b3a51f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
>       return idr_find(&uq_mgr->userq_idr, qid);
>   }
>   
> +static void
> +amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                   struct amdgpu_usermode_queue *queue)
> +{
> +    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
> +}
> +
> +static int
> +amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                  struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +
> +    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create context space for queue\n");
> +        return r;
> +    }
> +
> +    return 0;
> +}
> +
>   static int
>   amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>   {
> @@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>           goto free_qid;
>       }
>   
> +    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create context space\n");
> +        goto free_mqd;
> +    }
> +
>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>       args->out.q_id = queue->queue_id;
>       args->out.flags = 0;
>       mutex_unlock(&uq_mgr->userq_mutex);
>       return 0;
>   
> +free_mqd:
> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
> +
>   free_qid:
>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>   
> @@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>       }
>   
>       mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>       list_del(&queue->userq_node);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> index 57889729d635..687f90a587e3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> @@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>   
>   }
>   
> +static int amdgpu_userq_gfx_v11_ctx_create(struct amdgpu_userq_mgr *uq_mgr,
> +                                           struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
> +
> +    /*
> +     * The FW expects atleast one page space allocated for
> +     * process context related work, and one for gang context.
> +     */
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &pctx->obj,
> +                                &pctx->gpu_addr,
> +                                &pctx->cpu_ptr);

Again, don't use amdgpu_bo_create_kernel() for any of this.

> +    if (r) {
> +        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", r);
> +        return r;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &gctx->obj,
> +                                &gctx->gpu_addr,
> +                                &gctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", r);
> +        goto err_gangctx;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &gdsctx->obj,
> +                                &gdsctx->gpu_addr,
> +                                &gdsctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
> +        goto err_gdsctx;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &fwctx->obj,
> +                                &fwctx->gpu_addr,
> +                                &fwctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
> +        goto err_fwctx;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &sctx->obj,
> +                                &sctx->gpu_addr,
> +                                &sctx->cpu_ptr);

Why the heck should we allocate so many different BOs for that? Can't we 
put all of this into one?

Christian.

> +    if (r) {
> +        DRM_ERROR("Failed to allocate shadow bo for userqueue (%d)", r);
> +        goto err_sctx;
> +    }
> +
> +    return 0;
> +
> +err_sctx:
> +    amdgpu_bo_free_kernel(&fwctx->obj,
> +                          &fwctx->gpu_addr,
> +                          &fwctx->cpu_ptr);
> +
> +err_fwctx:
> +    amdgpu_bo_free_kernel(&gdsctx->obj,
> +                          &gdsctx->gpu_addr,
> +                          &gdsctx->cpu_ptr);
> +
> +err_gdsctx:
> +    amdgpu_bo_free_kernel(&gctx->obj,
> +                          &gctx->gpu_addr,
> +                          &gctx->cpu_ptr);
> +
> +err_gangctx:
> +    amdgpu_bo_free_kernel(&pctx->obj,
> +                          &pctx->gpu_addr,
> +                          &pctx->cpu_ptr);
> +    return r;
> +}
> +
> +static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
> +                                            struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
> +
> +    amdgpu_bo_free_kernel(&sctx->obj,
> +                          &sctx->gpu_addr,
> +                          &sctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&fwctx->obj,
> +                          &fwctx->gpu_addr,
> +                          &fwctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&gdsctx->obj,
> +                          &gdsctx->gpu_addr,
> +                          &gdsctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&gctx->obj,
> +                          &gctx->gpu_addr,
> +                          &gctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&pctx->obj,
> +                          &pctx->gpu_addr,
> +                          &pctx->cpu_ptr);
> +}
> +
>   static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>   {
>       return sizeof(struct v11_gfx_mqd);
> @@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>       .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>       .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>       .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> +    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
> +    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
>   };
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index a6abdfd5cb74..3adcd31618f7 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -25,9 +25,19 @@
>   #define AMDGPU_USERQUEUE_H_
>   
>   #define AMDGPU_MAX_USERQ 512
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>   
>   struct amdgpu_userq_mqd_funcs;
>   
> +struct amdgpu_userq_ctx {
> +	struct amdgpu_bo *obj;
> +	uint64_t gpu_addr;
> +	void	*cpu_ptr;
> +};
> +
>   struct amdgpu_userq_mgr {
>   	struct idr userq_idr;
>   	struct mutex userq_mutex;
> @@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
>   	uint64_t	mqd_gpu_addr;
>   	void 		*mqd_cpu_ptr;
>   
> +	struct amdgpu_userq_ctx	proc_ctx;
> +	struct amdgpu_userq_ctx	gang_ctx;
> +	struct amdgpu_userq_ctx	gds_ctx;
> +	struct amdgpu_userq_ctx	fw_ctx;
> +	struct amdgpu_userq_ctx	shadow_ctx;
> +
>   	struct amdgpu_bo	*mqd_obj;
>   	struct amdgpu_vm    	*vm;
>   	struct amdgpu_userq_mgr *userq_mgr;
> @@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
>   	int (*mqd_size)(struct amdgpu_userq_mgr *);
>   	int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>   	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +	int (*ctx_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +	void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>   };
>   
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] drm/amdgpu: Map userqueue into HW
  2023-02-03 21:54 ` [PATCH 6/8] drm/amdgpu: Map userqueue into HW Shashank Sharma
@ 2023-02-07  7:20   ` Christian König
  2023-02-07  7:55     ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07  7:20 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: alexander.deucher, Shashank Sharma



Am 03.02.23 um 22:54 schrieb Shashank Sharma:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This patch adds new fptrs to prepare the usermode queue to be
> mapped or unmapped into the HW. After this mapping, the queue
> will be ready to accept the workload.
>
> V1: Addressed review comments from Alex on the RFC patch series
>      - Map/Unmap should be IP specific.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 57 +++++++++++++++++++
>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 47 +++++++++++++++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  2 +
>   3 files changed, 106 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 18281b3a51f1..cbfe2608c040 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -42,6 +42,53 @@ static struct amdgpu_usermode_queue
>       return idr_find(&uq_mgr->userq_idr, qid);
>   }
>   
> +static void
> +amdgpu_userqueue_unmap(struct amdgpu_userq_mgr *uq_mgr,
> +                     struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct mes_remove_queue_input remove_request;
> +
> +    uq_mgr->userq_mqd_funcs->prepare_unmap(uq_mgr, queue, (void *)&remove_request);
> +
> +    amdgpu_mes_lock(&adev->mes);
> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &remove_request);
> +    amdgpu_mes_unlock(&adev->mes);
> +    if (r) {
> +        DRM_ERROR("Failed to unmap usermode queue %d\n", queue->queue_id);
> +        return;
> +    }
> +
> +    DRM_DEBUG_DRIVER("Usermode queue %d unmapped\n", queue->queue_id);
> +}
> +
> +static int
> +amdgpu_userqueue_map(struct amdgpu_userq_mgr *uq_mgr,
> +                     struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct mes_add_queue_input add_request;
> +
> +    r = uq_mgr->userq_mqd_funcs->prepare_map(uq_mgr, queue, (void *)&add_request);
> +    if (r) {
> +        DRM_ERROR("Failed to map userqueue\n");
> +        return r;
> +    }
> +
> +    amdgpu_mes_lock(&adev->mes);
> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &add_request);
> +    amdgpu_mes_unlock(&adev->mes);
> +    if (r) {
> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
> +        return r;
> +    }
> +
> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
> +    return 0;
> +}
> +
>   static void
>   amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>                                      struct amdgpu_usermode_queue *queue)
> @@ -170,12 +217,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>           goto free_mqd;
>       }
>   
> +    r = amdgpu_userqueue_map(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to map userqueue\n");
> +        goto free_ctx;
> +    }
> +
>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>       args->out.q_id = queue->queue_id;
>       args->out.flags = 0;
>       mutex_unlock(&uq_mgr->userq_mutex);
>       return 0;
>   
> +free_ctx:
> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
> +
>   free_mqd:
>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>   
> @@ -201,6 +257,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>       }
>   
>       mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_unmap(uq_mgr, queue);
>       amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> index 687f90a587e3..d317bb600fd9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> @@ -24,6 +24,7 @@
>   #include "amdgpu_userqueue.h"
>   #include "v11_structs.h"
>   #include "amdgpu_mes.h"
> +#include "mes_api_def.h"
>   #include "gc/gc_11_0_0_offset.h"
>   #include "gc/gc_11_0_0_sh_mask.h"
>   
> @@ -239,6 +240,50 @@ static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
>                             &pctx->cpu_ptr);
>   }
>   
> +static int
> +amdgpu_userq_gfx_v11_prepare_map(struct amdgpu_userq_mgr *uq_mgr,
> +                                 struct amdgpu_usermode_queue *queue,
> +                                 void *q_input)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct mes_add_queue_input *queue_input = q_input;
> +
> +    memset(queue_input, 0x0, sizeof(struct mes_add_queue_input));
> +
> +    queue_input->process_va_start = 0;
> +    queue_input->process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
> +    queue_input->process_quantum = 100000; /* 10ms */
> +    queue_input->gang_quantum = 10000; /* 1ms */
> +    queue_input->paging = false;
> +
> +    queue_input->gang_context_addr = queue->gang_ctx.gpu_addr;
> +    queue_input->process_context_addr = queue->proc_ctx.gpu_addr;
> +    queue_input->inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +    queue_input->gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +
> +    queue_input->process_id = queue->pasid;
> +    queue_input->queue_type = queue->queue_type;
> +    queue_input->mqd_addr = queue->mqd_gpu_addr;

> +    queue_input->wptr_addr = queue->wptr_gpu_addr;
> +    queue_input->wptr_mc_addr = queue->wptr_gpu_addr << AMDGPU_GPU_PAGE_SHIFT;

Well that here doesn't make sense at all.

Christian.

> +    queue_input->queue_size = queue->queue_size >> 2;
> +    queue_input->doorbell_offset = queue->doorbell_index;
> +    queue_input->page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +    return 0;
> +}
> +
> +static void
> +amdgpu_userq_gfx_v11_prepare_unmap(struct amdgpu_userq_mgr *uq_mgr,
> +                                   struct amdgpu_usermode_queue *queue,
> +                                   void *q_input)
> +{
> +    struct mes_remove_queue_input *queue_input = q_input;
> +
> +    memset(queue_input, 0x0, sizeof(struct mes_remove_queue_input));
> +    queue_input->doorbell_offset = queue->doorbell_index;
> +    queue_input->gang_context_addr = queue->gang_ctx.gpu_addr;
> +}
> +
>   static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>   {
>       return sizeof(struct v11_gfx_mqd);
> @@ -250,4 +295,6 @@ const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>       .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>       .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
>       .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
> +    .prepare_map = amdgpu_userq_gfx_v11_prepare_map,
> +    .prepare_unmap = amdgpu_userq_gfx_v11_prepare_unmap,
>   };
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 3adcd31618f7..202fac237501 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -82,6 +82,8 @@ struct amdgpu_userq_mqd_funcs {
>   	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>   	int (*ctx_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>   	void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +	int (*prepare_map)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *, void *);
> +	void (*prepare_unmap)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *, void *);
>   };
>   
>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07  7:03             ` Christian König
@ 2023-02-07  7:38               ` Shashank Sharma
  2023-02-07 14:07                 ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07  7:38 UTC (permalink / raw)
  To: Christian König, Alex Deucher, Christian König
  Cc: alexander.deucher, amd-gfx


On 07/02/2023 08:03, Christian König wrote:
> Am 06.02.23 um 22:03 schrieb Alex Deucher:
>> On Mon, Feb 6, 2023 at 12:01 PM Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
>>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma 
>>>> <shashank.sharma@amd.com> wrote:
>>>>> Hey Alex,
>>>>>
>>>>> On 03/02/2023 23:07, Alex Deucher wrote:
>>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma 
>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>
>>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>>>>> queue. The userspace app will fill this structure and request
>>>>>>> the graphics driver to add a graphics work queue for it. The
>>>>>>> output of this UAPI is a queue id.
>>>>>>>
>>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>>>>> submitting work to the queue as soon as the call returns.
>>>>>>>
>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>> ---
>>>>>>>     include/uapi/drm/amdgpu_drm.h | 53 
>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 53 insertions(+)
>>>>>>>
>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h 
>>>>>>> b/include/uapi/drm/amdgpu_drm.h
>>>>>>> index 4038abe8505a..6c5235d107b3 100644
>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>>>>     #define DRM_AMDGPU_VM                  0x13
>>>>>>>     #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>>>>     #define DRM_AMDGPU_SCHED               0x15
>>>>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>>>>
>>>>>>>     #define DRM_IOCTL_AMDGPU_GEM_CREATE 
>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union 
>>>>>>> drm_amdgpu_gem_create)
>>>>>>>     #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE 
>>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>>>>     #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE + 
>>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>>>>     #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE 
>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union 
>>>>>>> drm_amdgpu_fence_to_handle)
>>>>>>>     #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
>>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE + 
>>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>>>>
>>>>>>>     /**
>>>>>>>      * DOC: memory domains
>>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>>>>>>            union drm_amdgpu_ctx_out out;
>>>>>>>     };
>>>>>>>
>>>>>>> +/* user queue IOCTL */
>>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>>>>> +
>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>>>>> +
>>>>>>> +struct drm_amdgpu_userq_mqd {
>>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>>>>> +       __u32   flags;
>>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>>>>> +       __u32   ip_type;
>>>>>>> +       /** GEM object handle */
>>>>>>> +       __u32   doorbell_handle;
>>>>>>> +       /** Doorbell offset in dwords */
>>>>>>> +       __u32   doorbell_offset;
>>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
>>>>> Can you please help to cross check this information ? All the 
>>>>> existing
>>>>> kernel doorbell calculations are keeping doorbells size as 
>>>>> sizeof(u32)
>>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
>>>> from, but from vega onward most doorbells are 64 bit.  I think some
>>>> versions of VCN may still use 32 bit doorbells.  Internally in the
>>>> kernel driver we just use two slots for newer hardware, but for the
>>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
>>>> Even if an engine only uses a 32 bit one, I don't know that there is
>>>> much value to trying to support variable doorbell sizes.
>>> I think we can stick with using __u32 because this is *not* the size of
>>> the doorbell entries.
>>>
>>> Instead this is the offset into the BO where to find the doorbell for
>>> this queue (which then in turn is 64bits wide).
>>>
>>> Since we will probably never have more than 4GiB doorbells we should be
>>> pretty save to use 32bits here.
>> Yes, the offset would still be 32 bits, but the units would be 
>> qwords.  E.g.,
>>
>> +       /** Doorbell offset in qwords */
>> +       __u32   doorbell_offset;
>>
>> That way you couldn't accidently specify an overlapping doorbell.
>
> Ah, so you only wanted to fix the comment. That was absolutely not 
> clear from the discussion.

If I understand this correctly, the offset of the doorbell in the BO is 
still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting 
that right ?

- Shashank

>
> Christian.
>
>>
>> Alex
>>
>>> Christian.
>>>
>>>> Alex
>>>>
>>>>>>> +       /** GPU virtual address of the queue */
>>>>>>> +       __u64   queue_va;
>>>>>>> +       /** Size of the queue in bytes */
>>>>>>> +       __u64   queue_size;
>>>>>>> +       /** GPU virtual address of the rptr */
>>>>>>> +       __u64   rptr_va;
>>>>>>> +       /** GPU virtual address of the wptr */
>>>>>>> +       __u64   wptr_va;
>>>>>>> +};
>>>>>>> +
>>>>>>> +struct drm_amdgpu_userq_in {
>>>>>>> +       /** AMDGPU_USERQ_OP_* */
>>>>>>> +       __u32   op;
>>>>>>> +       /** Flags */
>>>>>>> +       __u32   flags;
>>>>>>> +       /** Queue handle to associate the queue free call with,
>>>>>>> +        * unused for queue create calls */
>>>>>>> +       __u32   queue_id;
>>>>>>> +       __u32   pad;
>>>>>>> +       /** Queue descriptor */
>>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>>>>> +};
>>>>>>> +
>>>>>>> +struct drm_amdgpu_userq_out {
>>>>>>> +       /** Queue handle */
>>>>>>> +       __u32   q_id;
>>>>>> Maybe this should be queue_id to match the input.
>>>>> Agree.
>>>>>
>>>>> - Shashank
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>> +       /** Flags */
>>>>>>> +       __u32   flags;
>>>>>>> +};
>>>>>>> +
>>>>>>> +union drm_amdgpu_userq {
>>>>>>> +       struct drm_amdgpu_userq_in in;
>>>>>>> +       struct drm_amdgpu_userq_out out;
>>>>>>> +};
>>>>>>> +
>>>>>>>     /* vm ioctl */
>>>>>>>     #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>>>>     #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>>>>> -- 
>>>>>>> 2.34.1
>>>>>>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] drm/amdgpu: add usermode queues
  2023-02-07  7:08   ` Christian König
@ 2023-02-07  7:40     ` Shashank Sharma
  0 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07  7:40 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher, Shashank Sharma


On 07/02/2023 08:08, Christian König wrote:
> Am 03.02.23 um 22:54 schrieb Shashank Sharma:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch adds skeleton code for usermode queue creation. It
>> typically contains:
>> - A new structure to keep all the user queue data in one place.
>> - An IOCTL function to create/free a usermode queue.
>> - A function to generate unique index for the queue.
>> - A queue context manager in driver private data.
>>
>> V1: Worked on design review comments from RFC patch series:
>> (https://patchwork.freedesktop.org/series/112214/)
>>
>> - Alex: Keep a list of queues, instead of single queue per process.
>> - Christian: Use the queue manager instead of global ptrs,
>>             Don't keep the queue structure in amdgpu_ctx
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   5 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 155 ++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  64 ++++++++
>>   6 files changed, 230 insertions(+)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 798d0e9a60b7..764801cc8203 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -210,6 +210,8 @@ amdgpu-y += \
>>   # add amdkfd interfaces
>>   amdgpu-y += amdgpu_amdkfd.o
>>   +# add usermode queue
>> +amdgpu-y += amdgpu_userqueue.o
>>     ifneq ($(CONFIG_HSA_AMD),)
>>   AMDKFD_PATH := ../amdkfd
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 6b74df446694..0625d6bdadf4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -109,6 +109,7 @@
>>   #include "amdgpu_fdinfo.h"
>>   #include "amdgpu_mca.h"
>>   #include "amdgpu_ras.h"
>> +#include "amdgpu_userqueue.h"
>>     #define MAX_GPU_INSTANCE        16
>>   @@ -482,6 +483,7 @@ struct amdgpu_fpriv {
>>       struct mutex        bo_list_lock;
>>       struct idr        bo_list_handles;
>>       struct amdgpu_ctx_mgr    ctx_mgr;
>> +    struct amdgpu_userq_mgr    userq_mgr;
>>   };
>>     int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv 
>> **fpriv);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index b4f2d61ea0d5..229976a2d0e7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -52,6 +52,7 @@
>>   #include "amdgpu_ras.h"
>>   #include "amdgpu_xgmi.h"
>>   #include "amdgpu_reset.h"
>> +#include "amdgpu_userqueue.h"
>>     /*
>>    * KMS wrapper.
>> @@ -2748,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] 
>> = {
>>       DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>>       DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>>       DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>>   };
>>     static const struct drm_driver amdgpu_kms_driver = {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 7aa7e52ca784..52e61e339a88 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -1187,6 +1187,10 @@ int amdgpu_driver_open_kms(struct drm_device 
>> *dev, struct drm_file *file_priv)
>>         amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>>   +    r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
>> +    if (r)
>> +        DRM_WARN("Can't setup usermode queues, only legacy workload 
>> submission will work\n");
>> +
>>       file_priv->driver_priv = fpriv;
>>       goto out_suspend;
>>   @@ -1254,6 +1258,7 @@ void amdgpu_driver_postclose_kms(struct 
>> drm_device *dev,
>>         amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>>       amdgpu_vm_fini(adev, &fpriv->vm);
>> +    amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>>         if (pasid)
>>           amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> new file mode 100644
>> index 000000000000..d5bc7fe81750
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -0,0 +1,155 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include "amdgpu.h"
>> +#include "amdgpu_vm.h"
>> +
>> +static inline int
>> +amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct 
>> amdgpu_usermode_queue *queue)
>> +{
>> +    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, 
>> GFP_KERNEL);
>> +}
>> +
>> +static inline void
>> +amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int 
>> queue_id)
>> +{
>> +    idr_remove(&uq_mgr->userq_idr, queue_id);
>> +}
>> +
>> +static struct amdgpu_usermode_queue
>> +*amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>
> Please put the * on the previous line, it took me a moment to realize 
> that you not return the queue by value here.
>
Noted,
>> +{
>> +    return idr_find(&uq_mgr->userq_idr, qid);
>> +}
>> +
>> +static int amdgpu_userqueue_create(struct drm_file *filp, union 
>> drm_amdgpu_userq *args)
>> +{
>> +    int r, pasid;
>> +    struct amdgpu_usermode_queue *queue;
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>> +    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
>
> We usually put variables like i and r last. The PCI maintainers even 
> require that you sort the variables in reverse xmas tree.
>
Noted,
>> +
>> +    pasid = vm->pasid;
>> +    if (vm->pasid < 0) {
>> +        DRM_WARN("No PASID info found\n");
>> +        pasid = 0;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +
>> +    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
>> +    if (!queue) {
>> +        DRM_ERROR("Failed to allocate memory for queue\n");
>> +        mutex_unlock(&uq_mgr->userq_mutex);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    queue->vm = vm;
>> +    queue->pasid = pasid;
>> +    queue->wptr_gpu_addr = mqd_in->wptr_va;
>> +    queue->rptr_gpu_addr = mqd_in->rptr_va;
>> +    queue->queue_size = mqd_in->queue_size;
>> +    queue->queue_type = mqd_in->ip_type;
>> +    queue->queue_gpu_addr = mqd_in->queue_va;
>> +    queue->flags = mqd_in->flags;
>> +    queue->use_doorbell = true;
>> +    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
>> +    if (queue->queue_id < 0) {
>> +        DRM_ERROR("Failed to allocate a queue id\n");
>> +        r = queue->queue_id;
>> +        goto free_queue;
>> +    }
>> +
>> +    list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>> +    args->out.q_id = queue->queue_id;
>> +    args->out.flags = 0;
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    return 0;
>> +
>> +free_queue:
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    kfree(queue);
>> +    return r;
>> +}
>> +
>> +static void amdgpu_userqueue_destroy(struct drm_file *filp, int 
>> queue_id)
>> +{
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>> +    struct amdgpu_usermode_queue *queue;
>> +
>> +    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
>> +    if (!queue) {
>> +        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
>> +        return;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>> +    list_del(&queue->userq_node);
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    kfree(queue);
>> +}
>> +
>> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>> +               struct drm_file *filp)
>> +{
>> +    union drm_amdgpu_userq *args = data;
>> +    int r = 0;
>> +
>> +    switch (args->in.op) {
>> +    case AMDGPU_USERQ_OP_CREATE:
>> +        r = amdgpu_userqueue_create(filp, args);
>> +        if (r)
>> +            DRM_ERROR("Failed to create usermode queue\n");
>> +        break;
>> +
>> +    case AMDGPU_USERQ_OP_FREE:
>> +        amdgpu_userqueue_destroy(filp, args->in.queue_id);
>> +        break;
>> +
>> +    default:
>> +        DRM_ERROR("Invalid user queue op specified: %d\n", 
>> args->in.op);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct 
>> amdgpu_device *adev)
>> +{
>> +    mutex_init(&userq_mgr->userq_mutex);
>> +    idr_init_base(&userq_mgr->userq_idr, 1);
>> +    INIT_LIST_HEAD(&userq_mgr->userq_list);
>
> Why do you need an extra list when you already have the idr?
>
> Apart from those nit picks looks good to me.
>
The idea was to put all the userq base ptrs in a list, but you are 
right, with IDR we will never need it. I will remove it.

- Shashank

> Christian.
>
>> +    userq_mgr->adev = adev;
>> +
>> +    return 0;
>> +}
>> +
>> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>> +{
>> +    idr_destroy(&userq_mgr->userq_idr);
>> +    mutex_destroy(&userq_mgr->userq_mutex);
>> +}
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> new file mode 100644
>> index 000000000000..9557588fe34f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -0,0 +1,64 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef AMDGPU_USERQUEUE_H_
>> +#define AMDGPU_USERQUEUE_H_
>> +
>> +#define AMDGPU_MAX_USERQ 512
>> +
>> +struct amdgpu_userq_mgr {
>> +    struct idr userq_idr;
>> +    struct mutex userq_mutex;
>> +    struct list_head userq_list;
>> +    struct amdgpu_device *adev;
>> +};
>> +
>> +struct amdgpu_usermode_queue {
>> +    int        queue_id;
>> +    int        queue_type;
>> +    int        queue_size;
>> +    int        pasid;
>> +    int        doorbell_index;
>> +    int         use_doorbell;
>> +
>> +    uint64_t    wptr_gpu_addr;
>> +    uint64_t    rptr_gpu_addr;
>> +    uint64_t    queue_gpu_addr;
>> +    uint64_t    flags;
>> +
>> +    uint64_t    mqd_gpu_addr;
>> +    void         *mqd_cpu_ptr;
>> +
>> +    struct amdgpu_bo    *mqd_obj;
>> +    struct amdgpu_vm        *vm;
>> +    struct amdgpu_userq_mgr *userq_mgr;
>> +    struct list_head     userq_node;
>> +};
>> +
>> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct 
>> drm_file *filp);
>> +
>> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct 
>> amdgpu_device *adev);
>> +
>> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
>> +
>> +#endif
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers
  2023-02-07  7:11   ` Christian König
@ 2023-02-07  7:41     ` Shashank Sharma
  0 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07  7:41 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher, Shashank Sharma


On 07/02/2023 08:11, Christian König wrote:
> Am 03.02.23 um 22:54 schrieb Shashank Sharma:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> A Memory queue descriptor (MQD) of a userqueue defines it in the 
>> harware's
>> context. As the method of formation of a MQD, and its format can vary 
>> between
>> different graphics IPs, we need gfx GEN specific handlers to create 
>> MQDs.
>>
>> This patch:
>> - Introduces MQD hander functions for the usermode queues
>> - A general function to create and destroy MQD for a userqueue.
>>
>> V1: Worked on review comments from Alex on RFC patches:
>>      MQD creation should be gen and IP specific.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 64 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  9 +++
>>   2 files changed, 73 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index d5bc7fe81750..625c2fe1e84a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -42,6 +42,60 @@ static struct amdgpu_usermode_queue
>>       return idr_find(&uq_mgr->userq_idr, qid);
>>   }
>>   +static int
>> +amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct 
>> amdgpu_usermode_queue *queue)
>> +{
>> +    int r;
>> +    int size;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +
>> +    if (!uq_mgr->userq_mqd_funcs) {
>> +        DRM_ERROR("Userqueue not initialized\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    size = uq_mgr->userq_mqd_funcs->mqd_size(uq_mgr);
>> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>> +                                &queue->mqd_obj,
>> +                                &queue->mqd_gpu_addr,
>> +                                &queue->mqd_cpu_ptr);
>
> We can't use amdgpu_bo_create_kernel() here, this pins the BO.
>
> Instead all BOs of the process must be fenced with some eviction fence.


Noted,

- Shashank

>
> Christian.
>
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
>> +        return r;
>> +    }
>> +
>> +    memset(queue->mqd_cpu_ptr, 0, size);
>> +    r = amdgpu_bo_reserve(queue->mqd_obj, false);
>> +    if (unlikely(r != 0)) {
>> +        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
>> +        goto free_mqd;
>> +    }
>> +
>> +    r = uq_mgr->userq_mqd_funcs->mqd_create(uq_mgr, queue);
>> +    amdgpu_bo_unreserve(queue->mqd_obj);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create MQD for queue\n");
>> +        goto free_mqd;
>> +    }
>> +    return 0;
>> +
>> +free_mqd:
>> +    amdgpu_bo_free_kernel(&queue->mqd_obj,
>> +               &queue->mqd_gpu_addr,
>> +               &queue->mqd_cpu_ptr);
>> +   return r;
>> +}
>> +
>> +static void
>> +amdgpu_userqueue_destroy_mqd(struct amdgpu_userq_mgr *uq_mgr, struct 
>> amdgpu_usermode_queue *queue)
>> +{
>> +    uq_mgr->userq_mqd_funcs->mqd_destroy(uq_mgr, queue);
>> +    amdgpu_bo_free_kernel(&queue->mqd_obj,
>> +               &queue->mqd_gpu_addr,
>> +               &queue->mqd_cpu_ptr);
>> +}
>> +
>>   static int amdgpu_userqueue_create(struct drm_file *filp, union 
>> drm_amdgpu_userq *args)
>>   {
>>       int r, pasid;
>> @@ -82,12 +136,21 @@ static int amdgpu_userqueue_create(struct 
>> drm_file *filp, union drm_amdgpu_userq
>>           goto free_queue;
>>       }
>>   +    r = amdgpu_userqueue_create_mqd(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create MQD\n");
>> +        goto free_qid;
>> +    }
>> +
>>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>>       args->out.q_id = queue->queue_id;
>>       args->out.flags = 0;
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       return 0;
>>   +free_qid:
>> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>> +
>>   free_queue:
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       kfree(queue);
>> @@ -107,6 +170,7 @@ static void amdgpu_userqueue_destroy(struct 
>> drm_file *filp, int queue_id)
>>       }
>>         mutex_lock(&uq_mgr->userq_mutex);
>> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>       list_del(&queue->userq_node);
>>       mutex_unlock(&uq_mgr->userq_mutex);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 9557588fe34f..a6abdfd5cb74 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -26,10 +26,13 @@
>>     #define AMDGPU_MAX_USERQ 512
>>   +struct amdgpu_userq_mqd_funcs;
>> +
>>   struct amdgpu_userq_mgr {
>>       struct idr userq_idr;
>>       struct mutex userq_mutex;
>>       struct list_head userq_list;
>> +    const struct amdgpu_userq_mqd_funcs *userq_mqd_funcs;
>>       struct amdgpu_device *adev;
>>   };
>>   @@ -57,6 +60,12 @@ struct amdgpu_usermode_queue {
>>     int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct 
>> drm_file *filp);
>>   +struct amdgpu_userq_mqd_funcs {
>> +    int (*mqd_size)(struct amdgpu_userq_mgr *);
>> +    int (*mqd_create)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>> +    void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>> +};
>> +
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>> struct amdgpu_device *adev);
>>     void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-07  7:14   ` Christian König
@ 2023-02-07  7:51     ` Shashank Sharma
  2023-02-07  7:55       ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07  7:51 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher


On 07/02/2023 08:14, Christian König wrote:
> Am 03.02.23 um 22:54 schrieb Shashank Sharma:
>> The FW expects us to allocate atleast one page as context space to
>> process gang, process, shadow, GDS and FW_space related work. This
>> patch creates some object for the same, and adds an IP specific
>> functions to do this.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
>>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 ++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
>>   3 files changed, 171 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 9f3490a91776..18281b3a51f1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
>>       return idr_find(&uq_mgr->userq_idr, qid);
>>   }
>>   +static void
>> +amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                   struct amdgpu_usermode_queue *queue)
>> +{
>> +    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
>> +}
>> +
>> +static int
>> +amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                  struct amdgpu_usermode_queue *queue)
>> +{
>> +    int r;
>> +
>> +    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create context space for queue\n");
>> +        return r;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int
>>   amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct 
>> amdgpu_usermode_queue *queue)
>>   {
>> @@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct 
>> drm_file *filp, union drm_amdgpu_userq
>>           goto free_qid;
>>       }
>>   +    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create context space\n");
>> +        goto free_mqd;
>> +    }
>> +
>>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>>       args->out.q_id = queue->queue_id;
>>       args->out.flags = 0;
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       return 0;
>>   +free_mqd:
>> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>> +
>>   free_qid:
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>   @@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct 
>> drm_file *filp, int queue_id)
>>       }
>>         mutex_lock(&uq_mgr->userq_mutex);
>> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>       list_del(&queue->userq_node);
>> diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> index 57889729d635..687f90a587e3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> @@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct 
>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>     }
>>   +static int amdgpu_userq_gfx_v11_ctx_create(struct amdgpu_userq_mgr 
>> *uq_mgr,
>> +                                           struct 
>> amdgpu_usermode_queue *queue)
>> +{
>> +    int r;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
>> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
>> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
>> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
>> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
>> +
>> +    /*
>> +     * The FW expects atleast one page space allocated for
>> +     * process context related work, and one for gang context.
>> +     */
>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, 
>> PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>> +                                &pctx->obj,
>> +                                &pctx->gpu_addr,
>> +                                &pctx->cpu_ptr);
>
> Again, don't use amdgpu_bo_create_kernel() for any of this.
Noted,
>
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", r);
>> +        return r;
>> +    }
>> +
>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, 
>> PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>> +                                &gctx->obj,
>> +                                &gctx->gpu_addr,
>> +                                &gctx->cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", r);
>> +        goto err_gangctx;
>> +    }
>> +
>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, 
>> PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>> +                                &gdsctx->obj,
>> +                                &gdsctx->gpu_addr,
>> +                                &gdsctx->cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
>> +        goto err_gdsctx;
>> +    }
>> +
>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, 
>> PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>> +                                &fwctx->obj,
>> +                                &fwctx->gpu_addr,
>> +                                &fwctx->cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
>> +        goto err_fwctx;
>> +    }
>> +
>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, 
>> PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>> +                                &sctx->obj,
>> +                                &sctx->gpu_addr,
>> +                                &sctx->cpu_ptr);
>
> Why the heck should we allocate so many different BOs for that? Can't 
> we put all of this into one?
If you mean why don't we create one object of 5 * PAGE_SIZE and give 
1-page spaced offsets for all of this, yes, that would further simplify 
things.

The reason why we kept it separate is that these objects could be of 
different sizes on different IPs/platforms, so we thought defining a 
separate

size macro and object for each of these will make it easier to 
understand how many FW page objects we are creating for this GEN IP. It 
can be

either ways.

- Shashank

>
> Christian.
>
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate shadow bo for userqueue (%d)", 
>> r);
>> +        goto err_sctx;
>> +    }
>> +
>> +    return 0;
>> +
>> +err_sctx:
>> +    amdgpu_bo_free_kernel(&fwctx->obj,
>> +                          &fwctx->gpu_addr,
>> +                          &fwctx->cpu_ptr);
>> +
>> +err_fwctx:
>> +    amdgpu_bo_free_kernel(&gdsctx->obj,
>> +                          &gdsctx->gpu_addr,
>> +                          &gdsctx->cpu_ptr);
>> +
>> +err_gdsctx:
>> +    amdgpu_bo_free_kernel(&gctx->obj,
>> +                          &gctx->gpu_addr,
>> +                          &gctx->cpu_ptr);
>> +
>> +err_gangctx:
>> +    amdgpu_bo_free_kernel(&pctx->obj,
>> +                          &pctx->gpu_addr,
>> +                          &pctx->cpu_ptr);
>> +    return r;
>> +}
>> +
>> +static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr 
>> *uq_mgr,
>> +                                            struct 
>> amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
>> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
>> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
>> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
>> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
>> +
>> +    amdgpu_bo_free_kernel(&sctx->obj,
>> +                          &sctx->gpu_addr,
>> +                          &sctx->cpu_ptr);
>> +
>> +    amdgpu_bo_free_kernel(&fwctx->obj,
>> +                          &fwctx->gpu_addr,
>> +                          &fwctx->cpu_ptr);
>> +
>> +    amdgpu_bo_free_kernel(&gdsctx->obj,
>> +                          &gdsctx->gpu_addr,
>> +                          &gdsctx->cpu_ptr);
>> +
>> +    amdgpu_bo_free_kernel(&gctx->obj,
>> +                          &gctx->gpu_addr,
>> +                          &gctx->cpu_ptr);
>> +
>> +    amdgpu_bo_free_kernel(&pctx->obj,
>> +                          &pctx->gpu_addr,
>> +                          &pctx->cpu_ptr);
>> +}
>> +
>>   static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr 
>> *uq_mgr)
>>   {
>>       return sizeof(struct v11_gfx_mqd);
>> @@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs 
>> userq_gfx_v11_mqd_funcs = {
>>       .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>>       .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>>       .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>> +    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
>> +    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
>>   };
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index a6abdfd5cb74..3adcd31618f7 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -25,9 +25,19 @@
>>   #define AMDGPU_USERQUEUE_H_
>>     #define AMDGPU_MAX_USERQ 512
>> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>     struct amdgpu_userq_mqd_funcs;
>>   +struct amdgpu_userq_ctx {
>> +    struct amdgpu_bo *obj;
>> +    uint64_t gpu_addr;
>> +    void    *cpu_ptr;
>> +};
>> +
>>   struct amdgpu_userq_mgr {
>>       struct idr userq_idr;
>>       struct mutex userq_mutex;
>> @@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
>>       uint64_t    mqd_gpu_addr;
>>       void         *mqd_cpu_ptr;
>>   +    struct amdgpu_userq_ctx    proc_ctx;
>> +    struct amdgpu_userq_ctx    gang_ctx;
>> +    struct amdgpu_userq_ctx    gds_ctx;
>> +    struct amdgpu_userq_ctx    fw_ctx;
>> +    struct amdgpu_userq_ctx    shadow_ctx;
>> +
>>       struct amdgpu_bo    *mqd_obj;
>>       struct amdgpu_vm        *vm;
>>       struct amdgpu_userq_mgr *userq_mgr;
>> @@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
>>       int (*mqd_size)(struct amdgpu_userq_mgr *);
>>       int (*mqd_create)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>>       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>> +    int (*ctx_create)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>> +    void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>>   };
>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>> struct amdgpu_device *adev);
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] drm/amdgpu: Map userqueue into HW
  2023-02-07  7:20   ` Christian König
@ 2023-02-07  7:55     ` Shashank Sharma
  0 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07  7:55 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher, Shashank Sharma


On 07/02/2023 08:20, Christian König wrote:
>
>
> Am 03.02.23 um 22:54 schrieb Shashank Sharma:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch adds new fptrs to prepare the usermode queue to be
>> mapped or unmapped into the HW. After this mapping, the queue
>> will be ready to accept the workload.
>>
>> V1: Addressed review comments from Alex on the RFC patch series
>>      - Map/Unmap should be IP specific.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 57 +++++++++++++++++++
>>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 47 +++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  2 +
>>   3 files changed, 106 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 18281b3a51f1..cbfe2608c040 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -42,6 +42,53 @@ static struct amdgpu_usermode_queue
>>       return idr_find(&uq_mgr->userq_idr, qid);
>>   }
>>   +static void
>> +amdgpu_userqueue_unmap(struct amdgpu_userq_mgr *uq_mgr,
>> +                     struct amdgpu_usermode_queue *queue)
>> +{
>> +    int r;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct mes_remove_queue_input remove_request;
>> +
>> +    uq_mgr->userq_mqd_funcs->prepare_unmap(uq_mgr, queue, (void 
>> *)&remove_request);
>> +
>> +    amdgpu_mes_lock(&adev->mes);
>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &remove_request);
>> +    amdgpu_mes_unlock(&adev->mes);
>> +    if (r) {
>> +        DRM_ERROR("Failed to unmap usermode queue %d\n", 
>> queue->queue_id);
>> +        return;
>> +    }
>> +
>> +    DRM_DEBUG_DRIVER("Usermode queue %d unmapped\n", queue->queue_id);
>> +}
>> +
>> +static int
>> +amdgpu_userqueue_map(struct amdgpu_userq_mgr *uq_mgr,
>> +                     struct amdgpu_usermode_queue *queue)
>> +{
>> +    int r;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct mes_add_queue_input add_request;
>> +
>> +    r = uq_mgr->userq_mqd_funcs->prepare_map(uq_mgr, queue, (void 
>> *)&add_request);
>> +    if (r) {
>> +        DRM_ERROR("Failed to map userqueue\n");
>> +        return r;
>> +    }
>> +
>> +    amdgpu_mes_lock(&adev->mes);
>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &add_request);
>> +    amdgpu_mes_unlock(&adev->mes);
>> +    if (r) {
>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>> +        return r;
>> +    }
>> +
>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", 
>> queue->queue_id);
>> +    return 0;
>> +}
>> +
>>   static void
>>   amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>                                      struct amdgpu_usermode_queue 
>> *queue)
>> @@ -170,12 +217,21 @@ static int amdgpu_userqueue_create(struct 
>> drm_file *filp, union drm_amdgpu_userq
>>           goto free_mqd;
>>       }
>>   +    r = amdgpu_userqueue_map(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to map userqueue\n");
>> +        goto free_ctx;
>> +    }
>> +
>>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>>       args->out.q_id = queue->queue_id;
>>       args->out.flags = 0;
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       return 0;
>>   +free_ctx:
>> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>> +
>>   free_mqd:
>>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>   @@ -201,6 +257,7 @@ static void amdgpu_userqueue_destroy(struct 
>> drm_file *filp, int queue_id)
>>       }
>>         mutex_lock(&uq_mgr->userq_mutex);
>> +    amdgpu_userqueue_unmap(uq_mgr, queue);
>>       amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>> diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> index 687f90a587e3..d317bb600fd9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> @@ -24,6 +24,7 @@
>>   #include "amdgpu_userqueue.h"
>>   #include "v11_structs.h"
>>   #include "amdgpu_mes.h"
>> +#include "mes_api_def.h"
>>   #include "gc/gc_11_0_0_offset.h"
>>   #include "gc/gc_11_0_0_sh_mask.h"
>>   @@ -239,6 +240,50 @@ static void 
>> amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
>>                             &pctx->cpu_ptr);
>>   }
>>   +static int
>> +amdgpu_userq_gfx_v11_prepare_map(struct amdgpu_userq_mgr *uq_mgr,
>> +                                 struct amdgpu_usermode_queue *queue,
>> +                                 void *q_input)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct mes_add_queue_input *queue_input = q_input;
>> +
>> +    memset(queue_input, 0x0, sizeof(struct mes_add_queue_input));
>> +
>> +    queue_input->process_va_start = 0;
>> +    queue_input->process_va_end = (adev->vm_manager.max_pfn - 1) << 
>> AMDGPU_GPU_PAGE_SHIFT;
>> +    queue_input->process_quantum = 100000; /* 10ms */
>> +    queue_input->gang_quantum = 10000; /* 1ms */
>> +    queue_input->paging = false;
>> +
>> +    queue_input->gang_context_addr = queue->gang_ctx.gpu_addr;
>> +    queue_input->process_context_addr = queue->proc_ctx.gpu_addr;
>> +    queue_input->inprocess_gang_priority = 
>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>> +    queue_input->gang_global_priority_level = 
>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>> +
>> +    queue_input->process_id = queue->pasid;
>> +    queue_input->queue_type = queue->queue_type;
>> +    queue_input->mqd_addr = queue->mqd_gpu_addr;
>
>> +    queue_input->wptr_addr = queue->wptr_gpu_addr;
>> +    queue_input->wptr_mc_addr = queue->wptr_gpu_addr << 
>> AMDGPU_GPU_PAGE_SHIFT;
>
> Well that here doesn't make sense at all.

Yes, please ignore this, it was based on old interpretation of 
wptr_mc_addr. We just prepared a separate patch to map WPTR in GART,

and then calculate the wptr_mc_addr from it, so this line will be 
removed and that calculation will replace it.

- Shashank

>
> Christian.
>
>> +    queue_input->queue_size = queue->queue_size >> 2;
>> +    queue_input->doorbell_offset = queue->doorbell_index;
>> +    queue_input->page_table_base_addr = 
>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +    return 0;
>> +}
>> +
>> +static void
>> +amdgpu_userq_gfx_v11_prepare_unmap(struct amdgpu_userq_mgr *uq_mgr,
>> +                                   struct amdgpu_usermode_queue *queue,
>> +                                   void *q_input)
>> +{
>> +    struct mes_remove_queue_input *queue_input = q_input;
>> +
>> +    memset(queue_input, 0x0, sizeof(struct mes_remove_queue_input));
>> +    queue_input->doorbell_offset = queue->doorbell_index;
>> +    queue_input->gang_context_addr = queue->gang_ctx.gpu_addr;
>> +}
>> +
>>   static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr 
>> *uq_mgr)
>>   {
>>       return sizeof(struct v11_gfx_mqd);
>> @@ -250,4 +295,6 @@ const struct amdgpu_userq_mqd_funcs 
>> userq_gfx_v11_mqd_funcs = {
>>       .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>>       .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
>>       .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
>> +    .prepare_map = amdgpu_userq_gfx_v11_prepare_map,
>> +    .prepare_unmap = amdgpu_userq_gfx_v11_prepare_unmap,
>>   };
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 3adcd31618f7..202fac237501 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -82,6 +82,8 @@ struct amdgpu_userq_mqd_funcs {
>>       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>>       int (*ctx_create)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>>       void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *);
>> +    int (*prepare_map)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *, void *);
>> +    void (*prepare_unmap)(struct amdgpu_userq_mgr *, struct 
>> amdgpu_usermode_queue *, void *);
>>   };
>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>> struct amdgpu_device *adev);
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-07  7:51     ` Shashank Sharma
@ 2023-02-07  7:55       ` Christian König
  2023-02-07  8:13         ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07  7:55 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: alexander.deucher

Am 07.02.23 um 08:51 schrieb Shashank Sharma:
>
> On 07/02/2023 08:14, Christian König wrote:
>> Am 03.02.23 um 22:54 schrieb Shashank Sharma:
>>> The FW expects us to allocate atleast one page as context space to
>>> process gang, process, shadow, GDS and FW_space related work. This
>>> patch creates some object for the same, and adds an IP specific
>>> functions to do this.
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
>>>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 
>>> ++++++++++++++++++
>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
>>>   3 files changed, 171 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> index 9f3490a91776..18281b3a51f1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> @@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
>>>       return idr_find(&uq_mgr->userq_idr, qid);
>>>   }
>>>   +static void
>>> +amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>> +                                   struct amdgpu_usermode_queue 
>>> *queue)
>>> +{
>>> +    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
>>> +}
>>> +
>>> +static int
>>> +amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>> +                                  struct amdgpu_usermode_queue *queue)
>>> +{
>>> +    int r;
>>> +
>>> +    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to create context space for queue\n");
>>> +        return r;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   static int
>>>   amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, 
>>> struct amdgpu_usermode_queue *queue)
>>>   {
>>> @@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct 
>>> drm_file *filp, union drm_amdgpu_userq
>>>           goto free_qid;
>>>       }
>>>   +    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to create context space\n");
>>> +        goto free_mqd;
>>> +    }
>>> +
>>>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>>>       args->out.q_id = queue->queue_id;
>>>       args->out.flags = 0;
>>>       mutex_unlock(&uq_mgr->userq_mutex);
>>>       return 0;
>>>   +free_mqd:
>>> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>> +
>>>   free_qid:
>>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>>   @@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct 
>>> drm_file *filp, int queue_id)
>>>       }
>>>         mutex_lock(&uq_mgr->userq_mutex);
>>> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>>>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>>       list_del(&queue->userq_node);
>>> diff --git 
>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>> index 57889729d635..687f90a587e3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>> @@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct 
>>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>     }
>>>   +static int amdgpu_userq_gfx_v11_ctx_create(struct 
>>> amdgpu_userq_mgr *uq_mgr,
>>> +                                           struct 
>>> amdgpu_usermode_queue *queue)
>>> +{
>>> +    int r;
>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
>>> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
>>> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
>>> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
>>> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
>>> +
>>> +    /*
>>> +     * The FW expects atleast one page space allocated for
>>> +     * process context related work, and one for gang context.
>>> +     */
>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, 
>>> PAGE_SIZE,
>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>> +                                &pctx->obj,
>>> +                                &pctx->gpu_addr,
>>> +                                &pctx->cpu_ptr);
>>
>> Again, don't use amdgpu_bo_create_kernel() for any of this.
> Noted,
>>
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", r);
>>> +        return r;
>>> +    }
>>> +
>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, 
>>> PAGE_SIZE,
>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>> +                                &gctx->obj,
>>> +                                &gctx->gpu_addr,
>>> +                                &gctx->cpu_ptr);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", r);
>>> +        goto err_gangctx;
>>> +    }
>>> +
>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, 
>>> PAGE_SIZE,
>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>> +                                &gdsctx->obj,
>>> +                                &gdsctx->gpu_addr,
>>> +                                &gdsctx->cpu_ptr);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
>>> +        goto err_gdsctx;
>>> +    }
>>> +
>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, 
>>> PAGE_SIZE,
>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>> +                                &fwctx->obj,
>>> +                                &fwctx->gpu_addr,
>>> +                                &fwctx->cpu_ptr);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
>>> +        goto err_fwctx;
>>> +    }
>>> +
>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, 
>>> PAGE_SIZE,
>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>> +                                &sctx->obj,
>>> +                                &sctx->gpu_addr,
>>> +                                &sctx->cpu_ptr);
>>
>> Why the heck should we allocate so many different BOs for that? Can't 
>> we put all of this into one?
> If you mean why don't we create one object of 5 * PAGE_SIZE and give 
> 1-page spaced offsets for all of this, yes, that would further 
> simplify things.
>
> The reason why we kept it separate is that these objects could be of 
> different sizes on different IPs/platforms, so we thought defining a 
> separate
>
> size macro and object for each of these will make it easier to 
> understand how many FW page objects we are creating for this GEN IP. 
> It can be
>
> either ways.

But this is completely uninteresting for common code, isn't it?

I strongly think we should just create a single BO for each queue and 
put all the data (MQD, gang, GDS, FW, shadow) in it at different offsets.

This handling here is just overkill and rather error prone (BTW you used 
AMDGPU_USERQ_FW_CTX_SZ) twice.

Christian.

>
> - Shashank
>
>>
>> Christian.
>>
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to allocate shadow bo for userqueue 
>>> (%d)", r);
>>> +        goto err_sctx;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +err_sctx:
>>> +    amdgpu_bo_free_kernel(&fwctx->obj,
>>> +                          &fwctx->gpu_addr,
>>> +                          &fwctx->cpu_ptr);
>>> +
>>> +err_fwctx:
>>> +    amdgpu_bo_free_kernel(&gdsctx->obj,
>>> +                          &gdsctx->gpu_addr,
>>> +                          &gdsctx->cpu_ptr);
>>> +
>>> +err_gdsctx:
>>> +    amdgpu_bo_free_kernel(&gctx->obj,
>>> +                          &gctx->gpu_addr,
>>> +                          &gctx->cpu_ptr);
>>> +
>>> +err_gangctx:
>>> +    amdgpu_bo_free_kernel(&pctx->obj,
>>> +                          &pctx->gpu_addr,
>>> +                          &pctx->cpu_ptr);
>>> +    return r;
>>> +}
>>> +
>>> +static void amdgpu_userq_gfx_v11_ctx_destroy(struct 
>>> amdgpu_userq_mgr *uq_mgr,
>>> +                                            struct 
>>> amdgpu_usermode_queue *queue)
>>> +{
>>> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
>>> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
>>> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
>>> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
>>> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
>>> +
>>> +    amdgpu_bo_free_kernel(&sctx->obj,
>>> +                          &sctx->gpu_addr,
>>> +                          &sctx->cpu_ptr);
>>> +
>>> +    amdgpu_bo_free_kernel(&fwctx->obj,
>>> +                          &fwctx->gpu_addr,
>>> +                          &fwctx->cpu_ptr);
>>> +
>>> +    amdgpu_bo_free_kernel(&gdsctx->obj,
>>> +                          &gdsctx->gpu_addr,
>>> +                          &gdsctx->cpu_ptr);
>>> +
>>> +    amdgpu_bo_free_kernel(&gctx->obj,
>>> +                          &gctx->gpu_addr,
>>> +                          &gctx->cpu_ptr);
>>> +
>>> +    amdgpu_bo_free_kernel(&pctx->obj,
>>> +                          &pctx->gpu_addr,
>>> +                          &pctx->cpu_ptr);
>>> +}
>>> +
>>>   static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr 
>>> *uq_mgr)
>>>   {
>>>       return sizeof(struct v11_gfx_mqd);
>>> @@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs 
>>> userq_gfx_v11_mqd_funcs = {
>>>       .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>>>       .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>>>       .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>>> +    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
>>> +    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
>>>   };
>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> index a6abdfd5cb74..3adcd31618f7 100644
>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> @@ -25,9 +25,19 @@
>>>   #define AMDGPU_USERQUEUE_H_
>>>     #define AMDGPU_MAX_USERQ 512
>>> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>     struct amdgpu_userq_mqd_funcs;
>>>   +struct amdgpu_userq_ctx {
>>> +    struct amdgpu_bo *obj;
>>> +    uint64_t gpu_addr;
>>> +    void    *cpu_ptr;
>>> +};
>>> +
>>>   struct amdgpu_userq_mgr {
>>>       struct idr userq_idr;
>>>       struct mutex userq_mutex;
>>> @@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
>>>       uint64_t    mqd_gpu_addr;
>>>       void         *mqd_cpu_ptr;
>>>   +    struct amdgpu_userq_ctx    proc_ctx;
>>> +    struct amdgpu_userq_ctx    gang_ctx;
>>> +    struct amdgpu_userq_ctx    gds_ctx;
>>> +    struct amdgpu_userq_ctx    fw_ctx;
>>> +    struct amdgpu_userq_ctx    shadow_ctx;
>>> +
>>>       struct amdgpu_bo    *mqd_obj;
>>>       struct amdgpu_vm        *vm;
>>>       struct amdgpu_userq_mgr *userq_mgr;
>>> @@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
>>>       int (*mqd_size)(struct amdgpu_userq_mgr *);
>>>       int (*mqd_create)(struct amdgpu_userq_mgr *, struct 
>>> amdgpu_usermode_queue *);
>>>       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct 
>>> amdgpu_usermode_queue *);
>>> +    int (*ctx_create)(struct amdgpu_userq_mgr *, struct 
>>> amdgpu_usermode_queue *);
>>> +    void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct 
>>> amdgpu_usermode_queue *);
>>>   };
>>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>>> struct amdgpu_device *adev);
>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-07  7:55       ` Christian König
@ 2023-02-07  8:13         ` Shashank Sharma
  0 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07  8:13 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher


On 07/02/2023 08:55, Christian König wrote:
> Am 07.02.23 um 08:51 schrieb Shashank Sharma:
>>
>> On 07/02/2023 08:14, Christian König wrote:
>>> Am 03.02.23 um 22:54 schrieb Shashank Sharma:
>>>> The FW expects us to allocate atleast one page as context space to
>>>> process gang, process, shadow, GDS and FW_space related work. This
>>>> patch creates some object for the same, and adds an IP specific
>>>> functions to do this.
>>>>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
>>>>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 
>>>> ++++++++++++++++++
>>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
>>>>   3 files changed, 171 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> index 9f3490a91776..18281b3a51f1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> @@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
>>>>       return idr_find(&uq_mgr->userq_idr, qid);
>>>>   }
>>>>   +static void
>>>> +amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>> +                                   struct amdgpu_usermode_queue 
>>>> *queue)
>>>> +{
>>>> +    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
>>>> +}
>>>> +
>>>> +static int
>>>> +amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>> +                                  struct amdgpu_usermode_queue 
>>>> *queue)
>>>> +{
>>>> +    int r;
>>>> +
>>>> +    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to create context space for queue\n");
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>   static int
>>>>   amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, 
>>>> struct amdgpu_usermode_queue *queue)
>>>>   {
>>>> @@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct 
>>>> drm_file *filp, union drm_amdgpu_userq
>>>>           goto free_qid;
>>>>       }
>>>>   +    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to create context space\n");
>>>> +        goto free_mqd;
>>>> +    }
>>>> +
>>>>       list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>>>>       args->out.q_id = queue->queue_id;
>>>>       args->out.flags = 0;
>>>>       mutex_unlock(&uq_mgr->userq_mutex);
>>>>       return 0;
>>>>   +free_mqd:
>>>> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>>> +
>>>>   free_qid:
>>>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>>>   @@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct 
>>>> drm_file *filp, int queue_id)
>>>>       }
>>>>         mutex_lock(&uq_mgr->userq_mutex);
>>>> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>>>>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>>>       list_del(&queue->userq_node);
>>>> diff --git 
>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>> index 57889729d635..687f90a587e3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>> @@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct 
>>>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>>     }
>>>>   +static int amdgpu_userq_gfx_v11_ctx_create(struct 
>>>> amdgpu_userq_mgr *uq_mgr,
>>>> +                                           struct 
>>>> amdgpu_usermode_queue *queue)
>>>> +{
>>>> +    int r;
>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
>>>> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
>>>> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
>>>> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
>>>> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
>>>> +
>>>> +    /*
>>>> +     * The FW expects atleast one page space allocated for
>>>> +     * process context related work, and one for gang context.
>>>> +     */
>>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, 
>>>> PAGE_SIZE,
>>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>>> +                                &pctx->obj,
>>>> +                                &pctx->gpu_addr,
>>>> +                                &pctx->cpu_ptr);
>>>
>>> Again, don't use amdgpu_bo_create_kernel() for any of this.
>> Noted,
>>>
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", 
>>>> r);
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, 
>>>> PAGE_SIZE,
>>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>>> +                                &gctx->obj,
>>>> +                                &gctx->gpu_addr,
>>>> +                                &gctx->cpu_ptr);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", 
>>>> r);
>>>> +        goto err_gangctx;
>>>> +    }
>>>> +
>>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, 
>>>> PAGE_SIZE,
>>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>>> +                                &gdsctx->obj,
>>>> +                                &gdsctx->gpu_addr,
>>>> +                                &gdsctx->cpu_ptr);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
>>>> +        goto err_gdsctx;
>>>> +    }
>>>> +
>>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, 
>>>> PAGE_SIZE,
>>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>>> +                                &fwctx->obj,
>>>> +                                &fwctx->gpu_addr,
>>>> +                                &fwctx->cpu_ptr);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
>>>> +        goto err_fwctx;
>>>> +    }
>>>> +
>>>> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, 
>>>> PAGE_SIZE,
>>>> +                                AMDGPU_GEM_DOMAIN_VRAM,
>>>> +                                &sctx->obj,
>>>> +                                &sctx->gpu_addr,
>>>> +                                &sctx->cpu_ptr);
>>>
>>> Why the heck should we allocate so many different BOs for that? 
>>> Can't we put all of this into one?
>> If you mean why don't we create one object of 5 * PAGE_SIZE and give 
>> 1-page spaced offsets for all of this, yes, that would further 
>> simplify things.
>>
>> The reason why we kept it separate is that these objects could be of 
>> different sizes on different IPs/platforms, so we thought defining a 
>> separate
>>
>> size macro and object for each of these will make it easier to 
>> understand how many FW page objects we are creating for this GEN IP. 
>> It can be
>>
>> either ways.
>
> But this is completely uninteresting for common code, isn't it?
>
> I strongly think we should just create a single BO for each queue and 
> put all the data (MQD, gang, GDS, FW, shadow) in it at different offsets.
>
> This handling here is just overkill and rather error prone (BTW you 
> used AMDGPU_USERQ_FW_CTX_SZ) twice.
>

Agree, we will fix this.

- Shashank

> Christian.
>
>>
>> - Shashank
>>
>>>
>>> Christian.
>>>
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to allocate shadow bo for userqueue 
>>>> (%d)", r);
>>>> +        goto err_sctx;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +err_sctx:
>>>> +    amdgpu_bo_free_kernel(&fwctx->obj,
>>>> +                          &fwctx->gpu_addr,
>>>> +                          &fwctx->cpu_ptr);
>>>> +
>>>> +err_fwctx:
>>>> +    amdgpu_bo_free_kernel(&gdsctx->obj,
>>>> +                          &gdsctx->gpu_addr,
>>>> +                          &gdsctx->cpu_ptr);
>>>> +
>>>> +err_gdsctx:
>>>> +    amdgpu_bo_free_kernel(&gctx->obj,
>>>> +                          &gctx->gpu_addr,
>>>> +                          &gctx->cpu_ptr);
>>>> +
>>>> +err_gangctx:
>>>> +    amdgpu_bo_free_kernel(&pctx->obj,
>>>> +                          &pctx->gpu_addr,
>>>> +                          &pctx->cpu_ptr);
>>>> +    return r;
>>>> +}
>>>> +
>>>> +static void amdgpu_userq_gfx_v11_ctx_destroy(struct 
>>>> amdgpu_userq_mgr *uq_mgr,
>>>> +                                            struct 
>>>> amdgpu_usermode_queue *queue)
>>>> +{
>>>> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
>>>> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
>>>> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
>>>> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
>>>> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
>>>> +
>>>> +    amdgpu_bo_free_kernel(&sctx->obj,
>>>> +                          &sctx->gpu_addr,
>>>> +                          &sctx->cpu_ptr);
>>>> +
>>>> +    amdgpu_bo_free_kernel(&fwctx->obj,
>>>> +                          &fwctx->gpu_addr,
>>>> +                          &fwctx->cpu_ptr);
>>>> +
>>>> +    amdgpu_bo_free_kernel(&gdsctx->obj,
>>>> +                          &gdsctx->gpu_addr,
>>>> +                          &gdsctx->cpu_ptr);
>>>> +
>>>> +    amdgpu_bo_free_kernel(&gctx->obj,
>>>> +                          &gctx->gpu_addr,
>>>> +                          &gctx->cpu_ptr);
>>>> +
>>>> +    amdgpu_bo_free_kernel(&pctx->obj,
>>>> +                          &pctx->gpu_addr,
>>>> +                          &pctx->cpu_ptr);
>>>> +}
>>>> +
>>>>   static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr 
>>>> *uq_mgr)
>>>>   {
>>>>       return sizeof(struct v11_gfx_mqd);
>>>> @@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs 
>>>> userq_gfx_v11_mqd_funcs = {
>>>>       .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>>>>       .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>>>>       .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>>>> +    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
>>>> +    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
>>>>   };
>>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>> index a6abdfd5cb74..3adcd31618f7 100644
>>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>> @@ -25,9 +25,19 @@
>>>>   #define AMDGPU_USERQUEUE_H_
>>>>     #define AMDGPU_MAX_USERQ 512
>>>> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>>     struct amdgpu_userq_mqd_funcs;
>>>>   +struct amdgpu_userq_ctx {
>>>> +    struct amdgpu_bo *obj;
>>>> +    uint64_t gpu_addr;
>>>> +    void    *cpu_ptr;
>>>> +};
>>>> +
>>>>   struct amdgpu_userq_mgr {
>>>>       struct idr userq_idr;
>>>>       struct mutex userq_mutex;
>>>> @@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
>>>>       uint64_t    mqd_gpu_addr;
>>>>       void         *mqd_cpu_ptr;
>>>>   +    struct amdgpu_userq_ctx    proc_ctx;
>>>> +    struct amdgpu_userq_ctx    gang_ctx;
>>>> +    struct amdgpu_userq_ctx    gds_ctx;
>>>> +    struct amdgpu_userq_ctx    fw_ctx;
>>>> +    struct amdgpu_userq_ctx    shadow_ctx;
>>>> +
>>>>       struct amdgpu_bo    *mqd_obj;
>>>>       struct amdgpu_vm        *vm;
>>>>       struct amdgpu_userq_mgr *userq_mgr;
>>>> @@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
>>>>       int (*mqd_size)(struct amdgpu_userq_mgr *);
>>>>       int (*mqd_create)(struct amdgpu_userq_mgr *, struct 
>>>> amdgpu_usermode_queue *);
>>>>       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct 
>>>> amdgpu_usermode_queue *);
>>>> +    int (*ctx_create)(struct amdgpu_userq_mgr *, struct 
>>>> amdgpu_usermode_queue *);
>>>> +    void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct 
>>>> amdgpu_usermode_queue *);
>>>>   };
>>>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, 
>>>> struct amdgpu_device *adev);
>>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07  7:38               ` Shashank Sharma
@ 2023-02-07 14:07                 ` Alex Deucher
  2023-02-07 14:11                   ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 14:07 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Christian König, Christian König, amd-gfx

On Tue, Feb 7, 2023 at 2:38 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 07/02/2023 08:03, Christian König wrote:
> > Am 06.02.23 um 22:03 schrieb Alex Deucher:
> >> On Mon, Feb 6, 2023 at 12:01 PM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
> >>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma
> >>>> <shashank.sharma@amd.com> wrote:
> >>>>> Hey Alex,
> >>>>>
> >>>>> On 03/02/2023 23:07, Alex Deucher wrote:
> >>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma
> >>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>
> >>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
> >>>>>>> queue. The userspace app will fill this structure and request
> >>>>>>> the graphics driver to add a graphics work queue for it. The
> >>>>>>> output of this UAPI is a queue id.
> >>>>>>>
> >>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
> >>>>>>> submitting work to the queue as soon as the call returns.
> >>>>>>>
> >>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>>> ---
> >>>>>>>     include/uapi/drm/amdgpu_drm.h | 53
> >>>>>>> +++++++++++++++++++++++++++++++++++
> >>>>>>>     1 file changed, 53 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
> >>>>>>> b/include/uapi/drm/amdgpu_drm.h
> >>>>>>> index 4038abe8505a..6c5235d107b3 100644
> >>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
> >>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
> >>>>>>> @@ -54,6 +54,7 @@ extern "C" {
> >>>>>>>     #define DRM_AMDGPU_VM                  0x13
> >>>>>>>     #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>>>>>>     #define DRM_AMDGPU_SCHED               0x15
> >>>>>>> +#define DRM_AMDGPU_USERQ               0x16
> >>>>>>>
> >>>>>>>     #define DRM_IOCTL_AMDGPU_GEM_CREATE
> >>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union
> >>>>>>> drm_amdgpu_gem_create)
> >>>>>>>     #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE
> >>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >>>>>>> @@ -71,6 +72,7 @@ extern "C" {
> >>>>>>>     #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
> >>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>>>>>>     #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE
> >>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union
> >>>>>>> drm_amdgpu_fence_to_handle)
> >>>>>>>     #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
> >>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
> >>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>>>>>>
> >>>>>>>     /**
> >>>>>>>      * DOC: memory domains
> >>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
> >>>>>>>            union drm_amdgpu_ctx_out out;
> >>>>>>>     };
> >>>>>>>
> >>>>>>> +/* user queue IOCTL */
> >>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
> >>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
> >>>>>>> +
> >>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >>>>>>> +
> >>>>>>> +struct drm_amdgpu_userq_mqd {
> >>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> >>>>>>> +       __u32   flags;
> >>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
> >>>>>>> +       __u32   ip_type;
> >>>>>>> +       /** GEM object handle */
> >>>>>>> +       __u32   doorbell_handle;
> >>>>>>> +       /** Doorbell offset in dwords */
> >>>>>>> +       __u32   doorbell_offset;
> >>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
> >>>>> Can you please help to cross check this information ? All the
> >>>>> existing
> >>>>> kernel doorbell calculations are keeping doorbells size as
> >>>>> sizeof(u32)
> >>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
> >>>> from, but from vega onward most doorbells are 64 bit.  I think some
> >>>> versions of VCN may still use 32 bit doorbells.  Internally in the
> >>>> kernel driver we just use two slots for newer hardware, but for the
> >>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
> >>>> Even if an engine only uses a 32 bit one, I don't know that there is
> >>>> much value to trying to support variable doorbell sizes.
> >>> I think we can stick with using __u32 because this is *not* the size of
> >>> the doorbell entries.
> >>>
> >>> Instead this is the offset into the BO where to find the doorbell for
> >>> this queue (which then in turn is 64bits wide).
> >>>
> >>> Since we will probably never have more than 4GiB doorbells we should be
> >>> pretty save to use 32bits here.
> >> Yes, the offset would still be 32 bits, but the units would be
> >> qwords.  E.g.,
> >>
> >> +       /** Doorbell offset in qwords */
> >> +       __u32   doorbell_offset;
> >>
> >> That way you couldn't accidently specify an overlapping doorbell.
> >
> > Ah, so you only wanted to fix the comment. That was absolutely not
> > clear from the discussion.
>
> If I understand this correctly, the offset of the doorbell in the BO is
> still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting
> that right ?

Right.  Each doorbell is 64 bits (8 bytes) so this value would
basically be an index into the doorbell bo.  Having it be a 64 bit
index rather than a 32 bit index would avoid the possibility of users
specifying overlapping doorbells.  E.g.,
offset in bytes
0 - doorbell
4 - doorbell
Would be incorrect, while
offset in bytes
0 - doorbell
8 - doorbell
Would be correct.

I.e., u64 doorbell_page[512] vs u32 doorbell_page[1024]

Alex

>
> - Shashank
>
> >
> > Christian.
> >
> >>
> >> Alex
> >>
> >>> Christian.
> >>>
> >>>> Alex
> >>>>
> >>>>>>> +       /** GPU virtual address of the queue */
> >>>>>>> +       __u64   queue_va;
> >>>>>>> +       /** Size of the queue in bytes */
> >>>>>>> +       __u64   queue_size;
> >>>>>>> +       /** GPU virtual address of the rptr */
> >>>>>>> +       __u64   rptr_va;
> >>>>>>> +       /** GPU virtual address of the wptr */
> >>>>>>> +       __u64   wptr_va;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +struct drm_amdgpu_userq_in {
> >>>>>>> +       /** AMDGPU_USERQ_OP_* */
> >>>>>>> +       __u32   op;
> >>>>>>> +       /** Flags */
> >>>>>>> +       __u32   flags;
> >>>>>>> +       /** Queue handle to associate the queue free call with,
> >>>>>>> +        * unused for queue create calls */
> >>>>>>> +       __u32   queue_id;
> >>>>>>> +       __u32   pad;
> >>>>>>> +       /** Queue descriptor */
> >>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +struct drm_amdgpu_userq_out {
> >>>>>>> +       /** Queue handle */
> >>>>>>> +       __u32   q_id;
> >>>>>> Maybe this should be queue_id to match the input.
> >>>>> Agree.
> >>>>>
> >>>>> - Shashank
> >>>>>
> >>>>>> Alex
> >>>>>>
> >>>>>>> +       /** Flags */
> >>>>>>> +       __u32   flags;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +union drm_amdgpu_userq {
> >>>>>>> +       struct drm_amdgpu_userq_in in;
> >>>>>>> +       struct drm_amdgpu_userq_out out;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>>     /* vm ioctl */
> >>>>>>>     #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>>>>>>     #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >>>>>>> --
> >>>>>>> 2.34.1
> >>>>>>>
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07 14:07                 ` Alex Deucher
@ 2023-02-07 14:11                   ` Christian König
  2023-02-07 14:17                     ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07 14:11 UTC (permalink / raw)
  To: Alex Deucher, Shashank Sharma
  Cc: alexander.deucher, Christian König, amd-gfx

Am 07.02.23 um 15:07 schrieb Alex Deucher:
> On Tue, Feb 7, 2023 at 2:38 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 07/02/2023 08:03, Christian König wrote:
>>> Am 06.02.23 um 22:03 schrieb Alex Deucher:
>>>> On Mon, Feb 6, 2023 at 12:01 PM Christian König
>>>> <christian.koenig@amd.com> wrote:
>>>>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
>>>>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma
>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>> Hey Alex,
>>>>>>>
>>>>>>> On 03/02/2023 23:07, Alex Deucher wrote:
>>>>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma
>>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>
>>>>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>>>>>>> queue. The userspace app will fill this structure and request
>>>>>>>>> the graphics driver to add a graphics work queue for it. The
>>>>>>>>> output of this UAPI is a queue id.
>>>>>>>>>
>>>>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>>>>>>> submitting work to the queue as soon as the call returns.
>>>>>>>>>
>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>>>> ---
>>>>>>>>>      include/uapi/drm/amdgpu_drm.h | 53
>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>      1 file changed, 53 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>> b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>> index 4038abe8505a..6c5235d107b3 100644
>>>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>>>>>>      #define DRM_AMDGPU_VM                  0x13
>>>>>>>>>      #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>>>>>>      #define DRM_AMDGPU_SCHED               0x15
>>>>>>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>>>>>>
>>>>>>>>>      #define DRM_IOCTL_AMDGPU_GEM_CREATE
>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union
>>>>>>>>> drm_amdgpu_gem_create)
>>>>>>>>>      #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE
>>>>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>>>>>>      #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
>>>>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>>>>>>      #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE
>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union
>>>>>>>>> drm_amdgpu_fence_to_handle)
>>>>>>>>>      #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
>>>>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
>>>>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>>>>>>
>>>>>>>>>      /**
>>>>>>>>>       * DOC: memory domains
>>>>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>>>>>>>>             union drm_amdgpu_ctx_out out;
>>>>>>>>>      };
>>>>>>>>>
>>>>>>>>> +/* user queue IOCTL */
>>>>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>>>>>>> +
>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>>>>>>> +
>>>>>>>>> +struct drm_amdgpu_userq_mqd {
>>>>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>>>>>>> +       __u32   flags;
>>>>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>>>>>>> +       __u32   ip_type;
>>>>>>>>> +       /** GEM object handle */
>>>>>>>>> +       __u32   doorbell_handle;
>>>>>>>>> +       /** Doorbell offset in dwords */
>>>>>>>>> +       __u32   doorbell_offset;
>>>>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
>>>>>>> Can you please help to cross check this information ? All the
>>>>>>> existing
>>>>>>> kernel doorbell calculations are keeping doorbells size as
>>>>>>> sizeof(u32)
>>>>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
>>>>>> from, but from vega onward most doorbells are 64 bit.  I think some
>>>>>> versions of VCN may still use 32 bit doorbells.  Internally in the
>>>>>> kernel driver we just use two slots for newer hardware, but for the
>>>>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
>>>>>> Even if an engine only uses a 32 bit one, I don't know that there is
>>>>>> much value to trying to support variable doorbell sizes.
>>>>> I think we can stick with using __u32 because this is *not* the size of
>>>>> the doorbell entries.
>>>>>
>>>>> Instead this is the offset into the BO where to find the doorbell for
>>>>> this queue (which then in turn is 64bits wide).
>>>>>
>>>>> Since we will probably never have more than 4GiB doorbells we should be
>>>>> pretty save to use 32bits here.
>>>> Yes, the offset would still be 32 bits, but the units would be
>>>> qwords.  E.g.,
>>>>
>>>> +       /** Doorbell offset in qwords */
>>>> +       __u32   doorbell_offset;
>>>>
>>>> That way you couldn't accidently specify an overlapping doorbell.
>>> Ah, so you only wanted to fix the comment. That was absolutely not
>>> clear from the discussion.
>> If I understand this correctly, the offset of the doorbell in the BO is
>> still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting
>> that right ?
> Right.  Each doorbell is 64 bits (8 bytes) so this value would
> basically be an index into the doorbell bo.  Having it be a 64 bit
> index rather than a 32 bit index would avoid the possibility of users
> specifying overlapping doorbells.  E.g.,
> offset in bytes
> 0 - doorbell
> 4 - doorbell
> Would be incorrect, while
> offset in bytes
> 0 - doorbell
> 8 - doorbell
> Would be correct.
>
> I.e., u64 doorbell_page[512] vs u32 doorbell_page[1024]

Well I usually prefer just straight byte offsets, but I think the main 
question is what does the underlying hw/fw use?

If that's a dword index we should probably stick with that in the UAPI 
as well. If it's in qword then stick to that, if it's in bytes than use 
that.

Otherwise we will just confuse people when we convert between the 
different API levels.

Christian.

>
> Alex
>
>> - Shashank
>>
>>> Christian.
>>>
>>>> Alex
>>>>
>>>>> Christian.
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>>>> +       /** GPU virtual address of the queue */
>>>>>>>>> +       __u64   queue_va;
>>>>>>>>> +       /** Size of the queue in bytes */
>>>>>>>>> +       __u64   queue_size;
>>>>>>>>> +       /** GPU virtual address of the rptr */
>>>>>>>>> +       __u64   rptr_va;
>>>>>>>>> +       /** GPU virtual address of the wptr */
>>>>>>>>> +       __u64   wptr_va;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct drm_amdgpu_userq_in {
>>>>>>>>> +       /** AMDGPU_USERQ_OP_* */
>>>>>>>>> +       __u32   op;
>>>>>>>>> +       /** Flags */
>>>>>>>>> +       __u32   flags;
>>>>>>>>> +       /** Queue handle to associate the queue free call with,
>>>>>>>>> +        * unused for queue create calls */
>>>>>>>>> +       __u32   queue_id;
>>>>>>>>> +       __u32   pad;
>>>>>>>>> +       /** Queue descriptor */
>>>>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct drm_amdgpu_userq_out {
>>>>>>>>> +       /** Queue handle */
>>>>>>>>> +       __u32   q_id;
>>>>>>>> Maybe this should be queue_id to match the input.
>>>>>>> Agree.
>>>>>>>
>>>>>>> - Shashank
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> +       /** Flags */
>>>>>>>>> +       __u32   flags;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +union drm_amdgpu_userq {
>>>>>>>>> +       struct drm_amdgpu_userq_in in;
>>>>>>>>> +       struct drm_amdgpu_userq_out out;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>      /* vm ioctl */
>>>>>>>>>      #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>>>>>>      #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>>>>>>> --
>>>>>>>>> 2.34.1
>>>>>>>>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07 14:11                   ` Christian König
@ 2023-02-07 14:17                     ` Alex Deucher
  2023-02-07 14:19                       ` Christian König
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 14:17 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher, Christian König, amd-gfx, Shashank Sharma

On Tue, Feb 7, 2023 at 9:11 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 07.02.23 um 15:07 schrieb Alex Deucher:
> > On Tue, Feb 7, 2023 at 2:38 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>
> >> On 07/02/2023 08:03, Christian König wrote:
> >>> Am 06.02.23 um 22:03 schrieb Alex Deucher:
> >>>> On Mon, Feb 6, 2023 at 12:01 PM Christian König
> >>>> <christian.koenig@amd.com> wrote:
> >>>>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
> >>>>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma
> >>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>> Hey Alex,
> >>>>>>>
> >>>>>>> On 03/02/2023 23:07, Alex Deucher wrote:
> >>>>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma
> >>>>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>>
> >>>>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
> >>>>>>>>> queue. The userspace app will fill this structure and request
> >>>>>>>>> the graphics driver to add a graphics work queue for it. The
> >>>>>>>>> output of this UAPI is a queue id.
> >>>>>>>>>
> >>>>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
> >>>>>>>>> submitting work to the queue as soon as the call returns.
> >>>>>>>>>
> >>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>>>>> ---
> >>>>>>>>>      include/uapi/drm/amdgpu_drm.h | 53
> >>>>>>>>> +++++++++++++++++++++++++++++++++++
> >>>>>>>>>      1 file changed, 53 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>> b/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>> index 4038abe8505a..6c5235d107b3 100644
> >>>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
> >>>>>>>>>      #define DRM_AMDGPU_VM                  0x13
> >>>>>>>>>      #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>>>>>>>>      #define DRM_AMDGPU_SCHED               0x15
> >>>>>>>>> +#define DRM_AMDGPU_USERQ               0x16
> >>>>>>>>>
> >>>>>>>>>      #define DRM_IOCTL_AMDGPU_GEM_CREATE
> >>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union
> >>>>>>>>> drm_amdgpu_gem_create)
> >>>>>>>>>      #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE
> >>>>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >>>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
> >>>>>>>>>      #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
> >>>>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>>>>>>>>      #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE
> >>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union
> >>>>>>>>> drm_amdgpu_fence_to_handle)
> >>>>>>>>>      #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
> >>>>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >>>>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
> >>>>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>>>>>>>>
> >>>>>>>>>      /**
> >>>>>>>>>       * DOC: memory domains
> >>>>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
> >>>>>>>>>             union drm_amdgpu_ctx_out out;
> >>>>>>>>>      };
> >>>>>>>>>
> >>>>>>>>> +/* user queue IOCTL */
> >>>>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
> >>>>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
> >>>>>>>>> +
> >>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >>>>>>>>> +
> >>>>>>>>> +struct drm_amdgpu_userq_mqd {
> >>>>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> >>>>>>>>> +       __u32   flags;
> >>>>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
> >>>>>>>>> +       __u32   ip_type;
> >>>>>>>>> +       /** GEM object handle */
> >>>>>>>>> +       __u32   doorbell_handle;
> >>>>>>>>> +       /** Doorbell offset in dwords */
> >>>>>>>>> +       __u32   doorbell_offset;
> >>>>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
> >>>>>>> Can you please help to cross check this information ? All the
> >>>>>>> existing
> >>>>>>> kernel doorbell calculations are keeping doorbells size as
> >>>>>>> sizeof(u32)
> >>>>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
> >>>>>> from, but from vega onward most doorbells are 64 bit.  I think some
> >>>>>> versions of VCN may still use 32 bit doorbells.  Internally in the
> >>>>>> kernel driver we just use two slots for newer hardware, but for the
> >>>>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
> >>>>>> Even if an engine only uses a 32 bit one, I don't know that there is
> >>>>>> much value to trying to support variable doorbell sizes.
> >>>>> I think we can stick with using __u32 because this is *not* the size of
> >>>>> the doorbell entries.
> >>>>>
> >>>>> Instead this is the offset into the BO where to find the doorbell for
> >>>>> this queue (which then in turn is 64bits wide).
> >>>>>
> >>>>> Since we will probably never have more than 4GiB doorbells we should be
> >>>>> pretty save to use 32bits here.
> >>>> Yes, the offset would still be 32 bits, but the units would be
> >>>> qwords.  E.g.,
> >>>>
> >>>> +       /** Doorbell offset in qwords */
> >>>> +       __u32   doorbell_offset;
> >>>>
> >>>> That way you couldn't accidently specify an overlapping doorbell.
> >>> Ah, so you only wanted to fix the comment. That was absolutely not
> >>> clear from the discussion.
> >> If I understand this correctly, the offset of the doorbell in the BO is
> >> still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting
> >> that right ?
> > Right.  Each doorbell is 64 bits (8 bytes) so this value would
> > basically be an index into the doorbell bo.  Having it be a 64 bit
> > index rather than a 32 bit index would avoid the possibility of users
> > specifying overlapping doorbells.  E.g.,
> > offset in bytes
> > 0 - doorbell
> > 4 - doorbell
> > Would be incorrect, while
> > offset in bytes
> > 0 - doorbell
> > 8 - doorbell
> > Would be correct.
> >
> > I.e., u64 doorbell_page[512] vs u32 doorbell_page[1024]
>
> Well I usually prefer just straight byte offsets, but I think the main
> question is what does the underlying hw/fw use?
>
> If that's a dword index we should probably stick with that in the UAPI
> as well. If it's in qword then stick to that, if it's in bytes than use
> that.

The MQD takes a dword offset from the start of the BAR, but the
doorbell is 64 bits wide so we have to be careful that we check for
overlapping doorbells.

Alex

>
> Otherwise we will just confuse people when we convert between the
> different API levels.
>
> Christian.
>
> >
> > Alex
> >
> >> - Shashank
> >>
> >>> Christian.
> >>>
> >>>> Alex
> >>>>
> >>>>> Christian.
> >>>>>
> >>>>>> Alex
> >>>>>>
> >>>>>>>>> +       /** GPU virtual address of the queue */
> >>>>>>>>> +       __u64   queue_va;
> >>>>>>>>> +       /** Size of the queue in bytes */
> >>>>>>>>> +       __u64   queue_size;
> >>>>>>>>> +       /** GPU virtual address of the rptr */
> >>>>>>>>> +       __u64   rptr_va;
> >>>>>>>>> +       /** GPU virtual address of the wptr */
> >>>>>>>>> +       __u64   wptr_va;
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> +struct drm_amdgpu_userq_in {
> >>>>>>>>> +       /** AMDGPU_USERQ_OP_* */
> >>>>>>>>> +       __u32   op;
> >>>>>>>>> +       /** Flags */
> >>>>>>>>> +       __u32   flags;
> >>>>>>>>> +       /** Queue handle to associate the queue free call with,
> >>>>>>>>> +        * unused for queue create calls */
> >>>>>>>>> +       __u32   queue_id;
> >>>>>>>>> +       __u32   pad;
> >>>>>>>>> +       /** Queue descriptor */
> >>>>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> +struct drm_amdgpu_userq_out {
> >>>>>>>>> +       /** Queue handle */
> >>>>>>>>> +       __u32   q_id;
> >>>>>>>> Maybe this should be queue_id to match the input.
> >>>>>>> Agree.
> >>>>>>>
> >>>>>>> - Shashank
> >>>>>>>
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>>> +       /** Flags */
> >>>>>>>>> +       __u32   flags;
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> +union drm_amdgpu_userq {
> >>>>>>>>> +       struct drm_amdgpu_userq_in in;
> >>>>>>>>> +       struct drm_amdgpu_userq_out out;
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>>      /* vm ioctl */
> >>>>>>>>>      #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>>>>>>>>      #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >>>>>>>>> --
> >>>>>>>>> 2.34.1
> >>>>>>>>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07 14:17                     ` Alex Deucher
@ 2023-02-07 14:19                       ` Christian König
  2023-02-07 14:20                         ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Christian König @ 2023-02-07 14:19 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: alexander.deucher, amd-gfx, Shashank Sharma

Am 07.02.23 um 15:17 schrieb Alex Deucher:
> On Tue, Feb 7, 2023 at 9:11 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 07.02.23 um 15:07 schrieb Alex Deucher:
>>> On Tue, Feb 7, 2023 at 2:38 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> On 07/02/2023 08:03, Christian König wrote:
>>>>> Am 06.02.23 um 22:03 schrieb Alex Deucher:
>>>>>> On Mon, Feb 6, 2023 at 12:01 PM Christian König
>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
>>>>>>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma
>>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>>> Hey Alex,
>>>>>>>>>
>>>>>>>>> On 03/02/2023 23:07, Alex Deucher wrote:
>>>>>>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma
>>>>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>
>>>>>>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>>>>>>>>> queue. The userspace app will fill this structure and request
>>>>>>>>>>> the graphics driver to add a graphics work queue for it. The
>>>>>>>>>>> output of this UAPI is a queue id.
>>>>>>>>>>>
>>>>>>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>>>>>>>>> submitting work to the queue as soon as the call returns.
>>>>>>>>>>>
>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>>>>>> ---
>>>>>>>>>>>       include/uapi/drm/amdgpu_drm.h | 53
>>>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>>       1 file changed, 53 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>> b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>> index 4038abe8505a..6c5235d107b3 100644
>>>>>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>>>>>>>>       #define DRM_AMDGPU_VM                  0x13
>>>>>>>>>>>       #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>>>>>>>>       #define DRM_AMDGPU_SCHED               0x15
>>>>>>>>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>>>>>>>>
>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_GEM_CREATE
>>>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union
>>>>>>>>>>> drm_amdgpu_gem_create)
>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE
>>>>>>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>>>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
>>>>>>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE
>>>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union
>>>>>>>>>>> drm_amdgpu_fence_to_handle)
>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
>>>>>>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>>>>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
>>>>>>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>>>>>>>>
>>>>>>>>>>>       /**
>>>>>>>>>>>        * DOC: memory domains
>>>>>>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>>>>>>>>>>              union drm_amdgpu_ctx_out out;
>>>>>>>>>>>       };
>>>>>>>>>>>
>>>>>>>>>>> +/* user queue IOCTL */
>>>>>>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>>>>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>>>>>>>>> +
>>>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>>>>>>>>> +
>>>>>>>>>>> +struct drm_amdgpu_userq_mqd {
>>>>>>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>>>>>>>>> +       __u32   flags;
>>>>>>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>>>>>>>>> +       __u32   ip_type;
>>>>>>>>>>> +       /** GEM object handle */
>>>>>>>>>>> +       __u32   doorbell_handle;
>>>>>>>>>>> +       /** Doorbell offset in dwords */
>>>>>>>>>>> +       __u32   doorbell_offset;
>>>>>>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
>>>>>>>>> Can you please help to cross check this information ? All the
>>>>>>>>> existing
>>>>>>>>> kernel doorbell calculations are keeping doorbells size as
>>>>>>>>> sizeof(u32)
>>>>>>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
>>>>>>>> from, but from vega onward most doorbells are 64 bit.  I think some
>>>>>>>> versions of VCN may still use 32 bit doorbells.  Internally in the
>>>>>>>> kernel driver we just use two slots for newer hardware, but for the
>>>>>>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
>>>>>>>> Even if an engine only uses a 32 bit one, I don't know that there is
>>>>>>>> much value to trying to support variable doorbell sizes.
>>>>>>> I think we can stick with using __u32 because this is *not* the size of
>>>>>>> the doorbell entries.
>>>>>>>
>>>>>>> Instead this is the offset into the BO where to find the doorbell for
>>>>>>> this queue (which then in turn is 64bits wide).
>>>>>>>
>>>>>>> Since we will probably never have more than 4GiB doorbells we should be
>>>>>>> pretty save to use 32bits here.
>>>>>> Yes, the offset would still be 32 bits, but the units would be
>>>>>> qwords.  E.g.,
>>>>>>
>>>>>> +       /** Doorbell offset in qwords */
>>>>>> +       __u32   doorbell_offset;
>>>>>>
>>>>>> That way you couldn't accidently specify an overlapping doorbell.
>>>>> Ah, so you only wanted to fix the comment. That was absolutely not
>>>>> clear from the discussion.
>>>> If I understand this correctly, the offset of the doorbell in the BO is
>>>> still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting
>>>> that right ?
>>> Right.  Each doorbell is 64 bits (8 bytes) so this value would
>>> basically be an index into the doorbell bo.  Having it be a 64 bit
>>> index rather than a 32 bit index would avoid the possibility of users
>>> specifying overlapping doorbells.  E.g.,
>>> offset in bytes
>>> 0 - doorbell
>>> 4 - doorbell
>>> Would be incorrect, while
>>> offset in bytes
>>> 0 - doorbell
>>> 8 - doorbell
>>> Would be correct.
>>>
>>> I.e., u64 doorbell_page[512] vs u32 doorbell_page[1024]
>> Well I usually prefer just straight byte offsets, but I think the main
>> question is what does the underlying hw/fw use?
>>
>> If that's a dword index we should probably stick with that in the UAPI
>> as well. If it's in qword then stick to that, if it's in bytes than use
>> that.
> The MQD takes a dword offset from the start of the BAR, but the
> doorbell is 64 bits wide so we have to be careful that we check for
> overlapping doorbells.

Well then let's just add an "if (doorbell_idx & 0x1) return -EINVAL;" to 
the kernel instead.

That's far less confusing that having dword in the MQD and qword in the 
UAPI.

Christian.

>
> Alex
>
>> Otherwise we will just confuse people when we convert between the
>> different API levels.
>>
>> Christian.
>>
>>> Alex
>>>
>>>> - Shashank
>>>>
>>>>> Christian.
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>>>> +       /** GPU virtual address of the queue */
>>>>>>>>>>> +       __u64   queue_va;
>>>>>>>>>>> +       /** Size of the queue in bytes */
>>>>>>>>>>> +       __u64   queue_size;
>>>>>>>>>>> +       /** GPU virtual address of the rptr */
>>>>>>>>>>> +       __u64   rptr_va;
>>>>>>>>>>> +       /** GPU virtual address of the wptr */
>>>>>>>>>>> +       __u64   wptr_va;
>>>>>>>>>>> +};
>>>>>>>>>>> +
>>>>>>>>>>> +struct drm_amdgpu_userq_in {
>>>>>>>>>>> +       /** AMDGPU_USERQ_OP_* */
>>>>>>>>>>> +       __u32   op;
>>>>>>>>>>> +       /** Flags */
>>>>>>>>>>> +       __u32   flags;
>>>>>>>>>>> +       /** Queue handle to associate the queue free call with,
>>>>>>>>>>> +        * unused for queue create calls */
>>>>>>>>>>> +       __u32   queue_id;
>>>>>>>>>>> +       __u32   pad;
>>>>>>>>>>> +       /** Queue descriptor */
>>>>>>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>>>>>>>>> +};
>>>>>>>>>>> +
>>>>>>>>>>> +struct drm_amdgpu_userq_out {
>>>>>>>>>>> +       /** Queue handle */
>>>>>>>>>>> +       __u32   q_id;
>>>>>>>>>> Maybe this should be queue_id to match the input.
>>>>>>>>> Agree.
>>>>>>>>>
>>>>>>>>> - Shashank
>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>>> +       /** Flags */
>>>>>>>>>>> +       __u32   flags;
>>>>>>>>>>> +};
>>>>>>>>>>> +
>>>>>>>>>>> +union drm_amdgpu_userq {
>>>>>>>>>>> +       struct drm_amdgpu_userq_in in;
>>>>>>>>>>> +       struct drm_amdgpu_userq_out out;
>>>>>>>>>>> +};
>>>>>>>>>>> +
>>>>>>>>>>>       /* vm ioctl */
>>>>>>>>>>>       #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>>>>>>>>       #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>>>>>>>>> --
>>>>>>>>>>> 2.34.1
>>>>>>>>>>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07 14:19                       ` Christian König
@ 2023-02-07 14:20                         ` Alex Deucher
  2023-02-07 14:36                           ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 14:20 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher, Christian König, amd-gfx, Shashank Sharma

On Tue, Feb 7, 2023 at 9:19 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 07.02.23 um 15:17 schrieb Alex Deucher:
> > On Tue, Feb 7, 2023 at 9:11 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> Am 07.02.23 um 15:07 schrieb Alex Deucher:
> >>> On Tue, Feb 7, 2023 at 2:38 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>> On 07/02/2023 08:03, Christian König wrote:
> >>>>> Am 06.02.23 um 22:03 schrieb Alex Deucher:
> >>>>>> On Mon, Feb 6, 2023 at 12:01 PM Christian König
> >>>>>> <christian.koenig@amd.com> wrote:
> >>>>>>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
> >>>>>>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma
> >>>>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>>>> Hey Alex,
> >>>>>>>>>
> >>>>>>>>> On 03/02/2023 23:07, Alex Deucher wrote:
> >>>>>>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma
> >>>>>>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>>>>
> >>>>>>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
> >>>>>>>>>>> queue. The userspace app will fill this structure and request
> >>>>>>>>>>> the graphics driver to add a graphics work queue for it. The
> >>>>>>>>>>> output of this UAPI is a queue id.
> >>>>>>>>>>>
> >>>>>>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
> >>>>>>>>>>> submitting work to the queue as soon as the call returns.
> >>>>>>>>>>>
> >>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>       include/uapi/drm/amdgpu_drm.h | 53
> >>>>>>>>>>> +++++++++++++++++++++++++++++++++++
> >>>>>>>>>>>       1 file changed, 53 insertions(+)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>>>> b/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>>>> index 4038abe8505a..6c5235d107b3 100644
> >>>>>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
> >>>>>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
> >>>>>>>>>>>       #define DRM_AMDGPU_VM                  0x13
> >>>>>>>>>>>       #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>>>>>>>>>>       #define DRM_AMDGPU_SCHED               0x15
> >>>>>>>>>>> +#define DRM_AMDGPU_USERQ               0x16
> >>>>>>>>>>>
> >>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_GEM_CREATE
> >>>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union
> >>>>>>>>>>> drm_amdgpu_gem_create)
> >>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE
> >>>>>>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >>>>>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
> >>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
> >>>>>>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE
> >>>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union
> >>>>>>>>>>> drm_amdgpu_fence_to_handle)
> >>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
> >>>>>>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >>>>>>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
> >>>>>>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>>>>>>>>>>
> >>>>>>>>>>>       /**
> >>>>>>>>>>>        * DOC: memory domains
> >>>>>>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
> >>>>>>>>>>>              union drm_amdgpu_ctx_out out;
> >>>>>>>>>>>       };
> >>>>>>>>>>>
> >>>>>>>>>>> +/* user queue IOCTL */
> >>>>>>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
> >>>>>>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
> >>>>>>>>>>> +
> >>>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >>>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >>>>>>>>>>> +
> >>>>>>>>>>> +struct drm_amdgpu_userq_mqd {
> >>>>>>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> >>>>>>>>>>> +       __u32   flags;
> >>>>>>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
> >>>>>>>>>>> +       __u32   ip_type;
> >>>>>>>>>>> +       /** GEM object handle */
> >>>>>>>>>>> +       __u32   doorbell_handle;
> >>>>>>>>>>> +       /** Doorbell offset in dwords */
> >>>>>>>>>>> +       __u32   doorbell_offset;
> >>>>>>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
> >>>>>>>>> Can you please help to cross check this information ? All the
> >>>>>>>>> existing
> >>>>>>>>> kernel doorbell calculations are keeping doorbells size as
> >>>>>>>>> sizeof(u32)
> >>>>>>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
> >>>>>>>> from, but from vega onward most doorbells are 64 bit.  I think some
> >>>>>>>> versions of VCN may still use 32 bit doorbells.  Internally in the
> >>>>>>>> kernel driver we just use two slots for newer hardware, but for the
> >>>>>>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
> >>>>>>>> Even if an engine only uses a 32 bit one, I don't know that there is
> >>>>>>>> much value to trying to support variable doorbell sizes.
> >>>>>>> I think we can stick with using __u32 because this is *not* the size of
> >>>>>>> the doorbell entries.
> >>>>>>>
> >>>>>>> Instead this is the offset into the BO where to find the doorbell for
> >>>>>>> this queue (which then in turn is 64bits wide).
> >>>>>>>
> >>>>>>> Since we will probably never have more than 4GiB doorbells we should be
> >>>>>>> pretty save to use 32bits here.
> >>>>>> Yes, the offset would still be 32 bits, but the units would be
> >>>>>> qwords.  E.g.,
> >>>>>>
> >>>>>> +       /** Doorbell offset in qwords */
> >>>>>> +       __u32   doorbell_offset;
> >>>>>>
> >>>>>> That way you couldn't accidently specify an overlapping doorbell.
> >>>>> Ah, so you only wanted to fix the comment. That was absolutely not
> >>>>> clear from the discussion.
> >>>> If I understand this correctly, the offset of the doorbell in the BO is
> >>>> still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting
> >>>> that right ?
> >>> Right.  Each doorbell is 64 bits (8 bytes) so this value would
> >>> basically be an index into the doorbell bo.  Having it be a 64 bit
> >>> index rather than a 32 bit index would avoid the possibility of users
> >>> specifying overlapping doorbells.  E.g.,
> >>> offset in bytes
> >>> 0 - doorbell
> >>> 4 - doorbell
> >>> Would be incorrect, while
> >>> offset in bytes
> >>> 0 - doorbell
> >>> 8 - doorbell
> >>> Would be correct.
> >>>
> >>> I.e., u64 doorbell_page[512] vs u32 doorbell_page[1024]
> >> Well I usually prefer just straight byte offsets, but I think the main
> >> question is what does the underlying hw/fw use?
> >>
> >> If that's a dword index we should probably stick with that in the UAPI
> >> as well. If it's in qword then stick to that, if it's in bytes than use
> >> that.
> > The MQD takes a dword offset from the start of the BAR, but the
> > doorbell is 64 bits wide so we have to be careful that we check for
> > overlapping doorbells.
>
> Well then let's just add an "if (doorbell_idx & 0x1) return -EINVAL;" to
> the kernel instead.
>
> That's far less confusing that having dword in the MQD and qword in the
> UAPI.

Yes, agreed.

Alex

>
> Christian.
>
> >
> > Alex
> >
> >> Otherwise we will just confuse people when we convert between the
> >> different API levels.
> >>
> >> Christian.
> >>
> >>> Alex
> >>>
> >>>> - Shashank
> >>>>
> >>>>> Christian.
> >>>>>
> >>>>>> Alex
> >>>>>>
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>>>>> +       /** GPU virtual address of the queue */
> >>>>>>>>>>> +       __u64   queue_va;
> >>>>>>>>>>> +       /** Size of the queue in bytes */
> >>>>>>>>>>> +       __u64   queue_size;
> >>>>>>>>>>> +       /** GPU virtual address of the rptr */
> >>>>>>>>>>> +       __u64   rptr_va;
> >>>>>>>>>>> +       /** GPU virtual address of the wptr */
> >>>>>>>>>>> +       __u64   wptr_va;
> >>>>>>>>>>> +};
> >>>>>>>>>>> +
> >>>>>>>>>>> +struct drm_amdgpu_userq_in {
> >>>>>>>>>>> +       /** AMDGPU_USERQ_OP_* */
> >>>>>>>>>>> +       __u32   op;
> >>>>>>>>>>> +       /** Flags */
> >>>>>>>>>>> +       __u32   flags;
> >>>>>>>>>>> +       /** Queue handle to associate the queue free call with,
> >>>>>>>>>>> +        * unused for queue create calls */
> >>>>>>>>>>> +       __u32   queue_id;
> >>>>>>>>>>> +       __u32   pad;
> >>>>>>>>>>> +       /** Queue descriptor */
> >>>>>>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
> >>>>>>>>>>> +};
> >>>>>>>>>>> +
> >>>>>>>>>>> +struct drm_amdgpu_userq_out {
> >>>>>>>>>>> +       /** Queue handle */
> >>>>>>>>>>> +       __u32   q_id;
> >>>>>>>>>> Maybe this should be queue_id to match the input.
> >>>>>>>>> Agree.
> >>>>>>>>>
> >>>>>>>>> - Shashank
> >>>>>>>>>
> >>>>>>>>>> Alex
> >>>>>>>>>>
> >>>>>>>>>>> +       /** Flags */
> >>>>>>>>>>> +       __u32   flags;
> >>>>>>>>>>> +};
> >>>>>>>>>>> +
> >>>>>>>>>>> +union drm_amdgpu_userq {
> >>>>>>>>>>> +       struct drm_amdgpu_userq_in in;
> >>>>>>>>>>> +       struct drm_amdgpu_userq_out out;
> >>>>>>>>>>> +};
> >>>>>>>>>>> +
> >>>>>>>>>>>       /* vm ioctl */
> >>>>>>>>>>>       #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>>>>>>>>>>       #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >>>>>>>>>>> --
> >>>>>>>>>>> 2.34.1
> >>>>>>>>>>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/amdgpu: UAPI for user queue management
  2023-02-07 14:20                         ` Alex Deucher
@ 2023-02-07 14:36                           ` Shashank Sharma
  0 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07 14:36 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: alexander.deucher, Christian König, amd-gfx


On 07/02/2023 15:20, Alex Deucher wrote:
> On Tue, Feb 7, 2023 at 9:19 AM Christian König <christian.koenig@amd.com> wrote:
>> Am 07.02.23 um 15:17 schrieb Alex Deucher:
>>> On Tue, Feb 7, 2023 at 9:11 AM Christian König
>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>> Am 07.02.23 um 15:07 schrieb Alex Deucher:
>>>>> On Tue, Feb 7, 2023 at 2:38 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>>>> On 07/02/2023 08:03, Christian König wrote:
>>>>>>> Am 06.02.23 um 22:03 schrieb Alex Deucher:
>>>>>>>> On Mon, Feb 6, 2023 at 12:01 PM Christian König
>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>> Am 06.02.23 um 17:56 schrieb Alex Deucher:
>>>>>>>>>> On Fri, Feb 3, 2023 at 5:26 PM Shashank Sharma
>>>>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>>>>> Hey Alex,
>>>>>>>>>>>
>>>>>>>>>>> On 03/02/2023 23:07, Alex Deucher wrote:
>>>>>>>>>>>> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma
>>>>>>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>>>>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>>>>>>>>>>> queue. The userspace app will fill this structure and request
>>>>>>>>>>>>> the graphics driver to add a graphics work queue for it. The
>>>>>>>>>>>>> output of this UAPI is a queue id.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>>>>>>>>>>> submitting work to the queue as soon as the call returns.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>        include/uapi/drm/amdgpu_drm.h | 53
>>>>>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>        1 file changed, 53 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> index 4038abe8505a..6c5235d107b3 100644
>>>>>>>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>>>>>>>>>>        #define DRM_AMDGPU_VM                  0x13
>>>>>>>>>>>>>        #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>>>>>>>>>>        #define DRM_AMDGPU_SCHED               0x15
>>>>>>>>>>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>>>>>>>>>>
>>>>>>>>>>>>>        #define DRM_IOCTL_AMDGPU_GEM_CREATE
>>>>>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union
>>>>>>>>>>>>> drm_amdgpu_gem_create)
>>>>>>>>>>>>>        #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE
>>>>>>>>>>>>> + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>>>>>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>>>>>>>>>>        #define DRM_IOCTL_AMDGPU_VM DRM_IOWR(DRM_COMMAND_BASE +
>>>>>>>>>>>>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>>>>>>>>>>        #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE
>>>>>>>>>>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union
>>>>>>>>>>>>> drm_amdgpu_fence_to_handle)
>>>>>>>>>>>>>        #define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE +
>>>>>>>>>>>>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>>>>>>>>>>> +#define DRM_IOCTL_AMDGPU_USERQ DRM_IOW(DRM_COMMAND_BASE +
>>>>>>>>>>>>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>>>>>>>>>>
>>>>>>>>>>>>>        /**
>>>>>>>>>>>>>         * DOC: memory domains
>>>>>>>>>>>>> @@ -302,6 +304,57 @@ union drm_amdgpu_ctx {
>>>>>>>>>>>>>               union drm_amdgpu_ctx_out out;
>>>>>>>>>>>>>        };
>>>>>>>>>>>>>
>>>>>>>>>>>>> +/* user queue IOCTL */
>>>>>>>>>>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>>>>>>>>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>>>>>>>>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +struct drm_amdgpu_userq_mqd {
>>>>>>>>>>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>>>>>>>>>>> +       __u32   flags;
>>>>>>>>>>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>>>>>>>>>>> +       __u32   ip_type;
>>>>>>>>>>>>> +       /** GEM object handle */
>>>>>>>>>>>>> +       __u32   doorbell_handle;
>>>>>>>>>>>>> +       /** Doorbell offset in dwords */
>>>>>>>>>>>>> +       __u32   doorbell_offset;
>>>>>>>>>>>> Since doorbells are 64 bit, maybe this offset should be in qwords.
>>>>>>>>>>> Can you please help to cross check this information ? All the
>>>>>>>>>>> existing
>>>>>>>>>>> kernel doorbell calculations are keeping doorbells size as
>>>>>>>>>>> sizeof(u32)
>>>>>>>>>> Doorbells on pre-vega hardware are 32 bits so that is where that comes
>>>>>>>>>> from, but from vega onward most doorbells are 64 bit.  I think some
>>>>>>>>>> versions of VCN may still use 32 bit doorbells.  Internally in the
>>>>>>>>>> kernel driver we just use two slots for newer hardware, but for the
>>>>>>>>>> UAPI, I think we can just stick with 64 bit slots to avoid confusion.
>>>>>>>>>> Even if an engine only uses a 32 bit one, I don't know that there is
>>>>>>>>>> much value to trying to support variable doorbell sizes.
>>>>>>>>> I think we can stick with using __u32 because this is *not* the size of
>>>>>>>>> the doorbell entries.
>>>>>>>>>
>>>>>>>>> Instead this is the offset into the BO where to find the doorbell for
>>>>>>>>> this queue (which then in turn is 64bits wide).
>>>>>>>>>
>>>>>>>>> Since we will probably never have more than 4GiB doorbells we should be
>>>>>>>>> pretty save to use 32bits here.
>>>>>>>> Yes, the offset would still be 32 bits, but the units would be
>>>>>>>> qwords.  E.g.,
>>>>>>>>
>>>>>>>> +       /** Doorbell offset in qwords */
>>>>>>>> +       __u32   doorbell_offset;
>>>>>>>>
>>>>>>>> That way you couldn't accidently specify an overlapping doorbell.
>>>>>>> Ah, so you only wanted to fix the comment. That was absolutely not
>>>>>>> clear from the discussion.
>>>>>> If I understand this correctly, the offset of the doorbell in the BO is
>>>>>> still is 32-bit, but its width (size in bytes) is 64 bits. Am I getting
>>>>>> that right ?
>>>>> Right.  Each doorbell is 64 bits (8 bytes) so this value would
>>>>> basically be an index into the doorbell bo.  Having it be a 64 bit
>>>>> index rather than a 32 bit index would avoid the possibility of users
>>>>> specifying overlapping doorbells.  E.g.,
>>>>> offset in bytes
>>>>> 0 - doorbell
>>>>> 4 - doorbell
>>>>> Would be incorrect, while
>>>>> offset in bytes
>>>>> 0 - doorbell
>>>>> 8 - doorbell
>>>>> Would be correct.
>>>>>
>>>>> I.e., u64 doorbell_page[512] vs u32 doorbell_page[1024]
>>>> Well I usually prefer just straight byte offsets, but I think the main
>>>> question is what does the underlying hw/fw use?
>>>>
>>>> If that's a dword index we should probably stick with that in the UAPI
>>>> as well. If it's in qword then stick to that, if it's in bytes than use
>>>> that.
>>> The MQD takes a dword offset from the start of the BAR, but the
>>> doorbell is 64 bits wide so we have to be careful that we check for
>>> overlapping doorbells.
>> Well then let's just add an "if (doorbell_idx & 0x1) return -EINVAL;" to
>> the kernel instead.
>>
>> That's far less confusing that having dword in the MQD and qword in the
>> UAPI.
> Yes, agreed.

Got it, Thanks.

- Shashank

>
> Alex
>
>> Christian.
>>
>>> Alex
>>>
>>>> Otherwise we will just confuse people when we convert between the
>>>> different API levels.
>>>>
>>>> Christian.
>>>>
>>>>> Alex
>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>>>>> +       /** GPU virtual address of the queue */
>>>>>>>>>>>>> +       __u64   queue_va;
>>>>>>>>>>>>> +       /** Size of the queue in bytes */
>>>>>>>>>>>>> +       __u64   queue_size;
>>>>>>>>>>>>> +       /** GPU virtual address of the rptr */
>>>>>>>>>>>>> +       __u64   rptr_va;
>>>>>>>>>>>>> +       /** GPU virtual address of the wptr */
>>>>>>>>>>>>> +       __u64   wptr_va;
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +struct drm_amdgpu_userq_in {
>>>>>>>>>>>>> +       /** AMDGPU_USERQ_OP_* */
>>>>>>>>>>>>> +       __u32   op;
>>>>>>>>>>>>> +       /** Flags */
>>>>>>>>>>>>> +       __u32   flags;
>>>>>>>>>>>>> +       /** Queue handle to associate the queue free call with,
>>>>>>>>>>>>> +        * unused for queue create calls */
>>>>>>>>>>>>> +       __u32   queue_id;
>>>>>>>>>>>>> +       __u32   pad;
>>>>>>>>>>>>> +       /** Queue descriptor */
>>>>>>>>>>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +struct drm_amdgpu_userq_out {
>>>>>>>>>>>>> +       /** Queue handle */
>>>>>>>>>>>>> +       __u32   q_id;
>>>>>>>>>>>> Maybe this should be queue_id to match the input.
>>>>>>>>>>> Agree.
>>>>>>>>>>>
>>>>>>>>>>> - Shashank
>>>>>>>>>>>
>>>>>>>>>>>> Alex
>>>>>>>>>>>>
>>>>>>>>>>>>> +       /** Flags */
>>>>>>>>>>>>> +       __u32   flags;
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +union drm_amdgpu_userq {
>>>>>>>>>>>>> +       struct drm_amdgpu_userq_in in;
>>>>>>>>>>>>> +       struct drm_amdgpu_userq_out out;
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>>        /* vm ioctl */
>>>>>>>>>>>>>        #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>>>>>>>>>>        #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 2.34.1
>>>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] drm/amdgpu: add usermode queues
  2023-02-03 21:54 ` [PATCH 2/8] drm/amdgpu: add usermode queues Shashank Sharma
  2023-02-07  7:08   ` Christian König
@ 2023-02-07 14:54   ` Alex Deucher
  2023-02-07 15:02     ` Shashank Sharma
  1 sibling, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 14:54 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, amd-gfx

On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This patch adds skeleton code for usermode queue creation. It
> typically contains:
> - A new structure to keep all the user queue data in one place.
> - An IOCTL function to create/free a usermode queue.
> - A function to generate unique index for the queue.
> - A queue context manager in driver private data.
>
> V1: Worked on design review comments from RFC patch series:
> (https://patchwork.freedesktop.org/series/112214/)
>
> - Alex: Keep a list of queues, instead of single queue per process.
> - Christian: Use the queue manager instead of global ptrs,
>            Don't keep the queue structure in amdgpu_ctx
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   5 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 155 ++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  64 ++++++++
>  6 files changed, 230 insertions(+)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 798d0e9a60b7..764801cc8203 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -210,6 +210,8 @@ amdgpu-y += \
>  # add amdkfd interfaces
>  amdgpu-y += amdgpu_amdkfd.o
>
> +# add usermode queue
> +amdgpu-y += amdgpu_userqueue.o
>
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 6b74df446694..0625d6bdadf4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -109,6 +109,7 @@
>  #include "amdgpu_fdinfo.h"
>  #include "amdgpu_mca.h"
>  #include "amdgpu_ras.h"
> +#include "amdgpu_userqueue.h"
>
>  #define MAX_GPU_INSTANCE               16
>
> @@ -482,6 +483,7 @@ struct amdgpu_fpriv {
>         struct mutex            bo_list_lock;
>         struct idr              bo_list_handles;
>         struct amdgpu_ctx_mgr   ctx_mgr;
> +       struct amdgpu_userq_mgr userq_mgr;
>  };
>
>  int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b4f2d61ea0d5..229976a2d0e7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -52,6 +52,7 @@
>  #include "amdgpu_ras.h"
>  #include "amdgpu_xgmi.h"
>  #include "amdgpu_reset.h"
> +#include "amdgpu_userqueue.h"
>
>  /*
>   * KMS wrapper.
> @@ -2748,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>  };
>
>  static const struct drm_driver amdgpu_kms_driver = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 7aa7e52ca784..52e61e339a88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1187,6 +1187,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>
>         amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>
> +       r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
> +       if (r)
> +               DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
> +
>         file_priv->driver_priv = fpriv;
>         goto out_suspend;
>
> @@ -1254,6 +1258,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>
>         amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>         amdgpu_vm_fini(adev, &fpriv->vm);
> +       amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>
>         if (pasid)
>                 amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> new file mode 100644
> index 000000000000..d5bc7fe81750
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -0,0 +1,155 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu.h"
> +#include "amdgpu_vm.h"
> +
> +static inline int
> +amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
> +}
> +
> +static inline void
> +amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
> +{
> +    idr_remove(&uq_mgr->userq_idr, queue_id);
> +}
> +
> +static struct amdgpu_usermode_queue
> +*amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
> +{
> +    return idr_find(&uq_mgr->userq_idr, qid);
> +}
> +
> +static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> +{
> +    int r, pasid;
> +    struct amdgpu_usermode_queue *queue;
> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +    struct amdgpu_vm *vm = &fpriv->vm;
> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
> +
> +    pasid = vm->pasid;
> +    if (vm->pasid < 0) {
> +        DRM_WARN("No PASID info found\n");
> +        pasid = 0;
> +    }
> +
> +    mutex_lock(&uq_mgr->userq_mutex);
> +
> +    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
> +    if (!queue) {
> +        DRM_ERROR("Failed to allocate memory for queue\n");
> +        mutex_unlock(&uq_mgr->userq_mutex);
> +        return -ENOMEM;
> +    }
> +
> +    queue->vm = vm;
> +    queue->pasid = pasid;
> +    queue->wptr_gpu_addr = mqd_in->wptr_va;
> +    queue->rptr_gpu_addr = mqd_in->rptr_va;
> +    queue->queue_size = mqd_in->queue_size;
> +    queue->queue_type = mqd_in->ip_type;
> +    queue->queue_gpu_addr = mqd_in->queue_va;
> +    queue->flags = mqd_in->flags;
> +    queue->use_doorbell = true;

I think we can drop use_doorbell.  All user queues require a doorbell.

Alex

> +    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
> +    if (queue->queue_id < 0) {
> +        DRM_ERROR("Failed to allocate a queue id\n");
> +        r = queue->queue_id;
> +        goto free_queue;
> +    }
> +
> +    list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
> +    args->out.q_id = queue->queue_id;
> +    args->out.flags = 0;
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    return 0;
> +
> +free_queue:
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    kfree(queue);
> +    return r;
> +}
> +
> +static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
> +{
> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +    struct amdgpu_usermode_queue *queue;
> +
> +    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
> +    if (!queue) {
> +        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
> +        return;
> +    }
> +
> +    mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> +    list_del(&queue->userq_node);
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    kfree(queue);
> +}
> +
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> +                      struct drm_file *filp)
> +{
> +    union drm_amdgpu_userq *args = data;
> +    int r = 0;
> +
> +    switch (args->in.op) {
> +    case AMDGPU_USERQ_OP_CREATE:
> +        r = amdgpu_userqueue_create(filp, args);
> +        if (r)
> +            DRM_ERROR("Failed to create usermode queue\n");
> +        break;
> +
> +    case AMDGPU_USERQ_OP_FREE:
> +        amdgpu_userqueue_destroy(filp, args->in.queue_id);
> +        break;
> +
> +    default:
> +        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
> +        return -EINVAL;
> +    }
> +
> +    return r;
> +}
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> +{
> +    mutex_init(&userq_mgr->userq_mutex);
> +    idr_init_base(&userq_mgr->userq_idr, 1);
> +    INIT_LIST_HEAD(&userq_mgr->userq_list);
> +    userq_mgr->adev = adev;
> +
> +    return 0;
> +}
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
> +{
> +    idr_destroy(&userq_mgr->userq_idr);
> +    mutex_destroy(&userq_mgr->userq_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> new file mode 100644
> index 000000000000..9557588fe34f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -0,0 +1,64 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef AMDGPU_USERQUEUE_H_
> +#define AMDGPU_USERQUEUE_H_
> +
> +#define AMDGPU_MAX_USERQ 512
> +
> +struct amdgpu_userq_mgr {
> +       struct idr userq_idr;
> +       struct mutex userq_mutex;
> +       struct list_head userq_list;
> +       struct amdgpu_device *adev;
> +};
> +
> +struct amdgpu_usermode_queue {
> +       int             queue_id;
> +       int             queue_type;
> +       int             queue_size;
> +       int             pasid;
> +       int             doorbell_index;
> +       int             use_doorbell;
> +
> +       uint64_t        wptr_gpu_addr;
> +       uint64_t        rptr_gpu_addr;
> +       uint64_t        queue_gpu_addr;
> +       uint64_t        flags;
> +
> +       uint64_t        mqd_gpu_addr;
> +       void            *mqd_cpu_ptr;
> +
> +       struct amdgpu_bo        *mqd_obj;
> +       struct amdgpu_vm        *vm;
> +       struct amdgpu_userq_mgr *userq_mgr;
> +       struct list_head        userq_node;
> +};
> +
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> +
> +#endif
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers
  2023-02-03 21:54 ` [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers Shashank Sharma
  2023-02-07  7:11   ` Christian König
@ 2023-02-07 14:59   ` Alex Deucher
  1 sibling, 0 replies; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 14:59 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, amd-gfx

On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
> context. As the method of formation of a MQD, and its format can vary between
> different graphics IPs, we need gfx GEN specific handlers to create MQDs.
>
> This patch:
> - Introduces MQD hander functions for the usermode queues
> - A general function to create and destroy MQD for a userqueue.
>
> V1: Worked on review comments from Alex on RFC patches:
>     MQD creation should be gen and IP specific.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 64 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  9 +++
>  2 files changed, 73 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index d5bc7fe81750..625c2fe1e84a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -42,6 +42,60 @@ static struct amdgpu_usermode_queue
>      return idr_find(&uq_mgr->userq_idr, qid);
>  }
>
> +static int
> +amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +    int size;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +
> +    if (!uq_mgr->userq_mqd_funcs) {
> +        DRM_ERROR("Userqueue not initialized\n");
> +        return -EINVAL;
> +    }
> +
> +    size = uq_mgr->userq_mqd_funcs->mqd_size(uq_mgr);
> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &queue->mqd_obj,
> +                                &queue->mqd_gpu_addr,
> +                                &queue->mqd_cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
> +        return r;
> +    }
> +
> +    memset(queue->mqd_cpu_ptr, 0, size);
> +    r = amdgpu_bo_reserve(queue->mqd_obj, false);
> +    if (unlikely(r != 0)) {
> +        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
> +        goto free_mqd;
> +    }
> +
> +    r = uq_mgr->userq_mqd_funcs->mqd_create(uq_mgr, queue);
> +    amdgpu_bo_unreserve(queue->mqd_obj);
> +    if (r) {
> +        DRM_ERROR("Failed to create MQD for queue\n");
> +        goto free_mqd;
> +    }
> +    return 0;
> +
> +free_mqd:
> +    amdgpu_bo_free_kernel(&queue->mqd_obj,
> +                          &queue->mqd_gpu_addr,
> +                          &queue->mqd_cpu_ptr);
> +   return r;
> +}
> +
> +static void
> +amdgpu_userqueue_destroy_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    uq_mgr->userq_mqd_funcs->mqd_destroy(uq_mgr, queue);
> +    amdgpu_bo_free_kernel(&queue->mqd_obj,
> +                          &queue->mqd_gpu_addr,
> +                          &queue->mqd_cpu_ptr);
> +}
> +
>  static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>  {
>      int r, pasid;
> @@ -82,12 +136,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>          goto free_queue;
>      }
>
> +    r = amdgpu_userqueue_create_mqd(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create MQD\n");
> +        goto free_qid;
> +    }
> +
>      list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>      args->out.q_id = queue->queue_id;
>      args->out.flags = 0;
>      mutex_unlock(&uq_mgr->userq_mutex);
>      return 0;
>
> +free_qid:
> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> +
>  free_queue:
>      mutex_unlock(&uq_mgr->userq_mutex);
>      kfree(queue);
> @@ -107,6 +170,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>      }
>
>      mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>      list_del(&queue->userq_node);
>      mutex_unlock(&uq_mgr->userq_mutex);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 9557588fe34f..a6abdfd5cb74 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -26,10 +26,13 @@
>
>  #define AMDGPU_MAX_USERQ 512
>
> +struct amdgpu_userq_mqd_funcs;
> +
>  struct amdgpu_userq_mgr {
>         struct idr userq_idr;
>         struct mutex userq_mutex;
>         struct list_head userq_list;
> +       const struct amdgpu_userq_mqd_funcs *userq_mqd_funcs;
>         struct amdgpu_device *adev;
>  };
>
> @@ -57,6 +60,12 @@ struct amdgpu_usermode_queue {
>
>  int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
>
> +struct amdgpu_userq_mqd_funcs {
> +       int (*mqd_size)(struct amdgpu_userq_mgr *);
> +       int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);

I think all we need is create and destroy callbacks.  All memory
allocations and metadata required for a specific engine's queue
management should be handled internally in the engine specific code.

Alex

> +};
> +
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>
>  void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/8] drm/amdgpu: add usermode queues
  2023-02-07 14:54   ` Alex Deucher
@ 2023-02-07 15:02     ` Shashank Sharma
  0 siblings, 0 replies; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07 15:02 UTC (permalink / raw)
  To: Alex Deucher
  Cc: alexander.deucher, Shashank Sharma, christian.koenig, amd-gfx


On 07/02/2023 15:54, Alex Deucher wrote:
> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch adds skeleton code for usermode queue creation. It
>> typically contains:
>> - A new structure to keep all the user queue data in one place.
>> - An IOCTL function to create/free a usermode queue.
>> - A function to generate unique index for the queue.
>> - A queue context manager in driver private data.
>>
>> V1: Worked on design review comments from RFC patch series:
>> (https://patchwork.freedesktop.org/series/112214/)
>>
>> - Alex: Keep a list of queues, instead of single queue per process.
>> - Christian: Use the queue manager instead of global ptrs,
>>             Don't keep the queue structure in amdgpu_ctx
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   5 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 155 ++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  64 ++++++++
>>   6 files changed, 230 insertions(+)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 798d0e9a60b7..764801cc8203 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -210,6 +210,8 @@ amdgpu-y += \
>>   # add amdkfd interfaces
>>   amdgpu-y += amdgpu_amdkfd.o
>>
>> +# add usermode queue
>> +amdgpu-y += amdgpu_userqueue.o
>>
>>   ifneq ($(CONFIG_HSA_AMD),)
>>   AMDKFD_PATH := ../amdkfd
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 6b74df446694..0625d6bdadf4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -109,6 +109,7 @@
>>   #include "amdgpu_fdinfo.h"
>>   #include "amdgpu_mca.h"
>>   #include "amdgpu_ras.h"
>> +#include "amdgpu_userqueue.h"
>>
>>   #define MAX_GPU_INSTANCE               16
>>
>> @@ -482,6 +483,7 @@ struct amdgpu_fpriv {
>>          struct mutex            bo_list_lock;
>>          struct idr              bo_list_handles;
>>          struct amdgpu_ctx_mgr   ctx_mgr;
>> +       struct amdgpu_userq_mgr userq_mgr;
>>   };
>>
>>   int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index b4f2d61ea0d5..229976a2d0e7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -52,6 +52,7 @@
>>   #include "amdgpu_ras.h"
>>   #include "amdgpu_xgmi.h"
>>   #include "amdgpu_reset.h"
>> +#include "amdgpu_userqueue.h"
>>
>>   /*
>>    * KMS wrapper.
>> @@ -2748,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>>          DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>          DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>          DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>   };
>>
>>   static const struct drm_driver amdgpu_kms_driver = {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 7aa7e52ca784..52e61e339a88 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -1187,6 +1187,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>>
>>          amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>>
>> +       r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
>> +       if (r)
>> +               DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
>> +
>>          file_priv->driver_priv = fpriv;
>>          goto out_suspend;
>>
>> @@ -1254,6 +1258,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>
>>          amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>>          amdgpu_vm_fini(adev, &fpriv->vm);
>> +       amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>>
>>          if (pasid)
>>                  amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> new file mode 100644
>> index 000000000000..d5bc7fe81750
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -0,0 +1,155 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include "amdgpu.h"
>> +#include "amdgpu_vm.h"
>> +
>> +static inline int
>> +amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>> +{
>> +    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
>> +}
>> +
>> +static inline void
>> +amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
>> +{
>> +    idr_remove(&uq_mgr->userq_idr, queue_id);
>> +}
>> +
>> +static struct amdgpu_usermode_queue
>> +*amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>> +{
>> +    return idr_find(&uq_mgr->userq_idr, qid);
>> +}
>> +
>> +static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>> +{
>> +    int r, pasid;
>> +    struct amdgpu_usermode_queue *queue;
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>> +    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
>> +
>> +    pasid = vm->pasid;
>> +    if (vm->pasid < 0) {
>> +        DRM_WARN("No PASID info found\n");
>> +        pasid = 0;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +
>> +    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
>> +    if (!queue) {
>> +        DRM_ERROR("Failed to allocate memory for queue\n");
>> +        mutex_unlock(&uq_mgr->userq_mutex);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    queue->vm = vm;
>> +    queue->pasid = pasid;
>> +    queue->wptr_gpu_addr = mqd_in->wptr_va;
>> +    queue->rptr_gpu_addr = mqd_in->rptr_va;
>> +    queue->queue_size = mqd_in->queue_size;
>> +    queue->queue_type = mqd_in->ip_type;
>> +    queue->queue_gpu_addr = mqd_in->queue_va;
>> +    queue->flags = mqd_in->flags;
>> +    queue->use_doorbell = true;
> I think we can drop use_doorbell.  All user queues require a doorbell.

Noted,

- Shashank

>
> Alex
>
>> +    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
>> +    if (queue->queue_id < 0) {
>> +        DRM_ERROR("Failed to allocate a queue id\n");
>> +        r = queue->queue_id;
>> +        goto free_queue;
>> +    }
>> +
>> +    list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>> +    args->out.q_id = queue->queue_id;
>> +    args->out.flags = 0;
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    return 0;
>> +
>> +free_queue:
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    kfree(queue);
>> +    return r;
>> +}
>> +
>> +static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>> +{
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>> +    struct amdgpu_usermode_queue *queue;
>> +
>> +    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
>> +    if (!queue) {
>> +        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
>> +        return;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>> +    list_del(&queue->userq_node);
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    kfree(queue);
>> +}
>> +
>> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>> +                      struct drm_file *filp)
>> +{
>> +    union drm_amdgpu_userq *args = data;
>> +    int r = 0;
>> +
>> +    switch (args->in.op) {
>> +    case AMDGPU_USERQ_OP_CREATE:
>> +        r = amdgpu_userqueue_create(filp, args);
>> +        if (r)
>> +            DRM_ERROR("Failed to create usermode queue\n");
>> +        break;
>> +
>> +    case AMDGPU_USERQ_OP_FREE:
>> +        amdgpu_userqueue_destroy(filp, args->in.queue_id);
>> +        break;
>> +
>> +    default:
>> +        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>> +{
>> +    mutex_init(&userq_mgr->userq_mutex);
>> +    idr_init_base(&userq_mgr->userq_idr, 1);
>> +    INIT_LIST_HEAD(&userq_mgr->userq_list);
>> +    userq_mgr->adev = adev;
>> +
>> +    return 0;
>> +}
>> +
>> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>> +{
>> +    idr_destroy(&userq_mgr->userq_idr);
>> +    mutex_destroy(&userq_mgr->userq_mutex);
>> +}
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> new file mode 100644
>> index 000000000000..9557588fe34f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -0,0 +1,64 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef AMDGPU_USERQUEUE_H_
>> +#define AMDGPU_USERQUEUE_H_
>> +
>> +#define AMDGPU_MAX_USERQ 512
>> +
>> +struct amdgpu_userq_mgr {
>> +       struct idr userq_idr;
>> +       struct mutex userq_mutex;
>> +       struct list_head userq_list;
>> +       struct amdgpu_device *adev;
>> +};
>> +
>> +struct amdgpu_usermode_queue {
>> +       int             queue_id;
>> +       int             queue_type;
>> +       int             queue_size;
>> +       int             pasid;
>> +       int             doorbell_index;
>> +       int             use_doorbell;
>> +
>> +       uint64_t        wptr_gpu_addr;
>> +       uint64_t        rptr_gpu_addr;
>> +       uint64_t        queue_gpu_addr;
>> +       uint64_t        flags;
>> +
>> +       uint64_t        mqd_gpu_addr;
>> +       void            *mqd_cpu_ptr;
>> +
>> +       struct amdgpu_bo        *mqd_obj;
>> +       struct amdgpu_vm        *vm;
>> +       struct amdgpu_userq_mgr *userq_mgr;
>> +       struct list_head        userq_node;
>> +};
>> +
>> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
>> +
>> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>> +
>> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
>> +
>> +#endif
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-03 21:54 ` [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions Shashank Sharma
@ 2023-02-07 15:17   ` Alex Deucher
  2023-02-07 15:43     ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 15:17 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx

On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> MQD describes the properies of a user queue to the HW, and allows it to
> accurately configure the queue while mapping it in GPU HW. This patch
> adds:
> - A new header file which contains the userqueue MQD definition for
>   V11 graphics engine.
> - A new function which fills it with userqueue data and prepares MQD
> - A function which sets-up the MQD function ptrs in the generic userqueue
>   creation code.
>
> V1: Addressed review comments from RFC patch series
>     - Reuse the existing MQD structure instead of creating a new one
>     - MQD format and creation can be IP specific, keep it like that
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
>  .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
>  drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
>  4 files changed, 169 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 764801cc8203..6ae9d5792791 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>
>  # add usermode queue
>  amdgpu-y += amdgpu_userqueue.o
> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
>
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 625c2fe1e84a..9f3490a91776 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>      return r;
>  }
>
> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
> +
> +static int
> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +    int maj;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> +
> +    maj = IP_VERSION_MAJ(version);
> +    if (maj == 11) {
> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
> +    } else {
> +        DRM_WARN("This IP doesn't support usermode queues\n");
> +        return -EINVAL;
> +    }
> +

I think it would be cleaner to just store these callbacks in adev.
Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
in early_init for each IP, we can register the callbacks.  When the
user goes to create a new user_queue, we can check check to see if the
function pointer is NULL or not for the queue type:

if (!adev->user_queue_funcs[ip_type])
  return -EINVAL

r = adev->user_queue_funcs[ip_type]->create_queue();

Actually, there is already an mqd manager interface (adev->mqds[]).
Maybe you can leverage that interface.

> +    return 0;
> +}
> +
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>  {
> +    int r;
> +
>      mutex_init(&userq_mgr->userq_mutex);
>      idr_init_base(&userq_mgr->userq_idr, 1);
>      INIT_LIST_HEAD(&userq_mgr->userq_list);
>      userq_mgr->adev = adev;
>
> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
> +    if (r) {
> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
> +        return r;
> +    }
> +
>      return 0;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> new file mode 100644
> index 000000000000..57889729d635
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> @@ -0,0 +1,132 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include "amdgpu.h"
> +#include "amdgpu_userqueue.h"
> +#include "v11_structs.h"
> +#include "amdgpu_mes.h"
> +#include "gc/gc_11_0_0_offset.h"
> +#include "gc/gc_11_0_0_sh_mask.h"
> +
> +static int
> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    uint32_t tmp, rb_bufsz;
> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +
> +    /* set up gfx hqd wptr */
> +    mqd->cp_gfx_hqd_wptr = 0;
> +    mqd->cp_gfx_hqd_wptr_hi = 0;
> +
> +    /* set the pointer to the MQD */
> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
> +
> +    /* set up mqd control */
> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
> +    mqd->cp_gfx_mqd_control = tmp;
> +
> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
> +    mqd->cp_gfx_hqd_vmid = 0;
> +
> +    /* set up default queue priority level
> +    * 0x0 = low priority, 0x1 = high priority */
> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
> +    mqd->cp_gfx_hqd_queue_priority = tmp;
> +
> +    /* set up time quantum */
> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
> +    mqd->cp_gfx_hqd_quantum = tmp;
> +
> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
> +
> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
> +    wb_gpu_addr = queue->rptr_gpu_addr;
> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
> +    mqd->cp_gfx_hqd_rptr_addr_hi =
> +    upper_32_bits(wb_gpu_addr) & 0xffff;
> +
> +    /* set up rb_wptr_poll addr */
> +    wb_gpu_addr = queue->wptr_gpu_addr;
> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
> +
> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
> +#ifdef __BIG_ENDIAN
> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
> +#endif
> +    mqd->cp_gfx_hqd_cntl = tmp;
> +
> +    /* set up cp_doorbell_control */
> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
> +    if (queue->use_doorbell) {
> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> +                    DOORBELL_OFFSET, queue->doorbell_index);
> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> +                    DOORBELL_EN, 1);
> +    } else {
> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> +                    DOORBELL_EN, 0);
> +    }
> +    mqd->cp_rb_doorbell_control = tmp;
> +
> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
> +
> +    /* activate the queue */
> +    mqd->cp_gfx_hqd_active = 1;
> +

Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
directly or leverage adev->mqds[]?

Alex

> +    return 0;
> +}
> +
> +static void
> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +
> +}
> +
> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +    return sizeof(struct v11_gfx_mqd);
> +}
> +
> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> +};
> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
> index b8ff7456ae0b..f8008270f813 100644
> --- a/drivers/gpu/drm/amd/include/v11_structs.h
> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
> @@ -25,14 +25,14 @@
>  #define V11_STRUCTS_H_
>
>  struct v11_gfx_mqd {
> -       uint32_t reserved_0; // offset: 0  (0x0)
> -       uint32_t reserved_1; // offset: 1  (0x1)
> -       uint32_t reserved_2; // offset: 2  (0x2)
> -       uint32_t reserved_3; // offset: 3  (0x3)
> -       uint32_t reserved_4; // offset: 4  (0x4)
> -       uint32_t reserved_5; // offset: 5  (0x5)
> -       uint32_t reserved_6; // offset: 6  (0x6)
> -       uint32_t reserved_7; // offset: 7  (0x7)
> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
> +       uint32_t shadow_initialized; // offset: 6  (0x6)
> +       uint32_t ib_vmid; // offset: 7  (0x7)
>         uint32_t reserved_8; // offset: 8  (0x8)
>         uint32_t reserved_9; // offset: 9  (0x9)
>         uint32_t reserved_10; // offset: 10  (0xA)
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 15:17   ` Alex Deucher
@ 2023-02-07 15:43     ` Shashank Sharma
  2023-02-07 16:05       ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07 15:43 UTC (permalink / raw)
  To: Alex Deucher
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx


On 07/02/2023 16:17, Alex Deucher wrote:
> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> MQD describes the properies of a user queue to the HW, and allows it to
>> accurately configure the queue while mapping it in GPU HW. This patch
>> adds:
>> - A new header file which contains the userqueue MQD definition for
>>    V11 graphics engine.
>> - A new function which fills it with userqueue data and prepares MQD
>> - A function which sets-up the MQD function ptrs in the generic userqueue
>>    creation code.
>>
>> V1: Addressed review comments from RFC patch series
>>      - Reuse the existing MQD structure instead of creating a new one
>>      - MQD format and creation can be IP specific, keep it like that
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
>>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
>>   drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
>>   4 files changed, 169 insertions(+), 8 deletions(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 764801cc8203..6ae9d5792791 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>>
>>   # add usermode queue
>>   amdgpu-y += amdgpu_userqueue.o
>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
>>
>>   ifneq ($(CONFIG_HSA_AMD),)
>>   AMDKFD_PATH := ../amdkfd
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 625c2fe1e84a..9f3490a91776 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>>       return r;
>>   }
>>
>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
>> +
>> +static int
>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    int maj;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>> +
>> +    maj = IP_VERSION_MAJ(version);
>> +    if (maj == 11) {
>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
>> +    } else {
>> +        DRM_WARN("This IP doesn't support usermode queues\n");
>> +        return -EINVAL;
>> +    }
>> +
> I think it would be cleaner to just store these callbacks in adev.
> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
> in early_init for each IP, we can register the callbacks.  When the
> user goes to create a new user_queue, we can check check to see if the
> function pointer is NULL or not for the queue type:
>
> if (!adev->user_queue_funcs[ip_type])
>    return -EINVAL
>
> r = adev->user_queue_funcs[ip_type]->create_queue();

Sounds like a good idea, we can do this.

>
> Actually, there is already an mqd manager interface (adev->mqds[]).
> Maybe you can leverage that interface.

Yep, I saw that and initially even tried to work on that interface 
itself, and then realized that it doesn't allow us to pass some

additional parameters (like queue->vm, various BOs like proc_ctx_bo, 
gang_ctx_bo's and so on). All of these are required in the MQD

and we will need them to be added into MQD. I even thought of expanding 
this structure with additional parameters, but I felt like

it defeats the purpose of this MQD properties. But if you feel strongly 
about that, we can work around it.

>> +    return 0;
>> +}
>> +
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>>   {
>> +    int r;
>> +
>>       mutex_init(&userq_mgr->userq_mutex);
>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>       INIT_LIST_HEAD(&userq_mgr->userq_list);
>>       userq_mgr->adev = adev;
>>
>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
>> +        return r;
>> +    }
>> +
>>       return 0;
>>   }
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> new file mode 100644
>> index 000000000000..57889729d635
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>> @@ -0,0 +1,132 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +#include "amdgpu.h"
>> +#include "amdgpu_userqueue.h"
>> +#include "v11_structs.h"
>> +#include "amdgpu_mes.h"
>> +#include "gc/gc_11_0_0_offset.h"
>> +#include "gc/gc_11_0_0_sh_mask.h"
>> +
>> +static int
>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>> +{
>> +    uint32_t tmp, rb_bufsz;
>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +
>> +    /* set up gfx hqd wptr */
>> +    mqd->cp_gfx_hqd_wptr = 0;
>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
>> +
>> +    /* set the pointer to the MQD */
>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
>> +
>> +    /* set up mqd control */
>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
>> +    mqd->cp_gfx_mqd_control = tmp;
>> +
>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
>> +    mqd->cp_gfx_hqd_vmid = 0;
>> +
>> +    /* set up default queue priority level
>> +    * 0x0 = low priority, 0x1 = high priority */
>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
>> +
>> +    /* set up time quantum */
>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
>> +    mqd->cp_gfx_hqd_quantum = tmp;
>> +
>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
>> +
>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
>> +    wb_gpu_addr = queue->rptr_gpu_addr;
>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
>> +
>> +    /* set up rb_wptr_poll addr */
>> +    wb_gpu_addr = queue->wptr_gpu_addr;
>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
>> +
>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
>> +#ifdef __BIG_ENDIAN
>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
>> +#endif
>> +    mqd->cp_gfx_hqd_cntl = tmp;
>> +
>> +    /* set up cp_doorbell_control */
>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
>> +    if (queue->use_doorbell) {
>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>> +                    DOORBELL_OFFSET, queue->doorbell_index);
>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>> +                    DOORBELL_EN, 1);
>> +    } else {
>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>> +                    DOORBELL_EN, 0);
>> +    }
>> +    mqd->cp_rb_doorbell_control = tmp;
>> +
>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
>> +
>> +    /* activate the queue */
>> +    mqd->cp_gfx_hqd_active = 1;
>> +
> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
> directly or leverage adev->mqds[]?

Let us try this out and come back.

- Shashank


>
> Alex
>
>> +    return 0;
>> +}
>> +
>> +static void
>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>> +{
>> +
>> +}
>> +
>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    return sizeof(struct v11_gfx_mqd);
>> +}
>> +
>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>> +};
>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
>> index b8ff7456ae0b..f8008270f813 100644
>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
>> @@ -25,14 +25,14 @@
>>   #define V11_STRUCTS_H_
>>
>>   struct v11_gfx_mqd {
>> -       uint32_t reserved_0; // offset: 0  (0x0)
>> -       uint32_t reserved_1; // offset: 1  (0x1)
>> -       uint32_t reserved_2; // offset: 2  (0x2)
>> -       uint32_t reserved_3; // offset: 3  (0x3)
>> -       uint32_t reserved_4; // offset: 4  (0x4)
>> -       uint32_t reserved_5; // offset: 5  (0x5)
>> -       uint32_t reserved_6; // offset: 6  (0x6)
>> -       uint32_t reserved_7; // offset: 7  (0x7)
>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
>> +       uint32_t ib_vmid; // offset: 7  (0x7)
>>          uint32_t reserved_8; // offset: 8  (0x8)
>>          uint32_t reserved_9; // offset: 9  (0x9)
>>          uint32_t reserved_10; // offset: 10  (0xA)
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 15:43     ` Shashank Sharma
@ 2023-02-07 16:05       ` Alex Deucher
  2023-02-07 16:37         ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 16:05 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx

On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 07/02/2023 16:17, Alex Deucher wrote:
> > On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >> From: Shashank Sharma <contactshashanksharma@gmail.com>
> >>
> >> MQD describes the properies of a user queue to the HW, and allows it to
> >> accurately configure the queue while mapping it in GPU HW. This patch
> >> adds:
> >> - A new header file which contains the userqueue MQD definition for
> >>    V11 graphics engine.
> >> - A new function which fills it with userqueue data and prepares MQD
> >> - A function which sets-up the MQD function ptrs in the generic userqueue
> >>    creation code.
> >>
> >> V1: Addressed review comments from RFC patch series
> >>      - Reuse the existing MQD structure instead of creating a new one
> >>      - MQD format and creation can be IP specific, keep it like that
> >>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: Christian Koenig <christian.koenig@amd.com>
> >> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
> >>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
> >>   drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
> >>   4 files changed, 169 insertions(+), 8 deletions(-)
> >>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> >> index 764801cc8203..6ae9d5792791 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> >> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> >> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
> >>
> >>   # add usermode queue
> >>   amdgpu-y += amdgpu_userqueue.o
> >> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
> >>
> >>   ifneq ($(CONFIG_HSA_AMD),)
> >>   AMDKFD_PATH := ../amdkfd
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> index 625c2fe1e84a..9f3490a91776 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> >>       return r;
> >>   }
> >>
> >> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
> >> +
> >> +static int
> >> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
> >> +{
> >> +    int maj;
> >> +    struct amdgpu_device *adev = uq_mgr->adev;
> >> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> >> +
> >> +    maj = IP_VERSION_MAJ(version);
> >> +    if (maj == 11) {
> >> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
> >> +    } else {
> >> +        DRM_WARN("This IP doesn't support usermode queues\n");
> >> +        return -EINVAL;
> >> +    }
> >> +
> > I think it would be cleaner to just store these callbacks in adev.
> > Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
> > in early_init for each IP, we can register the callbacks.  When the
> > user goes to create a new user_queue, we can check check to see if the
> > function pointer is NULL or not for the queue type:
> >
> > if (!adev->user_queue_funcs[ip_type])
> >    return -EINVAL
> >
> > r = adev->user_queue_funcs[ip_type]->create_queue();
>
> Sounds like a good idea, we can do this.
>
> >
> > Actually, there is already an mqd manager interface (adev->mqds[]).
> > Maybe you can leverage that interface.
>
> Yep, I saw that and initially even tried to work on that interface
> itself, and then realized that it doesn't allow us to pass some
>
> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
> gang_ctx_bo's and so on). All of these are required in the MQD
>
> and we will need them to be added into MQD. I even thought of expanding
> this structure with additional parameters, but I felt like
>
> it defeats the purpose of this MQD properties. But if you feel strongly
> about that, we can work around it.

I think it would be cleaner to just add whatever additional mqd
properties you need to amdgpu_mqd_prop, and then you can share
gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
configuration, we only have to change one function.

Alex

>
> >> +    return 0;
> >> +}
> >> +
> >>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> >>   {
> >> +    int r;
> >> +
> >>       mutex_init(&userq_mgr->userq_mutex);
> >>       idr_init_base(&userq_mgr->userq_idr, 1);
> >>       INIT_LIST_HEAD(&userq_mgr->userq_list);
> >>       userq_mgr->adev = adev;
> >>
> >> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
> >> +    if (r) {
> >> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
> >> +        return r;
> >> +    }
> >> +
> >>       return 0;
> >>   }
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >> new file mode 100644
> >> index 000000000000..57889729d635
> >> --- /dev/null
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >> @@ -0,0 +1,132 @@
> >> +/*
> >> + * Copyright 2022 Advanced Micro Devices, Inc.
> >> + *
> >> + * Permission is hereby granted, free of charge, to any person obtaining a
> >> + * copy of this software and associated documentation files (the "Software"),
> >> + * to deal in the Software without restriction, including without limitation
> >> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> >> + * and/or sell copies of the Software, and to permit persons to whom the
> >> + * Software is furnished to do so, subject to the following conditions:
> >> + *
> >> + * The above copyright notice and this permission notice shall be included in
> >> + * all copies or substantial portions of the Software.
> >> + *
> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> >> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> >> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> >> + * OTHER DEALINGS IN THE SOFTWARE.
> >> + *
> >> + */
> >> +#include "amdgpu.h"
> >> +#include "amdgpu_userqueue.h"
> >> +#include "v11_structs.h"
> >> +#include "amdgpu_mes.h"
> >> +#include "gc/gc_11_0_0_offset.h"
> >> +#include "gc/gc_11_0_0_sh_mask.h"
> >> +
> >> +static int
> >> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >> +{
> >> +    uint32_t tmp, rb_bufsz;
> >> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
> >> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
> >> +    struct amdgpu_device *adev = uq_mgr->adev;
> >> +
> >> +    /* set up gfx hqd wptr */
> >> +    mqd->cp_gfx_hqd_wptr = 0;
> >> +    mqd->cp_gfx_hqd_wptr_hi = 0;
> >> +
> >> +    /* set the pointer to the MQD */
> >> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
> >> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
> >> +
> >> +    /* set up mqd control */
> >> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
> >> +    mqd->cp_gfx_mqd_control = tmp;
> >> +
> >> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
> >> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
> >> +    mqd->cp_gfx_hqd_vmid = 0;
> >> +
> >> +    /* set up default queue priority level
> >> +    * 0x0 = low priority, 0x1 = high priority */
> >> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
> >> +    mqd->cp_gfx_hqd_queue_priority = tmp;
> >> +
> >> +    /* set up time quantum */
> >> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
> >> +    mqd->cp_gfx_hqd_quantum = tmp;
> >> +
> >> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
> >> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
> >> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
> >> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
> >> +
> >> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
> >> +    wb_gpu_addr = queue->rptr_gpu_addr;
> >> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
> >> +    mqd->cp_gfx_hqd_rptr_addr_hi =
> >> +    upper_32_bits(wb_gpu_addr) & 0xffff;
> >> +
> >> +    /* set up rb_wptr_poll addr */
> >> +    wb_gpu_addr = queue->wptr_gpu_addr;
> >> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
> >> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
> >> +
> >> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
> >> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
> >> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
> >> +#ifdef __BIG_ENDIAN
> >> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
> >> +#endif
> >> +    mqd->cp_gfx_hqd_cntl = tmp;
> >> +
> >> +    /* set up cp_doorbell_control */
> >> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
> >> +    if (queue->use_doorbell) {
> >> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >> +                    DOORBELL_OFFSET, queue->doorbell_index);
> >> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >> +                    DOORBELL_EN, 1);
> >> +    } else {
> >> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >> +                    DOORBELL_EN, 0);
> >> +    }
> >> +    mqd->cp_rb_doorbell_control = tmp;
> >> +
> >> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
> >> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
> >> +
> >> +    /* activate the queue */
> >> +    mqd->cp_gfx_hqd_active = 1;
> >> +
> > Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
> > directly or leverage adev->mqds[]?
>
> Let us try this out and come back.
>
> - Shashank
>
>
> >
> > Alex
> >
> >> +    return 0;
> >> +}
> >> +
> >> +static void
> >> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >> +{
> >> +
> >> +}
> >> +
> >> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
> >> +{
> >> +    return sizeof(struct v11_gfx_mqd);
> >> +}
> >> +
> >> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
> >> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
> >> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> >> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> >> +};
> >> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
> >> index b8ff7456ae0b..f8008270f813 100644
> >> --- a/drivers/gpu/drm/amd/include/v11_structs.h
> >> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
> >> @@ -25,14 +25,14 @@
> >>   #define V11_STRUCTS_H_
> >>
> >>   struct v11_gfx_mqd {
> >> -       uint32_t reserved_0; // offset: 0  (0x0)
> >> -       uint32_t reserved_1; // offset: 1  (0x1)
> >> -       uint32_t reserved_2; // offset: 2  (0x2)
> >> -       uint32_t reserved_3; // offset: 3  (0x3)
> >> -       uint32_t reserved_4; // offset: 4  (0x4)
> >> -       uint32_t reserved_5; // offset: 5  (0x5)
> >> -       uint32_t reserved_6; // offset: 6  (0x6)
> >> -       uint32_t reserved_7; // offset: 7  (0x7)
> >> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
> >> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
> >> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
> >> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
> >> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
> >> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
> >> +       uint32_t shadow_initialized; // offset: 6  (0x6)
> >> +       uint32_t ib_vmid; // offset: 7  (0x7)
> >>          uint32_t reserved_8; // offset: 8  (0x8)
> >>          uint32_t reserved_9; // offset: 9  (0x9)
> >>          uint32_t reserved_10; // offset: 10  (0xA)
> >> --
> >> 2.34.1
> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 16:05       ` Alex Deucher
@ 2023-02-07 16:37         ` Shashank Sharma
  2023-02-07 16:54           ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07 16:37 UTC (permalink / raw)
  To: Alex Deucher
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx


On 07/02/2023 17:05, Alex Deucher wrote:
> On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 07/02/2023 16:17, Alex Deucher wrote:
>>> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>>
>>>> MQD describes the properies of a user queue to the HW, and allows it to
>>>> accurately configure the queue while mapping it in GPU HW. This patch
>>>> adds:
>>>> - A new header file which contains the userqueue MQD definition for
>>>>     V11 graphics engine.
>>>> - A new function which fills it with userqueue data and prepares MQD
>>>> - A function which sets-up the MQD function ptrs in the generic userqueue
>>>>     creation code.
>>>>
>>>> V1: Addressed review comments from RFC patch series
>>>>       - Reuse the existing MQD structure instead of creating a new one
>>>>       - MQD format and creation can be IP specific, keep it like that
>>>>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
>>>>    .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
>>>>    drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
>>>>    4 files changed, 169 insertions(+), 8 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>> index 764801cc8203..6ae9d5792791 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>>>>
>>>>    # add usermode queue
>>>>    amdgpu-y += amdgpu_userqueue.o
>>>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
>>>>
>>>>    ifneq ($(CONFIG_HSA_AMD),)
>>>>    AMDKFD_PATH := ../amdkfd
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> index 625c2fe1e84a..9f3490a91776 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>>>>        return r;
>>>>    }
>>>>
>>>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
>>>> +
>>>> +static int
>>>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
>>>> +{
>>>> +    int maj;
>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>>> +
>>>> +    maj = IP_VERSION_MAJ(version);
>>>> +    if (maj == 11) {
>>>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
>>>> +    } else {
>>>> +        DRM_WARN("This IP doesn't support usermode queues\n");
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>> I think it would be cleaner to just store these callbacks in adev.
>>> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
>>> in early_init for each IP, we can register the callbacks.  When the
>>> user goes to create a new user_queue, we can check check to see if the
>>> function pointer is NULL or not for the queue type:
>>>
>>> if (!adev->user_queue_funcs[ip_type])
>>>     return -EINVAL
>>>
>>> r = adev->user_queue_funcs[ip_type]->create_queue();
>> Sounds like a good idea, we can do this.
>>
>>> Actually, there is already an mqd manager interface (adev->mqds[]).
>>> Maybe you can leverage that interface.
>> Yep, I saw that and initially even tried to work on that interface
>> itself, and then realized that it doesn't allow us to pass some
>>
>> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
>> gang_ctx_bo's and so on). All of these are required in the MQD
>>
>> and we will need them to be added into MQD. I even thought of expanding
>> this structure with additional parameters, but I felt like
>>
>> it defeats the purpose of this MQD properties. But if you feel strongly
>> about that, we can work around it.
> I think it would be cleaner to just add whatever additional mqd
> properties you need to amdgpu_mqd_prop, and then you can share
> gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
> sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
> configuration, we only have to change one function.
>
> Alex

Noted,

We might have to add some additional fptrs for .prepare_map() and 
.prepare_unmap(). in the mqd funcs.

These are the required to prepare data for MES HW queue mapping.

- Shashank

>
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>>>>    {
>>>> +    int r;
>>>> +
>>>>        mutex_init(&userq_mgr->userq_mutex);
>>>>        idr_init_base(&userq_mgr->userq_idr, 1);
>>>>        INIT_LIST_HEAD(&userq_mgr->userq_list);
>>>>        userq_mgr->adev = adev;
>>>>
>>>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
>>>> +        return r;
>>>> +    }
>>>> +
>>>>        return 0;
>>>>    }
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>> new file mode 100644
>>>> index 000000000000..57889729d635
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>> @@ -0,0 +1,132 @@
>>>> +/*
>>>> + * Copyright 2022 Advanced Micro Devices, Inc.
>>>> + *
>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>> + * copy of this software and associated documentation files (the "Software"),
>>>> + * to deal in the Software without restriction, including without limitation
>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>> + * Software is furnished to do so, subject to the following conditions:
>>>> + *
>>>> + * The above copyright notice and this permission notice shall be included in
>>>> + * all copies or substantial portions of the Software.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>>> + *
>>>> + */
>>>> +#include "amdgpu.h"
>>>> +#include "amdgpu_userqueue.h"
>>>> +#include "v11_structs.h"
>>>> +#include "amdgpu_mes.h"
>>>> +#include "gc/gc_11_0_0_offset.h"
>>>> +#include "gc/gc_11_0_0_sh_mask.h"
>>>> +
>>>> +static int
>>>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>>> +{
>>>> +    uint32_t tmp, rb_bufsz;
>>>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
>>>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>> +
>>>> +    /* set up gfx hqd wptr */
>>>> +    mqd->cp_gfx_hqd_wptr = 0;
>>>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
>>>> +
>>>> +    /* set the pointer to the MQD */
>>>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
>>>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
>>>> +
>>>> +    /* set up mqd control */
>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
>>>> +    mqd->cp_gfx_mqd_control = tmp;
>>>> +
>>>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
>>>> +    mqd->cp_gfx_hqd_vmid = 0;
>>>> +
>>>> +    /* set up default queue priority level
>>>> +    * 0x0 = low priority, 0x1 = high priority */
>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
>>>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
>>>> +
>>>> +    /* set up time quantum */
>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
>>>> +    mqd->cp_gfx_hqd_quantum = tmp;
>>>> +
>>>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
>>>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
>>>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
>>>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
>>>> +
>>>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
>>>> +    wb_gpu_addr = queue->rptr_gpu_addr;
>>>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
>>>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
>>>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
>>>> +
>>>> +    /* set up rb_wptr_poll addr */
>>>> +    wb_gpu_addr = queue->wptr_gpu_addr;
>>>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
>>>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
>>>> +
>>>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
>>>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
>>>> +#ifdef __BIG_ENDIAN
>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
>>>> +#endif
>>>> +    mqd->cp_gfx_hqd_cntl = tmp;
>>>> +
>>>> +    /* set up cp_doorbell_control */
>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
>>>> +    if (queue->use_doorbell) {
>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>> +                    DOORBELL_OFFSET, queue->doorbell_index);
>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>> +                    DOORBELL_EN, 1);
>>>> +    } else {
>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>> +                    DOORBELL_EN, 0);
>>>> +    }
>>>> +    mqd->cp_rb_doorbell_control = tmp;
>>>> +
>>>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
>>>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
>>>> +
>>>> +    /* activate the queue */
>>>> +    mqd->cp_gfx_hqd_active = 1;
>>>> +
>>> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
>>> directly or leverage adev->mqds[]?
>> Let us try this out and come back.
>>
>> - Shashank
>>
>>
>>> Alex
>>>
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void
>>>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>>> +{
>>>> +
>>>> +}
>>>> +
>>>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>>>> +{
>>>> +    return sizeof(struct v11_gfx_mqd);
>>>> +}
>>>> +
>>>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>>>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>>>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>>>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>>>> +};
>>>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
>>>> index b8ff7456ae0b..f8008270f813 100644
>>>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
>>>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
>>>> @@ -25,14 +25,14 @@
>>>>    #define V11_STRUCTS_H_
>>>>
>>>>    struct v11_gfx_mqd {
>>>> -       uint32_t reserved_0; // offset: 0  (0x0)
>>>> -       uint32_t reserved_1; // offset: 1  (0x1)
>>>> -       uint32_t reserved_2; // offset: 2  (0x2)
>>>> -       uint32_t reserved_3; // offset: 3  (0x3)
>>>> -       uint32_t reserved_4; // offset: 4  (0x4)
>>>> -       uint32_t reserved_5; // offset: 5  (0x5)
>>>> -       uint32_t reserved_6; // offset: 6  (0x6)
>>>> -       uint32_t reserved_7; // offset: 7  (0x7)
>>>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
>>>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
>>>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
>>>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
>>>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
>>>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
>>>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
>>>> +       uint32_t ib_vmid; // offset: 7  (0x7)
>>>>           uint32_t reserved_8; // offset: 8  (0x8)
>>>>           uint32_t reserved_9; // offset: 9  (0x9)
>>>>           uint32_t reserved_10; // offset: 10  (0xA)
>>>> --
>>>> 2.34.1
>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-03 21:54 ` [PATCH 5/8] drm/amdgpu: Create context for usermode queue Shashank Sharma
  2023-02-07  7:14   ` Christian König
@ 2023-02-07 16:51   ` Alex Deucher
  2023-02-07 16:53     ` Alex Deucher
  1 sibling, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 16:51 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: alexander.deucher, christian.koenig, amd-gfx

On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> The FW expects us to allocate atleast one page as context space to
> process gang, process, shadow, GDS and FW_space related work. This
> patch creates some object for the same, and adds an IP specific
> functions to do this.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
>  .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 ++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
>  3 files changed, 171 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 9f3490a91776..18281b3a51f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
>      return idr_find(&uq_mgr->userq_idr, qid);
>  }
>
> +static void
> +amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                   struct amdgpu_usermode_queue *queue)
> +{
> +    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
> +}
> +
> +static int
> +amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                  struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +
> +    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create context space for queue\n");
> +        return r;
> +    }
> +
> +    return 0;
> +}
> +
>  static int
>  amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>  {
> @@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>          goto free_qid;
>      }
>
> +    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create context space\n");
> +        goto free_mqd;
> +    }
> +
>      list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
>      args->out.q_id = queue->queue_id;
>      args->out.flags = 0;
>      mutex_unlock(&uq_mgr->userq_mutex);
>      return 0;
>
> +free_mqd:
> +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
> +
>  free_qid:
>      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>
> @@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>      }
>
>      mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
>      amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>      list_del(&queue->userq_node);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> index 57889729d635..687f90a587e3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> @@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>
>  }
>
> +static int amdgpu_userq_gfx_v11_ctx_create(struct amdgpu_userq_mgr *uq_mgr,
> +                                           struct amdgpu_usermode_queue *queue)
> +{
> +    int r;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
> +
> +    /*
> +     * The FW expects atleast one page space allocated for
> +     * process context related work, and one for gang context.
> +     */
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &pctx->obj,
> +                                &pctx->gpu_addr,
> +                                &pctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", r);
> +        return r;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &gctx->obj,
> +                                &gctx->gpu_addr,
> +                                &gctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", r);
> +        goto err_gangctx;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &gdsctx->obj,
> +                                &gdsctx->gpu_addr,
> +                                &gdsctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
> +        goto err_gdsctx;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &fwctx->obj,
> +                                &fwctx->gpu_addr,
> +                                &fwctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
> +        goto err_fwctx;
> +    }
> +
> +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_VRAM,
> +                                &sctx->obj,
> +                                &sctx->gpu_addr,
> +                                &sctx->cpu_ptr);


Unless there is a specific need for VRAM, we should probably put these in GTT.

Alex

> +    if (r) {
> +        DRM_ERROR("Failed to allocate shadow bo for userqueue (%d)", r);
> +        goto err_sctx;
> +    }
> +
> +    return 0;
> +
> +err_sctx:
> +    amdgpu_bo_free_kernel(&fwctx->obj,
> +                          &fwctx->gpu_addr,
> +                          &fwctx->cpu_ptr);
> +
> +err_fwctx:
> +    amdgpu_bo_free_kernel(&gdsctx->obj,
> +                          &gdsctx->gpu_addr,
> +                          &gdsctx->cpu_ptr);
> +
> +err_gdsctx:
> +    amdgpu_bo_free_kernel(&gctx->obj,
> +                          &gctx->gpu_addr,
> +                          &gctx->cpu_ptr);
> +
> +err_gangctx:
> +    amdgpu_bo_free_kernel(&pctx->obj,
> +                          &pctx->gpu_addr,
> +                          &pctx->cpu_ptr);
> +    return r;
> +}
> +
> +static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
> +                                            struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
> +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
> +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
> +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
> +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
> +
> +    amdgpu_bo_free_kernel(&sctx->obj,
> +                          &sctx->gpu_addr,
> +                          &sctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&fwctx->obj,
> +                          &fwctx->gpu_addr,
> +                          &fwctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&gdsctx->obj,
> +                          &gdsctx->gpu_addr,
> +                          &gdsctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&gctx->obj,
> +                          &gctx->gpu_addr,
> +                          &gctx->cpu_ptr);
> +
> +    amdgpu_bo_free_kernel(&pctx->obj,
> +                          &pctx->gpu_addr,
> +                          &pctx->cpu_ptr);
> +}
> +
>  static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>  {
>      return sizeof(struct v11_gfx_mqd);
> @@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>      .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>      .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>      .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> +    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
> +    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
>  };
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index a6abdfd5cb74..3adcd31618f7 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -25,9 +25,19 @@
>  #define AMDGPU_USERQUEUE_H_
>
>  #define AMDGPU_MAX_USERQ 512
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>
>  struct amdgpu_userq_mqd_funcs;
>
> +struct amdgpu_userq_ctx {
> +       struct amdgpu_bo *obj;
> +       uint64_t gpu_addr;
> +       void    *cpu_ptr;
> +};
> +
>  struct amdgpu_userq_mgr {
>         struct idr userq_idr;
>         struct mutex userq_mutex;
> @@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
>         uint64_t        mqd_gpu_addr;
>         void            *mqd_cpu_ptr;
>
> +       struct amdgpu_userq_ctx proc_ctx;
> +       struct amdgpu_userq_ctx gang_ctx;
> +       struct amdgpu_userq_ctx gds_ctx;
> +       struct amdgpu_userq_ctx fw_ctx;
> +       struct amdgpu_userq_ctx shadow_ctx;
> +
>         struct amdgpu_bo        *mqd_obj;
>         struct amdgpu_vm        *vm;
>         struct amdgpu_userq_mgr *userq_mgr;
> @@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
>         int (*mqd_size)(struct amdgpu_userq_mgr *);
>         int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>         void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +       int (*ctx_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +       void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>  };
>
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/8] drm/amdgpu: Create context for usermode queue
  2023-02-07 16:51   ` Alex Deucher
@ 2023-02-07 16:53     ` Alex Deucher
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 16:53 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: alexander.deucher, christian.koenig, amd-gfx

On Tue, Feb 7, 2023 at 11:51 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Fri, Feb 3, 2023 at 4:54 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >
> > The FW expects us to allocate atleast one page as context space to
> > process gang, process, shadow, GDS and FW_space related work. This
> > patch creates some object for the same, and adds an IP specific
> > functions to do this.
> >
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  32 +++++
> >  .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 121 ++++++++++++++++++
> >  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  18 +++
> >  3 files changed, 171 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> > index 9f3490a91776..18281b3a51f1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> > @@ -42,6 +42,28 @@ static struct amdgpu_usermode_queue
> >      return idr_find(&uq_mgr->userq_idr, qid);
> >  }
> >
> > +static void
> > +amdgpu_userqueue_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> > +                                   struct amdgpu_usermode_queue *queue)
> > +{
> > +    uq_mgr->userq_mqd_funcs->ctx_destroy(uq_mgr, queue);
> > +}
> > +
> > +static int
> > +amdgpu_userqueue_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> > +                                  struct amdgpu_usermode_queue *queue)
> > +{
> > +    int r;
> > +
> > +    r = uq_mgr->userq_mqd_funcs->ctx_create(uq_mgr, queue);
> > +    if (r) {
> > +        DRM_ERROR("Failed to create context space for queue\n");
> > +        return r;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static int
> >  amdgpu_userqueue_create_mqd(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >  {
> > @@ -142,12 +164,21 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
> >          goto free_qid;
> >      }
> >
> > +    r = amdgpu_userqueue_create_ctx_space(uq_mgr, queue);
> > +    if (r) {
> > +        DRM_ERROR("Failed to create context space\n");
> > +        goto free_mqd;
> > +    }
> > +
> >      list_add_tail(&queue->userq_node, &uq_mgr->userq_list);
> >      args->out.q_id = queue->queue_id;
> >      args->out.flags = 0;
> >      mutex_unlock(&uq_mgr->userq_mutex);
> >      return 0;
> >
> > +free_mqd:
> > +    amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
> > +
> >  free_qid:
> >      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> >
> > @@ -170,6 +201,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
> >      }
> >
> >      mutex_lock(&uq_mgr->userq_mutex);
> > +    amdgpu_userqueue_destroy_ctx_space(uq_mgr, queue);
> >      amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
> >      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> >      list_del(&queue->userq_node);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> > index 57889729d635..687f90a587e3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> > @@ -120,6 +120,125 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
> >
> >  }
> >
> > +static int amdgpu_userq_gfx_v11_ctx_create(struct amdgpu_userq_mgr *uq_mgr,
> > +                                           struct amdgpu_usermode_queue *queue)
> > +{
> > +    int r;
> > +    struct amdgpu_device *adev = uq_mgr->adev;
> > +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
> > +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
> > +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
> > +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
> > +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
> > +
> > +    /*
> > +     * The FW expects atleast one page space allocated for
> > +     * process context related work, and one for gang context.
> > +     */
> > +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_PROC_CTX_SZ, PAGE_SIZE,
> > +                                AMDGPU_GEM_DOMAIN_VRAM,
> > +                                &pctx->obj,
> > +                                &pctx->gpu_addr,
> > +                                &pctx->cpu_ptr);
> > +    if (r) {
> > +        DRM_ERROR("Failed to allocate proc bo for userqueue (%d)", r);
> > +        return r;
> > +    }
> > +
> > +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GANG_CTX_SZ, PAGE_SIZE,
> > +                                AMDGPU_GEM_DOMAIN_VRAM,
> > +                                &gctx->obj,
> > +                                &gctx->gpu_addr,
> > +                                &gctx->cpu_ptr);
> > +    if (r) {
> > +        DRM_ERROR("Failed to allocate gang bo for userqueue (%d)", r);
> > +        goto err_gangctx;
> > +    }
> > +
> > +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_GDS_CTX_SZ, PAGE_SIZE,
> > +                                AMDGPU_GEM_DOMAIN_VRAM,
> > +                                &gdsctx->obj,
> > +                                &gdsctx->gpu_addr,
> > +                                &gdsctx->cpu_ptr);
> > +    if (r) {
> > +        DRM_ERROR("Failed to allocate GDS bo for userqueue (%d)", r);
> > +        goto err_gdsctx;
> > +    }
> > +
> > +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
> > +                                AMDGPU_GEM_DOMAIN_VRAM,
> > +                                &fwctx->obj,
> > +                                &fwctx->gpu_addr,
> > +                                &fwctx->cpu_ptr);
> > +    if (r) {
> > +        DRM_ERROR("Failed to allocate FW bo for userqueue (%d)", r);
> > +        goto err_fwctx;
> > +    }
> > +
> > +    r = amdgpu_bo_create_kernel(adev, AMDGPU_USERQ_FW_CTX_SZ, PAGE_SIZE,
> > +                                AMDGPU_GEM_DOMAIN_VRAM,
> > +                                &sctx->obj,
> > +                                &sctx->gpu_addr,
> > +                                &sctx->cpu_ptr);
>
>
> Unless there is a specific need for VRAM, we should probably put these in GTT.
>
> Alex
>
> > +    if (r) {
> > +        DRM_ERROR("Failed to allocate shadow bo for userqueue (%d)", r);
> > +        goto err_sctx;
> > +    }
> > +
> > +    return 0;
> > +
> > +err_sctx:
> > +    amdgpu_bo_free_kernel(&fwctx->obj,
> > +                          &fwctx->gpu_addr,
> > +                          &fwctx->cpu_ptr);
> > +
> > +err_fwctx:
> > +    amdgpu_bo_free_kernel(&gdsctx->obj,
> > +                          &gdsctx->gpu_addr,
> > +                          &gdsctx->cpu_ptr);
> > +
> > +err_gdsctx:
> > +    amdgpu_bo_free_kernel(&gctx->obj,
> > +                          &gctx->gpu_addr,
> > +                          &gctx->cpu_ptr);
> > +
> > +err_gangctx:
> > +    amdgpu_bo_free_kernel(&pctx->obj,
> > +                          &pctx->gpu_addr,
> > +                          &pctx->cpu_ptr);
> > +    return r;
> > +}
> > +
> > +static void amdgpu_userq_gfx_v11_ctx_destroy(struct amdgpu_userq_mgr *uq_mgr,
> > +                                            struct amdgpu_usermode_queue *queue)
> > +{
> > +    struct amdgpu_userq_ctx *pctx = &queue->proc_ctx;
> > +    struct amdgpu_userq_ctx *gctx = &queue->gang_ctx;
> > +    struct amdgpu_userq_ctx *gdsctx = &queue->gds_ctx;
> > +    struct amdgpu_userq_ctx *fwctx = &queue->fw_ctx;
> > +    struct amdgpu_userq_ctx *sctx = &queue->shadow_ctx;
> > +
> > +    amdgpu_bo_free_kernel(&sctx->obj,
> > +                          &sctx->gpu_addr,
> > +                          &sctx->cpu_ptr);
> > +
> > +    amdgpu_bo_free_kernel(&fwctx->obj,
> > +                          &fwctx->gpu_addr,
> > +                          &fwctx->cpu_ptr);
> > +
> > +    amdgpu_bo_free_kernel(&gdsctx->obj,
> > +                          &gdsctx->gpu_addr,
> > +                          &gdsctx->cpu_ptr);
> > +
> > +    amdgpu_bo_free_kernel(&gctx->obj,
> > +                          &gctx->gpu_addr,
> > +                          &gctx->cpu_ptr);
> > +
> > +    amdgpu_bo_free_kernel(&pctx->obj,
> > +                          &pctx->gpu_addr,
> > +                          &pctx->cpu_ptr);
> > +}
> > +
> >  static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
> >  {
> >      return sizeof(struct v11_gfx_mqd);
> > @@ -129,4 +248,6 @@ const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
> >      .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
> >      .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> >      .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> > +    .ctx_create = amdgpu_userq_gfx_v11_ctx_create,
> > +    .ctx_destroy = amdgpu_userq_gfx_v11_ctx_destroy,
> >  };
> > diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> > index a6abdfd5cb74..3adcd31618f7 100644
> > --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> > +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> > @@ -25,9 +25,19 @@
> >  #define AMDGPU_USERQUEUE_H_
> >
> >  #define AMDGPU_MAX_USERQ 512
> > +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> > +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> > +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
> > +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
> >
> >  struct amdgpu_userq_mqd_funcs;
> >
> > +struct amdgpu_userq_ctx {
> > +       struct amdgpu_bo *obj;
> > +       uint64_t gpu_addr;
> > +       void    *cpu_ptr;
> > +};
> > +
> >  struct amdgpu_userq_mgr {
> >         struct idr userq_idr;
> >         struct mutex userq_mutex;
> > @@ -52,6 +62,12 @@ struct amdgpu_usermode_queue {
> >         uint64_t        mqd_gpu_addr;
> >         void            *mqd_cpu_ptr;
> >
> > +       struct amdgpu_userq_ctx proc_ctx;
> > +       struct amdgpu_userq_ctx gang_ctx;
> > +       struct amdgpu_userq_ctx gds_ctx;
> > +       struct amdgpu_userq_ctx fw_ctx;
> > +       struct amdgpu_userq_ctx shadow_ctx;

These should be an implementation detail for the specific IP.  There
is no need to have these at the userq level.  Different engines may
have different requirements.

Alex

> > +
> >         struct amdgpu_bo        *mqd_obj;
> >         struct amdgpu_vm        *vm;
> >         struct amdgpu_userq_mgr *userq_mgr;
> > @@ -64,6 +80,8 @@ struct amdgpu_userq_mqd_funcs {
> >         int (*mqd_size)(struct amdgpu_userq_mgr *);
> >         int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> >         void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> > +       int (*ctx_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> > +       void (*ctx_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> >  };
> >
> >  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 16:37         ` Shashank Sharma
@ 2023-02-07 16:54           ` Alex Deucher
  2023-02-07 17:13             ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 16:54 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx

On Tue, Feb 7, 2023 at 11:37 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 07/02/2023 17:05, Alex Deucher wrote:
> > On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>
> >> On 07/02/2023 16:17, Alex Deucher wrote:
> >>> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
> >>>>
> >>>> MQD describes the properies of a user queue to the HW, and allows it to
> >>>> accurately configure the queue while mapping it in GPU HW. This patch
> >>>> adds:
> >>>> - A new header file which contains the userqueue MQD definition for
> >>>>     V11 graphics engine.
> >>>> - A new function which fills it with userqueue data and prepares MQD
> >>>> - A function which sets-up the MQD function ptrs in the generic userqueue
> >>>>     creation code.
> >>>>
> >>>> V1: Addressed review comments from RFC patch series
> >>>>       - Reuse the existing MQD structure instead of creating a new one
> >>>>       - MQD format and creation can be IP specific, keep it like that
> >>>>
> >>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>> ---
> >>>>    drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
> >>>>    .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
> >>>>    drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
> >>>>    4 files changed, 169 insertions(+), 8 deletions(-)
> >>>>    create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>> index 764801cc8203..6ae9d5792791 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
> >>>>
> >>>>    # add usermode queue
> >>>>    amdgpu-y += amdgpu_userqueue.o
> >>>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
> >>>>
> >>>>    ifneq ($(CONFIG_HSA_AMD),)
> >>>>    AMDKFD_PATH := ../amdkfd
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>> index 625c2fe1e84a..9f3490a91776 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> >>>>        return r;
> >>>>    }
> >>>>
> >>>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
> >>>> +
> >>>> +static int
> >>>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
> >>>> +{
> >>>> +    int maj;
> >>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> >>>> +
> >>>> +    maj = IP_VERSION_MAJ(version);
> >>>> +    if (maj == 11) {
> >>>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
> >>>> +    } else {
> >>>> +        DRM_WARN("This IP doesn't support usermode queues\n");
> >>>> +        return -EINVAL;
> >>>> +    }
> >>>> +
> >>> I think it would be cleaner to just store these callbacks in adev.
> >>> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
> >>> in early_init for each IP, we can register the callbacks.  When the
> >>> user goes to create a new user_queue, we can check check to see if the
> >>> function pointer is NULL or not for the queue type:
> >>>
> >>> if (!adev->user_queue_funcs[ip_type])
> >>>     return -EINVAL
> >>>
> >>> r = adev->user_queue_funcs[ip_type]->create_queue();
> >> Sounds like a good idea, we can do this.
> >>
> >>> Actually, there is already an mqd manager interface (adev->mqds[]).
> >>> Maybe you can leverage that interface.
> >> Yep, I saw that and initially even tried to work on that interface
> >> itself, and then realized that it doesn't allow us to pass some
> >>
> >> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
> >> gang_ctx_bo's and so on). All of these are required in the MQD
> >>
> >> and we will need them to be added into MQD. I even thought of expanding
> >> this structure with additional parameters, but I felt like
> >>
> >> it defeats the purpose of this MQD properties. But if you feel strongly
> >> about that, we can work around it.
> > I think it would be cleaner to just add whatever additional mqd
> > properties you need to amdgpu_mqd_prop, and then you can share
> > gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
> > sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
> > configuration, we only have to change one function.
> >
> > Alex
>
> Noted,
>
> We might have to add some additional fptrs for .prepare_map() and
> .prepare_unmap(). in the mqd funcs.
>
> These are the required to prepare data for MES HW queue mapping.

OK.  I think we could start with just using the existing init_mqd
callbacks from your create/destroy queue functions for now.  That
said, do we need the prepare_(un)map callbacks?  I think just
create/destory callbacks should be fine.  In the create callback, we
can init the mqd and map it, then in destroy, we can unmap and free.

Alex


Alex

>
> - Shashank
>
> >
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>>    int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> >>>>    {
> >>>> +    int r;
> >>>> +
> >>>>        mutex_init(&userq_mgr->userq_mutex);
> >>>>        idr_init_base(&userq_mgr->userq_idr, 1);
> >>>>        INIT_LIST_HEAD(&userq_mgr->userq_list);
> >>>>        userq_mgr->adev = adev;
> >>>>
> >>>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
> >>>> +    if (r) {
> >>>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
> >>>> +        return r;
> >>>> +    }
> >>>> +
> >>>>        return 0;
> >>>>    }
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>> new file mode 100644
> >>>> index 000000000000..57889729d635
> >>>> --- /dev/null
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>> @@ -0,0 +1,132 @@
> >>>> +/*
> >>>> + * Copyright 2022 Advanced Micro Devices, Inc.
> >>>> + *
> >>>> + * Permission is hereby granted, free of charge, to any person obtaining a
> >>>> + * copy of this software and associated documentation files (the "Software"),
> >>>> + * to deal in the Software without restriction, including without limitation
> >>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> >>>> + * and/or sell copies of the Software, and to permit persons to whom the
> >>>> + * Software is furnished to do so, subject to the following conditions:
> >>>> + *
> >>>> + * The above copyright notice and this permission notice shall be included in
> >>>> + * all copies or substantial portions of the Software.
> >>>> + *
> >>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> >>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> >>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> >>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> >>>> + * OTHER DEALINGS IN THE SOFTWARE.
> >>>> + *
> >>>> + */
> >>>> +#include "amdgpu.h"
> >>>> +#include "amdgpu_userqueue.h"
> >>>> +#include "v11_structs.h"
> >>>> +#include "amdgpu_mes.h"
> >>>> +#include "gc/gc_11_0_0_offset.h"
> >>>> +#include "gc/gc_11_0_0_sh_mask.h"
> >>>> +
> >>>> +static int
> >>>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >>>> +{
> >>>> +    uint32_t tmp, rb_bufsz;
> >>>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
> >>>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
> >>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>> +
> >>>> +    /* set up gfx hqd wptr */
> >>>> +    mqd->cp_gfx_hqd_wptr = 0;
> >>>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
> >>>> +
> >>>> +    /* set the pointer to the MQD */
> >>>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
> >>>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
> >>>> +
> >>>> +    /* set up mqd control */
> >>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
> >>>> +    mqd->cp_gfx_mqd_control = tmp;
> >>>> +
> >>>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
> >>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
> >>>> +    mqd->cp_gfx_hqd_vmid = 0;
> >>>> +
> >>>> +    /* set up default queue priority level
> >>>> +    * 0x0 = low priority, 0x1 = high priority */
> >>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
> >>>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
> >>>> +
> >>>> +    /* set up time quantum */
> >>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
> >>>> +    mqd->cp_gfx_hqd_quantum = tmp;
> >>>> +
> >>>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
> >>>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
> >>>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
> >>>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
> >>>> +
> >>>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
> >>>> +    wb_gpu_addr = queue->rptr_gpu_addr;
> >>>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
> >>>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
> >>>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
> >>>> +
> >>>> +    /* set up rb_wptr_poll addr */
> >>>> +    wb_gpu_addr = queue->wptr_gpu_addr;
> >>>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
> >>>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
> >>>> +
> >>>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
> >>>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
> >>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
> >>>> +#ifdef __BIG_ENDIAN
> >>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
> >>>> +#endif
> >>>> +    mqd->cp_gfx_hqd_cntl = tmp;
> >>>> +
> >>>> +    /* set up cp_doorbell_control */
> >>>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
> >>>> +    if (queue->use_doorbell) {
> >>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>> +                    DOORBELL_OFFSET, queue->doorbell_index);
> >>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>> +                    DOORBELL_EN, 1);
> >>>> +    } else {
> >>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>> +                    DOORBELL_EN, 0);
> >>>> +    }
> >>>> +    mqd->cp_rb_doorbell_control = tmp;
> >>>> +
> >>>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
> >>>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
> >>>> +
> >>>> +    /* activate the queue */
> >>>> +    mqd->cp_gfx_hqd_active = 1;
> >>>> +
> >>> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
> >>> directly or leverage adev->mqds[]?
> >> Let us try this out and come back.
> >>
> >> - Shashank
> >>
> >>
> >>> Alex
> >>>
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +static void
> >>>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >>>> +{
> >>>> +
> >>>> +}
> >>>> +
> >>>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
> >>>> +{
> >>>> +    return sizeof(struct v11_gfx_mqd);
> >>>> +}
> >>>> +
> >>>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
> >>>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
> >>>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> >>>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> >>>> +};
> >>>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
> >>>> index b8ff7456ae0b..f8008270f813 100644
> >>>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
> >>>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
> >>>> @@ -25,14 +25,14 @@
> >>>>    #define V11_STRUCTS_H_
> >>>>
> >>>>    struct v11_gfx_mqd {
> >>>> -       uint32_t reserved_0; // offset: 0  (0x0)
> >>>> -       uint32_t reserved_1; // offset: 1  (0x1)
> >>>> -       uint32_t reserved_2; // offset: 2  (0x2)
> >>>> -       uint32_t reserved_3; // offset: 3  (0x3)
> >>>> -       uint32_t reserved_4; // offset: 4  (0x4)
> >>>> -       uint32_t reserved_5; // offset: 5  (0x5)
> >>>> -       uint32_t reserved_6; // offset: 6  (0x6)
> >>>> -       uint32_t reserved_7; // offset: 7  (0x7)
> >>>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
> >>>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
> >>>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
> >>>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
> >>>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
> >>>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
> >>>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
> >>>> +       uint32_t ib_vmid; // offset: 7  (0x7)
> >>>>           uint32_t reserved_8; // offset: 8  (0x8)
> >>>>           uint32_t reserved_9; // offset: 9  (0x9)
> >>>>           uint32_t reserved_10; // offset: 10  (0xA)
> >>>> --
> >>>> 2.34.1
> >>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 16:54           ` Alex Deucher
@ 2023-02-07 17:13             ` Shashank Sharma
  2023-02-07 17:57               ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07 17:13 UTC (permalink / raw)
  To: Alex Deucher
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx


On 07/02/2023 17:54, Alex Deucher wrote:
> On Tue, Feb 7, 2023 at 11:37 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 07/02/2023 17:05, Alex Deucher wrote:
>>> On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> On 07/02/2023 16:17, Alex Deucher wrote:
>>>>> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>>>>
>>>>>> MQD describes the properies of a user queue to the HW, and allows it to
>>>>>> accurately configure the queue while mapping it in GPU HW. This patch
>>>>>> adds:
>>>>>> - A new header file which contains the userqueue MQD definition for
>>>>>>      V11 graphics engine.
>>>>>> - A new function which fills it with userqueue data and prepares MQD
>>>>>> - A function which sets-up the MQD function ptrs in the generic userqueue
>>>>>>      creation code.
>>>>>>
>>>>>> V1: Addressed review comments from RFC patch series
>>>>>>        - Reuse the existing MQD structure instead of creating a new one
>>>>>>        - MQD format and creation can be IP specific, keep it like that
>>>>>>
>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
>>>>>>     .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
>>>>>>     drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
>>>>>>     4 files changed, 169 insertions(+), 8 deletions(-)
>>>>>>     create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>> index 764801cc8203..6ae9d5792791 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>>>>>>
>>>>>>     # add usermode queue
>>>>>>     amdgpu-y += amdgpu_userqueue.o
>>>>>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
>>>>>>
>>>>>>     ifneq ($(CONFIG_HSA_AMD),)
>>>>>>     AMDKFD_PATH := ../amdkfd
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>> index 625c2fe1e84a..9f3490a91776 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>>>>>>         return r;
>>>>>>     }
>>>>>>
>>>>>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
>>>>>> +
>>>>>> +static int
>>>>>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
>>>>>> +{
>>>>>> +    int maj;
>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>>>>> +
>>>>>> +    maj = IP_VERSION_MAJ(version);
>>>>>> +    if (maj == 11) {
>>>>>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
>>>>>> +    } else {
>>>>>> +        DRM_WARN("This IP doesn't support usermode queues\n");
>>>>>> +        return -EINVAL;
>>>>>> +    }
>>>>>> +
>>>>> I think it would be cleaner to just store these callbacks in adev.
>>>>> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
>>>>> in early_init for each IP, we can register the callbacks.  When the
>>>>> user goes to create a new user_queue, we can check check to see if the
>>>>> function pointer is NULL or not for the queue type:
>>>>>
>>>>> if (!adev->user_queue_funcs[ip_type])
>>>>>      return -EINVAL
>>>>>
>>>>> r = adev->user_queue_funcs[ip_type]->create_queue();
>>>> Sounds like a good idea, we can do this.
>>>>
>>>>> Actually, there is already an mqd manager interface (adev->mqds[]).
>>>>> Maybe you can leverage that interface.
>>>> Yep, I saw that and initially even tried to work on that interface
>>>> itself, and then realized that it doesn't allow us to pass some
>>>>
>>>> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
>>>> gang_ctx_bo's and so on). All of these are required in the MQD
>>>>
>>>> and we will need them to be added into MQD. I even thought of expanding
>>>> this structure with additional parameters, but I felt like
>>>>
>>>> it defeats the purpose of this MQD properties. But if you feel strongly
>>>> about that, we can work around it.
>>> I think it would be cleaner to just add whatever additional mqd
>>> properties you need to amdgpu_mqd_prop, and then you can share
>>> gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
>>> sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
>>> configuration, we only have to change one function.
>>>
>>> Alex
>> Noted,
>>
>> We might have to add some additional fptrs for .prepare_map() and
>> .prepare_unmap(). in the mqd funcs.
>>
>> These are the required to prepare data for MES HW queue mapping.
> OK.  I think we could start with just using the existing init_mqd
> callbacks from your create/destroy queue functions for now.
Ok,
> That
> said, do we need the prepare_(un)map callbacks?  I think just
> create/destory callbacks should be fine.  In the create callback, we
> can init the mqd and map it, then in destroy, we can unmap and free.

If you observe the kernel MES framework, it expects the data to be fed 
in a particular format, in form of queue_properties, and

creates the map_queue_packet using those. So we need to re-arrange the 
data we have in MQD or drm_mqd_in in format

of properties, which is being done in prepare_map/unmap. Now, as the MQD 
is IP specific, we will need this

function to be IP specific as well, so I added a new fptr callback.


So the idea here is, IP specific stuff like:

- preparing the MQD

- preparing the properties for map_queue_packet

- preparing the context BOs

is being done in IP specific functions in amdgpu_vxx_userqueue.c


and

- initializing the queue

- handling the IOCTL

- adding/mapping the queue to MES

- any bookkeeping

is being done from the IP independent amdgpu_userqueue.c functions.

- Shashank
> Alex
>
>
> Alex
>
>> - Shashank
>>
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>>>>>>     {
>>>>>> +    int r;
>>>>>> +
>>>>>>         mutex_init(&userq_mgr->userq_mutex);
>>>>>>         idr_init_base(&userq_mgr->userq_idr, 1);
>>>>>>         INIT_LIST_HEAD(&userq_mgr->userq_list);
>>>>>>         userq_mgr->adev = adev;
>>>>>>
>>>>>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>>>>>> +    if (r) {
>>>>>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
>>>>>> +        return r;
>>>>>> +    }
>>>>>> +
>>>>>>         return 0;
>>>>>>     }
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>>> new file mode 100644
>>>>>> index 000000000000..57889729d635
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>>> @@ -0,0 +1,132 @@
>>>>>> +/*
>>>>>> + * Copyright 2022 Advanced Micro Devices, Inc.
>>>>>> + *
>>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>>>> + * copy of this software and associated documentation files (the "Software"),
>>>>>> + * to deal in the Software without restriction, including without limitation
>>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>>>> + * Software is furnished to do so, subject to the following conditions:
>>>>>> + *
>>>>>> + * The above copyright notice and this permission notice shall be included in
>>>>>> + * all copies or substantial portions of the Software.
>>>>>> + *
>>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>>>>> + *
>>>>>> + */
>>>>>> +#include "amdgpu.h"
>>>>>> +#include "amdgpu_userqueue.h"
>>>>>> +#include "v11_structs.h"
>>>>>> +#include "amdgpu_mes.h"
>>>>>> +#include "gc/gc_11_0_0_offset.h"
>>>>>> +#include "gc/gc_11_0_0_sh_mask.h"
>>>>>> +
>>>>>> +static int
>>>>>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>>>>> +{
>>>>>> +    uint32_t tmp, rb_bufsz;
>>>>>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
>>>>>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>> +
>>>>>> +    /* set up gfx hqd wptr */
>>>>>> +    mqd->cp_gfx_hqd_wptr = 0;
>>>>>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
>>>>>> +
>>>>>> +    /* set the pointer to the MQD */
>>>>>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
>>>>>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
>>>>>> +
>>>>>> +    /* set up mqd control */
>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
>>>>>> +    mqd->cp_gfx_mqd_control = tmp;
>>>>>> +
>>>>>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
>>>>>> +    mqd->cp_gfx_hqd_vmid = 0;
>>>>>> +
>>>>>> +    /* set up default queue priority level
>>>>>> +    * 0x0 = low priority, 0x1 = high priority */
>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
>>>>>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
>>>>>> +
>>>>>> +    /* set up time quantum */
>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
>>>>>> +    mqd->cp_gfx_hqd_quantum = tmp;
>>>>>> +
>>>>>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
>>>>>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
>>>>>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
>>>>>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
>>>>>> +
>>>>>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
>>>>>> +    wb_gpu_addr = queue->rptr_gpu_addr;
>>>>>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
>>>>>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
>>>>>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
>>>>>> +
>>>>>> +    /* set up rb_wptr_poll addr */
>>>>>> +    wb_gpu_addr = queue->wptr_gpu_addr;
>>>>>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
>>>>>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
>>>>>> +
>>>>>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
>>>>>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
>>>>>> +#ifdef __BIG_ENDIAN
>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
>>>>>> +#endif
>>>>>> +    mqd->cp_gfx_hqd_cntl = tmp;
>>>>>> +
>>>>>> +    /* set up cp_doorbell_control */
>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
>>>>>> +    if (queue->use_doorbell) {
>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>>>> +                    DOORBELL_OFFSET, queue->doorbell_index);
>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>>>> +                    DOORBELL_EN, 1);
>>>>>> +    } else {
>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>>>> +                    DOORBELL_EN, 0);
>>>>>> +    }
>>>>>> +    mqd->cp_rb_doorbell_control = tmp;
>>>>>> +
>>>>>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
>>>>>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
>>>>>> +
>>>>>> +    /* activate the queue */
>>>>>> +    mqd->cp_gfx_hqd_active = 1;
>>>>>> +
>>>>> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
>>>>> directly or leverage adev->mqds[]?
>>>> Let us try this out and come back.
>>>>
>>>> - Shashank
>>>>
>>>>
>>>>> Alex
>>>>>
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>>>>> +{
>>>>>> +
>>>>>> +}
>>>>>> +
>>>>>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>>>>>> +{
>>>>>> +    return sizeof(struct v11_gfx_mqd);
>>>>>> +}
>>>>>> +
>>>>>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>>>>>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>>>>>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>>>>>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>>>>>> +};
>>>>>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
>>>>>> index b8ff7456ae0b..f8008270f813 100644
>>>>>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
>>>>>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
>>>>>> @@ -25,14 +25,14 @@
>>>>>>     #define V11_STRUCTS_H_
>>>>>>
>>>>>>     struct v11_gfx_mqd {
>>>>>> -       uint32_t reserved_0; // offset: 0  (0x0)
>>>>>> -       uint32_t reserved_1; // offset: 1  (0x1)
>>>>>> -       uint32_t reserved_2; // offset: 2  (0x2)
>>>>>> -       uint32_t reserved_3; // offset: 3  (0x3)
>>>>>> -       uint32_t reserved_4; // offset: 4  (0x4)
>>>>>> -       uint32_t reserved_5; // offset: 5  (0x5)
>>>>>> -       uint32_t reserved_6; // offset: 6  (0x6)
>>>>>> -       uint32_t reserved_7; // offset: 7  (0x7)
>>>>>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
>>>>>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
>>>>>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
>>>>>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
>>>>>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
>>>>>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
>>>>>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
>>>>>> +       uint32_t ib_vmid; // offset: 7  (0x7)
>>>>>>            uint32_t reserved_8; // offset: 8  (0x8)
>>>>>>            uint32_t reserved_9; // offset: 9  (0x9)
>>>>>>            uint32_t reserved_10; // offset: 10  (0xA)
>>>>>> --
>>>>>> 2.34.1
>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 17:13             ` Shashank Sharma
@ 2023-02-07 17:57               ` Alex Deucher
  2023-02-07 18:28                 ` Shashank Sharma
  0 siblings, 1 reply; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 17:57 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx

On Tue, Feb 7, 2023 at 12:14 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 07/02/2023 17:54, Alex Deucher wrote:
> > On Tue, Feb 7, 2023 at 11:37 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>
> >> On 07/02/2023 17:05, Alex Deucher wrote:
> >>> On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>> On 07/02/2023 16:17, Alex Deucher wrote:
> >>>>> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
> >>>>>>
> >>>>>> MQD describes the properies of a user queue to the HW, and allows it to
> >>>>>> accurately configure the queue while mapping it in GPU HW. This patch
> >>>>>> adds:
> >>>>>> - A new header file which contains the userqueue MQD definition for
> >>>>>>      V11 graphics engine.
> >>>>>> - A new function which fills it with userqueue data and prepares MQD
> >>>>>> - A function which sets-up the MQD function ptrs in the generic userqueue
> >>>>>>      creation code.
> >>>>>>
> >>>>>> V1: Addressed review comments from RFC patch series
> >>>>>>        - Reuse the existing MQD structure instead of creating a new one
> >>>>>>        - MQD format and creation can be IP specific, keep it like that
> >>>>>>
> >>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
> >>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
> >>>>>>     .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
> >>>>>>     drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
> >>>>>>     4 files changed, 169 insertions(+), 8 deletions(-)
> >>>>>>     create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>>>> index 764801cc8203..6ae9d5792791 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>>>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
> >>>>>>
> >>>>>>     # add usermode queue
> >>>>>>     amdgpu-y += amdgpu_userqueue.o
> >>>>>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
> >>>>>>
> >>>>>>     ifneq ($(CONFIG_HSA_AMD),)
> >>>>>>     AMDKFD_PATH := ../amdkfd
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>> index 625c2fe1e84a..9f3490a91776 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> >>>>>>         return r;
> >>>>>>     }
> >>>>>>
> >>>>>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
> >>>>>> +
> >>>>>> +static int
> >>>>>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
> >>>>>> +{
> >>>>>> +    int maj;
> >>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> >>>>>> +
> >>>>>> +    maj = IP_VERSION_MAJ(version);
> >>>>>> +    if (maj == 11) {
> >>>>>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
> >>>>>> +    } else {
> >>>>>> +        DRM_WARN("This IP doesn't support usermode queues\n");
> >>>>>> +        return -EINVAL;
> >>>>>> +    }
> >>>>>> +
> >>>>> I think it would be cleaner to just store these callbacks in adev.
> >>>>> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
> >>>>> in early_init for each IP, we can register the callbacks.  When the
> >>>>> user goes to create a new user_queue, we can check check to see if the
> >>>>> function pointer is NULL or not for the queue type:
> >>>>>
> >>>>> if (!adev->user_queue_funcs[ip_type])
> >>>>>      return -EINVAL
> >>>>>
> >>>>> r = adev->user_queue_funcs[ip_type]->create_queue();
> >>>> Sounds like a good idea, we can do this.
> >>>>
> >>>>> Actually, there is already an mqd manager interface (adev->mqds[]).
> >>>>> Maybe you can leverage that interface.
> >>>> Yep, I saw that and initially even tried to work on that interface
> >>>> itself, and then realized that it doesn't allow us to pass some
> >>>>
> >>>> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
> >>>> gang_ctx_bo's and so on). All of these are required in the MQD
> >>>>
> >>>> and we will need them to be added into MQD. I even thought of expanding
> >>>> this structure with additional parameters, but I felt like
> >>>>
> >>>> it defeats the purpose of this MQD properties. But if you feel strongly
> >>>> about that, we can work around it.
> >>> I think it would be cleaner to just add whatever additional mqd
> >>> properties you need to amdgpu_mqd_prop, and then you can share
> >>> gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
> >>> sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
> >>> configuration, we only have to change one function.
> >>>
> >>> Alex
> >> Noted,
> >>
> >> We might have to add some additional fptrs for .prepare_map() and
> >> .prepare_unmap(). in the mqd funcs.
> >>
> >> These are the required to prepare data for MES HW queue mapping.
> > OK.  I think we could start with just using the existing init_mqd
> > callbacks from your create/destroy queue functions for now.
> Ok,
> > That
> > said, do we need the prepare_(un)map callbacks?  I think just
> > create/destory callbacks should be fine.  In the create callback, we
> > can init the mqd and map it, then in destroy, we can unmap and free.
>
> If you observe the kernel MES framework, it expects the data to be fed
> in a particular format, in form of queue_properties, and
>
> creates the map_queue_packet using those. So we need to re-arrange the
> data we have in MQD or drm_mqd_in in format
>
> of properties, which is being done in prepare_map/unmap. Now, as the MQD
> is IP specific, we will need this
>
> function to be IP specific as well, so I added a new fptr callback.
>
>
> So the idea here is, IP specific stuff like:
>
> - preparing the MQD
>
> - preparing the properties for map_queue_packet
>
> - preparing the context BOs
>
> is being done in IP specific functions in amdgpu_vxx_userqueue.c
>
>
> and
>
> - initializing the queue
>
> - handling the IOCTL
>
> - adding/mapping the queue to MES

This seems weird to me.  Why have this in the asic independent code?
I was thinking the IOCTL would mostly just be a wrapper around IP
specific callbacks for create and destroy.  The callback would take a
generic mqd struct as a parameter, that was basically just a
passthrough from the IOCTL mqd struct.

struct amdgpu_user_queue_mqd {
    u32 flags;
    struct amdgpu_bo doorbell_bo;
    u32 doorbell_offset;
    struct amdgpu_bo queue_bo;
    struct amdgpu_bo rptr_bo;
    struct amdgpu_bo wptr_bo;
    u64 queue_gpu_va;
    u64 rptr_gpu_va;
    u64 wptr_gpu_va;
    int gang;
    ...
};

Then something like:

static int gfx_v11_0_create_gfx_user_queue(struct amdgpu_device *adev,
struct amdgpu_user_queue_mqd *user_mqd)
{
    struct gfx_v11_mqd mqd;

    mqd = kmalloc(size_of(struct gfx_v11_mqd mqd));
    ...
    // allocate any meta data, ctx buffers, etc.
    mqd->ctx_bo = amdgpu_bo_create();
    ...
    // populate the IP specific mqd with the generic stuff
    mqd->mqd_gpu_addr = user_mqd->queue_gpu_va;
    ...
    // init mqd
    r = adev->mqds[AMDGPU_HW_IP_GFX].init_mqd();
    // add gang, or increase ref count
    r = amdgpu_mes_add_gang();
    // map mqd
    r = amdgpu_mes_add_ring();
}

static int gfx_v11_0_destroy_gfx_user_queue(struct amdgpu_device
*adev, struct amdgpu_user_queue_mqd *user_mqd)
{
    // unmap mqd
    amdgpu_mes_remove_ring();
    // drop reference to the gang
    amdgpu_mes_remove_gang();

    // free any meta data, ctx buffers, etc.
    amdgpu_bo_unref(mqd->ctx_bo);
   kfree(mqd);
}

>
> - any bookkeeping
>
> is being done from the IP independent amdgpu_userqueue.c functions.
>
> - Shashank
> > Alex
> >
> >
> > Alex
> >
> >> - Shashank
> >>
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>>     int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> >>>>>>     {
> >>>>>> +    int r;
> >>>>>> +
> >>>>>>         mutex_init(&userq_mgr->userq_mutex);
> >>>>>>         idr_init_base(&userq_mgr->userq_idr, 1);
> >>>>>>         INIT_LIST_HEAD(&userq_mgr->userq_list);
> >>>>>>         userq_mgr->adev = adev;
> >>>>>>
> >>>>>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
> >>>>>> +    if (r) {
> >>>>>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
> >>>>>> +        return r;
> >>>>>> +    }
> >>>>>> +
> >>>>>>         return 0;
> >>>>>>     }
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>>> new file mode 100644
> >>>>>> index 000000000000..57889729d635
> >>>>>> --- /dev/null
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>>> @@ -0,0 +1,132 @@
> >>>>>> +/*
> >>>>>> + * Copyright 2022 Advanced Micro Devices, Inc.
> >>>>>> + *
> >>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
> >>>>>> + * copy of this software and associated documentation files (the "Software"),
> >>>>>> + * to deal in the Software without restriction, including without limitation
> >>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> >>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
> >>>>>> + * Software is furnished to do so, subject to the following conditions:
> >>>>>> + *
> >>>>>> + * The above copyright notice and this permission notice shall be included in
> >>>>>> + * all copies or substantial portions of the Software.
> >>>>>> + *
> >>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> >>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> >>>>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> >>>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >>>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> >>>>>> + * OTHER DEALINGS IN THE SOFTWARE.
> >>>>>> + *
> >>>>>> + */
> >>>>>> +#include "amdgpu.h"
> >>>>>> +#include "amdgpu_userqueue.h"
> >>>>>> +#include "v11_structs.h"
> >>>>>> +#include "amdgpu_mes.h"
> >>>>>> +#include "gc/gc_11_0_0_offset.h"
> >>>>>> +#include "gc/gc_11_0_0_sh_mask.h"
> >>>>>> +
> >>>>>> +static int
> >>>>>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >>>>>> +{
> >>>>>> +    uint32_t tmp, rb_bufsz;
> >>>>>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
> >>>>>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
> >>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>>>> +
> >>>>>> +    /* set up gfx hqd wptr */
> >>>>>> +    mqd->cp_gfx_hqd_wptr = 0;
> >>>>>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
> >>>>>> +
> >>>>>> +    /* set the pointer to the MQD */
> >>>>>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
> >>>>>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
> >>>>>> +
> >>>>>> +    /* set up mqd control */
> >>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
> >>>>>> +    mqd->cp_gfx_mqd_control = tmp;
> >>>>>> +
> >>>>>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
> >>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
> >>>>>> +    mqd->cp_gfx_hqd_vmid = 0;
> >>>>>> +
> >>>>>> +    /* set up default queue priority level
> >>>>>> +    * 0x0 = low priority, 0x1 = high priority */
> >>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
> >>>>>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
> >>>>>> +
> >>>>>> +    /* set up time quantum */
> >>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
> >>>>>> +    mqd->cp_gfx_hqd_quantum = tmp;
> >>>>>> +
> >>>>>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
> >>>>>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
> >>>>>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
> >>>>>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
> >>>>>> +
> >>>>>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
> >>>>>> +    wb_gpu_addr = queue->rptr_gpu_addr;
> >>>>>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
> >>>>>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
> >>>>>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
> >>>>>> +
> >>>>>> +    /* set up rb_wptr_poll addr */
> >>>>>> +    wb_gpu_addr = queue->wptr_gpu_addr;
> >>>>>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
> >>>>>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
> >>>>>> +
> >>>>>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
> >>>>>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
> >>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
> >>>>>> +#ifdef __BIG_ENDIAN
> >>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
> >>>>>> +#endif
> >>>>>> +    mqd->cp_gfx_hqd_cntl = tmp;
> >>>>>> +
> >>>>>> +    /* set up cp_doorbell_control */
> >>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
> >>>>>> +    if (queue->use_doorbell) {
> >>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>>>> +                    DOORBELL_OFFSET, queue->doorbell_index);
> >>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>>>> +                    DOORBELL_EN, 1);
> >>>>>> +    } else {
> >>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>>>> +                    DOORBELL_EN, 0);
> >>>>>> +    }
> >>>>>> +    mqd->cp_rb_doorbell_control = tmp;
> >>>>>> +
> >>>>>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
> >>>>>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
> >>>>>> +
> >>>>>> +    /* activate the queue */
> >>>>>> +    mqd->cp_gfx_hqd_active = 1;
> >>>>>> +
> >>>>> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
> >>>>> directly or leverage adev->mqds[]?
> >>>> Let us try this out and come back.
> >>>>
> >>>> - Shashank
> >>>>
> >>>>
> >>>>> Alex
> >>>>>
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void
> >>>>>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >>>>>> +{
> >>>>>> +
> >>>>>> +}
> >>>>>> +
> >>>>>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
> >>>>>> +{
> >>>>>> +    return sizeof(struct v11_gfx_mqd);
> >>>>>> +}
> >>>>>> +
> >>>>>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
> >>>>>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
> >>>>>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> >>>>>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> >>>>>> +};
> >>>>>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
> >>>>>> index b8ff7456ae0b..f8008270f813 100644
> >>>>>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
> >>>>>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
> >>>>>> @@ -25,14 +25,14 @@
> >>>>>>     #define V11_STRUCTS_H_
> >>>>>>
> >>>>>>     struct v11_gfx_mqd {
> >>>>>> -       uint32_t reserved_0; // offset: 0  (0x0)
> >>>>>> -       uint32_t reserved_1; // offset: 1  (0x1)
> >>>>>> -       uint32_t reserved_2; // offset: 2  (0x2)
> >>>>>> -       uint32_t reserved_3; // offset: 3  (0x3)
> >>>>>> -       uint32_t reserved_4; // offset: 4  (0x4)
> >>>>>> -       uint32_t reserved_5; // offset: 5  (0x5)
> >>>>>> -       uint32_t reserved_6; // offset: 6  (0x6)
> >>>>>> -       uint32_t reserved_7; // offset: 7  (0x7)
> >>>>>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
> >>>>>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
> >>>>>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
> >>>>>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
> >>>>>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
> >>>>>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
> >>>>>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
> >>>>>> +       uint32_t ib_vmid; // offset: 7  (0x7)
> >>>>>>            uint32_t reserved_8; // offset: 8  (0x8)
> >>>>>>            uint32_t reserved_9; // offset: 9  (0x9)
> >>>>>>            uint32_t reserved_10; // offset: 10  (0xA)
> >>>>>> --
> >>>>>> 2.34.1
> >>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 17:57               ` Alex Deucher
@ 2023-02-07 18:28                 ` Shashank Sharma
  2023-02-07 18:32                   ` Alex Deucher
  0 siblings, 1 reply; 50+ messages in thread
From: Shashank Sharma @ 2023-02-07 18:28 UTC (permalink / raw)
  To: Alex Deucher
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx


On 07/02/2023 18:57, Alex Deucher wrote:
> On Tue, Feb 7, 2023 at 12:14 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 07/02/2023 17:54, Alex Deucher wrote:
>>> On Tue, Feb 7, 2023 at 11:37 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> On 07/02/2023 17:05, Alex Deucher wrote:
>>>>> On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>>>> On 07/02/2023 16:17, Alex Deucher wrote:
>>>>>>> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>>>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>>>>>>
>>>>>>>> MQD describes the properies of a user queue to the HW, and allows it to
>>>>>>>> accurately configure the queue while mapping it in GPU HW. This patch
>>>>>>>> adds:
>>>>>>>> - A new header file which contains the userqueue MQD definition for
>>>>>>>>       V11 graphics engine.
>>>>>>>> - A new function which fills it with userqueue data and prepares MQD
>>>>>>>> - A function which sets-up the MQD function ptrs in the generic userqueue
>>>>>>>>       creation code.
>>>>>>>>
>>>>>>>> V1: Addressed review comments from RFC patch series
>>>>>>>>         - Reuse the existing MQD structure instead of creating a new one
>>>>>>>>         - MQD format and creation can be IP specific, keep it like that
>>>>>>>>
>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
>>>>>>>>      .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
>>>>>>>>      drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
>>>>>>>>      4 files changed, 169 insertions(+), 8 deletions(-)
>>>>>>>>      create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>>> index 764801cc8203..6ae9d5792791 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>>>>>>>>
>>>>>>>>      # add usermode queue
>>>>>>>>      amdgpu-y += amdgpu_userqueue.o
>>>>>>>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
>>>>>>>>
>>>>>>>>      ifneq ($(CONFIG_HSA_AMD),)
>>>>>>>>      AMDKFD_PATH := ../amdkfd
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> index 625c2fe1e84a..9f3490a91776 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>>>>>>>>          return r;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
>>>>>>>> +{
>>>>>>>> +    int maj;
>>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>>>>>>>> +
>>>>>>>> +    maj = IP_VERSION_MAJ(version);
>>>>>>>> +    if (maj == 11) {
>>>>>>>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
>>>>>>>> +    } else {
>>>>>>>> +        DRM_WARN("This IP doesn't support usermode queues\n");
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>> I think it would be cleaner to just store these callbacks in adev.
>>>>>>> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
>>>>>>> in early_init for each IP, we can register the callbacks.  When the
>>>>>>> user goes to create a new user_queue, we can check check to see if the
>>>>>>> function pointer is NULL or not for the queue type:
>>>>>>>
>>>>>>> if (!adev->user_queue_funcs[ip_type])
>>>>>>>       return -EINVAL
>>>>>>>
>>>>>>> r = adev->user_queue_funcs[ip_type]->create_queue();
>>>>>> Sounds like a good idea, we can do this.
>>>>>>
>>>>>>> Actually, there is already an mqd manager interface (adev->mqds[]).
>>>>>>> Maybe you can leverage that interface.
>>>>>> Yep, I saw that and initially even tried to work on that interface
>>>>>> itself, and then realized that it doesn't allow us to pass some
>>>>>>
>>>>>> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
>>>>>> gang_ctx_bo's and so on). All of these are required in the MQD
>>>>>>
>>>>>> and we will need them to be added into MQD. I even thought of expanding
>>>>>> this structure with additional parameters, but I felt like
>>>>>>
>>>>>> it defeats the purpose of this MQD properties. But if you feel strongly
>>>>>> about that, we can work around it.
>>>>> I think it would be cleaner to just add whatever additional mqd
>>>>> properties you need to amdgpu_mqd_prop, and then you can share
>>>>> gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
>>>>> sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
>>>>> configuration, we only have to change one function.
>>>>>
>>>>> Alex
>>>> Noted,
>>>>
>>>> We might have to add some additional fptrs for .prepare_map() and
>>>> .prepare_unmap(). in the mqd funcs.
>>>>
>>>> These are the required to prepare data for MES HW queue mapping.
>>> OK.  I think we could start with just using the existing init_mqd
>>> callbacks from your create/destroy queue functions for now.
>> Ok,
>>> That
>>> said, do we need the prepare_(un)map callbacks?  I think just
>>> create/destory callbacks should be fine.  In the create callback, we
>>> can init the mqd and map it, then in destroy, we can unmap and free.
>> If you observe the kernel MES framework, it expects the data to be fed
>> in a particular format, in form of queue_properties, and
>>
>> creates the map_queue_packet using those. So we need to re-arrange the
>> data we have in MQD or drm_mqd_in in format
>>
>> of properties, which is being done in prepare_map/unmap. Now, as the MQD
>> is IP specific, we will need this
>>
>> function to be IP specific as well, so I added a new fptr callback.
>>
>>
>> So the idea here is, IP specific stuff like:
>>
>> - preparing the MQD
>>
>> - preparing the properties for map_queue_packet
>>
>> - preparing the context BOs
>>
>> is being done in IP specific functions in amdgpu_vxx_userqueue.c
>>
>>
>> and
>>
>> - initializing the queue
>>
>> - handling the IOCTL
>>
>> - adding/mapping the queue to MES
> This seems weird to me.  Why have this in the asic independent code?
> I was thinking the IOCTL would mostly just be a wrapper around IP
> specific callbacks for create and destroy.  The callback would take a
> generic mqd struct as a parameter, that was basically just a
> passthrough from the IOCTL mqd struct.
>
> struct amdgpu_user_queue_mqd {
>      u32 flags;
>      struct amdgpu_bo doorbell_bo;
>      u32 doorbell_offset;
>      struct amdgpu_bo queue_bo;
>      struct amdgpu_bo rptr_bo;
>      struct amdgpu_bo wptr_bo;
>      u64 queue_gpu_va;
>      u64 rptr_gpu_va;
>      u64 wptr_gpu_va;
>      int gang;
>      ...
> };
>
> Then something like:
>
> static int gfx_v11_0_create_gfx_user_queue(struct amdgpu_device *adev,
> struct amdgpu_user_queue_mqd *user_mqd)
> {
>      struct gfx_v11_mqd mqd;
>
>      mqd = kmalloc(size_of(struct gfx_v11_mqd mqd));
>      ...
>      // allocate any meta data, ctx buffers, etc.
>      mqd->ctx_bo = amdgpu_bo_create();
>      ...
>      // populate the IP specific mqd with the generic stuff
>      mqd->mqd_gpu_addr = user_mqd->queue_gpu_va;
>      ...
>      // init mqd
>      r = adev->mqds[AMDGPU_HW_IP_GFX].init_mqd();

Actually, we are doing the same thing, but instead of doing in one large 
function we are doing in 3 smaller functions,

instead if having one big create_v11_mqd function, we have split its 
functionality into 3 parts:

- create_mqd_v11, create_ctx_mqd_v11, prepare_map_mqd_v11

I thought it would be easier to read, maintain and review modular 
functions for specific part. But probably I can get rid of the fptrs for 
them.

The IP independent functions are mostly passthrough who arrange 
data/handle memory management changes and call these IP functions.

>      // add gang, or increase ref count
>      r = amdgpu_mes_add_gang();
>      // map mqd
>      r = amdgpu_mes_add_ring();

we can't directly use most of these amdgpu_mes_* APIs as they expect 
data to be created and arranged in MES metadata format, which is not

aligned with how we are getting/preparing the data.


But I am getting your design points, and based on your inputs, I can try 
to re-arrange the code like this:

- Reuse existing MQD mgr and its fptr (create/destroy mqd) for 
add/destroy queue functionality

- mqd_create() can reuse the existing mqd_init() call and then 
internally call the create_ctx() and prepare_map() functions from the 
same file, we don't need separate fptrs.

- amdhpu_usermode_queue.c file can contain just the following:
    - init/fini function

    - the IOCTL handler

    - two wrappers, to call IP specific create/destroy functions.

This should take us close to what you are expecting.

- Shashank


> }
> static int gfx_v11_0_destroy_gfx_user_queue(struct amdgpu_device
> *adev, struct amdgpu_user_queue_mqd *user_mqd)
> {
>      // unmap mqd
>      amdgpu_mes_remove_ring();
>      // drop reference to the gang
>      amdgpu_mes_remove_gang();
>
>      // free any meta data, ctx buffers, etc.
>      amdgpu_bo_unref(mqd->ctx_bo);
>     kfree(mqd);
> }
>
>> - any bookkeeping
>>
>> is being done from the IP independent amdgpu_userqueue.c functions.
>>
>> - Shashank
>>> Alex
>>>
>>>
>>> Alex
>>>
>>>> - Shashank
>>>>
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>>>>>>>>      {
>>>>>>>> +    int r;
>>>>>>>> +
>>>>>>>>          mutex_init(&userq_mgr->userq_mutex);
>>>>>>>>          idr_init_base(&userq_mgr->userq_idr, 1);
>>>>>>>>          INIT_LIST_HEAD(&userq_mgr->userq_list);
>>>>>>>>          userq_mgr->adev = adev;
>>>>>>>>
>>>>>>>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>>>>>>>> +    if (r) {
>>>>>>>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
>>>>>>>> +        return r;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          return 0;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..57889729d635
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
>>>>>>>> @@ -0,0 +1,132 @@
>>>>>>>> +/*
>>>>>>>> + * Copyright 2022 Advanced Micro Devices, Inc.
>>>>>>>> + *
>>>>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>>>>>> + * copy of this software and associated documentation files (the "Software"),
>>>>>>>> + * to deal in the Software without restriction, including without limitation
>>>>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>>>>>> + * Software is furnished to do so, subject to the following conditions:
>>>>>>>> + *
>>>>>>>> + * The above copyright notice and this permission notice shall be included in
>>>>>>>> + * all copies or substantial portions of the Software.
>>>>>>>> + *
>>>>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>>>>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>>>>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>>>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>>>>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>>>>>>> + *
>>>>>>>> + */
>>>>>>>> +#include "amdgpu.h"
>>>>>>>> +#include "amdgpu_userqueue.h"
>>>>>>>> +#include "v11_structs.h"
>>>>>>>> +#include "amdgpu_mes.h"
>>>>>>>> +#include "gc/gc_11_0_0_offset.h"
>>>>>>>> +#include "gc/gc_11_0_0_sh_mask.h"
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>>>>>>> +{
>>>>>>>> +    uint32_t tmp, rb_bufsz;
>>>>>>>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
>>>>>>>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
>>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>>> +
>>>>>>>> +    /* set up gfx hqd wptr */
>>>>>>>> +    mqd->cp_gfx_hqd_wptr = 0;
>>>>>>>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
>>>>>>>> +
>>>>>>>> +    /* set the pointer to the MQD */
>>>>>>>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
>>>>>>>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
>>>>>>>> +
>>>>>>>> +    /* set up mqd control */
>>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
>>>>>>>> +    mqd->cp_gfx_mqd_control = tmp;
>>>>>>>> +
>>>>>>>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
>>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
>>>>>>>> +    mqd->cp_gfx_hqd_vmid = 0;
>>>>>>>> +
>>>>>>>> +    /* set up default queue priority level
>>>>>>>> +    * 0x0 = low priority, 0x1 = high priority */
>>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
>>>>>>>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
>>>>>>>> +
>>>>>>>> +    /* set up time quantum */
>>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
>>>>>>>> +    mqd->cp_gfx_hqd_quantum = tmp;
>>>>>>>> +
>>>>>>>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
>>>>>>>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
>>>>>>>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
>>>>>>>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
>>>>>>>> +
>>>>>>>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
>>>>>>>> +    wb_gpu_addr = queue->rptr_gpu_addr;
>>>>>>>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
>>>>>>>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
>>>>>>>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
>>>>>>>> +
>>>>>>>> +    /* set up rb_wptr_poll addr */
>>>>>>>> +    wb_gpu_addr = queue->wptr_gpu_addr;
>>>>>>>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
>>>>>>>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
>>>>>>>> +
>>>>>>>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
>>>>>>>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
>>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
>>>>>>>> +#ifdef __BIG_ENDIAN
>>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
>>>>>>>> +#endif
>>>>>>>> +    mqd->cp_gfx_hqd_cntl = tmp;
>>>>>>>> +
>>>>>>>> +    /* set up cp_doorbell_control */
>>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
>>>>>>>> +    if (queue->use_doorbell) {
>>>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>>>>>> +                    DOORBELL_OFFSET, queue->doorbell_index);
>>>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>>>>>> +                    DOORBELL_EN, 1);
>>>>>>>> +    } else {
>>>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
>>>>>>>> +                    DOORBELL_EN, 0);
>>>>>>>> +    }
>>>>>>>> +    mqd->cp_rb_doorbell_control = tmp;
>>>>>>>> +
>>>>>>>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
>>>>>>>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
>>>>>>>> +
>>>>>>>> +    /* activate the queue */
>>>>>>>> +    mqd->cp_gfx_hqd_active = 1;
>>>>>>>> +
>>>>>>> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
>>>>>>> directly or leverage adev->mqds[]?
>>>>>> Let us try this out and come back.
>>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>>>>>>> +{
>>>>>>>> +
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
>>>>>>>> +{
>>>>>>>> +    return sizeof(struct v11_gfx_mqd);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
>>>>>>>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
>>>>>>>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>>>>>>>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>>>>>>>> +};
>>>>>>>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
>>>>>>>> index b8ff7456ae0b..f8008270f813 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
>>>>>>>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
>>>>>>>> @@ -25,14 +25,14 @@
>>>>>>>>      #define V11_STRUCTS_H_
>>>>>>>>
>>>>>>>>      struct v11_gfx_mqd {
>>>>>>>> -       uint32_t reserved_0; // offset: 0  (0x0)
>>>>>>>> -       uint32_t reserved_1; // offset: 1  (0x1)
>>>>>>>> -       uint32_t reserved_2; // offset: 2  (0x2)
>>>>>>>> -       uint32_t reserved_3; // offset: 3  (0x3)
>>>>>>>> -       uint32_t reserved_4; // offset: 4  (0x4)
>>>>>>>> -       uint32_t reserved_5; // offset: 5  (0x5)
>>>>>>>> -       uint32_t reserved_6; // offset: 6  (0x6)
>>>>>>>> -       uint32_t reserved_7; // offset: 7  (0x7)
>>>>>>>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
>>>>>>>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
>>>>>>>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
>>>>>>>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
>>>>>>>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
>>>>>>>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
>>>>>>>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
>>>>>>>> +       uint32_t ib_vmid; // offset: 7  (0x7)
>>>>>>>>             uint32_t reserved_8; // offset: 8  (0x8)
>>>>>>>>             uint32_t reserved_9; // offset: 9  (0x9)
>>>>>>>>             uint32_t reserved_10; // offset: 10  (0xA)
>>>>>>>> --
>>>>>>>> 2.34.1
>>>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions
  2023-02-07 18:28                 ` Shashank Sharma
@ 2023-02-07 18:32                   ` Alex Deucher
  0 siblings, 0 replies; 50+ messages in thread
From: Alex Deucher @ 2023-02-07 18:32 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: alexander.deucher, Arvind Yadav, Shashank Sharma,
	christian.koenig, amd-gfx

On Tue, Feb 7, 2023 at 1:28 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 07/02/2023 18:57, Alex Deucher wrote:
> > On Tue, Feb 7, 2023 at 12:14 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>
> >> On 07/02/2023 17:54, Alex Deucher wrote:
> >>> On Tue, Feb 7, 2023 at 11:37 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>> On 07/02/2023 17:05, Alex Deucher wrote:
> >>>>> On Tue, Feb 7, 2023 at 10:43 AM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>>>> On 07/02/2023 16:17, Alex Deucher wrote:
> >>>>>>> On Fri, Feb 3, 2023 at 4:55 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>>>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
> >>>>>>>>
> >>>>>>>> MQD describes the properies of a user queue to the HW, and allows it to
> >>>>>>>> accurately configure the queue while mapping it in GPU HW. This patch
> >>>>>>>> adds:
> >>>>>>>> - A new header file which contains the userqueue MQD definition for
> >>>>>>>>       V11 graphics engine.
> >>>>>>>> - A new function which fills it with userqueue data and prepares MQD
> >>>>>>>> - A function which sets-up the MQD function ptrs in the generic userqueue
> >>>>>>>>       creation code.
> >>>>>>>>
> >>>>>>>> V1: Addressed review comments from RFC patch series
> >>>>>>>>         - Reuse the existing MQD structure instead of creating a new one
> >>>>>>>>         - MQD format and creation can be IP specific, keep it like that
> >>>>>>>>
> >>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>>>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> >>>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>>>> ---
> >>>>>>>>      drivers/gpu/drm/amd/amdgpu/Makefile           |   1 +
> >>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  28 ++++
> >>>>>>>>      .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 132 ++++++++++++++++++
> >>>>>>>>      drivers/gpu/drm/amd/include/v11_structs.h     |  16 +--
> >>>>>>>>      4 files changed, 169 insertions(+), 8 deletions(-)
> >>>>>>>>      create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>>>>>> index 764801cc8203..6ae9d5792791 100644
> >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>>>>>>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
> >>>>>>>>
> >>>>>>>>      # add usermode queue
> >>>>>>>>      amdgpu-y += amdgpu_userqueue.o
> >>>>>>>> +amdgpu-y += amdgpu_userqueue_mqd_gfx_v11.o
> >>>>>>>>
> >>>>>>>>      ifneq ($(CONFIG_HSA_AMD),)
> >>>>>>>>      AMDKFD_PATH := ../amdkfd
> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>>>> index 625c2fe1e84a..9f3490a91776 100644
> >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>>>> @@ -202,13 +202,41 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> >>>>>>>>          return r;
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>> +extern const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs;
> >>>>>>>> +
> >>>>>>>> +static int
> >>>>>>>> +amdgpu_userqueue_setup_mqd_funcs(struct amdgpu_userq_mgr *uq_mgr)
> >>>>>>>> +{
> >>>>>>>> +    int maj;
> >>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>>>>>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> >>>>>>>> +
> >>>>>>>> +    maj = IP_VERSION_MAJ(version);
> >>>>>>>> +    if (maj == 11) {
> >>>>>>>> +        uq_mgr->userq_mqd_funcs = &userq_gfx_v11_mqd_funcs;
> >>>>>>>> +    } else {
> >>>>>>>> +        DRM_WARN("This IP doesn't support usermode queues\n");
> >>>>>>>> +        return -EINVAL;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>> I think it would be cleaner to just store these callbacks in adev.
> >>>>>>> Maybe something like adev->user_queue_funcs[AMDGPU_HW_IP_NUM].  Then
> >>>>>>> in early_init for each IP, we can register the callbacks.  When the
> >>>>>>> user goes to create a new user_queue, we can check check to see if the
> >>>>>>> function pointer is NULL or not for the queue type:
> >>>>>>>
> >>>>>>> if (!adev->user_queue_funcs[ip_type])
> >>>>>>>       return -EINVAL
> >>>>>>>
> >>>>>>> r = adev->user_queue_funcs[ip_type]->create_queue();
> >>>>>> Sounds like a good idea, we can do this.
> >>>>>>
> >>>>>>> Actually, there is already an mqd manager interface (adev->mqds[]).
> >>>>>>> Maybe you can leverage that interface.
> >>>>>> Yep, I saw that and initially even tried to work on that interface
> >>>>>> itself, and then realized that it doesn't allow us to pass some
> >>>>>>
> >>>>>> additional parameters (like queue->vm, various BOs like proc_ctx_bo,
> >>>>>> gang_ctx_bo's and so on). All of these are required in the MQD
> >>>>>>
> >>>>>> and we will need them to be added into MQD. I even thought of expanding
> >>>>>> this structure with additional parameters, but I felt like
> >>>>>>
> >>>>>> it defeats the purpose of this MQD properties. But if you feel strongly
> >>>>>> about that, we can work around it.
> >>>>> I think it would be cleaner to just add whatever additional mqd
> >>>>> properties you need to amdgpu_mqd_prop, and then you can share
> >>>>> gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()  for GFX and
> >>>>> sdma_v6_0_mqd_init() for SDMA.  That way if we make changes to the MQD
> >>>>> configuration, we only have to change one function.
> >>>>>
> >>>>> Alex
> >>>> Noted,
> >>>>
> >>>> We might have to add some additional fptrs for .prepare_map() and
> >>>> .prepare_unmap(). in the mqd funcs.
> >>>>
> >>>> These are the required to prepare data for MES HW queue mapping.
> >>> OK.  I think we could start with just using the existing init_mqd
> >>> callbacks from your create/destroy queue functions for now.
> >> Ok,
> >>> That
> >>> said, do we need the prepare_(un)map callbacks?  I think just
> >>> create/destory callbacks should be fine.  In the create callback, we
> >>> can init the mqd and map it, then in destroy, we can unmap and free.
> >> If you observe the kernel MES framework, it expects the data to be fed
> >> in a particular format, in form of queue_properties, and
> >>
> >> creates the map_queue_packet using those. So we need to re-arrange the
> >> data we have in MQD or drm_mqd_in in format
> >>
> >> of properties, which is being done in prepare_map/unmap. Now, as the MQD
> >> is IP specific, we will need this
> >>
> >> function to be IP specific as well, so I added a new fptr callback.
> >>
> >>
> >> So the idea here is, IP specific stuff like:
> >>
> >> - preparing the MQD
> >>
> >> - preparing the properties for map_queue_packet
> >>
> >> - preparing the context BOs
> >>
> >> is being done in IP specific functions in amdgpu_vxx_userqueue.c
> >>
> >>
> >> and
> >>
> >> - initializing the queue
> >>
> >> - handling the IOCTL
> >>
> >> - adding/mapping the queue to MES
> > This seems weird to me.  Why have this in the asic independent code?
> > I was thinking the IOCTL would mostly just be a wrapper around IP
> > specific callbacks for create and destroy.  The callback would take a
> > generic mqd struct as a parameter, that was basically just a
> > passthrough from the IOCTL mqd struct.
> >
> > struct amdgpu_user_queue_mqd {
> >      u32 flags;
> >      struct amdgpu_bo doorbell_bo;
> >      u32 doorbell_offset;
> >      struct amdgpu_bo queue_bo;
> >      struct amdgpu_bo rptr_bo;
> >      struct amdgpu_bo wptr_bo;
> >      u64 queue_gpu_va;
> >      u64 rptr_gpu_va;
> >      u64 wptr_gpu_va;
> >      int gang;
> >      ...
> > };
> >
> > Then something like:
> >
> > static int gfx_v11_0_create_gfx_user_queue(struct amdgpu_device *adev,
> > struct amdgpu_user_queue_mqd *user_mqd)
> > {
> >      struct gfx_v11_mqd mqd;
> >
> >      mqd = kmalloc(size_of(struct gfx_v11_mqd mqd));
> >      ...
> >      // allocate any meta data, ctx buffers, etc.
> >      mqd->ctx_bo = amdgpu_bo_create();
> >      ...
> >      // populate the IP specific mqd with the generic stuff
> >      mqd->mqd_gpu_addr = user_mqd->queue_gpu_va;
> >      ...
> >      // init mqd
> >      r = adev->mqds[AMDGPU_HW_IP_GFX].init_mqd();
>
> Actually, we are doing the same thing, but instead of doing in one large
> function we are doing in 3 smaller functions,
>
> instead if having one big create_v11_mqd function, we have split its
> functionality into 3 parts:
>
> - create_mqd_v11, create_ctx_mqd_v11, prepare_map_mqd_v11
>
> I thought it would be easier to read, maintain and review modular
> functions for specific part. But probably I can get rid of the fptrs for
> them.
>
> The IP independent functions are mostly passthrough who arrange
> data/handle memory management changes and call these IP functions.
>
> >      // add gang, or increase ref count
> >      r = amdgpu_mes_add_gang();
> >      // map mqd
> >      r = amdgpu_mes_add_ring();
>
> we can't directly use most of these amdgpu_mes_* APIs as they expect
> data to be created and arranged in MES metadata format, which is not
>
> aligned with how we are getting/preparing the data.
>
>
> But I am getting your design points, and based on your inputs, I can try
> to re-arrange the code like this:
>
> - Reuse existing MQD mgr and its fptr (create/destroy mqd) for
> add/destroy queue functionality
>
> - mqd_create() can reuse the existing mqd_init() call and then
> internally call the create_ctx() and prepare_map() functions from the
> same file, we don't need separate fptrs.
>
> - amdhpu_usermode_queue.c file can contain just the following:
>     - init/fini function
>
>     - the IOCTL handler
>
>     - two wrappers, to call IP specific create/destroy functions.
>
> This should take us close to what you are expecting.

Yes, I think we are on the same page.

Thanks,

Alex

>
> - Shashank
>
>
> > }
> > static int gfx_v11_0_destroy_gfx_user_queue(struct amdgpu_device
> > *adev, struct amdgpu_user_queue_mqd *user_mqd)
> > {
> >      // unmap mqd
> >      amdgpu_mes_remove_ring();
> >      // drop reference to the gang
> >      amdgpu_mes_remove_gang();
> >
> >      // free any meta data, ctx buffers, etc.
> >      amdgpu_bo_unref(mqd->ctx_bo);
> >     kfree(mqd);
> > }
> >
> >> - any bookkeeping
> >>
> >> is being done from the IP independent amdgpu_userqueue.c functions.
> >>
> >> - Shashank
> >>> Alex
> >>>
> >>>
> >>> Alex
> >>>
> >>>> - Shashank
> >>>>
> >>>>>>>> +    return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>      int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> >>>>>>>>      {
> >>>>>>>> +    int r;
> >>>>>>>> +
> >>>>>>>>          mutex_init(&userq_mgr->userq_mutex);
> >>>>>>>>          idr_init_base(&userq_mgr->userq_idr, 1);
> >>>>>>>>          INIT_LIST_HEAD(&userq_mgr->userq_list);
> >>>>>>>>          userq_mgr->adev = adev;
> >>>>>>>>
> >>>>>>>> +    r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
> >>>>>>>> +    if (r) {
> >>>>>>>> +        DRM_ERROR("Failed to setup MQD functions for usermode queue\n");
> >>>>>>>> +        return r;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>>          return 0;
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>>>>> new file mode 100644
> >>>>>>>> index 000000000000..57889729d635
> >>>>>>>> --- /dev/null
> >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> >>>>>>>> @@ -0,0 +1,132 @@
> >>>>>>>> +/*
> >>>>>>>> + * Copyright 2022 Advanced Micro Devices, Inc.
> >>>>>>>> + *
> >>>>>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
> >>>>>>>> + * copy of this software and associated documentation files (the "Software"),
> >>>>>>>> + * to deal in the Software without restriction, including without limitation
> >>>>>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> >>>>>>>> + * and/or sell copies of the Software, and to permit persons to whom the
> >>>>>>>> + * Software is furnished to do so, subject to the following conditions:
> >>>>>>>> + *
> >>>>>>>> + * The above copyright notice and this permission notice shall be included in
> >>>>>>>> + * all copies or substantial portions of the Software.
> >>>>>>>> + *
> >>>>>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> >>>>>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >>>>>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> >>>>>>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> >>>>>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >>>>>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> >>>>>>>> + * OTHER DEALINGS IN THE SOFTWARE.
> >>>>>>>> + *
> >>>>>>>> + */
> >>>>>>>> +#include "amdgpu.h"
> >>>>>>>> +#include "amdgpu_userqueue.h"
> >>>>>>>> +#include "v11_structs.h"
> >>>>>>>> +#include "amdgpu_mes.h"
> >>>>>>>> +#include "gc/gc_11_0_0_offset.h"
> >>>>>>>> +#include "gc/gc_11_0_0_sh_mask.h"
> >>>>>>>> +
> >>>>>>>> +static int
> >>>>>>>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >>>>>>>> +{
> >>>>>>>> +    uint32_t tmp, rb_bufsz;
> >>>>>>>> +    uint64_t hqd_gpu_addr, wb_gpu_addr;
> >>>>>>>> +    struct v11_gfx_mqd *mqd = queue->mqd_cpu_ptr;
> >>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
> >>>>>>>> +
> >>>>>>>> +    /* set up gfx hqd wptr */
> >>>>>>>> +    mqd->cp_gfx_hqd_wptr = 0;
> >>>>>>>> +    mqd->cp_gfx_hqd_wptr_hi = 0;
> >>>>>>>> +
> >>>>>>>> +    /* set the pointer to the MQD */
> >>>>>>>> +    mqd->cp_mqd_base_addr = queue->mqd_gpu_addr & 0xfffffffc;
> >>>>>>>> +    mqd->cp_mqd_base_addr_hi = upper_32_bits(queue->mqd_gpu_addr);
> >>>>>>>> +
> >>>>>>>> +    /* set up mqd control */
> >>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_MQD_CONTROL);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, VMID, 0);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, PRIV_STATE, 1);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_MQD_CONTROL, CACHE_POLICY, 0);
> >>>>>>>> +    mqd->cp_gfx_mqd_control = tmp;
> >>>>>>>> +
> >>>>>>>> +    /* set up gfx_hqd_vimd with 0x0 to indicate the ring buffer's vmid */
> >>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_VMID);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_VMID, VMID, 0);
> >>>>>>>> +    mqd->cp_gfx_hqd_vmid = 0;
> >>>>>>>> +
> >>>>>>>> +    /* set up default queue priority level
> >>>>>>>> +    * 0x0 = low priority, 0x1 = high priority */
> >>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUEUE_PRIORITY);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUEUE_PRIORITY, PRIORITY_LEVEL, 0);
> >>>>>>>> +    mqd->cp_gfx_hqd_queue_priority = tmp;
> >>>>>>>> +
> >>>>>>>> +    /* set up time quantum */
> >>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_QUANTUM);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_QUANTUM, QUANTUM_EN, 1);
> >>>>>>>> +    mqd->cp_gfx_hqd_quantum = tmp;
> >>>>>>>> +
> >>>>>>>> +    /* set up gfx hqd base. this is similar as CP_RB_BASE */
> >>>>>>>> +    hqd_gpu_addr = queue->queue_gpu_addr >> 8;
> >>>>>>>> +    mqd->cp_gfx_hqd_base = hqd_gpu_addr;
> >>>>>>>> +    mqd->cp_gfx_hqd_base_hi = upper_32_bits(hqd_gpu_addr);
> >>>>>>>> +
> >>>>>>>> +    /* set up hqd_rptr_addr/_hi, similar as CP_RB_RPTR */
> >>>>>>>> +    wb_gpu_addr = queue->rptr_gpu_addr;
> >>>>>>>> +    mqd->cp_gfx_hqd_rptr_addr = wb_gpu_addr & 0xfffffffc;
> >>>>>>>> +    mqd->cp_gfx_hqd_rptr_addr_hi =
> >>>>>>>> +    upper_32_bits(wb_gpu_addr) & 0xffff;
> >>>>>>>> +
> >>>>>>>> +    /* set up rb_wptr_poll addr */
> >>>>>>>> +    wb_gpu_addr = queue->wptr_gpu_addr;
> >>>>>>>> +    mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
> >>>>>>>> +    mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
> >>>>>>>> +
> >>>>>>>> +    /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
> >>>>>>>> +    rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
> >>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BUFSZ, rb_bufsz);
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, RB_BLKSZ, rb_bufsz - 2);
> >>>>>>>> +#ifdef __BIG_ENDIAN
> >>>>>>>> +    tmp = REG_SET_FIELD(tmp, CP_GFX_HQD_CNTL, BUF_SWAP, 1);
> >>>>>>>> +#endif
> >>>>>>>> +    mqd->cp_gfx_hqd_cntl = tmp;
> >>>>>>>> +
> >>>>>>>> +    /* set up cp_doorbell_control */
> >>>>>>>> +    tmp = RREG32_SOC15(GC, 0, regCP_RB_DOORBELL_CONTROL);
> >>>>>>>> +    if (queue->use_doorbell) {
> >>>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>>>>>> +                    DOORBELL_OFFSET, queue->doorbell_index);
> >>>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>>>>>> +                    DOORBELL_EN, 1);
> >>>>>>>> +    } else {
> >>>>>>>> +        tmp = REG_SET_FIELD(tmp, CP_RB_DOORBELL_CONTROL,
> >>>>>>>> +                    DOORBELL_EN, 0);
> >>>>>>>> +    }
> >>>>>>>> +    mqd->cp_rb_doorbell_control = tmp;
> >>>>>>>> +
> >>>>>>>> +    /* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
> >>>>>>>> +    mqd->cp_gfx_hqd_rptr = RREG32_SOC15(GC, 0, regCP_GFX_HQD_RPTR);
> >>>>>>>> +
> >>>>>>>> +    /* activate the queue */
> >>>>>>>> +    mqd->cp_gfx_hqd_active = 1;
> >>>>>>>> +
> >>>>>>> Can you use gfx_v11_0_gfx_mqd_init() and gfx_v11_0_compute_mqd_init()
> >>>>>>> directly or leverage adev->mqds[]?
> >>>>>> Let us try this out and come back.
> >>>>>>
> >>>>>> - Shashank
> >>>>>>
> >>>>>>
> >>>>>>> Alex
> >>>>>>>
> >>>>>>>> +    return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static void
> >>>>>>>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> >>>>>>>> +{
> >>>>>>>> +
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int amdgpu_userq_gfx_v11_mqd_size(struct amdgpu_userq_mgr *uq_mgr)
> >>>>>>>> +{
> >>>>>>>> +    return sizeof(struct v11_gfx_mqd);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +const struct amdgpu_userq_mqd_funcs userq_gfx_v11_mqd_funcs = {
> >>>>>>>> +    .mqd_size = amdgpu_userq_gfx_v11_mqd_size,
> >>>>>>>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> >>>>>>>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> >>>>>>>> +};
> >>>>>>>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
> >>>>>>>> index b8ff7456ae0b..f8008270f813 100644
> >>>>>>>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
> >>>>>>>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
> >>>>>>>> @@ -25,14 +25,14 @@
> >>>>>>>>      #define V11_STRUCTS_H_
> >>>>>>>>
> >>>>>>>>      struct v11_gfx_mqd {
> >>>>>>>> -       uint32_t reserved_0; // offset: 0  (0x0)
> >>>>>>>> -       uint32_t reserved_1; // offset: 1  (0x1)
> >>>>>>>> -       uint32_t reserved_2; // offset: 2  (0x2)
> >>>>>>>> -       uint32_t reserved_3; // offset: 3  (0x3)
> >>>>>>>> -       uint32_t reserved_4; // offset: 4  (0x4)
> >>>>>>>> -       uint32_t reserved_5; // offset: 5  (0x5)
> >>>>>>>> -       uint32_t reserved_6; // offset: 6  (0x6)
> >>>>>>>> -       uint32_t reserved_7; // offset: 7  (0x7)
> >>>>>>>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
> >>>>>>>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
> >>>>>>>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
> >>>>>>>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
> >>>>>>>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
> >>>>>>>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
> >>>>>>>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
> >>>>>>>> +       uint32_t ib_vmid; // offset: 7  (0x7)
> >>>>>>>>             uint32_t reserved_8; // offset: 8  (0x8)
> >>>>>>>>             uint32_t reserved_9; // offset: 9  (0x9)
> >>>>>>>>             uint32_t reserved_10; // offset: 10  (0xA)
> >>>>>>>> --
> >>>>>>>> 2.34.1
> >>>>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-02-07 18:32 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-03 21:54 [PATCH 0/8] AMDGPU usermode queues Shashank Sharma
2023-02-03 21:54 ` [PATCH 1/8] drm/amdgpu: UAPI for user queue management Shashank Sharma
2023-02-03 22:07   ` Alex Deucher
2023-02-03 22:26     ` Shashank Sharma
2023-02-06 16:56       ` Alex Deucher
2023-02-06 17:01         ` Christian König
2023-02-06 21:03           ` Alex Deucher
2023-02-07  7:03             ` Christian König
2023-02-07  7:38               ` Shashank Sharma
2023-02-07 14:07                 ` Alex Deucher
2023-02-07 14:11                   ` Christian König
2023-02-07 14:17                     ` Alex Deucher
2023-02-07 14:19                       ` Christian König
2023-02-07 14:20                         ` Alex Deucher
2023-02-07 14:36                           ` Shashank Sharma
2023-02-03 21:54 ` [PATCH 2/8] drm/amdgpu: add usermode queues Shashank Sharma
2023-02-07  7:08   ` Christian König
2023-02-07  7:40     ` Shashank Sharma
2023-02-07 14:54   ` Alex Deucher
2023-02-07 15:02     ` Shashank Sharma
2023-02-03 21:54 ` [PATCH 3/8] drm/amdgpu: introduce userqueue MQD handlers Shashank Sharma
2023-02-07  7:11   ` Christian König
2023-02-07  7:41     ` Shashank Sharma
2023-02-07 14:59   ` Alex Deucher
2023-02-03 21:54 ` [PATCH 4/8] drm/amdgpu: Add V11 graphics MQD functions Shashank Sharma
2023-02-07 15:17   ` Alex Deucher
2023-02-07 15:43     ` Shashank Sharma
2023-02-07 16:05       ` Alex Deucher
2023-02-07 16:37         ` Shashank Sharma
2023-02-07 16:54           ` Alex Deucher
2023-02-07 17:13             ` Shashank Sharma
2023-02-07 17:57               ` Alex Deucher
2023-02-07 18:28                 ` Shashank Sharma
2023-02-07 18:32                   ` Alex Deucher
2023-02-03 21:54 ` [PATCH 5/8] drm/amdgpu: Create context for usermode queue Shashank Sharma
2023-02-07  7:14   ` Christian König
2023-02-07  7:51     ` Shashank Sharma
2023-02-07  7:55       ` Christian König
2023-02-07  8:13         ` Shashank Sharma
2023-02-07 16:51   ` Alex Deucher
2023-02-07 16:53     ` Alex Deucher
2023-02-03 21:54 ` [PATCH 6/8] drm/amdgpu: Map userqueue into HW Shashank Sharma
2023-02-07  7:20   ` Christian König
2023-02-07  7:55     ` Shashank Sharma
2023-02-03 21:54 ` [PATCH 7/8] drm/amdgpu: DO-NOT-MERGE add busy-waiting delay Shashank Sharma
2023-02-03 21:54 ` [PATCH 8/8] drm/amdgpu: DO-NOT-MERGE doorbell hack Shashank Sharma
2023-02-06  0:52 ` [PATCH 0/8] AMDGPU usermode queues Dave Airlie
2023-02-06  8:57   ` Christian König
2023-02-06 15:39 ` Michel Dänzer
2023-02-06 16:11   ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.