All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] AMDGPU Usermode queues
@ 2023-03-29 16:04 Shashank Sharma
  2023-03-29 16:04 ` [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
                   ` (9 more replies)
  0 siblings, 10 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, Shashank Sharma

This patch series introduces AMDGPU usermode queues for gfx workloads.
Usermode queues is a method of GPU workload submission into the graphics
hardware without any interaction with kernel/DRM schedulers. In this
method, a userspace graphics application can create its own workqueue
and submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
  - A queue object to hold the workload packets.
  - A read pointer object.
  - A write pointer object.
  - A doorbell page.
- The application picks a 32-bit offset in the doorbell page for this queue.
- The application uses the usermode_queue_create IOCTL introduced in
  this patch, by passing the the GPU addresses of these objects (read
  ptr, write ptr, queue base address and 32-bit doorbell offset from
  the doorbell page)
- The kernel creates the queue and maps it in the HW.
- The application can start submitting the data in the queue as soon as
  the kernel IOCTL returns.
- After filling the workload data in the queue, the app must write the
  number of dwords added in the queue into the doorbell offset, and the
  GPU will start fetching the data.

libDRM changes for this series and a sample DRM test program can be found
in the MESA merge request here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

This patch series depends on the doorbell-manager changes, which are being
reviewed here:
https://patchwork.freedesktop.org/series/115802/

Alex Deucher (1):
  drm/amdgpu: UAPI for user queue management

Arvind Yadav (2):
  drm/amdgpu: add new parameters in v11_struct
  drm/amdgpu: map wptr BO into GART

Shashank Sharma (6):
  drm/amdgpu: add usermode queue base code
  drm/amdgpu: add new IOCTL for usermode queue
  drm/amdgpu: create GFX-gen11 MQD for userqueue
  drm/amdgpu: create context space for usermode queue
  drm/amdgpu: map usermode queue into MES
  drm/amdgpu: generate doorbell index for userqueue

 drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298 ++++++++++++++++++
 .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 230 ++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
 drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
 include/uapi/drm/amdgpu_drm.h                 |  55 ++++
 9 files changed, 677 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

-- 
2.40.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-03-29 17:25   ` Christian König
                     ` (2 more replies)
  2023-03-29 16:04 ` [PATCH v3 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, Shashank Sharma

From: Alex Deucher <alexander.deucher@amd.com>

This patch intorduces new UAPI/IOCTL for usermode graphics
queue. The userspace app will fill this structure and request
the graphics driver to add a graphics work queue for it. The
output of this UAPI is a queue id.

This UAPI maps the queue into GPU, so the graphics app can start
submitting work to the queue as soon as the call returns.

V2: Addressed review comments from Alex and Christian
    - Make the doorbell offset's comment clearer
    - Change the output parameter name to queue_id
V3: Integration with doorbell manager

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index cc5d551abda5..e4943099b9d2 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DRM_AMDGPU_VM			0x13
 #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
 #define DRM_AMDGPU_SCHED		0x15
+#define DRM_AMDGPU_USERQ		0x16
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -71,6 +72,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
 #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
+#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 
 /**
  * DOC: memory domains
@@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* user queue IOCTL */
+#define AMDGPU_USERQ_OP_CREATE	1
+#define AMDGPU_USERQ_OP_FREE	2
+
+#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
+#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
+
+struct drm_amdgpu_userq_mqd {
+	/** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
+	__u32	flags;
+	/** IP type: AMDGPU_HW_IP_* */
+	__u32	ip_type;
+	/** GEM object handle */
+	__u32   doorbell_handle;
+	/** Doorbell's offset in the doorbell bo */
+	__u32   doorbell_offset;
+	/** GPU virtual address of the queue */
+	__u64   queue_va;
+	/** Size of the queue in bytes */
+	__u64   queue_size;
+	/** GPU virtual address of the rptr */
+	__u64   rptr_va;
+	/** GPU virtual address of the wptr */
+	__u64   wptr_va;
+	/** GPU virtual address of the shadow context space */
+	__u64	shadow_va;
+};
+
+struct drm_amdgpu_userq_in {
+	/** AMDGPU_USERQ_OP_* */
+	__u32	op;
+	/** Flags */
+	__u32	flags;
+	/** Queue handle to associate the queue free call with,
+	 * unused for queue create calls */
+	__u32	queue_id;
+	__u32	pad;
+	/** Queue descriptor */
+	struct drm_amdgpu_userq_mqd mqd;
+};
+
+struct drm_amdgpu_userq_out {
+	/** Queue handle */
+	__u32	queue_id;
+	/** Flags */
+	__u32	flags;
+};
+
+union drm_amdgpu_userq {
+	struct drm_amdgpu_userq_in in;
+	struct drm_amdgpu_userq_out out;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 2/9] drm/amdgpu: add usermode queue base code
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
  2023-03-29 16:04 ` [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-03-30 21:15   ` Alex Deucher
  2023-04-04 16:05   ` Luben Tuikov
  2023-03-29 16:04 ` [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

From: Shashank Sharma <contactshashanksharma@gmail.com>

This patch adds skeleton code for amdgpu usermode queue. It contains:
- A new files with init functions of usermode queues.
- A queue context manager in driver private data.

V1: Worked on design review comments from RFC patch series:
(https://patchwork.freedesktop.org/series/112214/)
- Alex: Keep a list of queues, instead of single queue per process.
- Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2:
 - Reformatted code, split the big patch into two

V3:
- Integration with doorbell manager

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    | 49 +++++++++++++++++++
 6 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 204665f20319..2d90ba618e5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -210,6 +210,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o
 
+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6b74df446694..c5f9af0e74ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -438,6 +438,14 @@ struct amdgpu_sa_manager {
 	uint32_t		align;
 };
 
+/* Gfx usermode queues */
+struct amdgpu_userq_mgr {
+	struct idr userq_idr;
+	struct mutex userq_mutex;
+	struct amdgpu_device *adev;
+	const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
+};
+
 /* sub-allocation buffer */
 struct amdgpu_sa_bo {
 	struct list_head		olist;
@@ -470,7 +478,6 @@ struct amdgpu_flip_work {
 	bool				async;
 };
 
-
 /*
  * file private structure
  */
@@ -482,6 +489,7 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	struct amdgpu_userq_mgr	userq_mgr;
 };
 
 int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b4f2d61ea0d5..2d6bcfd727c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -52,6 +52,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_userqueue.h"
 
 /*
  * KMS wrapper.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 7aa7e52ca784..b16b8155a157 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -43,6 +43,7 @@
 #include "amdgpu_gem.h"
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_userqueue.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -1187,6 +1188,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
 
+	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
+	if (r)
+		DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
+
 	file_priv->driver_priv = fpriv;
 	goto out_suspend;
 
@@ -1254,6 +1259,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 	amdgpu_vm_fini(adev, &fpriv->vm);
+	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
 
 	if (pasid)
 		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index 000000000000..13e1eebc1cb6
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,39 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
+{
+    mutex_init(&userq_mgr->userq_mutex);
+    idr_init_base(&userq_mgr->userq_idr, 1);
+    userq_mgr->adev = adev;
+
+    return 0;
+}
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
+{
+    idr_destroy(&userq_mgr->userq_idr);
+    mutex_destroy(&userq_mgr->userq_mutex);
+}
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
new file mode 100644
index 000000000000..7eeb8c9e6575
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDGPU_USERQUEUE_H_
+#define AMDGPU_USERQUEUE_H_
+
+#include "amdgpu.h"
+#define AMDGPU_MAX_USERQ 512
+
+struct amdgpu_usermode_queue {
+	int queue_id;
+	int queue_type;
+	uint64_t flags;
+	uint64_t doorbell_handle;
+	struct amdgpu_vm *vm;
+	struct amdgpu_userq_mgr *userq_mgr;
+	struct amdgpu_mqd_prop userq_prop;
+};
+
+struct amdgpu_userq_funcs {
+	int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
+};
+
+int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
+
+void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
+
+#endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
  2023-03-29 16:04 ` [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
  2023-03-29 16:04 ` [PATCH v3 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-04-10  0:02   ` Bas Nieuwenhuizen
  2023-03-29 16:04 ` [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue Shashank Sharma
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

From: Shashank Sharma <contactshashanksharma@gmail.com>

This patch adds:
- A new IOCTL function to create and destroy
- A new structure to keep all the user queue data in one place.
- A function to generate unique index for the queue.

V1: Worked on review comments from RFC patch series:
  - Alex: Keep a list of queues, instead of single queue per process.
  - Christian: Use the queue manager instead of global ptrs,
           Don't keep the queue structure in amdgpu_ctx

V2: Worked on review comments:
 - Christian:
   - Formatting of text
   - There is no need for queuing of userqueues, with idr in place
 - Alex:
   - Remove use_doorbell, its unnecessary
   - Reuse amdgpu_mqd_props for saving mqd fields

 - Code formatting and re-arrangement

V3:
 - Integration with doorbell manager

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 113 ++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 3 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2d6bcfd727c8..229976a2d0e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2749,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 13e1eebc1cb6..353f57c5a772 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -22,6 +22,119 @@
  */
 
 #include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_userqueue.h"
+
+static inline int
+amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
+}
+
+static inline void
+amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
+{
+    idr_remove(&uq_mgr->userq_idr, queue_id);
+}
+
+static struct amdgpu_usermode_queue *
+amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
+{
+    return idr_find(&uq_mgr->userq_idr, qid);
+}
+
+static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
+{
+    struct amdgpu_usermode_queue *queue;
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
+    int r;
+
+    /* Do we have support userqueues for this IP ? */
+    if (!uq_mgr->userq_funcs[mqd_in->ip_type]) {
+        DRM_ERROR("GFX User queues not supported for this IP: %d\n", mqd_in->ip_type);
+        return -EINVAL;
+    }
+
+    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
+    if (!queue) {
+        DRM_ERROR("Failed to allocate memory for queue\n");
+        return -ENOMEM;
+    }
+
+    mutex_lock(&uq_mgr->userq_mutex);
+    queue->userq_prop.wptr_gpu_addr = mqd_in->wptr_va;
+    queue->userq_prop.rptr_gpu_addr = mqd_in->rptr_va;
+    queue->userq_prop.queue_size = mqd_in->queue_size;
+    queue->userq_prop.hqd_base_gpu_addr = mqd_in->queue_va;
+    queue->userq_prop.queue_size = mqd_in->queue_size;
+
+    queue->doorbell_handle = mqd_in->doorbell_handle;
+    queue->queue_type = mqd_in->ip_type;
+    queue->flags = mqd_in->flags;
+    queue->vm = &fpriv->vm;
+    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
+    if (queue->queue_id < 0) {
+        DRM_ERROR("Failed to allocate a queue id\n");
+        r = queue->queue_id;
+        goto free_queue;
+    }
+
+    args->out.queue_id = queue->queue_id;
+    args->out.flags = 0;
+    mutex_unlock(&uq_mgr->userq_mutex);
+    return 0;
+
+free_queue:
+    mutex_unlock(&uq_mgr->userq_mutex);
+    kfree(queue);
+    return r;
+}
+
+static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
+{
+    struct amdgpu_fpriv *fpriv = filp->driver_priv;
+    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
+    struct amdgpu_usermode_queue *queue;
+
+    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
+    if (!queue) {
+        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
+        return;
+    }
+
+    mutex_lock(&uq_mgr->userq_mutex);
+    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
+    mutex_unlock(&uq_mgr->userq_mutex);
+    kfree(queue);
+}
+
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *filp)
+{
+    union drm_amdgpu_userq *args = data;
+    int r = 0;
+
+    switch (args->in.op) {
+    case AMDGPU_USERQ_OP_CREATE:
+        r = amdgpu_userqueue_create(filp, args);
+        if (r)
+            DRM_ERROR("Failed to create usermode queue\n");
+        break;
+
+    case AMDGPU_USERQ_OP_FREE:
+        amdgpu_userqueue_destroy(filp, args->in.queue_id);
+        break;
+
+    default:
+        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
+        return -EINVAL;
+    }
+
+    return r;
+}
+
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 7eeb8c9e6575..7625a862b1fc 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -42,6 +42,8 @@ struct amdgpu_userq_funcs {
 	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
 };
 
+int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
+
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
 
 void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (2 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-03-30 21:18   ` Alex Deucher
  2023-04-04 16:21   ` Luben Tuikov
  2023-03-29 16:04 ` [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue Shashank Sharma
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig,
	Arvind Yadav

From: Shashank Sharma <contactshashanksharma@gmail.com>

A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
context. As MQD format can vary between different graphics IPs, we need gfx
GEN specific handlers to create MQDs.

This patch:
- Introduces MQD hander functions for the usermode queues.
- Adds new functions to create and destroy MQD for GFX-GEN-11-IP

V1: Worked on review comments from Alex:
    - Make MQD functions GEN and IP specific

V2: Worked on review comments from Alex:
    - Reuse the existing adev->mqd[ip] for MQD creation
    - Formatting and arrangement of code

V3:
    - Integration with doorbell manager

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>

Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 21 +++++
 .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 84 +++++++++++++++++++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
 4 files changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2d90ba618e5d..2cc7897de7e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
 
 # add usermode queue
 amdgpu-y += amdgpu_userqueue.o
+amdgpu-y += amdgpu_userqueue_gfx_v11.o
 
 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 353f57c5a772..052c2c1e8aed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -81,6 +81,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
         goto free_queue;
     }
 
+    r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to create/map userqueue MQD\n");
+        goto free_queue;
+    }
+
     args->out.queue_id = queue->queue_id;
     args->out.flags = 0;
     mutex_unlock(&uq_mgr->userq_mutex);
@@ -105,6 +111,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
     }
 
     mutex_lock(&uq_mgr->userq_mutex);
+    uq_mgr->userq_funcs[queue->queue_type]->mqd_destroy(uq_mgr, queue);
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
     mutex_unlock(&uq_mgr->userq_mutex);
     kfree(queue);
@@ -135,6 +142,19 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
     return r;
 }
 
+extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
+
+static void
+amdgpu_userqueue_setup_ip_funcs(struct amdgpu_userq_mgr *uq_mgr)
+{
+    int maj;
+    struct amdgpu_device *adev = uq_mgr->adev;
+    uint32_t version = adev->ip_versions[GC_HWIP][0];
+
+    maj = IP_VERSION_MAJ(version);
+    if (maj == 11)
+        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
+}
 
 int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
 {
@@ -142,6 +162,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
     idr_init_base(&userq_mgr->userq_idr, 1);
     userq_mgr->adev = adev;
 
+    amdgpu_userqueue_setup_ip_funcs(userq_mgr);
     return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
new file mode 100644
index 000000000000..12e1a785b65a
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+#include "amdgpu_userqueue.h"
+
+static int
+amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
+    struct amdgpu_mqd *gfx_v11_mqd = &adev->mqds[queue->queue_type];
+    int size = gfx_v11_mqd->mqd_size;
+    int r;
+
+    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_GTT,
+                                &mqd->obj,
+                                &mqd->gpu_addr,
+                                &mqd->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
+        return r;
+    }
+
+    memset(mqd->cpu_ptr, 0, size);
+    r = amdgpu_bo_reserve(mqd->obj, false);
+    if (unlikely(r != 0)) {
+        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
+        goto free_mqd;
+    }
+
+    queue->userq_prop.use_doorbell = true;
+    queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
+    r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
+    amdgpu_bo_unreserve(mqd->obj);
+    if (r) {
+        DRM_ERROR("Failed to init MQD for queue\n");
+        goto free_mqd;
+    }
+
+    DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
+    return 0;
+
+free_mqd:
+    amdgpu_bo_free_kernel(&mqd->obj,
+			   &mqd->gpu_addr,
+			   &mqd->cpu_ptr);
+   return r;
+}
+
+static void
+amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
+
+    amdgpu_bo_free_kernel(&mqd->obj,
+			   &mqd->gpu_addr,
+			   &mqd->cpu_ptr);
+}
+
+const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
+    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
+    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
+};
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 7625a862b1fc..2911c88d0fed 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -27,6 +27,12 @@
 #include "amdgpu.h"
 #define AMDGPU_MAX_USERQ 512
 
+struct amdgpu_userq_ctx_space {
+	struct amdgpu_bo *obj;
+	uint64_t gpu_addr;
+	void *cpu_ptr;
+};
+
 struct amdgpu_usermode_queue {
 	int queue_id;
 	int queue_type;
@@ -35,6 +41,7 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_vm *vm;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_mqd_prop userq_prop;
+	struct amdgpu_userq_ctx_space mqd;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (3 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-03-30 21:23   ` Alex Deucher
  2023-04-04 16:24   ` Luben Tuikov
  2023-03-29 16:04 ` [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct Shashank Sharma
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig,
	Shashank Sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

The FW expects us to allocate atleast one page as context space to
process gang, process, shadow, GDS and FW  related work. This patch
creates a joint object for the same, and calculates GPU space offsets
for each of these spaces.

V1: Addressed review comments on RFC patch:
    Alex: Make this function IP specific

V2: Addressed review comments from Christian
    - Allocate only one object for total FW space, and calculate
      offsets for each of these objects.

V3: Integration with doorbell manager

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  1 +
 .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 60 ++++++++++++++++++-
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 +++
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 052c2c1e8aed..5672efcbcffc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -71,6 +71,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
     queue->userq_prop.queue_size = mqd_in->queue_size;
 
     queue->doorbell_handle = mqd_in->doorbell_handle;
+    queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;
     queue->queue_type = mqd_in->ip_type;
     queue->flags = mqd_in->flags;
     queue->vm = &fpriv->vm;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
index 12e1a785b65a..52de96727f98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
@@ -23,6 +23,51 @@
 #include "amdgpu.h"
 #include "amdgpu_userqueue.h"
 
+#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
+#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
+
+static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+                                                 struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
+    int r, size;
+
+    /*
+     * The FW expects atleast one page space allocated for
+     * process ctx, gang ctx, gds ctx, fw ctx and shadow ctx each.
+     */
+    size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ +
+           AMDGPU_USERQ_FW_CTX_SZ + AMDGPU_USERQ_GDS_CTX_SZ;
+    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
+                                AMDGPU_GEM_DOMAIN_GTT,
+                                &ctx->obj,
+                                &ctx->gpu_addr,
+                                &ctx->cpu_ptr);
+    if (r) {
+        DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
+        return r;
+    }
+
+    queue->proc_ctx_gpu_addr = ctx->gpu_addr;
+    queue->gang_ctx_gpu_addr = queue->proc_ctx_gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
+    queue->fw_ctx_gpu_addr = queue->gang_ctx_gpu_addr + AMDGPU_USERQ_GANG_CTX_SZ;
+    queue->gds_ctx_gpu_addr = queue->fw_ctx_gpu_addr + AMDGPU_USERQ_FW_CTX_SZ;
+    return 0;
+}
+
+static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+                                                   struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
+
+    amdgpu_bo_free_kernel(&ctx->obj,
+                          &ctx->gpu_addr,
+                          &ctx->cpu_ptr);
+}
+
 static int
 amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
 {
@@ -43,10 +88,17 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
     }
 
     memset(mqd->cpu_ptr, 0, size);
+
+    r = amdgpu_userq_gfx_v11_create_ctx_space(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to create CTX space for userqueue (%d)\n", r);
+        goto free_mqd;
+    }
+
     r = amdgpu_bo_reserve(mqd->obj, false);
     if (unlikely(r != 0)) {
         DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
-        goto free_mqd;
+        goto free_ctx;
     }
 
     queue->userq_prop.use_doorbell = true;
@@ -55,12 +107,15 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
     amdgpu_bo_unreserve(mqd->obj);
     if (r) {
         DRM_ERROR("Failed to init MQD for queue\n");
-        goto free_mqd;
+        goto free_ctx;
     }
 
     DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
     return 0;
 
+free_ctx:
+    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
+
 free_mqd:
     amdgpu_bo_free_kernel(&mqd->obj,
 			   &mqd->gpu_addr,
@@ -73,6 +128,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
 {
     struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
 
+    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
     amdgpu_bo_free_kernel(&mqd->obj,
 			   &mqd->gpu_addr,
 			   &mqd->cpu_ptr);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 2911c88d0fed..8b62ef77cd26 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -38,10 +38,17 @@ struct amdgpu_usermode_queue {
 	int queue_type;
 	uint64_t flags;
 	uint64_t doorbell_handle;
+	uint64_t proc_ctx_gpu_addr;
+	uint64_t gang_ctx_gpu_addr;
+	uint64_t gds_ctx_gpu_addr;
+	uint64_t fw_ctx_gpu_addr;
+	uint64_t shadow_ctx_gpu_addr;
+
 	struct amdgpu_vm *vm;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct amdgpu_mqd_prop userq_prop;
 	struct amdgpu_userq_ctx_space mqd;
+	struct amdgpu_userq_ctx_space fw_space;
 };
 
 struct amdgpu_userq_funcs {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (4 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-03-30 21:25   ` Alex Deucher
  2023-03-29 16:04 ` [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Christian Koenig, Arvind Yadav,
	Shashank Sharma

From: Arvind Yadav <arvind.yadav@amd.com>

This patch:
- adds some new parameters defined for the gfx usermode queues
  use cases in the v11_mqd_struct.
- sets those parametes with the respective allocated gpu context
  space addresses.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 21 ++++++++++++++++++-
 drivers/gpu/drm/amd/include/v11_structs.h     | 16 +++++++-------
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
index 52de96727f98..39e90ea32fcb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
@@ -22,6 +22,7 @@
  */
 #include "amdgpu.h"
 #include "amdgpu_userqueue.h"
+#include "v11_structs.h"
 
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
@@ -68,6 +69,22 @@ static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_m
                           &ctx->cpu_ptr);
 }
 
+static void
+amdgpu_userq_set_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
+                           struct amdgpu_usermode_queue *queue)
+{
+    struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
+
+    mqd->shadow_base_lo = queue->shadow_ctx_gpu_addr & 0xfffffffc;
+    mqd->shadow_base_hi = upper_32_bits(queue->shadow_ctx_gpu_addr);
+
+    mqd->gds_bkup_base_lo = queue->gds_ctx_gpu_addr & 0xfffffffc;
+    mqd->gds_bkup_base_hi = upper_32_bits(queue->gds_ctx_gpu_addr);
+
+    mqd->fw_work_area_base_lo = queue->fw_ctx_gpu_addr & 0xfffffffc;
+    mqd->fw_work_area_base_lo = upper_32_bits(queue->fw_ctx_gpu_addr);
+}
+
 static int
 amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
 {
@@ -104,12 +121,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
     queue->userq_prop.use_doorbell = true;
     queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
     r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
-    amdgpu_bo_unreserve(mqd->obj);
     if (r) {
+        amdgpu_bo_unreserve(mqd->obj);
         DRM_ERROR("Failed to init MQD for queue\n");
         goto free_ctx;
     }
 
+    amdgpu_userq_set_ctx_space(uq_mgr, queue);
+    amdgpu_bo_unreserve(mqd->obj);
     DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
     return 0;
 
diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
index b8ff7456ae0b..f8008270f813 100644
--- a/drivers/gpu/drm/amd/include/v11_structs.h
+++ b/drivers/gpu/drm/amd/include/v11_structs.h
@@ -25,14 +25,14 @@
 #define V11_STRUCTS_H_
 
 struct v11_gfx_mqd {
-	uint32_t reserved_0; // offset: 0  (0x0)
-	uint32_t reserved_1; // offset: 1  (0x1)
-	uint32_t reserved_2; // offset: 2  (0x2)
-	uint32_t reserved_3; // offset: 3  (0x3)
-	uint32_t reserved_4; // offset: 4  (0x4)
-	uint32_t reserved_5; // offset: 5  (0x5)
-	uint32_t reserved_6; // offset: 6  (0x6)
-	uint32_t reserved_7; // offset: 7  (0x7)
+	uint32_t shadow_base_lo; // offset: 0  (0x0)
+	uint32_t shadow_base_hi; // offset: 1  (0x1)
+	uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
+	uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
+	uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
+	uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
+	uint32_t shadow_initialized; // offset: 6  (0x6)
+	uint32_t ib_vmid; // offset: 7  (0x7)
 	uint32_t reserved_8; // offset: 8  (0x8)
 	uint32_t reserved_9; // offset: 9  (0x9)
 	uint32_t reserved_10; // offset: 10  (0xA)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (5 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-04-04 16:30   ` Luben Tuikov
  2023-03-29 16:04 ` [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig,
	Shashank Sharma

From: Shashank Sharma <contactshashanksharma@gmail.com>

This patch adds new functions to map/unmap a usermode queue into
the FW, using the MES ring. As soon as this mapping is done, the
queue would  be considered ready to accept the workload.

V1: Addressed review comments from Alex on the RFC patch series
    - Map/Unmap should be IP specific.
V2:
    Addressed review comments from Christian:
    - Fix the wptr_mc_addr calculation (moved into another patch)
    Addressed review comments from Alex:
    - Do not add fptrs for map/unmap

V3: Integration with doorbell manager

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
index 39e90ea32fcb..1627641a4a4e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
@@ -23,12 +23,73 @@
 #include "amdgpu.h"
 #include "amdgpu_userqueue.h"
 #include "v11_structs.h"
+#include "amdgpu_mes.h"
 
 #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
 #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
 
+static int
+amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
+                         struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct mes_add_queue_input queue_input;
+    int r;
+
+    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
+
+    queue_input.process_va_start = 0;
+    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
+    queue_input.process_quantum = 100000; /* 10ms */
+    queue_input.gang_quantum = 10000; /* 1ms */
+    queue_input.paging = false;
+
+    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
+    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
+    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
+
+    queue_input.process_id = queue->vm->pasid;
+    queue_input.queue_type = queue->queue_type;
+    queue_input.mqd_addr = queue->mqd.gpu_addr;
+    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
+    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
+    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
+    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+
+    amdgpu_mes_lock(&adev->mes);
+    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
+    amdgpu_mes_unlock(&adev->mes);
+    if (r) {
+        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
+        return r;
+    }
+
+    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
+    return 0;
+}
+
+static void
+amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
+                           struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_device *adev = uq_mgr->adev;
+    struct mes_remove_queue_input queue_input;
+    int r;
+
+    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
+    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
+    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
+
+    amdgpu_mes_lock(&adev->mes);
+    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
+    amdgpu_mes_unlock(&adev->mes);
+    if (r)
+        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
+}
+
 static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
                                                  struct amdgpu_usermode_queue *queue)
 {
@@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
 
     amdgpu_userq_set_ctx_space(uq_mgr, queue);
     amdgpu_bo_unreserve(mqd->obj);
+
+    /* Map the queue in HW using MES ring */
+    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
+    if (r) {
+        DRM_ERROR("Failed to map userqueue (%d)\n", r);
+        goto free_ctx;
+    }
+
     DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
     return 0;
 
@@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
 {
     struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
 
+    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
     amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
     amdgpu_bo_free_kernel(&mqd->obj,
 			   &mqd->gpu_addr,
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (6 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-04-10  0:00   ` Bas Nieuwenhuizen
  2023-03-29 16:04 ` [PATCH v3 9/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
  2023-04-10  0:36 ` [PATCH v3 0/9] AMDGPU Usermode queues Bas Nieuwenhuizen
  9 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Christian Koenig, Arvind Yadav,
	Shashank Sharma

From: Arvind Yadav <arvind.yadav@amd.com>

To support oversubscription, MES expects WPTR BOs to be mapped
to GART, before they are submitted to usermode queues.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 89 +++++++++++++++++++
 .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |  1 +
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
 3 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 5672efcbcffc..7409a4ae55da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -43,6 +43,89 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
     return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+static int
+amdgpu_userqueue_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+{
+    int ret;
+
+    ret = amdgpu_bo_reserve(bo, true);
+    if (ret) {
+        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
+        goto err_reserve_bo_failed;
+    }
+
+    ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
+    if (ret) {
+        DRM_ERROR("Failed to pin bo. ret %d\n", ret);
+        goto err_pin_bo_failed;
+    }
+
+    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
+    if (ret) {
+        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
+        goto err_map_bo_gart_failed;
+    }
+
+
+    amdgpu_bo_unreserve(bo);
+    bo = amdgpu_bo_ref(bo);
+
+    return 0;
+
+err_map_bo_gart_failed:
+    amdgpu_bo_unpin(bo);
+err_pin_bo_failed:
+    amdgpu_bo_unreserve(bo);
+err_reserve_bo_failed:
+
+    return ret;
+}
+
+
+static int
+amdgpu_userqueue_create_wptr_mapping(struct amdgpu_device *adev,
+				     struct drm_file *filp,
+				     struct amdgpu_usermode_queue *queue)
+{
+    struct amdgpu_bo_va_mapping *wptr_mapping;
+    struct amdgpu_vm *wptr_vm;
+    struct amdgpu_bo *wptr_bo = NULL;
+    uint64_t wptr = queue->userq_prop.wptr_gpu_addr;
+    int ret;
+
+    wptr_vm = queue->vm;
+    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
+    if (ret)
+        goto err_wptr_map_gart;
+
+    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
+    amdgpu_bo_unreserve(wptr_vm->root.bo);
+    if (!wptr_mapping) {
+        DRM_ERROR("Failed to lookup wptr bo\n");
+        ret = -EINVAL;
+        goto err_wptr_map_gart;
+    }
+
+    wptr_bo = wptr_mapping->bo_va->base.bo;
+    if (wptr_bo->tbo.base.size > PAGE_SIZE) {
+        DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
+        ret = -EINVAL;
+        goto err_wptr_map_gart;
+    }
+
+    ret = amdgpu_userqueue_map_gtt_bo_to_gart(adev, wptr_bo);
+    if (ret) {
+        DRM_ERROR("Failed to map wptr bo to GART\n");
+        goto err_wptr_map_gart;
+    }
+
+    queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
+    return 0;
+
+err_wptr_map_gart:
+    return ret;
+}
+
 static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
 {
     struct amdgpu_usermode_queue *queue;
@@ -82,6 +165,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
         goto free_queue;
     }
 
+    r = amdgpu_userqueue_create_wptr_mapping(uq_mgr->adev, filp, queue);
+    if (r) {
+        DRM_ERROR("Failed to map WPTR (0x%llx) for userqueue\n", queue->userq_prop.wptr_gpu_addr);
+        goto free_queue;
+    }
+
     r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
     if (r) {
         DRM_ERROR("Failed to create/map userqueue MQD\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
index 1627641a4a4e..274e78826334 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
@@ -58,6 +58,7 @@ amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
     queue_input.queue_size = queue->userq_prop.queue_size >> 2;
     queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
     queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
+    queue_input.wptr_mc_addr = queue->wptr_mc_addr;
 
     amdgpu_mes_lock(&adev->mes);
     r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index 8b62ef77cd26..eaab7cf5fff6 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -38,6 +38,7 @@ struct amdgpu_usermode_queue {
 	int queue_type;
 	uint64_t flags;
 	uint64_t doorbell_handle;
+	uint64_t wptr_mc_addr;
 	uint64_t proc_ctx_gpu_addr;
 	uint64_t gang_ctx_gpu_addr;
 	uint64_t gds_ctx_gpu_addr;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v3 9/9] drm/amdgpu: generate doorbell index for userqueue
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (7 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2023-03-29 16:04 ` Shashank Sharma
  2023-04-10  0:36 ` [PATCH v3 0/9] AMDGPU Usermode queues Bas Nieuwenhuizen
  9 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 16:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, Shashank Sharma

The userspace sends us the doorbell object and the doobell index
to be used for the usermode queue, but the FW expects the absolute
doorbell index on the PCI BAR in the MQD. This patch adds a function
to convert this relative doorbell index to the absolute doorbell index.

This patch is dependent on the doorbell manager series being reviewed
here: https://patchwork.freedesktop.org/series/115802/

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 7409a4ae55da..fd4a2ca3302d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -43,6 +43,32 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
     return idr_find(&uq_mgr->userq_idr, qid);
 }
 
+static uint64_t
+amdgpu_userqueue_get_doorbell_index(struct amdgpu_userq_mgr *uq_mgr,
+                                    struct amdgpu_usermode_queue *queue,
+                                    struct drm_file *filp,
+                                    uint32_t doorbell_index)
+{
+    struct drm_gem_object *gobj;
+    struct amdgpu_bo *db_bo;
+    uint64_t index;
+
+    gobj = drm_gem_object_lookup(filp, queue->doorbell_handle);
+    if (gobj == NULL) {
+        DRM_ERROR("Can't find GEM object for doorbell\n");
+        return -EINVAL;
+    }
+
+    db_bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
+    drm_gem_object_put(gobj);
+
+    index = amdgpu_doorbell_index_on_bar(uq_mgr->adev, db_bo, doorbell_index);
+
+    DRM_DEBUG_DRIVER("[Usermode queues] doorbell index=%lld\n", index);
+
+    return index;
+}
+
 static int
 amdgpu_userqueue_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
 {
@@ -132,6 +158,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
     struct amdgpu_fpriv *fpriv = filp->driver_priv;
     struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
     struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
+    uint64_t index;
     int r;
 
     /* Do we have support userqueues for this IP ? */
@@ -154,6 +181,14 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
     queue->userq_prop.queue_size = mqd_in->queue_size;
 
     queue->doorbell_handle = mqd_in->doorbell_handle;
+    index = amdgpu_userqueue_get_doorbell_index(uq_mgr, queue, filp, mqd_in->doorbell_offset);
+    if (index == (uint64_t)-EINVAL) {
+        DRM_ERROR("Invalid doorbell object\n");
+        r = -EINVAL;
+        goto free_queue;
+    }
+
+    queue->userq_prop.doorbell_index = index;
     queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;
     queue->queue_type = mqd_in->ip_type;
     queue->flags = mqd_in->flags;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-29 16:04 ` [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
@ 2023-03-29 17:25   ` Christian König
  2023-03-29 17:57   ` Alex Deucher
       [not found]   ` <71fc098c-c0cb-3097-4e11-c2d9bd9b4783@damsy.net>
  2 siblings, 0 replies; 56+ messages in thread
From: Christian König @ 2023-03-29 17:25 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx; +Cc: Alex Deucher, Felix Kuehling

Am 29.03.23 um 18:04 schrieb Shashank Sharma:
> From: Alex Deucher <alexander.deucher@amd.com>
>
> This patch intorduces new UAPI/IOCTL for usermode graphics
> queue. The userspace app will fill this structure and request
> the graphics driver to add a graphics work queue for it. The
> output of this UAPI is a queue id.
>
> This UAPI maps the queue into GPU, so the graphics app can start
> submitting work to the queue as soon as the call returns.
>
> V2: Addressed review comments from Alex and Christian
>      - Make the doorbell offset's comment clearer
>      - Change the output parameter name to queue_id
> V3: Integration with doorbell manager
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
>   1 file changed, 55 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index cc5d551abda5..e4943099b9d2 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -54,6 +54,7 @@ extern "C" {
>   #define DRM_AMDGPU_VM			0x13
>   #define DRM_AMDGPU_FENCE_TO_HANDLE	0x14
>   #define DRM_AMDGPU_SCHED		0x15
> +#define DRM_AMDGPU_USERQ		0x16
>   
>   #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>   #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -71,6 +72,7 @@ extern "C" {
>   #define DRM_IOCTL_AMDGPU_VM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>   #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> +#define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>   
>   /**
>    * DOC: memory domains
> @@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
>   	union drm_amdgpu_ctx_out out;
>   };
>   
> +/* user queue IOCTL */
> +#define AMDGPU_USERQ_OP_CREATE	1
> +#define AMDGPU_USERQ_OP_FREE	2
> +
> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE	(1 << 0)
> +#define AMDGPU_USERQ_MQD_FLAGS_AQL	(1 << 1)
> +
> +struct drm_amdgpu_userq_mqd {
> +	/** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> +	__u32	flags;
> +	/** IP type: AMDGPU_HW_IP_* */
> +	__u32	ip_type;
> +	/** GEM object handle */
> +	__u32   doorbell_handle;
> +	/** Doorbell's offset in the doorbell bo */
> +	__u32   doorbell_offset;
> +	/** GPU virtual address of the queue */
> +	__u64   queue_va;
> +	/** Size of the queue in bytes */
> +	__u64   queue_size;
> +	/** GPU virtual address of the rptr */
> +	__u64   rptr_va;
> +	/** GPU virtual address of the wptr */
> +	__u64   wptr_va;
> +	/** GPU virtual address of the shadow context space */
> +	__u64	shadow_va;
> +};
> +
> +struct drm_amdgpu_userq_in {
> +	/** AMDGPU_USERQ_OP_* */
> +	__u32	op;
> +	/** Flags */
> +	__u32	flags;
> +	/** Queue handle to associate the queue free call with,
> +	 * unused for queue create calls */
> +	__u32	queue_id;
> +	__u32	pad;
> +	/** Queue descriptor */
> +	struct drm_amdgpu_userq_mqd mqd;
> +};
> +
> +struct drm_amdgpu_userq_out {
> +	/** Queue handle */
> +	__u32	queue_id;
> +	/** Flags */
> +	__u32	flags;
> +};
> +
> +union drm_amdgpu_userq {
> +	struct drm_amdgpu_userq_in in;
> +	struct drm_amdgpu_userq_out out;
> +};
> +
>   /* vm ioctl */
>   #define AMDGPU_VM_OP_RESERVE_VMID	1
>   #define AMDGPU_VM_OP_UNRESERVE_VMID	2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-29 16:04 ` [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
  2023-03-29 17:25   ` Christian König
@ 2023-03-29 17:57   ` Alex Deucher
  2023-03-29 19:21     ` Shashank Sharma
       [not found]   ` <71fc098c-c0cb-3097-4e11-c2d9bd9b4783@damsy.net>
  2 siblings, 1 reply; 56+ messages in thread
From: Alex Deucher @ 2023-03-29 17:57 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Alex Deucher <alexander.deucher@amd.com>
>
> This patch intorduces new UAPI/IOCTL for usermode graphics
> queue. The userspace app will fill this structure and request
> the graphics driver to add a graphics work queue for it. The
> output of this UAPI is a queue id.
>
> This UAPI maps the queue into GPU, so the graphics app can start
> submitting work to the queue as soon as the call returns.
>
> V2: Addressed review comments from Alex and Christian
>     - Make the doorbell offset's comment clearer
>     - Change the output parameter name to queue_id
> V3: Integration with doorbell manager
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index cc5d551abda5..e4943099b9d2 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -54,6 +54,7 @@ extern "C" {
>  #define DRM_AMDGPU_VM                  0x13
>  #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>  #define DRM_AMDGPU_SCHED               0x15
> +#define DRM_AMDGPU_USERQ               0x16
>
>  #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>  #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -71,6 +72,7 @@ extern "C" {
>  #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>  #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>  #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>
>  /**
>   * DOC: memory domains
> @@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
>         union drm_amdgpu_ctx_out out;
>  };
>
> +/* user queue IOCTL */
> +#define AMDGPU_USERQ_OP_CREATE 1
> +#define AMDGPU_USERQ_OP_FREE   2
> +
> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> +
> +struct drm_amdgpu_userq_mqd {
> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> +       __u32   flags;
> +       /** IP type: AMDGPU_HW_IP_* */
> +       __u32   ip_type;
> +       /** GEM object handle */
> +       __u32   doorbell_handle;
> +       /** Doorbell's offset in the doorbell bo */
> +       __u32   doorbell_offset;
> +       /** GPU virtual address of the queue */
> +       __u64   queue_va;
> +       /** Size of the queue in bytes */
> +       __u64   queue_size;
> +       /** GPU virtual address of the rptr */
> +       __u64   rptr_va;
> +       /** GPU virtual address of the wptr */
> +       __u64   wptr_va;
> +       /** GPU virtual address of the shadow context space */
> +       __u64   shadow_va;
> +};

We may want to make the MQD engine specific.  E.g., shadow is gfx
specific.  We also probably need the csa and gds buffers for gfx as
well.  Other engines may have their own additional buffer
requirements.

Alex


> +
> +struct drm_amdgpu_userq_in {
> +       /** AMDGPU_USERQ_OP_* */
> +       __u32   op;
> +       /** Flags */
> +       __u32   flags;
> +       /** Queue handle to associate the queue free call with,
> +        * unused for queue create calls */
> +       __u32   queue_id;
> +       __u32   pad;
> +       /** Queue descriptor */
> +       struct drm_amdgpu_userq_mqd mqd;
> +};
> +
> +struct drm_amdgpu_userq_out {
> +       /** Queue handle */
> +       __u32   queue_id;
> +       /** Flags */
> +       __u32   flags;
> +};
> +
> +union drm_amdgpu_userq {
> +       struct drm_amdgpu_userq_in in;
> +       struct drm_amdgpu_userq_out out;
> +};
> +
>  /* vm ioctl */
>  #define AMDGPU_VM_OP_RESERVE_VMID      1
>  #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-29 17:57   ` Alex Deucher
@ 2023-03-29 19:21     ` Shashank Sharma
  2023-03-29 19:46       ` Alex Deucher
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-03-29 19:21 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, amd-gfx

Hey Alex,

On 29/03/2023 19:57, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> This patch intorduces new UAPI/IOCTL for usermode graphics
>> queue. The userspace app will fill this structure and request
>> the graphics driver to add a graphics work queue for it. The
>> output of this UAPI is a queue id.
>>
>> This UAPI maps the queue into GPU, so the graphics app can start
>> submitting work to the queue as soon as the call returns.
>>
>> V2: Addressed review comments from Alex and Christian
>>      - Make the doorbell offset's comment clearer
>>      - Change the output parameter name to queue_id
>> V3: Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 55 insertions(+)
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index cc5d551abda5..e4943099b9d2 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -54,6 +54,7 @@ extern "C" {
>>   #define DRM_AMDGPU_VM                  0x13
>>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>   #define DRM_AMDGPU_SCHED               0x15
>> +#define DRM_AMDGPU_USERQ               0x16
>>
>>   #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>> @@ -71,6 +72,7 @@ extern "C" {
>>   #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>   #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>
>>   /**
>>    * DOC: memory domains
>> @@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
>>          union drm_amdgpu_ctx_out out;
>>   };
>>
>> +/* user queue IOCTL */
>> +#define AMDGPU_USERQ_OP_CREATE 1
>> +#define AMDGPU_USERQ_OP_FREE   2
>> +
>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>> +
>> +struct drm_amdgpu_userq_mqd {
>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>> +       __u32   flags;
>> +       /** IP type: AMDGPU_HW_IP_* */
>> +       __u32   ip_type;
>> +       /** GEM object handle */
>> +       __u32   doorbell_handle;
>> +       /** Doorbell's offset in the doorbell bo */
>> +       __u32   doorbell_offset;
>> +       /** GPU virtual address of the queue */
>> +       __u64   queue_va;
>> +       /** Size of the queue in bytes */
>> +       __u64   queue_size;
>> +       /** GPU virtual address of the rptr */
>> +       __u64   rptr_va;
>> +       /** GPU virtual address of the wptr */
>> +       __u64   wptr_va;
>> +       /** GPU virtual address of the shadow context space */
>> +       __u64   shadow_va;
>> +};
> We may want to make the MQD engine specific.  E.g., shadow is gfx
> specific.  We also probably need the csa and gds buffers for gfx as
> well.  Other engines may have their own additional buffer
> requirements.
>
> Alex

Sure, we can call it drm_amdgpu_userq_mqd_gfx to clarify that this MQD 
is specific to GFX engine.

- Shashank

>
>
>> +
>> +struct drm_amdgpu_userq_in {
>> +       /** AMDGPU_USERQ_OP_* */
>> +       __u32   op;
>> +       /** Flags */
>> +       __u32   flags;
>> +       /** Queue handle to associate the queue free call with,
>> +        * unused for queue create calls */
>> +       __u32   queue_id;
>> +       __u32   pad;
>> +       /** Queue descriptor */
>> +       struct drm_amdgpu_userq_mqd mqd;
>> +};
>> +
>> +struct drm_amdgpu_userq_out {
>> +       /** Queue handle */
>> +       __u32   queue_id;
>> +       /** Flags */
>> +       __u32   flags;
>> +};
>> +
>> +union drm_amdgpu_userq {
>> +       struct drm_amdgpu_userq_in in;
>> +       struct drm_amdgpu_userq_out out;
>> +};
>> +
>>   /* vm ioctl */
>>   #define AMDGPU_VM_OP_RESERVE_VMID      1
>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-29 19:21     ` Shashank Sharma
@ 2023-03-29 19:46       ` Alex Deucher
  2023-03-30  6:13         ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Deucher @ 2023-03-29 19:46 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 3:21 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> Hey Alex,
>
> On 29/03/2023 19:57, Alex Deucher wrote:
> > On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> > <shashank.sharma@amd.com> wrote:
> >> From: Alex Deucher <alexander.deucher@amd.com>
> >>
> >> This patch intorduces new UAPI/IOCTL for usermode graphics
> >> queue. The userspace app will fill this structure and request
> >> the graphics driver to add a graphics work queue for it. The
> >> output of this UAPI is a queue id.
> >>
> >> This UAPI maps the queue into GPU, so the graphics app can start
> >> submitting work to the queue as soon as the call returns.
> >>
> >> V2: Addressed review comments from Alex and Christian
> >>      - Make the doorbell offset's comment clearer
> >>      - Change the output parameter name to queue_id
> >> V3: Integration with doorbell manager
> >>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: Christian Koenig <christian.koenig@amd.com>
> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >> ---
> >>   include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
> >>   1 file changed, 55 insertions(+)
> >>
> >> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> >> index cc5d551abda5..e4943099b9d2 100644
> >> --- a/include/uapi/drm/amdgpu_drm.h
> >> +++ b/include/uapi/drm/amdgpu_drm.h
> >> @@ -54,6 +54,7 @@ extern "C" {
> >>   #define DRM_AMDGPU_VM                  0x13
> >>   #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
> >>   #define DRM_AMDGPU_SCHED               0x15
> >> +#define DRM_AMDGPU_USERQ               0x16
> >>
> >>   #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
> >>   #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> >> @@ -71,6 +72,7 @@ extern "C" {
> >>   #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
> >>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
> >>   #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
> >> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
> >>
> >>   /**
> >>    * DOC: memory domains
> >> @@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
> >>          union drm_amdgpu_ctx_out out;
> >>   };
> >>
> >> +/* user queue IOCTL */
> >> +#define AMDGPU_USERQ_OP_CREATE 1
> >> +#define AMDGPU_USERQ_OP_FREE   2
> >> +
> >> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
> >> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
> >> +
> >> +struct drm_amdgpu_userq_mqd {
> >> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
> >> +       __u32   flags;
> >> +       /** IP type: AMDGPU_HW_IP_* */
> >> +       __u32   ip_type;
> >> +       /** GEM object handle */
> >> +       __u32   doorbell_handle;
> >> +       /** Doorbell's offset in the doorbell bo */
> >> +       __u32   doorbell_offset;
> >> +       /** GPU virtual address of the queue */
> >> +       __u64   queue_va;
> >> +       /** Size of the queue in bytes */
> >> +       __u64   queue_size;
> >> +       /** GPU virtual address of the rptr */
> >> +       __u64   rptr_va;
> >> +       /** GPU virtual address of the wptr */
> >> +       __u64   wptr_va;
> >> +       /** GPU virtual address of the shadow context space */
> >> +       __u64   shadow_va;
> >> +};
> > We may want to make the MQD engine specific.  E.g., shadow is gfx
> > specific.  We also probably need the csa and gds buffers for gfx as
> > well.  Other engines may have their own additional buffer
> > requirements.
> >
> > Alex
>
> Sure, we can call it drm_amdgpu_userq_mqd_gfx to clarify that this MQD
> is specific to GFX engine.

We can make it a union and then add additional entries for SDMA,
compute, and VCN.  We should also move the IP type into struct
drm_amdgpu_userq_in so we know how to interpret the union.  Or make it
a u64 and handle it similarly to the chunks interface in
drm_amdgpu_cs_chunk.

Alex

>
> - Shashank
>
> >
> >
> >> +
> >> +struct drm_amdgpu_userq_in {
> >> +       /** AMDGPU_USERQ_OP_* */
> >> +       __u32   op;
> >> +       /** Flags */
> >> +       __u32   flags;
> >> +       /** Queue handle to associate the queue free call with,
> >> +        * unused for queue create calls */
> >> +       __u32   queue_id;
> >> +       __u32   pad;
> >> +       /** Queue descriptor */
> >> +       struct drm_amdgpu_userq_mqd mqd;
> >> +};
> >> +
> >> +struct drm_amdgpu_userq_out {
> >> +       /** Queue handle */
> >> +       __u32   queue_id;
> >> +       /** Flags */
> >> +       __u32   flags;
> >> +};
> >> +
> >> +union drm_amdgpu_userq {
> >> +       struct drm_amdgpu_userq_in in;
> >> +       struct drm_amdgpu_userq_out out;
> >> +};
> >> +
> >>   /* vm ioctl */
> >>   #define AMDGPU_VM_OP_RESERVE_VMID      1
> >>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2
> >> --
> >> 2.40.0
> >>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-29 19:46       ` Alex Deucher
@ 2023-03-30  6:13         ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-30  6:13 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, amd-gfx


On 29/03/2023 21:46, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 3:21 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> Hey Alex,
>>
>> On 29/03/2023 19:57, Alex Deucher wrote:
>>> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
>>> <shashank.sharma@amd.com> wrote:
>>>> From: Alex Deucher <alexander.deucher@amd.com>
>>>>
>>>> This patch intorduces new UAPI/IOCTL for usermode graphics
>>>> queue. The userspace app will fill this structure and request
>>>> the graphics driver to add a graphics work queue for it. The
>>>> output of this UAPI is a queue id.
>>>>
>>>> This UAPI maps the queue into GPU, so the graphics app can start
>>>> submitting work to the queue as soon as the call returns.
>>>>
>>>> V2: Addressed review comments from Alex and Christian
>>>>       - Make the doorbell offset's comment clearer
>>>>       - Change the output parameter name to queue_id
>>>> V3: Integration with doorbell manager
>>>>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>>    include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 55 insertions(+)
>>>>
>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>>>> index cc5d551abda5..e4943099b9d2 100644
>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>    #define DRM_AMDGPU_VM                  0x13
>>>>    #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>    #define DRM_AMDGPU_SCHED               0x15
>>>> +#define DRM_AMDGPU_USERQ               0x16
>>>>
>>>>    #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>>>    #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>    #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>    #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>>>    #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>> +#define DRM_IOCTL_AMDGPU_USERQ         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>>>
>>>>    /**
>>>>     * DOC: memory domains
>>>> @@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
>>>>           union drm_amdgpu_ctx_out out;
>>>>    };
>>>>
>>>> +/* user queue IOCTL */
>>>> +#define AMDGPU_USERQ_OP_CREATE 1
>>>> +#define AMDGPU_USERQ_OP_FREE   2
>>>> +
>>>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE  (1 << 0)
>>>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL     (1 << 1)
>>>> +
>>>> +struct drm_amdgpu_userq_mqd {
>>>> +       /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>>>> +       __u32   flags;
>>>> +       /** IP type: AMDGPU_HW_IP_* */
>>>> +       __u32   ip_type;
>>>> +       /** GEM object handle */
>>>> +       __u32   doorbell_handle;
>>>> +       /** Doorbell's offset in the doorbell bo */
>>>> +       __u32   doorbell_offset;
>>>> +       /** GPU virtual address of the queue */
>>>> +       __u64   queue_va;
>>>> +       /** Size of the queue in bytes */
>>>> +       __u64   queue_size;
>>>> +       /** GPU virtual address of the rptr */
>>>> +       __u64   rptr_va;
>>>> +       /** GPU virtual address of the wptr */
>>>> +       __u64   wptr_va;
>>>> +       /** GPU virtual address of the shadow context space */
>>>> +       __u64   shadow_va;
>>>> +};
>>> We may want to make the MQD engine specific.  E.g., shadow is gfx
>>> specific.  We also probably need the csa and gds buffers for gfx as
>>> well.  Other engines may have their own additional buffer
>>> requirements.
>>>
>>> Alex
>> Sure, we can call it drm_amdgpu_userq_mqd_gfx to clarify that this MQD
>> is specific to GFX engine.
> We can make it a union and then add additional entries for SDMA,
> compute, and VCN.  We should also move the IP type into struct
> drm_amdgpu_userq_in so we know how to interpret the union.

I was exactly thinking to do this :), it would be a small change.

Please have a look at rest of the series as well considering this done.

- Shashank

>    Or make it
> a u64 and handle it similarly to the chunks interface in
> drm_amdgpu_cs_chunk.
> Alex
>
>> - Shashank
>>
>>>
>>>> +
>>>> +struct drm_amdgpu_userq_in {
>>>> +       /** AMDGPU_USERQ_OP_* */
>>>> +       __u32   op;
>>>> +       /** Flags */
>>>> +       __u32   flags;
>>>> +       /** Queue handle to associate the queue free call with,
>>>> +        * unused for queue create calls */
>>>> +       __u32   queue_id;
>>>> +       __u32   pad;
>>>> +       /** Queue descriptor */
>>>> +       struct drm_amdgpu_userq_mqd mqd;
>>>> +};
>>>> +
>>>> +struct drm_amdgpu_userq_out {
>>>> +       /** Queue handle */
>>>> +       __u32   queue_id;
>>>> +       /** Flags */
>>>> +       __u32   flags;
>>>> +};
>>>> +
>>>> +union drm_amdgpu_userq {
>>>> +       struct drm_amdgpu_userq_in in;
>>>> +       struct drm_amdgpu_userq_out out;
>>>> +};
>>>> +
>>>>    /* vm ioctl */
>>>>    #define AMDGPU_VM_OP_RESERVE_VMID      1
>>>>    #define AMDGPU_VM_OP_UNRESERVE_VMID    2
>>>> --
>>>> 2.40.0
>>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
       [not found]   ` <71fc098c-c0cb-3097-4e11-c2d9bd9b4783@damsy.net>
@ 2023-03-30  8:15     ` Shashank Sharma
  2023-03-30 10:40       ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-03-30  8:15 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Christian Koenig

Hello Pierre-Eric,

Thanks for your review, my comments inline.


On 30/03/2023 10:02, Pierre-Eric Pelloux-Prayer wrote:
> Hi Shashank,
>
> On 29/03/2023 18:04, Shashank Sharma wrote:
>> From: Alex Deucher <alexander.deucher@amd.com>
>>
>> This patch intorduces new UAPI/IOCTL for usermode graphics
>> queue. The userspace app will fill this structure and request
>> the graphics driver to add a graphics work queue for it. The
>> output of this UAPI is a queue id.
>>
>> This UAPI maps the queue into GPU, so the graphics app can start
>> submitting work to the queue as soon as the call returns.
>>
>> V2: Addressed review comments from Alex and Christian
>>      - Make the doorbell offset's comment clearer
>>      - Change the output parameter name to queue_id
>> V3: Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   include/uapi/drm/amdgpu_drm.h | 55 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 55 insertions(+)
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h 
>> b/include/uapi/drm/amdgpu_drm.h
>> index cc5d551abda5..e4943099b9d2 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -54,6 +54,7 @@ extern "C" {
>>   #define DRM_AMDGPU_VM            0x13
>>   #define DRM_AMDGPU_FENCE_TO_HANDLE    0x14
>>   #define DRM_AMDGPU_SCHED        0x15
>> +#define DRM_AMDGPU_USERQ        0x16
>>     #define DRM_IOCTL_AMDGPU_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>   #define DRM_IOCTL_AMDGPU_GEM_MMAP    DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>> @@ -71,6 +72,7 @@ extern "C" {
>>   #define DRM_IOCTL_AMDGPU_VM        DRM_IOWR(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>   #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE 
>> + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>   #define DRM_IOCTL_AMDGPU_SCHED        DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>> +#define DRM_IOCTL_AMDGPU_USERQ        DRM_IOW(DRM_COMMAND_BASE + 
>> DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>>     /**
>>    * DOC: memory domains
>> @@ -307,6 +309,59 @@ union drm_amdgpu_ctx {
>>       union drm_amdgpu_ctx_out out;
>>   };
>>   +/* user queue IOCTL */
>> +#define AMDGPU_USERQ_OP_CREATE    1
>> +#define AMDGPU_USERQ_OP_FREE    2
>> +
>> +#define AMDGPU_USERQ_MQD_FLAGS_SECURE    (1 << 0)
>> +#define AMDGPU_USERQ_MQD_FLAGS_AQL    (1 << 1)
>
> What is the purpose of these flags?
> Could you add some documentation?
Noted,
>
>> +
>> +struct drm_amdgpu_userq_mqd {
>> +    /** Flags: AMDGPU_USERQ_MQD_FLAGS_* */
>> +    __u32    flags;
>> +    /** IP type: AMDGPU_HW_IP_* */
>> +    __u32    ip_type;
>> +    /** GEM object handle */
>> +    __u32   doorbell_handle;
>> +    /** Doorbell's offset in the doorbell bo */
>> +    __u32   doorbell_offset;
>> +    /** GPU virtual address of the queue */
>> +    __u64   queue_va;
>> +    /** Size of the queue in bytes */
>> +    __u64   queue_size;
>> +    /** GPU virtual address of the rptr */
>> +    __u64   rptr_va;
>> +    /** GPU virtual address of the wptr */
>> +    __u64   wptr_va;
>> +    /** GPU virtual address of the shadow context space */
>> +    __u64    shadow_va;
>> +};
>
> The comments inside drm_amdgpu_userq_mqd could be improved.
> Here are some questions I have looking at the API:
> * what is a doorbell in this context?
> * what's the purpose of the doorbell offset?
> * how UMD should size each buffer? I assume doorbell, rptr, wptr
>   have min size requirements?
We are planning to add all these in a detailed doc in form of 
cover-letter/text comment in the follow-up libDRM UAPI version, as 
discussed internally.
>
> I'm also wondering why the doorbell needs a handle+offset but
> other buffers are passed in as virtual addresses?
>
As you know, doorbell offset here will be an relative offset in this 
doorbell page, but the MQD needs the absolute offset on the doorbell PCI 
BAR.

So kernel needs both the object as well as relative offset to calculate 
absolute offset.

something like: absolute offset = base offset of this doorbell page + 
relative offset of this doorbell.

>> +
>> +struct drm_amdgpu_userq_in {
>> +    /** AMDGPU_USERQ_OP_* */
>> +    __u32    op;
>> +    /** Flags */
>> +    __u32    flags;
>
> What are these flags?

We have kept these flags to indicate special conditions like secure 
display, encryption etc. We will start utilizing these once we have the

base code (this series) merged up.

>
>> +    /** Queue handle to associate the queue free call with,
>> +     * unused for queue create calls */
>> +    __u32    queue_id;
>> +    __u32    pad;
>> +    /** Queue descriptor */
>> +    struct drm_amdgpu_userq_mqd mqd;
>> +};
>
> I'm not familiar with ioctl design but would a union work to
> identify the parameters of each operation?
>
> union {
>    struct {
>       struct drm_amdgpu_userq_mqd mqd;
>    } create;
>    struct {
>       __u32 queue_id;
>       __u32 pad;
>    } free;
> };


I think it might work. I can check this out.

>
>> +
>> +struct drm_amdgpu_userq_out {
>> +    /** Queue handle */
>> +    __u32    queue_id;
>> +    /** Flags */
>> +    __u32    flags;
>
> What are these flags?
>
These are not utilized yet. We have kept these flags to indicate special 
out conditions like over subscription/failure due to no more queue slot 
etc.

Noted to be covered in the doc as well.

- Shashank


> Thanks,
> Pierre-Eric
>
>> +};
>> +
>> +union drm_amdgpu_userq {
>> +    struct drm_amdgpu_userq_in in;
>> +    struct drm_amdgpu_userq_out out;
>> +};
>> +
>>   /* vm ioctl */
>>   #define AMDGPU_VM_OP_RESERVE_VMID    1
>>   #define AMDGPU_VM_OP_UNRESERVE_VMID    2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-30  8:15     ` Shashank Sharma
@ 2023-03-30 10:40       ` Christian König
  2023-03-30 15:08         ` Alex Deucher
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2023-03-30 10:40 UTC (permalink / raw)
  To: Shashank Sharma, Pierre-Eric Pelloux-Prayer, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Christian Koenig

Am 30.03.23 um 10:15 schrieb Shashank Sharma:
> Hello Pierre-Eric,
> [SNIP]
>> I'm also wondering why the doorbell needs a handle+offset but
>> other buffers are passed in as virtual addresses?
>>
> As you know, doorbell offset here will be an relative offset in this 
> doorbell page, but the MQD needs the absolute offset on the doorbell 
> PCI BAR.
>
> So kernel needs both the object as well as relative offset to 
> calculate absolute offset.
>
> something like: absolute offset = base offset of this doorbell page + 
> relative offset of this doorbell.

Another much more obvious reason is that the doorbell doesn't have a 
virtual address.

At least for GFX the doorbell is used to signal to the hw that new 
commands are available. So as long as we don't want a shader to kick of 
other work we don't need to map the doorbell into the GPUVM address space.

Christian.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management
  2023-03-30 10:40       ` Christian König
@ 2023-03-30 15:08         ` Alex Deucher
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Deucher @ 2023-03-30 15:08 UTC (permalink / raw)
  To: Christian König
  Cc: Pierre-Eric Pelloux-Prayer, Shashank Sharma, Felix Kuehling,
	amd-gfx, Alex Deucher, Christian Koenig

On Thu, Mar 30, 2023 at 6:40 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 30.03.23 um 10:15 schrieb Shashank Sharma:
> > Hello Pierre-Eric,
> > [SNIP]
> >> I'm also wondering why the doorbell needs a handle+offset but
> >> other buffers are passed in as virtual addresses?
> >>
> > As you know, doorbell offset here will be an relative offset in this
> > doorbell page, but the MQD needs the absolute offset on the doorbell
> > PCI BAR.
> >
> > So kernel needs both the object as well as relative offset to
> > calculate absolute offset.
> >
> > something like: absolute offset = base offset of this doorbell page +
> > relative offset of this doorbell.
>
> Another much more obvious reason is that the doorbell doesn't have a
> virtual address.
>
> At least for GFX the doorbell is used to signal to the hw that new
> commands are available. So as long as we don't want a shader to kick of
> other work we don't need to map the doorbell into the GPUVM address space.

This certainly could be done if the app wanted to, but to set up the
queue in the firmware API we need the index within the doorbell BAR.

Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/9] drm/amdgpu: add usermode queue base code
  2023-03-29 16:04 ` [PATCH v3 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
@ 2023-03-30 21:15   ` Alex Deucher
  2023-03-31  8:52     ` Shashank Sharma
  2023-04-04 16:05   ` Luben Tuikov
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Deucher @ 2023-03-30 21:15 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This patch adds skeleton code for amdgpu usermode queue. It contains:
> - A new files with init functions of usermode queues.
> - A queue context manager in driver private data.
>
> V1: Worked on design review comments from RFC patch series:
> (https://patchwork.freedesktop.org/series/112214/)
> - Alex: Keep a list of queues, instead of single queue per process.
> - Christian: Use the queue manager instead of global ptrs,
>            Don't keep the queue structure in amdgpu_ctx
>
> V2:
>  - Reformatted code, split the big patch into two
>
> V3:
> - Integration with doorbell manager
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    | 49 +++++++++++++++++++
>  6 files changed, 106 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 204665f20319..2d90ba618e5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -210,6 +210,8 @@ amdgpu-y += \
>  # add amdkfd interfaces
>  amdgpu-y += amdgpu_amdkfd.o
>
> +# add usermode queue
> +amdgpu-y += amdgpu_userqueue.o
>
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 6b74df446694..c5f9af0e74ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -438,6 +438,14 @@ struct amdgpu_sa_manager {
>         uint32_t                align;
>  };
>
> +/* Gfx usermode queues */
> +struct amdgpu_userq_mgr {
> +       struct idr userq_idr;
> +       struct mutex userq_mutex;
> +       struct amdgpu_device *adev;
> +       const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];

These function pointers can be per device rather than per userq_mgr.
Just hang them off of adev and then each IP can fill them in during
their init functions.

Alex

> +};
> +
>  /* sub-allocation buffer */
>  struct amdgpu_sa_bo {
>         struct list_head                olist;
> @@ -470,7 +478,6 @@ struct amdgpu_flip_work {
>         bool                            async;
>  };
>
> -
>  /*
>   * file private structure
>   */
> @@ -482,6 +489,7 @@ struct amdgpu_fpriv {
>         struct mutex            bo_list_lock;
>         struct idr              bo_list_handles;
>         struct amdgpu_ctx_mgr   ctx_mgr;
> +       struct amdgpu_userq_mgr userq_mgr;
>  };
>
>  int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b4f2d61ea0d5..2d6bcfd727c8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -52,6 +52,7 @@
>  #include "amdgpu_ras.h"
>  #include "amdgpu_xgmi.h"
>  #include "amdgpu_reset.h"
> +#include "amdgpu_userqueue.h"
>
>  /*
>   * KMS wrapper.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 7aa7e52ca784..b16b8155a157 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -43,6 +43,7 @@
>  #include "amdgpu_gem.h"
>  #include "amdgpu_display.h"
>  #include "amdgpu_ras.h"
> +#include "amdgpu_userqueue.h"
>
>  void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
>  {
> @@ -1187,6 +1188,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>
>         amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>
> +       r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
> +       if (r)
> +               DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
> +
>         file_priv->driver_priv = fpriv;
>         goto out_suspend;
>
> @@ -1254,6 +1259,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>
>         amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>         amdgpu_vm_fini(adev, &fpriv->vm);
> +       amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>
>         if (pasid)
>                 amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> new file mode 100644
> index 000000000000..13e1eebc1cb6
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu.h"
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> +{
> +    mutex_init(&userq_mgr->userq_mutex);
> +    idr_init_base(&userq_mgr->userq_idr, 1);
> +    userq_mgr->adev = adev;
> +
> +    return 0;
> +}
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
> +{
> +    idr_destroy(&userq_mgr->userq_idr);
> +    mutex_destroy(&userq_mgr->userq_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> new file mode 100644
> index 000000000000..7eeb8c9e6575
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -0,0 +1,49 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef AMDGPU_USERQUEUE_H_
> +#define AMDGPU_USERQUEUE_H_
> +
> +#include "amdgpu.h"
> +#define AMDGPU_MAX_USERQ 512
> +
> +struct amdgpu_usermode_queue {
> +       int queue_id;
> +       int queue_type;
> +       uint64_t flags;
> +       uint64_t doorbell_handle;
> +       struct amdgpu_vm *vm;
> +       struct amdgpu_userq_mgr *userq_mgr;
> +       struct amdgpu_mqd_prop userq_prop;
> +};
> +
> +struct amdgpu_userq_funcs {
> +       int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +};
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> +
> +#endif
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue
  2023-03-29 16:04 ` [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue Shashank Sharma
@ 2023-03-30 21:18   ` Alex Deucher
  2023-03-31  8:49     ` Shashank Sharma
  2023-04-04 16:21   ` Luben Tuikov
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Deucher @ 2023-03-30 21:18 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Felix Kuehling, Arvind Yadav, amd-gfx, Alex Deucher,
	Shashank Sharma, Christian Koenig

On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
> context. As MQD format can vary between different graphics IPs, we need gfx
> GEN specific handlers to create MQDs.
>
> This patch:
> - Introduces MQD hander functions for the usermode queues.
> - Adds new functions to create and destroy MQD for GFX-GEN-11-IP
>
> V1: Worked on review comments from Alex:
>     - Make MQD functions GEN and IP specific
>
> V2: Worked on review comments from Alex:
>     - Reuse the existing adev->mqd[ip] for MQD creation
>     - Formatting and arrangement of code
>
> V3:
>     - Integration with doorbell manager
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 21 +++++
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 84 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>  4 files changed, 113 insertions(+)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 2d90ba618e5d..2cc7897de7e6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>
>  # add usermode queue
>  amdgpu-y += amdgpu_userqueue.o
> +amdgpu-y += amdgpu_userqueue_gfx_v11.o
>
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 353f57c5a772..052c2c1e8aed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -81,6 +81,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>          goto free_queue;
>      }
>
> +    r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create/map userqueue MQD\n");
> +        goto free_queue;
> +    }
> +
>      args->out.queue_id = queue->queue_id;
>      args->out.flags = 0;
>      mutex_unlock(&uq_mgr->userq_mutex);
> @@ -105,6 +111,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>      }
>
>      mutex_lock(&uq_mgr->userq_mutex);
> +    uq_mgr->userq_funcs[queue->queue_type]->mqd_destroy(uq_mgr, queue);
>      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>      mutex_unlock(&uq_mgr->userq_mutex);
>      kfree(queue);
> @@ -135,6 +142,19 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>      return r;
>  }
>
> +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
> +
> +static void
> +amdgpu_userqueue_setup_ip_funcs(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +    int maj;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> +
> +    maj = IP_VERSION_MAJ(version);
> +    if (maj == 11)
> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
> +}

These can be per device and done in each IP's init code.

>
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>  {
> @@ -142,6 +162,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>      idr_init_base(&userq_mgr->userq_idr, 1);
>      userq_mgr->adev = adev;
>
> +    amdgpu_userqueue_setup_ip_funcs(userq_mgr);
>      return 0;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> new file mode 100644
> index 000000000000..12e1a785b65a
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -0,0 +1,84 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include "amdgpu.h"
> +#include "amdgpu_userqueue.h"
> +
> +static int
> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
> +    struct amdgpu_mqd *gfx_v11_mqd = &adev->mqds[queue->queue_type];
> +    int size = gfx_v11_mqd->mqd_size;
> +    int r;
> +
> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_GTT,
> +                                &mqd->obj,
> +                                &mqd->gpu_addr,
> +                                &mqd->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
> +        return r;
> +    }
> +
> +    memset(mqd->cpu_ptr, 0, size);
> +    r = amdgpu_bo_reserve(mqd->obj, false);
> +    if (unlikely(r != 0)) {
> +        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
> +        goto free_mqd;
> +    }
> +
> +    queue->userq_prop.use_doorbell = true;
> +    queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
> +    r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
> +    amdgpu_bo_unreserve(mqd->obj);
> +    if (r) {
> +        DRM_ERROR("Failed to init MQD for queue\n");
> +        goto free_mqd;
> +    }
> +
> +    DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
> +    return 0;
> +
> +free_mqd:
> +    amdgpu_bo_free_kernel(&mqd->obj,
> +                          &mqd->gpu_addr,
> +                          &mqd->cpu_ptr);
> +   return r;
> +}
> +
> +static void
> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
> +
> +    amdgpu_bo_free_kernel(&mqd->obj,
> +                          &mqd->gpu_addr,
> +                          &mqd->cpu_ptr);
> +}
> +
> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> +};

We can just stick these in gfx_v11_0.c.  No need for a new file.

Alex

> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 7625a862b1fc..2911c88d0fed 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -27,6 +27,12 @@
>  #include "amdgpu.h"
>  #define AMDGPU_MAX_USERQ 512
>
> +struct amdgpu_userq_ctx_space {
> +       struct amdgpu_bo *obj;
> +       uint64_t gpu_addr;
> +       void *cpu_ptr;
> +};
> +
>  struct amdgpu_usermode_queue {
>         int queue_id;
>         int queue_type;
> @@ -35,6 +41,7 @@ struct amdgpu_usermode_queue {
>         struct amdgpu_vm *vm;
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_mqd_prop userq_prop;
> +       struct amdgpu_userq_ctx_space mqd;
>  };
>
>  struct amdgpu_userq_funcs {
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue
  2023-03-29 16:04 ` [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue Shashank Sharma
@ 2023-03-30 21:23   ` Alex Deucher
  2023-03-31  8:42     ` Shashank Sharma
  2023-04-04 16:24   ` Luben Tuikov
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Deucher @ 2023-03-30 21:23 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> The FW expects us to allocate atleast one page as context space to
> process gang, process, shadow, GDS and FW  related work. This patch
> creates a joint object for the same, and calculates GPU space offsets
> for each of these spaces.

The shadow bo, at least, should come from user space since userspace
will want to mess with it to optimize it's register handling at least
for gfx.  The gds and csa could also come from userspace.  That would
simplify things.  The UMD would just specify them in the MQD
descriptor for GFX in the IOCTL.  We could allocate them in the
kernel, but then we'd need to make sure they were mapped into the
GPUVM space for the UMD,  That could get pretty big if they have a lot
of queues.

Alex

>
> V1: Addressed review comments on RFC patch:
>     Alex: Make this function IP specific
>
> V2: Addressed review comments from Christian
>     - Allocate only one object for total FW space, and calculate
>       offsets for each of these objects.
>
> V3: Integration with doorbell manager
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  1 +
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 60 ++++++++++++++++++-
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 +++
>  3 files changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 052c2c1e8aed..5672efcbcffc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -71,6 +71,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>      queue->userq_prop.queue_size = mqd_in->queue_size;
>
>      queue->doorbell_handle = mqd_in->doorbell_handle;
> +    queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;
>      queue->queue_type = mqd_in->ip_type;
>      queue->flags = mqd_in->flags;
>      queue->vm = &fpriv->vm;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> index 12e1a785b65a..52de96727f98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -23,6 +23,51 @@
>  #include "amdgpu.h"
>  #include "amdgpu_userqueue.h"
>
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
> +
> +static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                                 struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
> +    int r, size;
> +
> +    /*
> +     * The FW expects atleast one page space allocated for
> +     * process ctx, gang ctx, gds ctx, fw ctx and shadow ctx each.
> +     */
> +    size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ +
> +           AMDGPU_USERQ_FW_CTX_SZ + AMDGPU_USERQ_GDS_CTX_SZ;
> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_GTT,
> +                                &ctx->obj,
> +                                &ctx->gpu_addr,
> +                                &ctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> +        return r;
> +    }
> +
> +    queue->proc_ctx_gpu_addr = ctx->gpu_addr;
> +    queue->gang_ctx_gpu_addr = queue->proc_ctx_gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
> +    queue->fw_ctx_gpu_addr = queue->gang_ctx_gpu_addr + AMDGPU_USERQ_GANG_CTX_SZ;
> +    queue->gds_ctx_gpu_addr = queue->fw_ctx_gpu_addr + AMDGPU_USERQ_FW_CTX_SZ;
> +    return 0;
> +}
> +
> +static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                                   struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
> +
> +    amdgpu_bo_free_kernel(&ctx->obj,
> +                          &ctx->gpu_addr,
> +                          &ctx->cpu_ptr);
> +}
> +
>  static int
>  amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>  {
> @@ -43,10 +88,17 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>      }
>
>      memset(mqd->cpu_ptr, 0, size);
> +
> +    r = amdgpu_userq_gfx_v11_create_ctx_space(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create CTX space for userqueue (%d)\n", r);
> +        goto free_mqd;
> +    }
> +
>      r = amdgpu_bo_reserve(mqd->obj, false);
>      if (unlikely(r != 0)) {
>          DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
> -        goto free_mqd;
> +        goto free_ctx;
>      }
>
>      queue->userq_prop.use_doorbell = true;
> @@ -55,12 +107,15 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>      amdgpu_bo_unreserve(mqd->obj);
>      if (r) {
>          DRM_ERROR("Failed to init MQD for queue\n");
> -        goto free_mqd;
> +        goto free_ctx;
>      }
>
>      DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>      return 0;
>
> +free_ctx:
> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
> +
>  free_mqd:
>      amdgpu_bo_free_kernel(&mqd->obj,
>                            &mqd->gpu_addr,
> @@ -73,6 +128,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>  {
>      struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>
> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>      amdgpu_bo_free_kernel(&mqd->obj,
>                            &mqd->gpu_addr,
>                            &mqd->cpu_ptr);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 2911c88d0fed..8b62ef77cd26 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -38,10 +38,17 @@ struct amdgpu_usermode_queue {
>         int queue_type;
>         uint64_t flags;
>         uint64_t doorbell_handle;
> +       uint64_t proc_ctx_gpu_addr;
> +       uint64_t gang_ctx_gpu_addr;
> +       uint64_t gds_ctx_gpu_addr;
> +       uint64_t fw_ctx_gpu_addr;
> +       uint64_t shadow_ctx_gpu_addr;
> +
>         struct amdgpu_vm *vm;
>         struct amdgpu_userq_mgr *userq_mgr;
>         struct amdgpu_mqd_prop userq_prop;
>         struct amdgpu_userq_ctx_space mqd;
> +       struct amdgpu_userq_ctx_space fw_space;
>  };
>
>  struct amdgpu_userq_funcs {
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct
  2023-03-29 16:04 ` [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct Shashank Sharma
@ 2023-03-30 21:25   ` Alex Deucher
  2023-03-31  6:39     ` Yadav, Arvind
  2023-03-31  8:30     ` Shashank Sharma
  0 siblings, 2 replies; 56+ messages in thread
From: Alex Deucher @ 2023-03-30 21:25 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Arvind Yadav, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
> From: Arvind Yadav <arvind.yadav@amd.com>
>
> This patch:
> - adds some new parameters defined for the gfx usermode queues
>   use cases in the v11_mqd_struct.
> - sets those parametes with the respective allocated gpu context
>   space addresses.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 21 ++++++++++++++++++-
>  drivers/gpu/drm/amd/include/v11_structs.h     | 16 +++++++-------
>  2 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> index 52de96727f98..39e90ea32fcb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -22,6 +22,7 @@
>   */
>  #include "amdgpu.h"
>  #include "amdgpu_userqueue.h"
> +#include "v11_structs.h"
>
>  #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>  #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> @@ -68,6 +69,22 @@ static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_m
>                            &ctx->cpu_ptr);
>  }
>
> +static void
> +amdgpu_userq_set_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                           struct amdgpu_usermode_queue *queue)
> +{
> +    struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
> +
> +    mqd->shadow_base_lo = queue->shadow_ctx_gpu_addr & 0xfffffffc;
> +    mqd->shadow_base_hi = upper_32_bits(queue->shadow_ctx_gpu_addr);
> +
> +    mqd->gds_bkup_base_lo = queue->gds_ctx_gpu_addr & 0xfffffffc;
> +    mqd->gds_bkup_base_hi = upper_32_bits(queue->gds_ctx_gpu_addr);
> +
> +    mqd->fw_work_area_base_lo = queue->fw_ctx_gpu_addr & 0xfffffffc;
> +    mqd->fw_work_area_base_lo = upper_32_bits(queue->fw_ctx_gpu_addr);
> +}
> +
>  static int
>  amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>  {
> @@ -104,12 +121,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>      queue->userq_prop.use_doorbell = true;
>      queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
>      r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
> -    amdgpu_bo_unreserve(mqd->obj);
>      if (r) {
> +        amdgpu_bo_unreserve(mqd->obj);
>          DRM_ERROR("Failed to init MQD for queue\n");
>          goto free_ctx;
>      }
>
> +    amdgpu_userq_set_ctx_space(uq_mgr, queue);
> +    amdgpu_bo_unreserve(mqd->obj);
>      DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>      return 0;
>
> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
> index b8ff7456ae0b..f8008270f813 100644
> --- a/drivers/gpu/drm/amd/include/v11_structs.h
> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
> @@ -25,14 +25,14 @@
>  #define V11_STRUCTS_H_
>
>  struct v11_gfx_mqd {
> -       uint32_t reserved_0; // offset: 0  (0x0)
> -       uint32_t reserved_1; // offset: 1  (0x1)
> -       uint32_t reserved_2; // offset: 2  (0x2)
> -       uint32_t reserved_3; // offset: 3  (0x3)
> -       uint32_t reserved_4; // offset: 4  (0x4)
> -       uint32_t reserved_5; // offset: 5  (0x5)
> -       uint32_t reserved_6; // offset: 6  (0x6)
> -       uint32_t reserved_7; // offset: 7  (0x7)
> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
> +       uint32_t shadow_initialized; // offset: 6  (0x6)
> +       uint32_t ib_vmid; // offset: 7  (0x7)
>         uint32_t reserved_8; // offset: 8  (0x8)
>         uint32_t reserved_9; // offset: 9  (0x9)
>         uint32_t reserved_10; // offset: 10  (0xA)

We should split this hunk out as a separate patch and upstream it now.

Alex

> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct
  2023-03-30 21:25   ` Alex Deucher
@ 2023-03-31  6:39     ` Yadav, Arvind
  2023-03-31  8:30     ` Shashank Sharma
  1 sibling, 0 replies; 56+ messages in thread
From: Yadav, Arvind @ 2023-03-31  6:39 UTC (permalink / raw)
  To: Alex Deucher, Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Arvind Yadav, Christian Koenig, amd-gfx


On 3/31/2023 2:55 AM, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Arvind Yadav <arvind.yadav@amd.com>
>>
>> This patch:
>> - adds some new parameters defined for the gfx usermode queues
>>    use cases in the v11_mqd_struct.
>> - sets those parametes with the respective allocated gpu context
>>    space addresses.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 21 ++++++++++++++++++-
>>   drivers/gpu/drm/amd/include/v11_structs.h     | 16 +++++++-------
>>   2 files changed, 28 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> index 52de96727f98..39e90ea32fcb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -22,6 +22,7 @@
>>    */
>>   #include "amdgpu.h"
>>   #include "amdgpu_userqueue.h"
>> +#include "v11_structs.h"
>>
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>> @@ -68,6 +69,22 @@ static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_m
>>                             &ctx->cpu_ptr);
>>   }
>>
>> +static void
>> +amdgpu_userq_set_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                           struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
>> +
>> +    mqd->shadow_base_lo = queue->shadow_ctx_gpu_addr & 0xfffffffc;
>> +    mqd->shadow_base_hi = upper_32_bits(queue->shadow_ctx_gpu_addr);
>> +
>> +    mqd->gds_bkup_base_lo = queue->gds_ctx_gpu_addr & 0xfffffffc;
>> +    mqd->gds_bkup_base_hi = upper_32_bits(queue->gds_ctx_gpu_addr);
>> +
>> +    mqd->fw_work_area_base_lo = queue->fw_ctx_gpu_addr & 0xfffffffc;
>> +    mqd->fw_work_area_base_lo = upper_32_bits(queue->fw_ctx_gpu_addr);
>> +}
>> +
>>   static int
>>   amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>   {
>> @@ -104,12 +121,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>       queue->userq_prop.use_doorbell = true;
>>       queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
>>       r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
>> -    amdgpu_bo_unreserve(mqd->obj);
>>       if (r) {
>> +        amdgpu_bo_unreserve(mqd->obj);
>>           DRM_ERROR("Failed to init MQD for queue\n");
>>           goto free_ctx;
>>       }
>>
>> +    amdgpu_userq_set_ctx_space(uq_mgr, queue);
>> +    amdgpu_bo_unreserve(mqd->obj);
>>       DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>       return 0;
>>
>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
>> index b8ff7456ae0b..f8008270f813 100644
>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
>> @@ -25,14 +25,14 @@
>>   #define V11_STRUCTS_H_
>>
>>   struct v11_gfx_mqd {
>> -       uint32_t reserved_0; // offset: 0  (0x0)
>> -       uint32_t reserved_1; // offset: 1  (0x1)
>> -       uint32_t reserved_2; // offset: 2  (0x2)
>> -       uint32_t reserved_3; // offset: 3  (0x3)
>> -       uint32_t reserved_4; // offset: 4  (0x4)
>> -       uint32_t reserved_5; // offset: 5  (0x5)
>> -       uint32_t reserved_6; // offset: 6  (0x6)
>> -       uint32_t reserved_7; // offset: 7  (0x7)
>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
>> +       uint32_t ib_vmid; // offset: 7  (0x7)
>>          uint32_t reserved_8; // offset: 8  (0x8)
>>          uint32_t reserved_9; // offset: 9  (0x9)
>>          uint32_t reserved_10; // offset: 10  (0xA)
> We should split this hunk out as a separate patch and upstream it now.

Sure, we will send this as a separate patch.

~arvind

> Alex
>
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct
  2023-03-30 21:25   ` Alex Deucher
  2023-03-31  6:39     ` Yadav, Arvind
@ 2023-03-31  8:30     ` Shashank Sharma
  1 sibling, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-31  8:30 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Alex Deucher, Felix Kuehling, Arvind Yadav, Christian Koenig, amd-gfx


On 30/03/2023 23:25, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Arvind Yadav <arvind.yadav@amd.com>
>>
>> This patch:
>> - adds some new parameters defined for the gfx usermode queues
>>    use cases in the v11_mqd_struct.
>> - sets those parametes with the respective allocated gpu context
>>    space addresses.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 21 ++++++++++++++++++-
>>   drivers/gpu/drm/amd/include/v11_structs.h     | 16 +++++++-------
>>   2 files changed, 28 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> index 52de96727f98..39e90ea32fcb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -22,6 +22,7 @@
>>    */
>>   #include "amdgpu.h"
>>   #include "amdgpu_userqueue.h"
>> +#include "v11_structs.h"
>>
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>> @@ -68,6 +69,22 @@ static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_m
>>                             &ctx->cpu_ptr);
>>   }
>>
>> +static void
>> +amdgpu_userq_set_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                           struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct v11_gfx_mqd *mqd = queue->mqd.cpu_ptr;
>> +
>> +    mqd->shadow_base_lo = queue->shadow_ctx_gpu_addr & 0xfffffffc;
>> +    mqd->shadow_base_hi = upper_32_bits(queue->shadow_ctx_gpu_addr);
>> +
>> +    mqd->gds_bkup_base_lo = queue->gds_ctx_gpu_addr & 0xfffffffc;
>> +    mqd->gds_bkup_base_hi = upper_32_bits(queue->gds_ctx_gpu_addr);
>> +
>> +    mqd->fw_work_area_base_lo = queue->fw_ctx_gpu_addr & 0xfffffffc;
>> +    mqd->fw_work_area_base_lo = upper_32_bits(queue->fw_ctx_gpu_addr);
>> +}
>> +
>>   static int
>>   amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>   {
>> @@ -104,12 +121,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>       queue->userq_prop.use_doorbell = true;
>>       queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
>>       r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
>> -    amdgpu_bo_unreserve(mqd->obj);
>>       if (r) {
>> +        amdgpu_bo_unreserve(mqd->obj);
>>           DRM_ERROR("Failed to init MQD for queue\n");
>>           goto free_ctx;
>>       }
>>
>> +    amdgpu_userq_set_ctx_space(uq_mgr, queue);
>> +    amdgpu_bo_unreserve(mqd->obj);
>>       DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>       return 0;
>>
>> diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
>> index b8ff7456ae0b..f8008270f813 100644
>> --- a/drivers/gpu/drm/amd/include/v11_structs.h
>> +++ b/drivers/gpu/drm/amd/include/v11_structs.h
>> @@ -25,14 +25,14 @@
>>   #define V11_STRUCTS_H_
>>
>>   struct v11_gfx_mqd {
>> -       uint32_t reserved_0; // offset: 0  (0x0)
>> -       uint32_t reserved_1; // offset: 1  (0x1)
>> -       uint32_t reserved_2; // offset: 2  (0x2)
>> -       uint32_t reserved_3; // offset: 3  (0x3)
>> -       uint32_t reserved_4; // offset: 4  (0x4)
>> -       uint32_t reserved_5; // offset: 5  (0x5)
>> -       uint32_t reserved_6; // offset: 6  (0x6)
>> -       uint32_t reserved_7; // offset: 7  (0x7)
>> +       uint32_t shadow_base_lo; // offset: 0  (0x0)
>> +       uint32_t shadow_base_hi; // offset: 1  (0x1)
>> +       uint32_t gds_bkup_base_lo; // offset: 2  (0x2)
>> +       uint32_t gds_bkup_base_hi; // offset: 3  (0x3)
>> +       uint32_t fw_work_area_base_lo; // offset: 4  (0x4)
>> +       uint32_t fw_work_area_base_hi; // offset: 5  (0x5)
>> +       uint32_t shadow_initialized; // offset: 6  (0x6)
>> +       uint32_t ib_vmid; // offset: 7  (0x7)
>>          uint32_t reserved_8; // offset: 8  (0x8)
>>          uint32_t reserved_9; // offset: 9  (0x9)
>>          uint32_t reserved_10; // offset: 10  (0xA)
> We should split this hunk out as a separate patch and upstream it now.

Got it,

- Shashank

>
> Alex
>
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue
  2023-03-30 21:23   ` Alex Deucher
@ 2023-03-31  8:42     ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-31  8:42 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig, amd-gfx


On 30/03/2023 23:23, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> The FW expects us to allocate atleast one page as context space to
>> process gang, process, shadow, GDS and FW  related work. This patch
>> creates a joint object for the same, and calculates GPU space offsets
>> for each of these spaces.
> The shadow bo, at least, should come from user space since userspace
> will want to mess with it to optimize it's register handling at least
> for gfx.  The gds and csa could also come from userspace.  That would
> simplify things.  The UMD would just specify them in the MQD
> descriptor for GFX in the IOCTL.  We could allocate them in the
> kernel, but then we'd need to make sure they were mapped into the
> GPUVM space for the UMD,  That could get pretty big if they have a lot
> of queues.

The shadow space is indeed getting created in userspace for the working 
solution,

I just forgot update the commit message here.  Should I move the GDS bo to

userspace as well ?

- Shashank

>
> Alex
>
>> V1: Addressed review comments on RFC patch:
>>      Alex: Make this function IP specific
>>
>> V2: Addressed review comments from Christian
>>      - Allocate only one object for total FW space, and calculate
>>        offsets for each of these objects.
>>
>> V3: Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  1 +
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 60 ++++++++++++++++++-
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 +++
>>   3 files changed, 66 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 052c2c1e8aed..5672efcbcffc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -71,6 +71,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>>       queue->userq_prop.queue_size = mqd_in->queue_size;
>>
>>       queue->doorbell_handle = mqd_in->doorbell_handle;
>> +    queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;
>>       queue->queue_type = mqd_in->ip_type;
>>       queue->flags = mqd_in->flags;
>>       queue->vm = &fpriv->vm;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> index 12e1a785b65a..52de96727f98 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -23,6 +23,51 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_userqueue.h"
>>
>> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>> +
>> +static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                                 struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
>> +    int r, size;
>> +
>> +    /*
>> +     * The FW expects atleast one page space allocated for
>> +     * process ctx, gang ctx, gds ctx, fw ctx and shadow ctx each.
>> +     */
>> +    size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ +
>> +           AMDGPU_USERQ_FW_CTX_SZ + AMDGPU_USERQ_GDS_CTX_SZ;
>> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_GTT,
>> +                                &ctx->obj,
>> +                                &ctx->gpu_addr,
>> +                                &ctx->cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
>> +        return r;
>> +    }
>> +
>> +    queue->proc_ctx_gpu_addr = ctx->gpu_addr;
>> +    queue->gang_ctx_gpu_addr = queue->proc_ctx_gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
>> +    queue->fw_ctx_gpu_addr = queue->gang_ctx_gpu_addr + AMDGPU_USERQ_GANG_CTX_SZ;
>> +    queue->gds_ctx_gpu_addr = queue->fw_ctx_gpu_addr + AMDGPU_USERQ_FW_CTX_SZ;
>> +    return 0;
>> +}
>> +
>> +static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                                   struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
>> +
>> +    amdgpu_bo_free_kernel(&ctx->obj,
>> +                          &ctx->gpu_addr,
>> +                          &ctx->cpu_ptr);
>> +}
>> +
>>   static int
>>   amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>   {
>> @@ -43,10 +88,17 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>       }
>>
>>       memset(mqd->cpu_ptr, 0, size);
>> +
>> +    r = amdgpu_userq_gfx_v11_create_ctx_space(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create CTX space for userqueue (%d)\n", r);
>> +        goto free_mqd;
>> +    }
>> +
>>       r = amdgpu_bo_reserve(mqd->obj, false);
>>       if (unlikely(r != 0)) {
>>           DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
>> -        goto free_mqd;
>> +        goto free_ctx;
>>       }
>>
>>       queue->userq_prop.use_doorbell = true;
>> @@ -55,12 +107,15 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>       amdgpu_bo_unreserve(mqd->obj);
>>       if (r) {
>>           DRM_ERROR("Failed to init MQD for queue\n");
>> -        goto free_mqd;
>> +        goto free_ctx;
>>       }
>>
>>       DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>       return 0;
>>
>> +free_ctx:
>> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>> +
>>   free_mqd:
>>       amdgpu_bo_free_kernel(&mqd->obj,
>>                             &mqd->gpu_addr,
>> @@ -73,6 +128,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>   {
>>       struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>
>> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>       amdgpu_bo_free_kernel(&mqd->obj,
>>                             &mqd->gpu_addr,
>>                             &mqd->cpu_ptr);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 2911c88d0fed..8b62ef77cd26 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -38,10 +38,17 @@ struct amdgpu_usermode_queue {
>>          int queue_type;
>>          uint64_t flags;
>>          uint64_t doorbell_handle;
>> +       uint64_t proc_ctx_gpu_addr;
>> +       uint64_t gang_ctx_gpu_addr;
>> +       uint64_t gds_ctx_gpu_addr;
>> +       uint64_t fw_ctx_gpu_addr;
>> +       uint64_t shadow_ctx_gpu_addr;
>> +
>>          struct amdgpu_vm *vm;
>>          struct amdgpu_userq_mgr *userq_mgr;
>>          struct amdgpu_mqd_prop userq_prop;
>>          struct amdgpu_userq_ctx_space mqd;
>> +       struct amdgpu_userq_ctx_space fw_space;
>>   };
>>
>>   struct amdgpu_userq_funcs {
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue
  2023-03-30 21:18   ` Alex Deucher
@ 2023-03-31  8:49     ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-31  8:49 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Felix Kuehling, Arvind Yadav, amd-gfx, Alex Deucher,
	Shashank Sharma, Christian Koenig


On 30/03/2023 23:18, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
>> context. As MQD format can vary between different graphics IPs, we need gfx
>> GEN specific handlers to create MQDs.
>>
>> This patch:
>> - Introduces MQD hander functions for the usermode queues.
>> - Adds new functions to create and destroy MQD for GFX-GEN-11-IP
>>
>> V1: Worked on review comments from Alex:
>>      - Make MQD functions GEN and IP specific
>>
>> V2: Worked on review comments from Alex:
>>      - Reuse the existing adev->mqd[ip] for MQD creation
>>      - Formatting and arrangement of code
>>
>> V3:
>>      - Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 21 +++++
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 84 +++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>>   4 files changed, 113 insertions(+)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 2d90ba618e5d..2cc7897de7e6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>>
>>   # add usermode queue
>>   amdgpu-y += amdgpu_userqueue.o
>> +amdgpu-y += amdgpu_userqueue_gfx_v11.o
>>
>>   ifneq ($(CONFIG_HSA_AMD),)
>>   AMDKFD_PATH := ../amdkfd
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 353f57c5a772..052c2c1e8aed 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -81,6 +81,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>>           goto free_queue;
>>       }
>>
>> +    r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create/map userqueue MQD\n");
>> +        goto free_queue;
>> +    }
>> +
>>       args->out.queue_id = queue->queue_id;
>>       args->out.flags = 0;
>>       mutex_unlock(&uq_mgr->userq_mutex);
>> @@ -105,6 +111,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>>       }
>>
>>       mutex_lock(&uq_mgr->userq_mutex);
>> +    uq_mgr->userq_funcs[queue->queue_type]->mqd_destroy(uq_mgr, queue);
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       kfree(queue);
>> @@ -135,6 +142,19 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>>       return r;
>>   }
>>
>> +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
>> +
>> +static void
>> +amdgpu_userqueue_setup_ip_funcs(struct amdgpu_userq_mgr *uq_mgr)
>> +{
>> +    int maj;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
>> +
>> +    maj = IP_VERSION_MAJ(version);
>> +    if (maj == 11)
>> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
>> +}
> These can be per device and done in each IP's init code.

Agree, but as we have validated usermode queues only the gfx IP (gen 11) 
so far, we deliberately enabled only for this IP.

Once this code gets stable, we can gradually validate and add more 
engines and IPs, and then

we could also move this whole initialization code in IP->sw_init() as 
you suggested.

>
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>>   {
>> @@ -142,6 +162,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>       userq_mgr->adev = adev;
>>
>> +    amdgpu_userqueue_setup_ip_funcs(userq_mgr);
>>       return 0;
>>   }
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> new file mode 100644
>> index 000000000000..12e1a785b65a
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -0,0 +1,84 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +#include "amdgpu.h"
>> +#include "amdgpu_userqueue.h"
>> +
>> +static int
>> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>> +    struct amdgpu_mqd *gfx_v11_mqd = &adev->mqds[queue->queue_type];
>> +    int size = gfx_v11_mqd->mqd_size;
>> +    int r;
>> +
>> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_GTT,
>> +                                &mqd->obj,
>> +                                &mqd->gpu_addr,
>> +                                &mqd->cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
>> +        return r;
>> +    }
>> +
>> +    memset(mqd->cpu_ptr, 0, size);
>> +    r = amdgpu_bo_reserve(mqd->obj, false);
>> +    if (unlikely(r != 0)) {
>> +        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
>> +        goto free_mqd;
>> +    }
>> +
>> +    queue->userq_prop.use_doorbell = true;
>> +    queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
>> +    r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
>> +    amdgpu_bo_unreserve(mqd->obj);
>> +    if (r) {
>> +        DRM_ERROR("Failed to init MQD for queue\n");
>> +        goto free_mqd;
>> +    }
>> +
>> +    DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>> +    return 0;
>> +
>> +free_mqd:
>> +    amdgpu_bo_free_kernel(&mqd->obj,
>> +                          &mqd->gpu_addr,
>> +                          &mqd->cpu_ptr);
>> +   return r;
>> +}
>> +
>> +static void
>> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>> +
>> +    amdgpu_bo_free_kernel(&mqd->obj,
>> +                          &mqd->gpu_addr,
>> +                          &mqd->cpu_ptr);
>> +}
>> +
>> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
>> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
>> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
>> +};
> We can just stick these in gfx_v11_0.c.  No need for a new file.

Noted,

- Shashank

>
> Alex
>
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 7625a862b1fc..2911c88d0fed 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -27,6 +27,12 @@
>>   #include "amdgpu.h"
>>   #define AMDGPU_MAX_USERQ 512
>>
>> +struct amdgpu_userq_ctx_space {
>> +       struct amdgpu_bo *obj;
>> +       uint64_t gpu_addr;
>> +       void *cpu_ptr;
>> +};
>> +
>>   struct amdgpu_usermode_queue {
>>          int queue_id;
>>          int queue_type;
>> @@ -35,6 +41,7 @@ struct amdgpu_usermode_queue {
>>          struct amdgpu_vm *vm;
>>          struct amdgpu_userq_mgr *userq_mgr;
>>          struct amdgpu_mqd_prop userq_prop;
>> +       struct amdgpu_userq_ctx_space mqd;
>>   };
>>
>>   struct amdgpu_userq_funcs {
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/9] drm/amdgpu: add usermode queue base code
  2023-03-30 21:15   ` Alex Deucher
@ 2023-03-31  8:52     ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-03-31  8:52 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig, amd-gfx


On 30/03/2023 23:15, Alex Deucher wrote:
> On Wed, Mar 29, 2023 at 12:05 PM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch adds skeleton code for amdgpu usermode queue. It contains:
>> - A new files with init functions of usermode queues.
>> - A queue context manager in driver private data.
>>
>> V1: Worked on design review comments from RFC patch series:
>> (https://patchwork.freedesktop.org/series/112214/)
>> - Alex: Keep a list of queues, instead of single queue per process.
>> - Christian: Use the queue manager instead of global ptrs,
>>             Don't keep the queue structure in amdgpu_ctx
>>
>> V2:
>>   - Reformatted code, split the big patch into two
>>
>> V3:
>> - Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    | 49 +++++++++++++++++++
>>   6 files changed, 106 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 204665f20319..2d90ba618e5d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -210,6 +210,8 @@ amdgpu-y += \
>>   # add amdkfd interfaces
>>   amdgpu-y += amdgpu_amdkfd.o
>>
>> +# add usermode queue
>> +amdgpu-y += amdgpu_userqueue.o
>>
>>   ifneq ($(CONFIG_HSA_AMD),)
>>   AMDKFD_PATH := ../amdkfd
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 6b74df446694..c5f9af0e74ee 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -438,6 +438,14 @@ struct amdgpu_sa_manager {
>>          uint32_t                align;
>>   };
>>
>> +/* Gfx usermode queues */
>> +struct amdgpu_userq_mgr {
>> +       struct idr userq_idr;
>> +       struct mutex userq_mutex;
>> +       struct amdgpu_device *adev;
>> +       const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
> These function pointers can be per device rather than per userq_mgr.
> Just hang them off of adev and then each IP can fill them in during
> their init functions.

We kept these functions in adev in V1, and then moved them to uq_mgr due 
to one of the previous code review comment.

Which way should I go now ?

- Shashank

> Alex
>
>> +};
>> +
>>   /* sub-allocation buffer */
>>   struct amdgpu_sa_bo {
>>          struct list_head                olist;
>> @@ -470,7 +478,6 @@ struct amdgpu_flip_work {
>>          bool                            async;
>>   };
>>
>> -
>>   /*
>>    * file private structure
>>    */
>> @@ -482,6 +489,7 @@ struct amdgpu_fpriv {
>>          struct mutex            bo_list_lock;
>>          struct idr              bo_list_handles;
>>          struct amdgpu_ctx_mgr   ctx_mgr;
>> +       struct amdgpu_userq_mgr userq_mgr;
>>   };
>>
>>   int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index b4f2d61ea0d5..2d6bcfd727c8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -52,6 +52,7 @@
>>   #include "amdgpu_ras.h"
>>   #include "amdgpu_xgmi.h"
>>   #include "amdgpu_reset.h"
>> +#include "amdgpu_userqueue.h"
>>
>>   /*
>>    * KMS wrapper.
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 7aa7e52ca784..b16b8155a157 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -43,6 +43,7 @@
>>   #include "amdgpu_gem.h"
>>   #include "amdgpu_display.h"
>>   #include "amdgpu_ras.h"
>> +#include "amdgpu_userqueue.h"
>>
>>   void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
>>   {
>> @@ -1187,6 +1188,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>>
>>          amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>>
>> +       r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
>> +       if (r)
>> +               DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
>> +
>>          file_priv->driver_priv = fpriv;
>>          goto out_suspend;
>>
>> @@ -1254,6 +1259,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>
>>          amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>>          amdgpu_vm_fini(adev, &fpriv->vm);
>> +       amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>>
>>          if (pasid)
>>                  amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> new file mode 100644
>> index 000000000000..13e1eebc1cb6
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -0,0 +1,39 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include "amdgpu.h"
>> +
>> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>> +{
>> +    mutex_init(&userq_mgr->userq_mutex);
>> +    idr_init_base(&userq_mgr->userq_idr, 1);
>> +    userq_mgr->adev = adev;
>> +
>> +    return 0;
>> +}
>> +
>> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>> +{
>> +    idr_destroy(&userq_mgr->userq_idr);
>> +    mutex_destroy(&userq_mgr->userq_mutex);
>> +}
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> new file mode 100644
>> index 000000000000..7eeb8c9e6575
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -0,0 +1,49 @@
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef AMDGPU_USERQUEUE_H_
>> +#define AMDGPU_USERQUEUE_H_
>> +
>> +#include "amdgpu.h"
>> +#define AMDGPU_MAX_USERQ 512
>> +
>> +struct amdgpu_usermode_queue {
>> +       int queue_id;
>> +       int queue_type;
>> +       uint64_t flags;
>> +       uint64_t doorbell_handle;
>> +       struct amdgpu_vm *vm;
>> +       struct amdgpu_userq_mgr *userq_mgr;
>> +       struct amdgpu_mqd_prop userq_prop;
>> +};
>> +
>> +struct amdgpu_userq_funcs {
>> +       int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>> +       void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>> +};
>> +
>> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>> +
>> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
>> +
>> +#endif
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/9] drm/amdgpu: add usermode queue base code
  2023-03-29 16:04 ` [PATCH v3 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
  2023-03-30 21:15   ` Alex Deucher
@ 2023-04-04 16:05   ` Luben Tuikov
  1 sibling, 0 replies; 56+ messages in thread
From: Luben Tuikov @ 2023-04-04 16:05 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

On 2023-03-29 12:04, Shashank Sharma wrote:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
> 
> This patch adds skeleton code for amdgpu usermode queue. It contains:
> - A new files with init functions of usermode queues.
> - A queue context manager in driver private data.
> 
> V1: Worked on design review comments from RFC patch series:
> (https://patchwork.freedesktop.org/series/112214/)
> - Alex: Keep a list of queues, instead of single queue per process.
> - Christian: Use the queue manager instead of global ptrs,
>            Don't keep the queue structure in amdgpu_ctx
> 
> V2:
>  - Reformatted code, split the big patch into two
> 
> V3:
> - Integration with doorbell manager
> 
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  6 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    | 49 +++++++++++++++++++
>  6 files changed, 106 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 204665f20319..2d90ba618e5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -210,6 +210,8 @@ amdgpu-y += \
>  # add amdkfd interfaces
>  amdgpu-y += amdgpu_amdkfd.o
>  
> +# add usermode queue
> +amdgpu-y += amdgpu_userqueue.o
>  
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 6b74df446694..c5f9af0e74ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -438,6 +438,14 @@ struct amdgpu_sa_manager {
>  	uint32_t		align;
>  };
>  
> +/* Gfx usermode queues */
> +struct amdgpu_userq_mgr {
> +	struct idr userq_idr;
> +	struct mutex userq_mutex;
> +	struct amdgpu_device *adev;
> +	const struct amdgpu_userq_funcs *userq_funcs[AMDGPU_HW_IP_NUM];
> +};
> +

Could you align the member names to the largest member's column,
as opposed to having only a single space in between?

It makes it so much more readable.

>  /* sub-allocation buffer */
>  struct amdgpu_sa_bo {
>  	struct list_head		olist;
> @@ -470,7 +478,6 @@ struct amdgpu_flip_work {
>  	bool				async;
>  };
>  
> -
>  /*
>   * file private structure
>   */
> @@ -482,6 +489,7 @@ struct amdgpu_fpriv {
>  	struct mutex		bo_list_lock;
>  	struct idr		bo_list_handles;
>  	struct amdgpu_ctx_mgr	ctx_mgr;
> +	struct amdgpu_userq_mgr	userq_mgr;
>  };

Like here, and pretty much the rest of the kernel code.

>  
>  int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b4f2d61ea0d5..2d6bcfd727c8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -52,6 +52,7 @@
>  #include "amdgpu_ras.h"
>  #include "amdgpu_xgmi.h"
>  #include "amdgpu_reset.h"
> +#include "amdgpu_userqueue.h"
>  
>  /*
>   * KMS wrapper.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 7aa7e52ca784..b16b8155a157 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -43,6 +43,7 @@
>  #include "amdgpu_gem.h"
>  #include "amdgpu_display.h"
>  #include "amdgpu_ras.h"
> +#include "amdgpu_userqueue.h"
>  
>  void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
>  {
> @@ -1187,6 +1188,10 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>  
>  	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr, adev);
>  
> +	r = amdgpu_userq_mgr_init(&fpriv->userq_mgr, adev);
> +	if (r)
> +		DRM_WARN("Can't setup usermode queues, only legacy workload submission will work\n");
> +
>  	file_priv->driver_priv = fpriv;
>  	goto out_suspend;
>  
> @@ -1254,6 +1259,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  
>  	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>  	amdgpu_vm_fini(adev, &fpriv->vm);
> +	amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>  
>  	if (pasid)
>  		amdgpu_pasid_free_delayed(pd->tbo.base.resv, pasid);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> new file mode 100644
> index 000000000000..13e1eebc1cb6
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu.h"
> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
> +{
> +    mutex_init(&userq_mgr->userq_mutex);
> +    idr_init_base(&userq_mgr->userq_idr, 1);
> +    userq_mgr->adev = adev;
> +
> +    return 0;
> +}
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
> +{
> +    idr_destroy(&userq_mgr->userq_idr);
> +    mutex_destroy(&userq_mgr->userq_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> new file mode 100644
> index 000000000000..7eeb8c9e6575
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -0,0 +1,49 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef AMDGPU_USERQUEUE_H_
> +#define AMDGPU_USERQUEUE_H_
> +
> +#include "amdgpu.h"
> +#define AMDGPU_MAX_USERQ 512
> +
> +struct amdgpu_usermode_queue {
> +	int queue_id;
> +	int queue_type;
> +	uint64_t flags;
> +	uint64_t doorbell_handle;
> +	struct amdgpu_vm *vm;
> +	struct amdgpu_userq_mgr *userq_mgr;
> +	struct amdgpu_mqd_prop userq_prop;
> +};

Could you align the member names to the largest member's column,
as opposed to having only a single space in between?

It makes it so much more readable.

> +
> +struct amdgpu_userq_funcs {
> +	int (*mqd_create)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +	void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
> +};

Here too, align to the largest column.

Regards,
Luben

> +
> +int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
> +
> +void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> +
> +#endif


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue
  2023-03-29 16:04 ` [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue Shashank Sharma
  2023-03-30 21:18   ` Alex Deucher
@ 2023-04-04 16:21   ` Luben Tuikov
  1 sibling, 0 replies; 56+ messages in thread
From: Luben Tuikov @ 2023-04-04 16:21 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig,
	Arvind Yadav

On 2023-03-29 12:04, Shashank Sharma wrote:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
> 
> A Memory queue descriptor (MQD) of a userqueue defines it in the harware's
> context. As MQD format can vary between different graphics IPs, we need gfx
> GEN specific handlers to create MQDs.
> 
> This patch:
> - Introduces MQD hander functions for the usermode queues.
> - Adds new functions to create and destroy MQD for GFX-GEN-11-IP
> 
> V1: Worked on review comments from Alex:
>     - Make MQD functions GEN and IP specific
> 
> V2: Worked on review comments from Alex:
>     - Reuse the existing adev->mqd[ip] for MQD creation
>     - Formatting and arrangement of code
> 
> V3:
>     - Integration with doorbell manager
> 
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> 
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---

Don't break up the Cc tag list and the Sob tag list.

Check out the following resources:
https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs
https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format


>  drivers/gpu/drm/amd/amdgpu/Makefile           |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 21 +++++
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 84 +++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 ++
>  4 files changed, 113 insertions(+)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 2d90ba618e5d..2cc7897de7e6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -212,6 +212,7 @@ amdgpu-y += amdgpu_amdkfd.o
>  
>  # add usermode queue
>  amdgpu-y += amdgpu_userqueue.o
> +amdgpu-y += amdgpu_userqueue_gfx_v11.o
>  
>  ifneq ($(CONFIG_HSA_AMD),)
>  AMDKFD_PATH := ../amdkfd
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 353f57c5a772..052c2c1e8aed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -81,6 +81,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>          goto free_queue;
>      }
>  
> +    r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create/map userqueue MQD\n");
> +        goto free_queue;
> +    }
> +
>      args->out.queue_id = queue->queue_id;
>      args->out.flags = 0;
>      mutex_unlock(&uq_mgr->userq_mutex);
> @@ -105,6 +111,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>      }
>  
>      mutex_lock(&uq_mgr->userq_mutex);
> +    uq_mgr->userq_funcs[queue->queue_type]->mqd_destroy(uq_mgr, queue);
>      amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>      mutex_unlock(&uq_mgr->userq_mutex);
>      kfree(queue);
> @@ -135,6 +142,19 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>      return r;
>  }
>  
> +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs;
> +
> +static void
> +amdgpu_userqueue_setup_ip_funcs(struct amdgpu_userq_mgr *uq_mgr)
> +{
> +    int maj;
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    uint32_t version = adev->ip_versions[GC_HWIP][0];
> +
> +    maj = IP_VERSION_MAJ(version);
> +    if (maj == 11)
> +        uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = &userq_gfx_v11_funcs;
> +}
>  
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>  {
> @@ -142,6 +162,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>      idr_init_base(&userq_mgr->userq_idr, 1);
>      userq_mgr->adev = adev;
>  
> +    amdgpu_userqueue_setup_ip_funcs(userq_mgr);
>      return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> new file mode 100644
> index 000000000000..12e1a785b65a
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -0,0 +1,84 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include "amdgpu.h"
> +#include "amdgpu_userqueue.h"
> +
> +static int
> +amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
> +    struct amdgpu_mqd *gfx_v11_mqd = &adev->mqds[queue->queue_type];
> +    int size = gfx_v11_mqd->mqd_size;
> +    int r;
> +
> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_GTT,
> +                                &mqd->obj,
> +                                &mqd->gpu_addr,
> +                                &mqd->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate bo for userqueue (%d)", r);
> +        return r;
> +    }
> +
> +    memset(mqd->cpu_ptr, 0, size);
> +    r = amdgpu_bo_reserve(mqd->obj, false);
> +    if (unlikely(r != 0)) {
> +        DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
> +        goto free_mqd;
> +    }
> +
> +    queue->userq_prop.use_doorbell = true;
> +    queue->userq_prop.mqd_gpu_addr = mqd->gpu_addr;
> +    r = gfx_v11_mqd->init_mqd(adev, (void *)mqd->cpu_ptr, &queue->userq_prop);
> +    amdgpu_bo_unreserve(mqd->obj);
> +    if (r) {
> +        DRM_ERROR("Failed to init MQD for queue\n");
> +        goto free_mqd;
> +    }
> +
> +    DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
> +    return 0;
> +
> +free_mqd:
> +    amdgpu_bo_free_kernel(&mqd->obj,
> +			   &mqd->gpu_addr,
> +			   &mqd->cpu_ptr);
> +   return r;
> +}
> +
> +static void
> +amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
> +
> +    amdgpu_bo_free_kernel(&mqd->obj,
> +			   &mqd->gpu_addr,
> +			   &mqd->cpu_ptr);
> +}
> +
> +const struct amdgpu_userq_funcs userq_gfx_v11_funcs = {
> +    .mqd_create = amdgpu_userq_gfx_v11_mqd_create,
> +    .mqd_destroy = amdgpu_userq_gfx_v11_mqd_destroy,
> +};
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 7625a862b1fc..2911c88d0fed 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -27,6 +27,12 @@
>  #include "amdgpu.h"
>  #define AMDGPU_MAX_USERQ 512
>  
> +struct amdgpu_userq_ctx_space {
> +	struct amdgpu_bo *obj;
> +	uint64_t gpu_addr;
> +	void *cpu_ptr;
> +};
> +

Code is very readable when the name colums are aligned:

struct amdgpu_userq_ctx_space {
	struct amdgpu_bo   *obj;
	uint64_t            gpu_addr;
	void               *cpu_ptr;
};

And for the rest of your patches.

Regards,
Luben

>  struct amdgpu_usermode_queue {
>  	int queue_id;
>  	int queue_type;
> @@ -35,6 +41,7 @@ struct amdgpu_usermode_queue {
>  	struct amdgpu_vm *vm;
>  	struct amdgpu_userq_mgr *userq_mgr;
>  	struct amdgpu_mqd_prop userq_prop;
> +	struct amdgpu_userq_ctx_space mqd;
>  };
>  
>  struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue
  2023-03-29 16:04 ` [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue Shashank Sharma
  2023-03-30 21:23   ` Alex Deucher
@ 2023-04-04 16:24   ` Luben Tuikov
  2023-04-04 16:37     ` Shashank Sharma
  1 sibling, 1 reply; 56+ messages in thread
From: Luben Tuikov @ 2023-04-04 16:24 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

On 2023-03-29 12:04, Shashank Sharma wrote:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
> 
> The FW expects us to allocate atleast one page as context space to
> process gang, process, shadow, GDS and FW  related work. This patch
> creates a joint object for the same, and calculates GPU space offsets
> for each of these spaces.
> 
> V1: Addressed review comments on RFC patch:
>     Alex: Make this function IP specific
> 
> V2: Addressed review comments from Christian
>     - Allocate only one object for total FW space, and calculate
>       offsets for each of these objects.
> 
> V3: Integration with doorbell manager
> 
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  1 +
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 60 ++++++++++++++++++-
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 +++
>  3 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 052c2c1e8aed..5672efcbcffc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -71,6 +71,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>      queue->userq_prop.queue_size = mqd_in->queue_size;
>  
>      queue->doorbell_handle = mqd_in->doorbell_handle;
> +    queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;
>      queue->queue_type = mqd_in->ip_type;
>      queue->flags = mqd_in->flags;
>      queue->vm = &fpriv->vm;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> index 12e1a785b65a..52de96727f98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -23,6 +23,51 @@
>  #include "amdgpu.h"
>  #include "amdgpu_userqueue.h"
>  
> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
> +

Leaving a single space after the macro name and its value
makes it hard to read. Please align the value columns
and leave at least 3 spaces--this would make it readable.
For instance,

#define AMDGPU_USERQ_PROC_CTX_SZ   PAGE_SIZE
#define AMDGPU_USERQ_GANG_CTX_SZ   PAGE_SIZE
#define AMDGPU_USERQ_FW_CTX_SZ     PAGE_SIZE
#define AMDGPU_USERQ_GDS_CTX_SZ    PAGE_SIZE

Regards,
Luben

> +static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                                 struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
> +    int r, size;
> +
> +    /*
> +     * The FW expects atleast one page space allocated for
> +     * process ctx, gang ctx, gds ctx, fw ctx and shadow ctx each.
> +     */
> +    size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ +
> +           AMDGPU_USERQ_FW_CTX_SZ + AMDGPU_USERQ_GDS_CTX_SZ;
> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
> +                                AMDGPU_GEM_DOMAIN_GTT,
> +                                &ctx->obj,
> +                                &ctx->gpu_addr,
> +                                &ctx->cpu_ptr);
> +    if (r) {
> +        DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
> +        return r;
> +    }
> +
> +    queue->proc_ctx_gpu_addr = ctx->gpu_addr;
> +    queue->gang_ctx_gpu_addr = queue->proc_ctx_gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
> +    queue->fw_ctx_gpu_addr = queue->gang_ctx_gpu_addr + AMDGPU_USERQ_GANG_CTX_SZ;
> +    queue->gds_ctx_gpu_addr = queue->fw_ctx_gpu_addr + AMDGPU_USERQ_FW_CTX_SZ;
> +    return 0;
> +}
> +
> +static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
> +                                                   struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
> +
> +    amdgpu_bo_free_kernel(&ctx->obj,
> +                          &ctx->gpu_addr,
> +                          &ctx->cpu_ptr);
> +}
> +
>  static int
>  amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>  {
> @@ -43,10 +88,17 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>      }
>  
>      memset(mqd->cpu_ptr, 0, size);
> +
> +    r = amdgpu_userq_gfx_v11_create_ctx_space(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to create CTX space for userqueue (%d)\n", r);
> +        goto free_mqd;
> +    }
> +
>      r = amdgpu_bo_reserve(mqd->obj, false);
>      if (unlikely(r != 0)) {
>          DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
> -        goto free_mqd;
> +        goto free_ctx;
>      }
>  
>      queue->userq_prop.use_doorbell = true;
> @@ -55,12 +107,15 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>      amdgpu_bo_unreserve(mqd->obj);
>      if (r) {
>          DRM_ERROR("Failed to init MQD for queue\n");
> -        goto free_mqd;
> +        goto free_ctx;
>      }
>  
>      DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>      return 0;
>  
> +free_ctx:
> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
> +
>  free_mqd:
>      amdgpu_bo_free_kernel(&mqd->obj,
>  			   &mqd->gpu_addr,
> @@ -73,6 +128,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>  {
>      struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>  
> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>      amdgpu_bo_free_kernel(&mqd->obj,
>  			   &mqd->gpu_addr,
>  			   &mqd->cpu_ptr);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 2911c88d0fed..8b62ef77cd26 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -38,10 +38,17 @@ struct amdgpu_usermode_queue {
>  	int queue_type;
>  	uint64_t flags;
>  	uint64_t doorbell_handle;
> +	uint64_t proc_ctx_gpu_addr;
> +	uint64_t gang_ctx_gpu_addr;
> +	uint64_t gds_ctx_gpu_addr;
> +	uint64_t fw_ctx_gpu_addr;
> +	uint64_t shadow_ctx_gpu_addr;
> +
>  	struct amdgpu_vm *vm;
>  	struct amdgpu_userq_mgr *userq_mgr;
>  	struct amdgpu_mqd_prop userq_prop;
>  	struct amdgpu_userq_ctx_space mqd;
> +	struct amdgpu_userq_ctx_space fw_space;
>  };
>  
>  struct amdgpu_userq_funcs {


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-03-29 16:04 ` [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
@ 2023-04-04 16:30   ` Luben Tuikov
  2023-04-04 16:36     ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Luben Tuikov @ 2023-04-04 16:30 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

On 2023-03-29 12:04, Shashank Sharma wrote:
> From: Shashank Sharma <contactshashanksharma@gmail.com>
> 
> This patch adds new functions to map/unmap a usermode queue into
> the FW, using the MES ring. As soon as this mapping is done, the
> queue would  be considered ready to accept the workload.
> 
> V1: Addressed review comments from Alex on the RFC patch series
>     - Map/Unmap should be IP specific.
> V2:
>     Addressed review comments from Christian:
>     - Fix the wptr_mc_addr calculation (moved into another patch)
>     Addressed review comments from Alex:
>     - Do not add fptrs for map/unmap
> 
> V3: Integration with doorbell manager
> 
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---

Just add all your Cc right here, and let git-send-email figure it out.
Between the Cc tags and the SMTP CC list, Felix is the only one missing.

Regards,
Luben

>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> index 39e90ea32fcb..1627641a4a4e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -23,12 +23,73 @@
>  #include "amdgpu.h"
>  #include "amdgpu_userqueue.h"
>  #include "v11_structs.h"
> +#include "amdgpu_mes.h"
>  
>  #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>  #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>  #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>  #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>  
> +static int
> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
> +                         struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct mes_add_queue_input queue_input;
> +    int r;
> +
> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
> +
> +    queue_input.process_va_start = 0;
> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
> +    queue_input.process_quantum = 100000; /* 10ms */
> +    queue_input.gang_quantum = 10000; /* 1ms */
> +    queue_input.paging = false;
> +
> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
> +    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
> +
> +    queue_input.process_id = queue->vm->pasid;
> +    queue_input.queue_type = queue->queue_type;
> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
> +    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +
> +    amdgpu_mes_lock(&adev->mes);
> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> +    amdgpu_mes_unlock(&adev->mes);
> +    if (r) {
> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
> +        return r;
> +    }
> +
> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
> +    return 0;
> +}
> +
> +static void
> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
> +                           struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_device *adev = uq_mgr->adev;
> +    struct mes_remove_queue_input queue_input;
> +    int r;
> +
> +    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
> +
> +    amdgpu_mes_lock(&adev->mes);
> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
> +    amdgpu_mes_unlock(&adev->mes);
> +    if (r)
> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
> +}
> +
>  static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>                                                   struct amdgpu_usermode_queue *queue)
>  {
> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>  
>      amdgpu_userq_set_ctx_space(uq_mgr, queue);
>      amdgpu_bo_unreserve(mqd->obj);
> +
> +    /* Map the queue in HW using MES ring */
> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
> +        goto free_ctx;
> +    }
> +
>      DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>      return 0;
>  
> @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>  {
>      struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>  
> +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>      amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>      amdgpu_bo_free_kernel(&mqd->obj,
>  			   &mqd->gpu_addr,


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-04 16:30   ` Luben Tuikov
@ 2023-04-04 16:36     ` Shashank Sharma
  2023-04-04 20:58       ` Luben Tuikov
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-04-04 16:36 UTC (permalink / raw)
  To: Luben Tuikov, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig


On 04/04/2023 18:30, Luben Tuikov wrote:
> On 2023-03-29 12:04, Shashank Sharma wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch adds new functions to map/unmap a usermode queue into
>> the FW, using the MES ring. As soon as this mapping is done, the
>> queue would  be considered ready to accept the workload.
>>
>> V1: Addressed review comments from Alex on the RFC patch series
>>      - Map/Unmap should be IP specific.
>> V2:
>>      Addressed review comments from Christian:
>>      - Fix the wptr_mc_addr calculation (moved into another patch)
>>      Addressed review comments from Alex:
>>      - Do not add fptrs for map/unmap
>>
>> V3: Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
> Just add all your Cc right here, and let git-send-email figure it out.
> Between the Cc tags and the SMTP CC list, Felix is the only one missing.

No, that's not how it is.

You keep people cc'ed in the cover letter so that they get informed 
every time this patch is pushed/used on any opensource branch.

People who are added manually in cc are required for this code review 
session.

- Shashank

> Regards,
> Luben
>
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
>>   1 file changed, 70 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> index 39e90ea32fcb..1627641a4a4e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -23,12 +23,73 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_userqueue.h"
>>   #include "v11_structs.h"
>> +#include "amdgpu_mes.h"
>>   
>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>   #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>   
>> +static int
>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>> +                         struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct mes_add_queue_input queue_input;
>> +    int r;
>> +
>> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
>> +
>> +    queue_input.process_va_start = 0;
>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
>> +    queue_input.process_quantum = 100000; /* 10ms */
>> +    queue_input.gang_quantum = 10000; /* 1ms */
>> +    queue_input.paging = false;
>> +
>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>> +    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>> +    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>> +
>> +    queue_input.process_id = queue->vm->pasid;
>> +    queue_input.queue_type = queue->queue_type;
>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>> +    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +
>> +    amdgpu_mes_lock(&adev->mes);
>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>> +    amdgpu_mes_unlock(&adev->mes);
>> +    if (r) {
>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>> +        return r;
>> +    }
>> +
>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
>> +    return 0;
>> +}
>> +
>> +static void
>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>> +                           struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct mes_remove_queue_input queue_input;
>> +    int r;
>> +
>> +    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>> +
>> +    amdgpu_mes_lock(&adev->mes);
>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
>> +    amdgpu_mes_unlock(&adev->mes);
>> +    if (r)
>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>> +}
>> +
>>   static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>                                                    struct amdgpu_usermode_queue *queue)
>>   {
>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>   
>>       amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>       amdgpu_bo_unreserve(mqd->obj);
>> +
>> +    /* Map the queue in HW using MES ring */
>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>> +        goto free_ctx;
>> +    }
>> +
>>       DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>       return 0;
>>   
>> @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>   {
>>       struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>   
>> +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>       amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>       amdgpu_bo_free_kernel(&mqd->obj,
>>   			   &mqd->gpu_addr,

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue
  2023-04-04 16:24   ` Luben Tuikov
@ 2023-04-04 16:37     ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-04-04 16:37 UTC (permalink / raw)
  To: Luben Tuikov, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig


On 04/04/2023 18:24, Luben Tuikov wrote:
> On 2023-03-29 12:04, Shashank Sharma wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> The FW expects us to allocate atleast one page as context space to
>> process gang, process, shadow, GDS and FW  related work. This patch
>> creates a joint object for the same, and calculates GPU space offsets
>> for each of these spaces.
>>
>> V1: Addressed review comments on RFC patch:
>>      Alex: Make this function IP specific
>>
>> V2: Addressed review comments from Christian
>>      - Allocate only one object for total FW space, and calculate
>>        offsets for each of these objects.
>>
>> V3: Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  1 +
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 60 ++++++++++++++++++-
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  7 +++
>>   3 files changed, 66 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 052c2c1e8aed..5672efcbcffc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -71,6 +71,7 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>>       queue->userq_prop.queue_size = mqd_in->queue_size;
>>   
>>       queue->doorbell_handle = mqd_in->doorbell_handle;
>> +    queue->shadow_ctx_gpu_addr = mqd_in->shadow_va;
>>       queue->queue_type = mqd_in->ip_type;
>>       queue->flags = mqd_in->flags;
>>       queue->vm = &fpriv->vm;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> index 12e1a785b65a..52de96727f98 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -23,6 +23,51 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_userqueue.h"
>>   
>> +#define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>> +#define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>> +
> Leaving a single space after the macro name and its value
> makes it hard to read. Please align the value columns
> and leave at least 3 spaces--this would make it readable.
> For instance,
>
> #define AMDGPU_USERQ_PROC_CTX_SZ   PAGE_SIZE
> #define AMDGPU_USERQ_GANG_CTX_SZ   PAGE_SIZE
> #define AMDGPU_USERQ_FW_CTX_SZ     PAGE_SIZE
> #define AMDGPU_USERQ_GDS_CTX_SZ    PAGE_SIZE


Noted,

Shashank

>
> Regards,
> Luben
>
>> +static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                                 struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
>> +    int r, size;
>> +
>> +    /*
>> +     * The FW expects atleast one page space allocated for
>> +     * process ctx, gang ctx, gds ctx, fw ctx and shadow ctx each.
>> +     */
>> +    size = AMDGPU_USERQ_PROC_CTX_SZ + AMDGPU_USERQ_GANG_CTX_SZ +
>> +           AMDGPU_USERQ_FW_CTX_SZ + AMDGPU_USERQ_GDS_CTX_SZ;
>> +    r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
>> +                                AMDGPU_GEM_DOMAIN_GTT,
>> +                                &ctx->obj,
>> +                                &ctx->gpu_addr,
>> +                                &ctx->cpu_ptr);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate ctx space bo for userqueue, err:%d\n", r);
>> +        return r;
>> +    }
>> +
>> +    queue->proc_ctx_gpu_addr = ctx->gpu_addr;
>> +    queue->gang_ctx_gpu_addr = queue->proc_ctx_gpu_addr + AMDGPU_USERQ_PROC_CTX_SZ;
>> +    queue->fw_ctx_gpu_addr = queue->gang_ctx_gpu_addr + AMDGPU_USERQ_GANG_CTX_SZ;
>> +    queue->gds_ctx_gpu_addr = queue->fw_ctx_gpu_addr + AMDGPU_USERQ_FW_CTX_SZ;
>> +    return 0;
>> +}
>> +
>> +static void amdgpu_userq_gfx_v11_destroy_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>> +                                                   struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_userq_ctx_space *ctx = &queue->fw_space;
>> +
>> +    amdgpu_bo_free_kernel(&ctx->obj,
>> +                          &ctx->gpu_addr,
>> +                          &ctx->cpu_ptr);
>> +}
>> +
>>   static int
>>   amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>>   {
>> @@ -43,10 +88,17 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>       }
>>   
>>       memset(mqd->cpu_ptr, 0, size);
>> +
>> +    r = amdgpu_userq_gfx_v11_create_ctx_space(uq_mgr, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to create CTX space for userqueue (%d)\n", r);
>> +        goto free_mqd;
>> +    }
>> +
>>       r = amdgpu_bo_reserve(mqd->obj, false);
>>       if (unlikely(r != 0)) {
>>           DRM_ERROR("Failed to reserve mqd for userqueue (%d)", r);
>> -        goto free_mqd;
>> +        goto free_ctx;
>>       }
>>   
>>       queue->userq_prop.use_doorbell = true;
>> @@ -55,12 +107,15 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>       amdgpu_bo_unreserve(mqd->obj);
>>       if (r) {
>>           DRM_ERROR("Failed to init MQD for queue\n");
>> -        goto free_mqd;
>> +        goto free_ctx;
>>       }
>>   
>>       DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>       return 0;
>>   
>> +free_ctx:
>> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>> +
>>   free_mqd:
>>       amdgpu_bo_free_kernel(&mqd->obj,
>>   			   &mqd->gpu_addr,
>> @@ -73,6 +128,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>   {
>>       struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>   
>> +    amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>       amdgpu_bo_free_kernel(&mqd->obj,
>>   			   &mqd->gpu_addr,
>>   			   &mqd->cpu_ptr);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 2911c88d0fed..8b62ef77cd26 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -38,10 +38,17 @@ struct amdgpu_usermode_queue {
>>   	int queue_type;
>>   	uint64_t flags;
>>   	uint64_t doorbell_handle;
>> +	uint64_t proc_ctx_gpu_addr;
>> +	uint64_t gang_ctx_gpu_addr;
>> +	uint64_t gds_ctx_gpu_addr;
>> +	uint64_t fw_ctx_gpu_addr;
>> +	uint64_t shadow_ctx_gpu_addr;
>> +
>>   	struct amdgpu_vm *vm;
>>   	struct amdgpu_userq_mgr *userq_mgr;
>>   	struct amdgpu_mqd_prop userq_prop;
>>   	struct amdgpu_userq_ctx_space mqd;
>> +	struct amdgpu_userq_ctx_space fw_space;
>>   };
>>   
>>   struct amdgpu_userq_funcs {

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-04 16:36     ` Shashank Sharma
@ 2023-04-04 20:58       ` Luben Tuikov
  2023-04-05 10:06         ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Luben Tuikov @ 2023-04-04 20:58 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

On 2023-04-04 12:36, Shashank Sharma wrote:
> 
> On 04/04/2023 18:30, Luben Tuikov wrote:
>> On 2023-03-29 12:04, Shashank Sharma wrote:
>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>
>>> This patch adds new functions to map/unmap a usermode queue into
>>> the FW, using the MES ring. As soon as this mapping is done, the
>>> queue would  be considered ready to accept the workload.
>>>
>>> V1: Addressed review comments from Alex on the RFC patch series
>>>      - Map/Unmap should be IP specific.
>>> V2:
>>>      Addressed review comments from Christian:
>>>      - Fix the wptr_mc_addr calculation (moved into another patch)
>>>      Addressed review comments from Alex:
>>>      - Do not add fptrs for map/unmap
>>>
>>> V3: Integration with doorbell manager
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> ---
>> Just add all your Cc right here, and let git-send-email figure it out.
>> Between the Cc tags and the SMTP CC list, Felix is the only one missing.
> 
> No, that's not how it is.
> 
> You keep people cc'ed in the cover letter so that they get informed 
> every time this patch is pushed/used on any opensource branch.

The cover letter is optional, and you can add Cc tags
into the cover letter and then git-send-email would extract those
(any and all) tags from the cover letter too.

When you pick and choose whom to add in the Cc tags, and whom to
add to the SMTP CC list, it creates division.

> People who are added manually in cc are required for this code review 
> session.

No such rule exists. It is best to add all the Cc into the commit message,
so that it is preserved in Git history.

For instance, I just randomly did "git log drivers/gpu/drm/*.[hc]" in
amd-staging-drm-next, and this is the first commit which came up,

commit 097ee58f3ddf7d
Author: Harry Wentland <harry.wentland@amd.com>
Date:   Fri Jan 13 11:24:09 2023 -0500

    drm/connector: print max_requested_bpc in state debugfs
    
    This is useful to understand the bpc defaults and
    support of a driver.
    
    Signed-off-by: Harry Wentland <harry.wentland@amd.com>
    Cc: Pekka Paalanen <ppaalanen@gmail.com>
    Cc: Sebastian Wick <sebastian.wick@redhat.com>
    Cc: Vitaly.Prosyak@amd.com
    Cc: Uma Shankar <uma.shankar@intel.com>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Joshua Ashton <joshua@froggi.es>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: dri-devel@lists.freedesktop.org
    Cc: amd-gfx@lists.freedesktop.org
    Reviewed-By: Joshua Ashton <joshua@froggi.es>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230113162428.33874-3-harry.wentland@amd.com
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

As you can see the whole Cc list and the MLs are part of the Cc tags.
And the rest of the commits are also good examples of how to do it.
(Don't worry about the Link tag--it is automatically added by tools
maintainers use, although some use Lore.)
This preserves things in Git history, and it's a good thing if we need
to data mine and brainstorm later on on patches, design, and so on.

A good tool to use is "scripts/get_maintainer.pl" which works
on a file, directory and even patch files.

I usually include everyone get_maintainer.pl prints, and on subsequent patch
revisions, also people who have previously commented on the patchset, as they
might be interested to follow up. It's a good thing to do.

Here are a couple of resources, in addition to DRM commits in the tree,
https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs
https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format
(The second link includes links to more resources on good patch writing.)

Hope this helps.

Regards,
Luben


> 
> - Shashank
> 
>> Regards,
>> Luben
>>
>>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
>>>   1 file changed, 70 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>> index 39e90ea32fcb..1627641a4a4e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>> @@ -23,12 +23,73 @@
>>>   #include "amdgpu.h"
>>>   #include "amdgpu_userqueue.h"
>>>   #include "v11_structs.h"
>>> +#include "amdgpu_mes.h"
>>>   
>>>   #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>   #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>   #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>   #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>   
>>> +static int
>>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>> +                         struct amdgpu_usermode_queue *queue)
>>> +{
>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>> +    struct mes_add_queue_input queue_input;
>>> +    int r;
>>> +
>>> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
>>> +
>>> +    queue_input.process_va_start = 0;
>>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
>>> +    queue_input.process_quantum = 100000; /* 10ms */
>>> +    queue_input.gang_quantum = 10000; /* 1ms */
>>> +    queue_input.paging = false;
>>> +
>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>>> +    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>> +    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>> +
>>> +    queue_input.process_id = queue->vm->pasid;
>>> +    queue_input.queue_type = queue->queue_type;
>>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>> +    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>> +
>>> +    amdgpu_mes_lock(&adev->mes);
>>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>> +    amdgpu_mes_unlock(&adev->mes);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>>> +        return r;
>>> +    }
>>> +
>>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
>>> +    return 0;
>>> +}
>>> +
>>> +static void
>>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>> +                           struct amdgpu_usermode_queue *queue)
>>> +{
>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>> +    struct mes_remove_queue_input queue_input;
>>> +    int r;
>>> +
>>> +    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>> +
>>> +    amdgpu_mes_lock(&adev->mes);
>>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
>>> +    amdgpu_mes_unlock(&adev->mes);
>>> +    if (r)
>>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>>> +}
>>> +
>>>   static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>                                                    struct amdgpu_usermode_queue *queue)
>>>   {
>>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>>   
>>>       amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>>       amdgpu_bo_unreserve(mqd->obj);
>>> +
>>> +    /* Map the queue in HW using MES ring */
>>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>>> +        goto free_ctx;
>>> +    }
>>> +
>>>       DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>>       return 0;
>>>   
>>> @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>   {
>>>       struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>>   
>>> +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>>       amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>>       amdgpu_bo_free_kernel(&mqd->obj,
>>>   			   &mqd->gpu_addr,


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-04 20:58       ` Luben Tuikov
@ 2023-04-05 10:06         ` Shashank Sharma
  2023-04-05 21:18           ` Luben Tuikov
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-04-05 10:06 UTC (permalink / raw)
  To: Luben Tuikov, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig


On 04/04/2023 22:58, Luben Tuikov wrote:
> On 2023-04-04 12:36, Shashank Sharma wrote:
>> On 04/04/2023 18:30, Luben Tuikov wrote:
>>> On 2023-03-29 12:04, Shashank Sharma wrote:
>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>>
>>>> This patch adds new functions to map/unmap a usermode queue into
>>>> the FW, using the MES ring. As soon as this mapping is done, the
>>>> queue would  be considered ready to accept the workload.
>>>>
>>>> V1: Addressed review comments from Alex on the RFC patch series
>>>>       - Map/Unmap should be IP specific.
>>>> V2:
>>>>       Addressed review comments from Christian:
>>>>       - Fix the wptr_mc_addr calculation (moved into another patch)
>>>>       Addressed review comments from Alex:
>>>>       - Do not add fptrs for map/unmap
>>>>
>>>> V3: Integration with doorbell manager
>>>>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>> Just add all your Cc right here, and let git-send-email figure it out.
>>> Between the Cc tags and the SMTP CC list, Felix is the only one missing.
>> No, that's not how it is.
>>
>> You keep people cc'ed in the cover letter so that they get informed
>> every time this patch is pushed/used on any opensource branch.
> The cover letter is optional, and you can add Cc tags
> into the cover letter and then git-send-email would extract those
> (any and all) tags from the cover letter too.
>
> When you pick and choose whom to add in the Cc tags, and whom to
> add to the SMTP CC list, it creates division.


Exactly my point, there is no guideline on whom to add in Cc 
cover-letter and whom to add manually, its all preference.

Now different people can have different preference, and a review comment 
on what is your preference of what to

keep on cover letter does seem like a nitpick.

>
>> People who are added manually in cc are required for this code review
>> session.
> No such rule exists. It is best to add all the Cc into the commit message,
> so that it is preserved in Git history.
I believe this is also not a rule, we are discussing preferences only. 
It is my preference that I want to keep only Alex and Christian in Cc.
>
> For instance, I just randomly did "git log drivers/gpu/drm/*.[hc]" in
> amd-staging-drm-next, and this is the first commit which came up,
>
> commit 097ee58f3ddf7d
> Author: Harry Wentland <harry.wentland@amd.com>
> Date:   Fri Jan 13 11:24:09 2023 -0500
>
>      drm/connector: print max_requested_bpc in state debugfs
>      
>      This is useful to understand the bpc defaults and
>      support of a driver.
>      
>      Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>      Cc: Pekka Paalanen <ppaalanen@gmail.com>
>      Cc: Sebastian Wick <sebastian.wick@redhat.com>
>      Cc: Vitaly.Prosyak@amd.com
>      Cc: Uma Shankar <uma.shankar@intel.com>
>      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>      Cc: Joshua Ashton <joshua@froggi.es>
>      Cc: Jani Nikula <jani.nikula@linux.intel.com>
>      Cc: dri-devel@lists.freedesktop.org
>      Cc: amd-gfx@lists.freedesktop.org
>      Reviewed-By: Joshua Ashton <joshua@froggi.es>
>      Link: https://patchwork.freedesktop.org/patch/msgid/20230113162428.33874-3-harry.wentland@amd.com
>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
> As you can see the whole Cc list and the MLs are part of the Cc tags.
> And the rest of the commits are also good examples of how to do it.
> (Don't worry about the Link tag--it is automatically added by tools
> maintainers use, although some use Lore.)
> This preserves things in Git history, and it's a good thing if we need
> to data mine and brainstorm later on on patches, design, and so on.

No, this is not random, this is actually well planned. All of these 
people here in Cc are either the maintainers or respective domain experts or

contributors of color management feature and keeping them in CC is about 
how this color management feature is being carried forward, so this is

exactly aligned with my point. Do note that this is a DRM level change 
(not AMDGPU level).


Also, I would like to express that in my opinion we are spending way too 
much time in discussing the 'preferences' around cover letter,

words, comments and variable names. No cover letter is perfect, no 
commit message is good enough to explain what is happening in code,

no variable name is flawless and no comment explains what is going on in 
code which is clear to everyone. These are very subjective to the

author and the reader. The only deciding factor is if there is a 
community rule/guideline on that.


I appreciate your time and suggestions but I also certainly do not want 
to spend this much time to discuss how should we add people in Cc.

Let's keep preferences separate from code review process.

- Shashank

>
> A good tool to use is "scripts/get_maintainer.pl" which works
> on a file, directory and even patch files.
>
> I usually include everyone get_maintainer.pl prints, and on subsequent patch
> revisions, also people who have previously commented on the patchset, as they
> might be interested to follow up. It's a good thing to do.
>
> Here are a couple of resources, in addition to DRM commits in the tree,
> https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs
> https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format
> (The second link includes links to more resources on good patch writing.)
>
> Hope this helps.
>
> Regards,
> Luben
>
>
>> - Shashank
>>
>>> Regards,
>>> Luben
>>>
>>>>    .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
>>>>    1 file changed, 70 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>> index 39e90ea32fcb..1627641a4a4e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>> @@ -23,12 +23,73 @@
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_userqueue.h"
>>>>    #include "v11_structs.h"
>>>> +#include "amdgpu_mes.h"
>>>>    
>>>>    #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>>    #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>>    #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>>    #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>>    
>>>> +static int
>>>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>>> +                         struct amdgpu_usermode_queue *queue)
>>>> +{
>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>> +    struct mes_add_queue_input queue_input;
>>>> +    int r;
>>>> +
>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
>>>> +
>>>> +    queue_input.process_va_start = 0;
>>>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
>>>> +    queue_input.process_quantum = 100000; /* 10ms */
>>>> +    queue_input.gang_quantum = 10000; /* 1ms */
>>>> +    queue_input.paging = false;
>>>> +
>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>>>> +    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>> +    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>> +
>>>> +    queue_input.process_id = queue->vm->pasid;
>>>> +    queue_input.queue_type = queue->queue_type;
>>>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>>>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>>>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>> +    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>>> +
>>>> +    amdgpu_mes_lock(&adev->mes);
>>>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void
>>>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>>> +                           struct amdgpu_usermode_queue *queue)
>>>> +{
>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>> +    struct mes_remove_queue_input queue_input;
>>>> +    int r;
>>>> +
>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
>>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>> +
>>>> +    amdgpu_mes_lock(&adev->mes);
>>>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>> +    if (r)
>>>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>>>> +}
>>>> +
>>>>    static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>>                                                     struct amdgpu_usermode_queue *queue)
>>>>    {
>>>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>>>    
>>>>        amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>>>        amdgpu_bo_unreserve(mqd->obj);
>>>> +
>>>> +    /* Map the queue in HW using MES ring */
>>>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>>>> +        goto free_ctx;
>>>> +    }
>>>> +
>>>>        DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>>>        return 0;
>>>>    
>>>> @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>>    {
>>>>        struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>>>    
>>>> +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>>>        amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>>>        amdgpu_bo_free_kernel(&mqd->obj,
>>>>    			   &mqd->gpu_addr,

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-05 10:06         ` Shashank Sharma
@ 2023-04-05 21:18           ` Luben Tuikov
  2023-04-06  7:45             ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Luben Tuikov @ 2023-04-05 21:18 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig

On 2023-04-05 06:06, Shashank Sharma wrote:
> 
> On 04/04/2023 22:58, Luben Tuikov wrote:
>> On 2023-04-04 12:36, Shashank Sharma wrote:
>>> On 04/04/2023 18:30, Luben Tuikov wrote:
>>>> On 2023-03-29 12:04, Shashank Sharma wrote:
>>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>>>
>>>>> This patch adds new functions to map/unmap a usermode queue into
>>>>> the FW, using the MES ring. As soon as this mapping is done, the
>>>>> queue would  be considered ready to accept the workload.
>>>>>
>>>>> V1: Addressed review comments from Alex on the RFC patch series
>>>>>       - Map/Unmap should be IP specific.
>>>>> V2:
>>>>>       Addressed review comments from Christian:
>>>>>       - Fix the wptr_mc_addr calculation (moved into another patch)
>>>>>       Addressed review comments from Alex:
>>>>>       - Do not add fptrs for map/unmap
>>>>>
>>>>> V3: Integration with doorbell manager
>>>>>
>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>> ---
>>>> Just add all your Cc right here, and let git-send-email figure it out.
>>>> Between the Cc tags and the SMTP CC list, Felix is the only one missing.
>>> No, that's not how it is.
>>>
>>> You keep people cc'ed in the cover letter so that they get informed
>>> every time this patch is pushed/used on any opensource branch.
>> The cover letter is optional, and you can add Cc tags
>> into the cover letter and then git-send-email would extract those
>> (any and all) tags from the cover letter too.
>>
>> When you pick and choose whom to add in the Cc tags, and whom to
>> add to the SMTP CC list, it creates division.
> 
> 
> Exactly my point, there is no guideline on whom to add in Cc 
> cover-letter and whom to add manually, its all preference.
> 
> Now different people can have different preference, and a review comment 
> on what is your preference of what to
> 
> keep on cover letter does seem like a nitpick.

I am describing consensus. Take a look at DRM commits to see what
people do. It'd be nice if you followed that

> 
>>
>>> People who are added manually in cc are required for this code review
>>> session.
>> No such rule exists. It is best to add all the Cc into the commit message,
>> so that it is preserved in Git history.
> I believe this is also not a rule, we are discussing preferences only. 
> It is my preference that I want to keep only Alex and Christian in Cc.
>>
>> For instance, I just randomly did "git log drivers/gpu/drm/*.[hc]" in
>> amd-staging-drm-next, and this is the first commit which came up,
>>
>> commit 097ee58f3ddf7d
>> Author: Harry Wentland <harry.wentland@amd.com>
>> Date:   Fri Jan 13 11:24:09 2023 -0500
>>
>>      drm/connector: print max_requested_bpc in state debugfs
>>      
>>      This is useful to understand the bpc defaults and
>>      support of a driver.
>>      
>>      Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>>      Cc: Pekka Paalanen <ppaalanen@gmail.com>
>>      Cc: Sebastian Wick <sebastian.wick@redhat.com>
>>      Cc: Vitaly.Prosyak@amd.com
>>      Cc: Uma Shankar <uma.shankar@intel.com>
>>      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>      Cc: Joshua Ashton <joshua@froggi.es>
>>      Cc: Jani Nikula <jani.nikula@linux.intel.com>
>>      Cc: dri-devel@lists.freedesktop.org
>>      Cc: amd-gfx@lists.freedesktop.org
>>      Reviewed-By: Joshua Ashton <joshua@froggi.es>
>>      Link: https://patchwork.freedesktop.org/patch/msgid/20230113162428.33874-3-harry.wentland@amd.com
>>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>
>> As you can see the whole Cc list and the MLs are part of the Cc tags.
>> And the rest of the commits are also good examples of how to do it.
>> (Don't worry about the Link tag--it is automatically added by tools
>> maintainers use, although some use Lore.)
>> This preserves things in Git history, and it's a good thing if we need
>> to data mine and brainstorm later on on patches, design, and so on.
> 
> No, this is not random, this is actually well planned. All of these 

I never said it is "random"--it is not, it is well planned because
everyone submitting to DRM does this--it's common practice.

> people here in Cc are either the maintainers or respective domain experts or
> 
> contributors of color management feature and keeping them in CC is about 
> how this color management feature is being carried forward, so this is
> 
> exactly aligned with my point. Do note that this is a DRM level change 
> (not AMDGPU level).

So, then why isn't Felix in the Cc tags? Doorbell changes touch that area too.
He's actually the only one you left out, other than the MLs emails.
Either add everyone to the Cc tags in the commit message, or only add
your Sob line and leave everyone to a --cc= on the command line. Both are
common practice and acceptable.

> Also, I would like to express that in my opinion we are spending way too 
> much time in discussing the 'preferences' around cover letter,

I agree. But those aren't "preferences," they are common practices,
like for instance not separating the Cc tags and the Sob tags with
an empty line, or shifting struct member names to the same column
for readability, and so on. Use "git log -- drivers/gpu/drm".

Regards,
Luben

> 
> words, comments and variable names. No cover letter is perfect, no 
> commit message is good enough to explain what is happening in code,
> 
> no variable name is flawless and no comment explains what is going on in 
> code which is clear to everyone. These are very subjective to the
> 
> author and the reader. The only deciding factor is if there is a 
> community rule/guideline on that.
> 
> 
> I appreciate your time and suggestions but I also certainly do not want 
> to spend this much time to discuss how should we add people in Cc.
> 
> Let's keep preferences separate from code review process.
> 
> - Shashank
> 
>>
>> A good tool to use is "scripts/get_maintainer.pl" which works
>> on a file, directory and even patch files.
>>
>> I usually include everyone get_maintainer.pl prints, and on subsequent patch
>> revisions, also people who have previously commented on the patchset, as they
>> might be interested to follow up. It's a good thing to do.
>>
>> Here are a couple of resources, in addition to DRM commits in the tree,
>> https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs
>> https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format
>> (The second link includes links to more resources on good patch writing.)
>>
>> Hope this helps.
>>
>> Regards,
>> Luben
>>
>>
>>> - Shashank
>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>>    .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
>>>>>    1 file changed, 70 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>> index 39e90ea32fcb..1627641a4a4e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>> @@ -23,12 +23,73 @@
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_userqueue.h"
>>>>>    #include "v11_structs.h"
>>>>> +#include "amdgpu_mes.h"
>>>>>    
>>>>>    #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>>>    #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>>>    #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>>>    #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>>>    
>>>>> +static int
>>>>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>>>> +                         struct amdgpu_usermode_queue *queue)
>>>>> +{
>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>> +    struct mes_add_queue_input queue_input;
>>>>> +    int r;
>>>>> +
>>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
>>>>> +
>>>>> +    queue_input.process_va_start = 0;
>>>>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
>>>>> +    queue_input.process_quantum = 100000; /* 10ms */
>>>>> +    queue_input.gang_quantum = 10000; /* 1ms */
>>>>> +    queue_input.paging = false;
>>>>> +
>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>>>>> +    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>> +    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>> +
>>>>> +    queue_input.process_id = queue->vm->pasid;
>>>>> +    queue_input.queue_type = queue->queue_type;
>>>>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>>>>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>>>>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>>> +    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>>>> +
>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>> +    if (r) {
>>>>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>>>>> +        return r;
>>>>> +    }
>>>>> +
>>>>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>>>> +                           struct amdgpu_usermode_queue *queue)
>>>>> +{
>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>> +    struct mes_remove_queue_input queue_input;
>>>>> +    int r;
>>>>> +
>>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
>>>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>> +
>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>> +    if (r)
>>>>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>>>>> +}
>>>>> +
>>>>>    static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>>>                                                     struct amdgpu_usermode_queue *queue)
>>>>>    {
>>>>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>>>>    
>>>>>        amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>>>>        amdgpu_bo_unreserve(mqd->obj);
>>>>> +
>>>>> +    /* Map the queue in HW using MES ring */
>>>>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>>>>> +    if (r) {
>>>>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>>>>> +        goto free_ctx;
>>>>> +    }
>>>>> +
>>>>>        DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>>>>        return 0;
>>>>>    
>>>>> @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>>>    {
>>>>>        struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>>>>    
>>>>> +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>>>>        amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>>>>        amdgpu_bo_free_kernel(&mqd->obj,
>>>>>    			   &mqd->gpu_addr,


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-05 21:18           ` Luben Tuikov
@ 2023-04-06  7:45             ` Shashank Sharma
  2023-04-06 15:16               ` Felix Kuehling
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-04-06  7:45 UTC (permalink / raw)
  To: Luben Tuikov, amd-gfx
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig


On 05/04/2023 23:18, Luben Tuikov wrote:
> On 2023-04-05 06:06, Shashank Sharma wrote:
>> On 04/04/2023 22:58, Luben Tuikov wrote:
>>> On 2023-04-04 12:36, Shashank Sharma wrote:
>>>> On 04/04/2023 18:30, Luben Tuikov wrote:
>>>>> On 2023-03-29 12:04, Shashank Sharma wrote:
>>>>>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>>>>>
>>>>>> This patch adds new functions to map/unmap a usermode queue into
>>>>>> the FW, using the MES ring. As soon as this mapping is done, the
>>>>>> queue would  be considered ready to accept the workload.
>>>>>>
>>>>>> V1: Addressed review comments from Alex on the RFC patch series
>>>>>>        - Map/Unmap should be IP specific.
>>>>>> V2:
>>>>>>        Addressed review comments from Christian:
>>>>>>        - Fix the wptr_mc_addr calculation (moved into another patch)
>>>>>>        Addressed review comments from Alex:
>>>>>>        - Do not add fptrs for map/unmap
>>>>>>
>>>>>> V3: Integration with doorbell manager
>>>>>>
>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> ---
>>>>> Just add all your Cc right here, and let git-send-email figure it out.
>>>>> Between the Cc tags and the SMTP CC list, Felix is the only one missing.
>>>> No, that's not how it is.
>>>>
>>>> You keep people cc'ed in the cover letter so that they get informed
>>>> every time this patch is pushed/used on any opensource branch.
>>> The cover letter is optional, and you can add Cc tags
>>> into the cover letter and then git-send-email would extract those
>>> (any and all) tags from the cover letter too.
>>>
>>> When you pick and choose whom to add in the Cc tags, and whom to
>>> add to the SMTP CC list, it creates division.
>>
>> Exactly my point, there is no guideline on whom to add in Cc
>> cover-letter and whom to add manually, its all preference.
>>
>> Now different people can have different preference, and a review comment
>> on what is your preference of what to
>>
>> keep on cover letter does seem like a nitpick.
> I am describing consensus. Take a look at DRM commits to see what
> people do. It'd be nice if you followed that
No, not every DRM patch needs to be like that. I have added some 
examples below.
>>>> People who are added manually in cc are required for this code review
>>>> session.
>>> No such rule exists. It is best to add all the Cc into the commit message,
>>> so that it is preserved in Git history.
>> I believe this is also not a rule, we are discussing preferences only.
>> It is my preference that I want to keep only Alex and Christian in Cc.
>>> For instance, I just randomly did "git log drivers/gpu/drm/*.[hc]" in
>>> amd-staging-drm-next, and this is the first commit which came up,
>>>
>>> commit 097ee58f3ddf7d
>>> Author: Harry Wentland <harry.wentland@amd.com>
>>> Date:   Fri Jan 13 11:24:09 2023 -0500
>>>
>>>       drm/connector: print max_requested_bpc in state debugfs
>>>       
>>>       This is useful to understand the bpc defaults and
>>>       support of a driver.
>>>       
>>>       Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>>>       Cc: Pekka Paalanen <ppaalanen@gmail.com>
>>>       Cc: Sebastian Wick <sebastian.wick@redhat.com>
>>>       Cc: Vitaly.Prosyak@amd.com
>>>       Cc: Uma Shankar <uma.shankar@intel.com>
>>>       Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>       Cc: Joshua Ashton <joshua@froggi.es>
>>>       Cc: Jani Nikula <jani.nikula@linux.intel.com>
>>>       Cc: dri-devel@lists.freedesktop.org
>>>       Cc: amd-gfx@lists.freedesktop.org
>>>       Reviewed-By: Joshua Ashton <joshua@froggi.es>
>>>       Link: https://patchwork.freedesktop.org/patch/msgid/20230113162428.33874-3-harry.wentland@amd.com
>>>       Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>
>>> As you can see the whole Cc list and the MLs are part of the Cc tags.
>>> And the rest of the commits are also good examples of how to do it.
>>> (Don't worry about the Link tag--it is automatically added by tools
>>> maintainers use, although some use Lore.)
>>> This preserves things in Git history, and it's a good thing if we need
>>> to data mine and brainstorm later on on patches, design, and so on.
>> No, this is not random, this is actually well planned. All of these
> I never said it is "random"--it is not, it is well planned because
> everyone submitting to DRM does this--it's common practice.
>> people here in Cc are either the maintainers or respective domain experts or
>>
>> contributors of color management feature and keeping them in CC is about
>> how this color management feature is being carried forward, so this is
>>
>> exactly aligned with my point. Do note that this is a DRM level change
>> (not AMDGPU level).
> So, then why isn't Felix in the Cc tags? Doorbell changes touch that area too.
> He's actually the only one you left out, other than the MLs emails.
> Either add everyone to the Cc tags in the commit message, or only add
> your Sob line and leave everyone to a --cc= on the command line. Both are
> common practice and acceptable.

This code touches KFD code, and that's why Felix needs to be involved in 
the code-review process. But

once code review is done, I don't want to spam his email every time this 
patch is pushed into some branch,

so he is not in cover-letter CC. This is how I prefer to drive the code 
review of this patch, and I don't see a problem here.


>> Also, I would like to express that in my opinion we are spending way too
>> much time in discussing the 'preferences' around cover letter,
> I agree. But those aren't "preferences," they are common practices,

This is not a common practice, its your interpretation of it.

I also picked a few examples:


https://patchwork.freedesktop.org/patch/531143/?series=116163&rev=1

This patch has multiple communities in Cc, none in cover-letter (also 
R-B'ed by you).


https://patchwork.freedesktop.org/patch/505571/

https://patchwork.freedesktop.org/patch/442410/

These are some of patches which has multiple people missing in the 
cover-letter CC than email-CC.


https://patchwork.freedesktop.org/patch/530652/?series=116055&rev=1

This patch has multiple people in email-cc but none in cover-letter CC.


There could be tons of such examples for both, and the maintainers do 
not have any problem with that,

So I don't consider this as common practice in DRM community, its just a 
preference, and hence it consumed

a lot more time and efforts in this discussion than what it should have.

- Shashank

> like for instance not separating the Cc tags and the Sob tags with
> an empty line, or shifting struct member names to the same column
> for readability, and so on. Use "git log -- drivers/gpu/drm".
>
> Regards,
> Luben
>
>> words, comments and variable names. No cover letter is perfect, no
>> commit message is good enough to explain what is happening in code,
>>
>> no variable name is flawless and no comment explains what is going on in
>> code which is clear to everyone. These are very subjective to the
>>
>> author and the reader. The only deciding factor is if there is a
>> community rule/guideline on that.
>>
>>
>> I appreciate your time and suggestions but I also certainly do not want
>> to spend this much time to discuss how should we add people in Cc.
>>
>> Let's keep preferences separate from code review process.
>>
>> - Shashank
>>
>>> A good tool to use is "scripts/get_maintainer.pl" which works
>>> on a file, directory and even patch files.
>>>
>>> I usually include everyone get_maintainer.pl prints, and on subsequent patch
>>> revisions, also people who have previously commented on the patchset, as they
>>> might be interested to follow up. It's a good thing to do.
>>>
>>> Here are a couple of resources, in addition to DRM commits in the tree,
>>> https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs
>>> https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format
>>> (The second link includes links to more resources on good patch writing.)
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Luben
>>>
>>>
>>>> - Shashank
>>>>
>>>>> Regards,
>>>>> Luben
>>>>>
>>>>>>     .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 +++++++++++++++++++
>>>>>>     1 file changed, 70 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>> index 39e90ea32fcb..1627641a4a4e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>> @@ -23,12 +23,73 @@
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_userqueue.h"
>>>>>>     #include "v11_structs.h"
>>>>>> +#include "amdgpu_mes.h"
>>>>>>     
>>>>>>     #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>>>>     #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>>>>     #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>>>>     #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>>>>     
>>>>>> +static int
>>>>>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>>>>> +                         struct amdgpu_usermode_queue *queue)
>>>>>> +{
>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>> +    struct mes_add_queue_input queue_input;
>>>>>> +    int r;
>>>>>> +
>>>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
>>>>>> +
>>>>>> +    queue_input.process_va_start = 0;
>>>>>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) << AMDGPU_GPU_PAGE_SHIFT;
>>>>>> +    queue_input.process_quantum = 100000; /* 10ms */
>>>>>> +    queue_input.gang_quantum = 10000; /* 1ms */
>>>>>> +    queue_input.paging = false;
>>>>>> +
>>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>>>>>> +    queue_input.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>>> +    queue_input.gang_global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>>> +
>>>>>> +    queue_input.process_id = queue->vm->pasid;
>>>>>> +    queue_input.queue_type = queue->queue_type;
>>>>>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>>>>>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>>>>>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>>>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>>>> +    queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>>>>> +
>>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>>> +    if (r) {
>>>>>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>>>>>> +        return r;
>>>>>> +    }
>>>>>> +
>>>>>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", queue->queue_id);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>>>>> +                           struct amdgpu_usermode_queue *queue)
>>>>>> +{
>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>> +    struct mes_remove_queue_input queue_input;
>>>>>> +    int r;
>>>>>> +
>>>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_remove_queue_input));
>>>>>> +    queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>>> +
>>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, &queue_input);
>>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>>> +    if (r)
>>>>>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>>>>>> +}
>>>>>> +
>>>>>>     static int amdgpu_userq_gfx_v11_create_ctx_space(struct amdgpu_userq_mgr *uq_mgr,
>>>>>>                                                      struct amdgpu_usermode_queue *queue)
>>>>>>     {
>>>>>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>>>>>     
>>>>>>         amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>>>>>         amdgpu_bo_unreserve(mqd->obj);
>>>>>> +
>>>>>> +    /* Map the queue in HW using MES ring */
>>>>>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>>>>>> +    if (r) {
>>>>>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>>>>>> +        goto free_ctx;
>>>>>> +    }
>>>>>> +
>>>>>>         DRM_DEBUG_DRIVER("MQD for queue %d created\n", queue->queue_id);
>>>>>>         return 0;
>>>>>>     
>>>>>> @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>>>>     {
>>>>>>         struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>>>>>     
>>>>>> +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>>>>>         amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>>>>>         amdgpu_bo_free_kernel(&mqd->obj,
>>>>>>     			   &mqd->gpu_addr,

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-06  7:45             ` Shashank Sharma
@ 2023-04-06 15:16               ` Felix Kuehling
  2023-04-07  6:41                 ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Felix Kuehling @ 2023-04-06 15:16 UTC (permalink / raw)
  To: Shashank Sharma, Luben Tuikov, amd-gfx
  Cc: Alex Deucher, Shashank Sharma, Christian Koenig

Am 2023-04-06 um 03:45 schrieb Shashank Sharma:
>
> On 05/04/2023 23:18, Luben Tuikov wrote: So, then why isn't Felix in 
> the Cc tags? Doorbell changes touch that area too.
>> He's actually the only one you left out, other than the MLs emails.
>> Either add everyone to the Cc tags in the commit message, or only add
>> your Sob line and leave everyone to a --cc= on the command line. Both 
>> are
>> common practice and acceptable.
>
> This code touches KFD code, and that's why Felix needs to be involved 
> in the code-review process. But
>
> once code review is done, I don't want to spam his email every time 
> this patch is pushed into some branch,
>
> so he is not in cover-letter CC. This is how I prefer to drive the 
> code review of this patch, and I don't see a problem here.

It's a big patch series, and I don't have time to review the whole thing 
in detail. A CC tag on the patches that need my attention would help.

Thanks,
   Felix


>
>
>>> Also, I would like to express that in my opinion we are spending way 
>>> too
>>> much time in discussing the 'preferences' around cover letter,
>> I agree. But those aren't "preferences," they are common practices,
>
> This is not a common practice, its your interpretation of it.
>
> I also picked a few examples:
>
>
> https://patchwork.freedesktop.org/patch/531143/?series=116163&rev=1
>
> This patch has multiple communities in Cc, none in cover-letter (also 
> R-B'ed by you).
>
>
> https://patchwork.freedesktop.org/patch/505571/
>
> https://patchwork.freedesktop.org/patch/442410/
>
> These are some of patches which has multiple people missing in the 
> cover-letter CC than email-CC.
>
>
> https://patchwork.freedesktop.org/patch/530652/?series=116055&rev=1
>
> This patch has multiple people in email-cc but none in cover-letter CC.
>
>
> There could be tons of such examples for both, and the maintainers do 
> not have any problem with that,
>
> So I don't consider this as common practice in DRM community, its just 
> a preference, and hence it consumed
>
> a lot more time and efforts in this discussion than what it should have.
>
> - Shashank
>
>> like for instance not separating the Cc tags and the Sob tags with
>> an empty line, or shifting struct member names to the same column
>> for readability, and so on. Use "git log -- drivers/gpu/drm".
>>
>> Regards,
>> Luben
>>
>>> words, comments and variable names. No cover letter is perfect, no
>>> commit message is good enough to explain what is happening in code,
>>>
>>> no variable name is flawless and no comment explains what is going 
>>> on in
>>> code which is clear to everyone. These are very subjective to the
>>>
>>> author and the reader. The only deciding factor is if there is a
>>> community rule/guideline on that.
>>>
>>>
>>> I appreciate your time and suggestions but I also certainly do not want
>>> to spend this much time to discuss how should we add people in Cc.
>>>
>>> Let's keep preferences separate from code review process.
>>>
>>> - Shashank
>>>
>>>> A good tool to use is "scripts/get_maintainer.pl" which works
>>>> on a file, directory and even patch files.
>>>>
>>>> I usually include everyone get_maintainer.pl prints, and on 
>>>> subsequent patch
>>>> revisions, also people who have previously commented on the 
>>>> patchset, as they
>>>> might be interested to follow up. It's a good thing to do.
>>>>
>>>> Here are a couple of resources, in addition to DRM commits in the 
>>>> tree,
>>>> https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs 
>>>>
>>>> https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format 
>>>>
>>>> (The second link includes links to more resources on good patch 
>>>> writing.)
>>>>
>>>> Hope this helps.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>
>>>>> - Shashank
>>>>>
>>>>>> Regards,
>>>>>> Luben
>>>>>>
>>>>>>> .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 
>>>>>>> +++++++++++++++++++
>>>>>>>     1 file changed, 70 insertions(+)
>>>>>>>
>>>>>>> diff --git 
>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c 
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>> index 39e90ea32fcb..1627641a4a4e 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>> @@ -23,12 +23,73 @@
>>>>>>>     #include "amdgpu.h"
>>>>>>>     #include "amdgpu_userqueue.h"
>>>>>>>     #include "v11_structs.h"
>>>>>>> +#include "amdgpu_mes.h"
>>>>>>>         #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>>>>>     #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>>>>>     #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>>>>>     #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>>>>>     +static int
>>>>>>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>>>>>> +                         struct amdgpu_usermode_queue *queue)
>>>>>>> +{
>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>> +    struct mes_add_queue_input queue_input;
>>>>>>> +    int r;
>>>>>>> +
>>>>>>> +    memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input));
>>>>>>> +
>>>>>>> +    queue_input.process_va_start = 0;
>>>>>>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 1) 
>>>>>>> << AMDGPU_GPU_PAGE_SHIFT;
>>>>>>> +    queue_input.process_quantum = 100000; /* 10ms */
>>>>>>> +    queue_input.gang_quantum = 10000; /* 1ms */
>>>>>>> +    queue_input.paging = false;
>>>>>>> +
>>>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>>>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>>>>>>> +    queue_input.inprocess_gang_priority = 
>>>>>>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>>>> +    queue_input.gang_global_priority_level = 
>>>>>>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>>>> +
>>>>>>> +    queue_input.process_id = queue->vm->pasid;
>>>>>>> +    queue_input.queue_type = queue->queue_type;
>>>>>>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>>>>>>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>>>>>>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>>>>>> +    queue_input.doorbell_offset = 
>>>>>>> queue->userq_prop.doorbell_index;
>>>>>>> +    queue_input.page_table_base_addr = 
>>>>>>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>>>>>> +
>>>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>>>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>>>> +    if (r) {
>>>>>>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>>>>>>> +        return r;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", 
>>>>>>> queue->queue_id);
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>>>>>> +                           struct amdgpu_usermode_queue *queue)
>>>>>>> +{
>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>> +    struct mes_remove_queue_input queue_input;
>>>>>>> +    int r;
>>>>>>> +
>>>>>>> +    memset(&queue_input, 0x0, sizeof(struct 
>>>>>>> mes_remove_queue_input));
>>>>>>> +    queue_input.doorbell_offset = 
>>>>>>> queue->userq_prop.doorbell_index;
>>>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>>>> +
>>>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>>>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, 
>>>>>>> &queue_input);
>>>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>>>> +    if (r)
>>>>>>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>>>>>>> +}
>>>>>>> +
>>>>>>>     static int amdgpu_userq_gfx_v11_create_ctx_space(struct 
>>>>>>> amdgpu_userq_mgr *uq_mgr,
>>>>>>> struct amdgpu_usermode_queue *queue)
>>>>>>>     {
>>>>>>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct 
>>>>>>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>>>>>>             amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>>>>>>         amdgpu_bo_unreserve(mqd->obj);
>>>>>>> +
>>>>>>> +    /* Map the queue in HW using MES ring */
>>>>>>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>>>>>>> +    if (r) {
>>>>>>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>>>>>>> +        goto free_ctx;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         DRM_DEBUG_DRIVER("MQD for queue %d created\n", 
>>>>>>> queue->queue_id);
>>>>>>>         return 0;
>>>>>>>     @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct 
>>>>>>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>>>>>     {
>>>>>>>         struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>>>>>>     +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>>>>>>         amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>>>>>>         amdgpu_bo_free_kernel(&mqd->obj,
>>>>>>>                    &mqd->gpu_addr,

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES
  2023-04-06 15:16               ` Felix Kuehling
@ 2023-04-07  6:41                 ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-04-07  6:41 UTC (permalink / raw)
  To: Felix Kuehling, Luben Tuikov, amd-gfx
  Cc: Alex Deucher, Shashank Sharma, Christian Koenig


On 06/04/2023 17:16, Felix Kuehling wrote:
> Am 2023-04-06 um 03:45 schrieb Shashank Sharma:
>>
>> On 05/04/2023 23:18, Luben Tuikov wrote: So, then why isn't Felix in 
>> the Cc tags? Doorbell changes touch that area too.
>>> He's actually the only one you left out, other than the MLs emails.
>>> Either add everyone to the Cc tags in the commit message, or only add
>>> your Sob line and leave everyone to a --cc= on the command line. 
>>> Both are
>>> common practice and acceptable.
>>
>> This code touches KFD code, and that's why Felix needs to be involved 
>> in the code-review process. But
>>
>> once code review is done, I don't want to spam his email every time 
>> this patch is pushed into some branch,
>>
>> so he is not in cover-letter CC. This is how I prefer to drive the 
>> code review of this patch, and I don't see a problem here.
>
> It's a big patch series, and I don't have time to review the whole 
> thing in detail. A CC tag on the patches that need my attention would 
> help.
>
Hey Felix,

noted. There are only 2 patches which touch the KFD code, I will keep 
you in CC only for those.

- Shashank


> Thanks,
>   Felix
>
>
>>
>>
>>>> Also, I would like to express that in my opinion we are spending 
>>>> way too
>>>> much time in discussing the 'preferences' around cover letter,
>>> I agree. But those aren't "preferences," they are common practices,
>>
>> This is not a common practice, its your interpretation of it.
>>
>> I also picked a few examples:
>>
>>
>> https://patchwork.freedesktop.org/patch/531143/?series=116163&rev=1
>>
>> This patch has multiple communities in Cc, none in cover-letter (also 
>> R-B'ed by you).
>>
>>
>> https://patchwork.freedesktop.org/patch/505571/
>>
>> https://patchwork.freedesktop.org/patch/442410/
>>
>> These are some of patches which has multiple people missing in the 
>> cover-letter CC than email-CC.
>>
>>
>> https://patchwork.freedesktop.org/patch/530652/?series=116055&rev=1
>>
>> This patch has multiple people in email-cc but none in cover-letter CC.
>>
>>
>> There could be tons of such examples for both, and the maintainers do 
>> not have any problem with that,
>>
>> So I don't consider this as common practice in DRM community, its 
>> just a preference, and hence it consumed
>>
>> a lot more time and efforts in this discussion than what it should have.
>>
>> - Shashank
>>
>>> like for instance not separating the Cc tags and the Sob tags with
>>> an empty line, or shifting struct member names to the same column
>>> for readability, and so on. Use "git log -- drivers/gpu/drm".
>>>
>>> Regards,
>>> Luben
>>>
>>>> words, comments and variable names. No cover letter is perfect, no
>>>> commit message is good enough to explain what is happening in code,
>>>>
>>>> no variable name is flawless and no comment explains what is going 
>>>> on in
>>>> code which is clear to everyone. These are very subjective to the
>>>>
>>>> author and the reader. The only deciding factor is if there is a
>>>> community rule/guideline on that.
>>>>
>>>>
>>>> I appreciate your time and suggestions but I also certainly do not 
>>>> want
>>>> to spend this much time to discuss how should we add people in Cc.
>>>>
>>>> Let's keep preferences separate from code review process.
>>>>
>>>> - Shashank
>>>>
>>>>> A good tool to use is "scripts/get_maintainer.pl" which works
>>>>> on a file, directory and even patch files.
>>>>>
>>>>> I usually include everyone get_maintainer.pl prints, and on 
>>>>> subsequent patch
>>>>> revisions, also people who have previously commented on the 
>>>>> patchset, as they
>>>>> might be interested to follow up. It's a good thing to do.
>>>>>
>>>>> Here are a couple of resources, in addition to DRM commits in the 
>>>>> tree,
>>>>> https://www.kernel.org/doc/html/v4.12/process/5.Posting.html#patch-formatting-and-changelogs 
>>>>>
>>>>> https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#the-canonical-patch-format 
>>>>>
>>>>> (The second link includes links to more resources on good patch 
>>>>> writing.)
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Regards,
>>>>> Luben
>>>>>
>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>> Regards,
>>>>>>> Luben
>>>>>>>
>>>>>>>> .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 70 
>>>>>>>> +++++++++++++++++++
>>>>>>>>     1 file changed, 70 insertions(+)
>>>>>>>>
>>>>>>>> diff --git 
>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c 
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>> index 39e90ea32fcb..1627641a4a4e 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>> @@ -23,12 +23,73 @@
>>>>>>>>     #include "amdgpu.h"
>>>>>>>>     #include "amdgpu_userqueue.h"
>>>>>>>>     #include "v11_structs.h"
>>>>>>>> +#include "amdgpu_mes.h"
>>>>>>>>         #define AMDGPU_USERQ_PROC_CTX_SZ PAGE_SIZE
>>>>>>>>     #define AMDGPU_USERQ_GANG_CTX_SZ PAGE_SIZE
>>>>>>>>     #define AMDGPU_USERQ_FW_CTX_SZ PAGE_SIZE
>>>>>>>>     #define AMDGPU_USERQ_GDS_CTX_SZ PAGE_SIZE
>>>>>>>>     +static int
>>>>>>>> +amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>>>>>>> +                         struct amdgpu_usermode_queue *queue)
>>>>>>>> +{
>>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>>> +    struct mes_add_queue_input queue_input;
>>>>>>>> +    int r;
>>>>>>>> +
>>>>>>>> +    memset(&queue_input, 0x0, sizeof(struct 
>>>>>>>> mes_add_queue_input));
>>>>>>>> +
>>>>>>>> +    queue_input.process_va_start = 0;
>>>>>>>> +    queue_input.process_va_end = (adev->vm_manager.max_pfn - 
>>>>>>>> 1) << AMDGPU_GPU_PAGE_SHIFT;
>>>>>>>> +    queue_input.process_quantum = 100000; /* 10ms */
>>>>>>>> +    queue_input.gang_quantum = 10000; /* 1ms */
>>>>>>>> +    queue_input.paging = false;
>>>>>>>> +
>>>>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>>>>> +    queue_input.process_context_addr = queue->proc_ctx_gpu_addr;
>>>>>>>> +    queue_input.inprocess_gang_priority = 
>>>>>>>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>>>>> +    queue_input.gang_global_priority_level = 
>>>>>>>> AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
>>>>>>>> +
>>>>>>>> +    queue_input.process_id = queue->vm->pasid;
>>>>>>>> +    queue_input.queue_type = queue->queue_type;
>>>>>>>> +    queue_input.mqd_addr = queue->mqd.gpu_addr;
>>>>>>>> +    queue_input.wptr_addr = queue->userq_prop.wptr_gpu_addr;
>>>>>>>> +    queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>>>>>>> +    queue_input.doorbell_offset = 
>>>>>>>> queue->userq_prop.doorbell_index;
>>>>>>>> +    queue_input.page_table_base_addr = 
>>>>>>>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>>>>>>> +
>>>>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>>>>> +    r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>>>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>>>>> +    if (r) {
>>>>>>>> +        DRM_ERROR("Failed to map queue in HW, err (%d)\n", r);
>>>>>>>> +        return r;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    DRM_DEBUG_DRIVER("Queue %d mapped successfully\n", 
>>>>>>>> queue->queue_id);
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +amdgpu_userq_gfx_v11_unmap(struct amdgpu_userq_mgr *uq_mgr,
>>>>>>>> +                           struct amdgpu_usermode_queue *queue)
>>>>>>>> +{
>>>>>>>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>>>>>>> +    struct mes_remove_queue_input queue_input;
>>>>>>>> +    int r;
>>>>>>>> +
>>>>>>>> +    memset(&queue_input, 0x0, sizeof(struct 
>>>>>>>> mes_remove_queue_input));
>>>>>>>> +    queue_input.doorbell_offset = 
>>>>>>>> queue->userq_prop.doorbell_index;
>>>>>>>> +    queue_input.gang_context_addr = queue->gang_ctx_gpu_addr;
>>>>>>>> +
>>>>>>>> +    amdgpu_mes_lock(&adev->mes);
>>>>>>>> +    r = adev->mes.funcs->remove_hw_queue(&adev->mes, 
>>>>>>>> &queue_input);
>>>>>>>> +    amdgpu_mes_unlock(&adev->mes);
>>>>>>>> +    if (r)
>>>>>>>> +        DRM_ERROR("Failed to unmap queue in HW, err (%d)\n", r);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static int amdgpu_userq_gfx_v11_create_ctx_space(struct 
>>>>>>>> amdgpu_userq_mgr *uq_mgr,
>>>>>>>> struct amdgpu_usermode_queue *queue)
>>>>>>>>     {
>>>>>>>> @@ -129,6 +190,14 @@ amdgpu_userq_gfx_v11_mqd_create(struct 
>>>>>>>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
>>>>>>>>             amdgpu_userq_set_ctx_space(uq_mgr, queue);
>>>>>>>>         amdgpu_bo_unreserve(mqd->obj);
>>>>>>>> +
>>>>>>>> +    /* Map the queue in HW using MES ring */
>>>>>>>> +    r = amdgpu_userq_gfx_v11_map(uq_mgr, queue);
>>>>>>>> +    if (r) {
>>>>>>>> +        DRM_ERROR("Failed to map userqueue (%d)\n", r);
>>>>>>>> +        goto free_ctx;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         DRM_DEBUG_DRIVER("MQD for queue %d created\n", 
>>>>>>>> queue->queue_id);
>>>>>>>>         return 0;
>>>>>>>>     @@ -147,6 +216,7 @@ amdgpu_userq_gfx_v11_mqd_destroy(struct 
>>>>>>>> amdgpu_userq_mgr *uq_mgr, struct amdgpu_
>>>>>>>>     {
>>>>>>>>         struct amdgpu_userq_ctx_space *mqd = &queue->mqd;
>>>>>>>>     +    amdgpu_userq_gfx_v11_unmap(uq_mgr, queue);
>>>>>>>> amdgpu_userq_gfx_v11_destroy_ctx_space(uq_mgr, queue);
>>>>>>>>         amdgpu_bo_free_kernel(&mqd->obj,
>>>>>>>>                    &mqd->gpu_addr,

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART
  2023-03-29 16:04 ` [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
@ 2023-04-10  0:00   ` Bas Nieuwenhuizen
  2023-04-11  9:29     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-10  0:00 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Arvind Yadav, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Arvind Yadav <arvind.yadav@amd.com>
>
> To support oversubscription, MES expects WPTR BOs to be mapped
> to GART, before they are submitted to usermode queues.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 89 +++++++++++++++++++
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |  1 +
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>  3 files changed, 91 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 5672efcbcffc..7409a4ae55da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -43,6 +43,89 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>      return idr_find(&uq_mgr->userq_idr, qid);
>  }
>
> +static int
> +amdgpu_userqueue_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
> +{
> +    int ret;
> +
> +    ret = amdgpu_bo_reserve(bo, true);
> +    if (ret) {
> +        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
> +        goto err_reserve_bo_failed;
> +    }
> +
> +    ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
> +    if (ret) {
> +        DRM_ERROR("Failed to pin bo. ret %d\n", ret);
> +        goto err_pin_bo_failed;
> +    }
> +
> +    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
> +    if (ret) {
> +        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
> +        goto err_map_bo_gart_failed;
> +    }
> +
> +
> +    amdgpu_bo_unreserve(bo);
> +    bo = amdgpu_bo_ref(bo);
> +
> +    return 0;
> +
> +err_map_bo_gart_failed:
> +    amdgpu_bo_unpin(bo);
> +err_pin_bo_failed:
> +    amdgpu_bo_unreserve(bo);
> +err_reserve_bo_failed:
> +
> +    return ret;
> +}
> +
> +
> +static int
> +amdgpu_userqueue_create_wptr_mapping(struct amdgpu_device *adev,
> +                                    struct drm_file *filp,
> +                                    struct amdgpu_usermode_queue *queue)
> +{
> +    struct amdgpu_bo_va_mapping *wptr_mapping;
> +    struct amdgpu_vm *wptr_vm;
> +    struct amdgpu_bo *wptr_bo = NULL;
> +    uint64_t wptr = queue->userq_prop.wptr_gpu_addr;
> +    int ret;
> +
> +    wptr_vm = queue->vm;
> +    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
> +    if (ret)
> +        goto err_wptr_map_gart;
> +
> +    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
> +    amdgpu_bo_unreserve(wptr_vm->root.bo);
> +    if (!wptr_mapping) {
> +        DRM_ERROR("Failed to lookup wptr bo\n");
> +        ret = -EINVAL;
> +        goto err_wptr_map_gart;
> +    }

This triggers for wptr BOs mapped to the high half of address space,
may need some mangling wrt the top bits?

> +
> +    wptr_bo = wptr_mapping->bo_va->base.bo;
> +    if (wptr_bo->tbo.base.size > PAGE_SIZE) {
> +        DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
> +        ret = -EINVAL;
> +        goto err_wptr_map_gart;
> +    }
> +
> +    ret = amdgpu_userqueue_map_gtt_bo_to_gart(adev, wptr_bo);
> +    if (ret) {
> +        DRM_ERROR("Failed to map wptr bo to GART\n");
> +        goto err_wptr_map_gart;
> +    }
> +
> +    queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
> +    return 0;
> +
> +err_wptr_map_gart:
> +    return ret;
> +}
> +
>  static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>  {
>      struct amdgpu_usermode_queue *queue;
> @@ -82,6 +165,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>          goto free_queue;
>      }
>
> +    r = amdgpu_userqueue_create_wptr_mapping(uq_mgr->adev, filp, queue);
> +    if (r) {
> +        DRM_ERROR("Failed to map WPTR (0x%llx) for userqueue\n", queue->userq_prop.wptr_gpu_addr);
> +        goto free_queue;
> +    }
> +
>      r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
>      if (r) {
>          DRM_ERROR("Failed to create/map userqueue MQD\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> index 1627641a4a4e..274e78826334 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> @@ -58,6 +58,7 @@ amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>      queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>      queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>      queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
> +    queue_input.wptr_mc_addr = queue->wptr_mc_addr;
>
>      amdgpu_mes_lock(&adev->mes);
>      r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 8b62ef77cd26..eaab7cf5fff6 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -38,6 +38,7 @@ struct amdgpu_usermode_queue {
>         int queue_type;
>         uint64_t flags;
>         uint64_t doorbell_handle;
> +       uint64_t wptr_mc_addr;
>         uint64_t proc_ctx_gpu_addr;
>         uint64_t gang_ctx_gpu_addr;
>         uint64_t gds_ctx_gpu_addr;
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue
  2023-03-29 16:04 ` [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
@ 2023-04-10  0:02   ` Bas Nieuwenhuizen
  2023-04-10 14:28     ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-10  0:02 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig, amd-gfx

On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> From: Shashank Sharma <contactshashanksharma@gmail.com>
>
> This patch adds:
> - A new IOCTL function to create and destroy
> - A new structure to keep all the user queue data in one place.
> - A function to generate unique index for the queue.
>
> V1: Worked on review comments from RFC patch series:
>   - Alex: Keep a list of queues, instead of single queue per process.
>   - Christian: Use the queue manager instead of global ptrs,
>            Don't keep the queue structure in amdgpu_ctx
>
> V2: Worked on review comments:
>  - Christian:
>    - Formatting of text
>    - There is no need for queuing of userqueues, with idr in place
>  - Alex:
>    - Remove use_doorbell, its unnecessary
>    - Reuse amdgpu_mqd_props for saving mqd fields
>
>  - Code formatting and re-arrangement
>
> V3:
>  - Integration with doorbell manager
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 113 ++++++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
>  3 files changed, 116 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 2d6bcfd727c8..229976a2d0e7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2749,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>  };
>
>  static const struct drm_driver amdgpu_kms_driver = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 13e1eebc1cb6..353f57c5a772 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -22,6 +22,119 @@
>   */
>
>  #include "amdgpu.h"
> +#include "amdgpu_vm.h"
> +#include "amdgpu_userqueue.h"
> +
> +static inline int
> +amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
> +{
> +    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
> +}
> +
> +static inline void
> +amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
> +{
> +    idr_remove(&uq_mgr->userq_idr, queue_id);
> +}
> +
> +static struct amdgpu_usermode_queue *
> +amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
> +{
> +    return idr_find(&uq_mgr->userq_idr, qid);
> +}
> +
> +static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> +{
> +    struct amdgpu_usermode_queue *queue;
> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
> +    int r;
> +
> +    /* Do we have support userqueues for this IP ? */
> +    if (!uq_mgr->userq_funcs[mqd_in->ip_type]) {
> +        DRM_ERROR("GFX User queues not supported for this IP: %d\n", mqd_in->ip_type);
> +        return -EINVAL;
> +    }
> +
> +    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
> +    if (!queue) {
> +        DRM_ERROR("Failed to allocate memory for queue\n");
> +        return -ENOMEM;
> +    }
> +
> +    mutex_lock(&uq_mgr->userq_mutex);
> +    queue->userq_prop.wptr_gpu_addr = mqd_in->wptr_va;
> +    queue->userq_prop.rptr_gpu_addr = mqd_in->rptr_va;
> +    queue->userq_prop.queue_size = mqd_in->queue_size;
> +    queue->userq_prop.hqd_base_gpu_addr = mqd_in->queue_va;
> +    queue->userq_prop.queue_size = mqd_in->queue_size;

This sets queue_size twice.

> +
> +    queue->doorbell_handle = mqd_in->doorbell_handle;
> +    queue->queue_type = mqd_in->ip_type;
> +    queue->flags = mqd_in->flags;
> +    queue->vm = &fpriv->vm;
> +    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
> +    if (queue->queue_id < 0) {
> +        DRM_ERROR("Failed to allocate a queue id\n");
> +        r = queue->queue_id;
> +        goto free_queue;
> +    }
> +
> +    args->out.queue_id = queue->queue_id;
> +    args->out.flags = 0;
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    return 0;
> +
> +free_queue:
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    kfree(queue);
> +    return r;
> +}
> +
> +static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
> +{
> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
> +    struct amdgpu_usermode_queue *queue;
> +
> +    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
> +    if (!queue) {
> +        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
> +        return;
> +    }
> +
> +    mutex_lock(&uq_mgr->userq_mutex);
> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
> +    mutex_unlock(&uq_mgr->userq_mutex);
> +    kfree(queue);
> +}
> +
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
> +                      struct drm_file *filp)
> +{
> +    union drm_amdgpu_userq *args = data;
> +    int r = 0;
> +
> +    switch (args->in.op) {
> +    case AMDGPU_USERQ_OP_CREATE:
> +        r = amdgpu_userqueue_create(filp, args);
> +        if (r)
> +            DRM_ERROR("Failed to create usermode queue\n");
> +        break;
> +
> +    case AMDGPU_USERQ_OP_FREE:
> +        amdgpu_userqueue_destroy(filp, args->in.queue_id);
> +        break;
> +
> +    default:
> +        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
> +        return -EINVAL;
> +    }
> +
> +    return r;
> +}
> +
>
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>  {
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index 7eeb8c9e6575..7625a862b1fc 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -42,6 +42,8 @@ struct amdgpu_userq_funcs {
>         void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>  };
>
> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
> +
>  int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>
>  void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
                   ` (8 preceding siblings ...)
  2023-03-29 16:04 ` [PATCH v3 9/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
@ 2023-04-10  0:36 ` Bas Nieuwenhuizen
  2023-04-10  7:32   ` Sharma, Shashank
  9 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-10  0:36 UTC (permalink / raw)
  To: Shashank Sharma; +Cc: Alex Deucher, Felix Kuehling, Christian Koenig, amd-gfx

Hi Shashank,

I tried writing a program to experiment with usermode queues and I
found some weird behavior: The first run of the program works as
expected, while subsequent runs don't seem to do anything (and I
allocate everything in GTT, so it should be zero initialized
consistently). Is this a known issue?

The linked libdrm code for the uapi still does a doorbell ioctl so it
could very well be that I do the doorbell wrong (especially since the
ioctl implementation was never shared AFAICT?), but it seems like the
kernel submissions (i.e. write wptr in dwords to the wptr va and to
the doorbell). Is it possible to update the test in libdrm?

Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue

Thanks,
Bas

On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch series introduces AMDGPU usermode queues for gfx workloads.
> Usermode queues is a method of GPU workload submission into the graphics
> hardware without any interaction with kernel/DRM schedulers. In this
> method, a userspace graphics application can create its own workqueue
> and submit it directly in the GPU HW.
>
> The general idea of how this is supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
> - The application picks a 32-bit offset in the doorbell page for this queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the the GPU addresses of these objects (read
>   ptr, write ptr, queue base address and 32-bit doorbell offset from
>   the doorbell page)
> - The kernel creates the queue and maps it in the HW.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - After filling the workload data in the queue, the app must write the
>   number of dwords added in the queue into the doorbell offset, and the
>   GPU will start fetching the data.
>
> libDRM changes for this series and a sample DRM test program can be found
> in the MESA merge request here:
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>
> This patch series depends on the doorbell-manager changes, which are being
> reviewed here:
> https://patchwork.freedesktop.org/series/115802/
>
> Alex Deucher (1):
>   drm/amdgpu: UAPI for user queue management
>
> Arvind Yadav (2):
>   drm/amdgpu: add new parameters in v11_struct
>   drm/amdgpu: map wptr BO into GART
>
> Shashank Sharma (6):
>   drm/amdgpu: add usermode queue base code
>   drm/amdgpu: add new IOCTL for usermode queue
>   drm/amdgpu: create GFX-gen11 MQD for userqueue
>   drm/amdgpu: create context space for usermode queue
>   drm/amdgpu: map usermode queue into MES
>   drm/amdgpu: generate doorbell index for userqueue
>
>  drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298 ++++++++++++++++++
>  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 230 ++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
>  drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
>  include/uapi/drm/amdgpu_drm.h                 |  55 ++++
>  9 files changed, 677 insertions(+), 9 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10  0:36 ` [PATCH v3 0/9] AMDGPU Usermode queues Bas Nieuwenhuizen
@ 2023-04-10  7:32   ` Sharma, Shashank
  2023-04-10  9:25     ` Bas Nieuwenhuizen
  0 siblings, 1 reply; 56+ messages in thread
From: Sharma, Shashank @ 2023-04-10  7:32 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx

[AMD Official Use Only - General]

Hello Bas, 
Thanks for trying this out. 

This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion. 
We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.

You can use this integrated kernel stack (1+2) from my gitlab to build your kernel: https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tree/integrated-db-and-uq-v3 
Please stay tuned for updated libDRM changes with doorbell objects.

Regards
Shashank
-----Original Message-----
From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> 
Sent: 10 April 2023 02:37
To: Sharma, Shashank <Shashank.Sharma@amd.com>
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues

Hi Shashank,

I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?

The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?

Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue

Thanks,
Bas

On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
> This patch series introduces AMDGPU usermode queues for gfx workloads.
> Usermode queues is a method of GPU workload submission into the 
> graphics hardware without any interaction with kernel/DRM schedulers. 
> In this method, a userspace graphics application can create its own 
> workqueue and submit it directly in the GPU HW.
>
> The general idea of how this is supposed to work:
> - The application creates the following GPU objetcs:
>   - A queue object to hold the workload packets.
>   - A read pointer object.
>   - A write pointer object.
>   - A doorbell page.
> - The application picks a 32-bit offset in the doorbell page for this queue.
> - The application uses the usermode_queue_create IOCTL introduced in
>   this patch, by passing the the GPU addresses of these objects (read
>   ptr, write ptr, queue base address and 32-bit doorbell offset from
>   the doorbell page)
> - The kernel creates the queue and maps it in the HW.
> - The application can start submitting the data in the queue as soon as
>   the kernel IOCTL returns.
> - After filling the workload data in the queue, the app must write the
>   number of dwords added in the queue into the doorbell offset, and the
>   GPU will start fetching the data.
>
> libDRM changes for this series and a sample DRM test program can be 
> found in the MESA merge request here:
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>
> This patch series depends on the doorbell-manager changes, which are 
> being reviewed here:
> https://patchwork.freedesktop.org/series/115802/
>
> Alex Deucher (1):
>   drm/amdgpu: UAPI for user queue management
>
> Arvind Yadav (2):
>   drm/amdgpu: add new parameters in v11_struct
>   drm/amdgpu: map wptr BO into GART
>
> Shashank Sharma (6):
>   drm/amdgpu: add usermode queue base code
>   drm/amdgpu: add new IOCTL for usermode queue
>   drm/amdgpu: create GFX-gen11 MQD for userqueue
>   drm/amdgpu: create context space for usermode queue
>   drm/amdgpu: map usermode queue into MES
>   drm/amdgpu: generate doorbell index for userqueue
>
>  drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298 
> ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 230 ++++++++++++++
>  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
>  drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
>  include/uapi/drm/amdgpu_drm.h                 |  55 ++++
>  9 files changed, 677 insertions(+), 9 deletions(-)  create mode 
> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>  create mode 100644 
> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>
> --
> 2.40.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10  7:32   ` Sharma, Shashank
@ 2023-04-10  9:25     ` Bas Nieuwenhuizen
  2023-04-10 13:40       ` Sharma, Shashank
  0 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-10  9:25 UTC (permalink / raw)
  To: Sharma, Shashank
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx

Hi Shashank,

I think I found the issue: I wasn't destroying the user queue in my
program and the kernel doesn't clean up any remaining user queues in
the postclose hook. I think we need something like
https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
?

While running things multiple times now works, I still have problems
doing multiple submissions from the same queue. Looking forward to the
updated test/sample

Thanks,
Bas

On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
<Shashank.Sharma@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Bas,
> Thanks for trying this out.
>
> This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
> This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion.
> We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.
>
> You can use this integrated kernel stack (1+2) from my gitlab to build your kernel: https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tree/integrated-db-and-uq-v3
> Please stay tuned for updated libDRM changes with doorbell objects.
>
> Regards
> Shashank
> -----Original Message-----
> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Sent: 10 April 2023 02:37
> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>
> Hi Shashank,
>
> I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?
>
> The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?
>
> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>
> Thanks,
> Bas
>
> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >
> > This patch series introduces AMDGPU usermode queues for gfx workloads.
> > Usermode queues is a method of GPU workload submission into the
> > graphics hardware without any interaction with kernel/DRM schedulers.
> > In this method, a userspace graphics application can create its own
> > workqueue and submit it directly in the GPU HW.
> >
> > The general idea of how this is supposed to work:
> > - The application creates the following GPU objetcs:
> >   - A queue object to hold the workload packets.
> >   - A read pointer object.
> >   - A write pointer object.
> >   - A doorbell page.
> > - The application picks a 32-bit offset in the doorbell page for this queue.
> > - The application uses the usermode_queue_create IOCTL introduced in
> >   this patch, by passing the the GPU addresses of these objects (read
> >   ptr, write ptr, queue base address and 32-bit doorbell offset from
> >   the doorbell page)
> > - The kernel creates the queue and maps it in the HW.
> > - The application can start submitting the data in the queue as soon as
> >   the kernel IOCTL returns.
> > - After filling the workload data in the queue, the app must write the
> >   number of dwords added in the queue into the doorbell offset, and the
> >   GPU will start fetching the data.
> >
> > libDRM changes for this series and a sample DRM test program can be
> > found in the MESA merge request here:
> > https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
> >
> > This patch series depends on the doorbell-manager changes, which are
> > being reviewed here:
> > https://patchwork.freedesktop.org/series/115802/
> >
> > Alex Deucher (1):
> >   drm/amdgpu: UAPI for user queue management
> >
> > Arvind Yadav (2):
> >   drm/amdgpu: add new parameters in v11_struct
> >   drm/amdgpu: map wptr BO into GART
> >
> > Shashank Sharma (6):
> >   drm/amdgpu: add usermode queue base code
> >   drm/amdgpu: add new IOCTL for usermode queue
> >   drm/amdgpu: create GFX-gen11 MQD for userqueue
> >   drm/amdgpu: create context space for usermode queue
> >   drm/amdgpu: map usermode queue into MES
> >   drm/amdgpu: generate doorbell index for userqueue
> >
> >  drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
> > ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 230 ++++++++++++++
> >  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
> >  drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
> >  include/uapi/drm/amdgpu_drm.h                 |  55 ++++
> >  9 files changed, 677 insertions(+), 9 deletions(-)  create mode
> > 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >  create mode 100644
> > drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> >  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >
> > --
> > 2.40.0
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10  9:25     ` Bas Nieuwenhuizen
@ 2023-04-10 13:40       ` Sharma, Shashank
  2023-04-10 13:46         ` Bas Nieuwenhuizen
  0 siblings, 1 reply; 56+ messages in thread
From: Sharma, Shashank @ 2023-04-10 13:40 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx

[AMD Official Use Only - General]

Hello Bas, 

This is not the correct interpretation of the code, the USERQ_IOCTL has both the OPs (create and destroy), but th euser has to exclusively call  it.

Please see the sample test program in the existing libDRM series (userq_test.c, it specifically calls amdgpu_free_userq, which does the destroy_OP

for the IOCTL.

- Shashank

-----Original Message-----
From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> 
Sent: 10 April 2023 11:26
To: Sharma, Shashank <Shashank.Sharma@amd.com>
Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues

Hi Shashank,

I think I found the issue: I wasn't destroying the user queue in my program and the kernel doesn't clean up any remaining user queues in the postclose hook. I think we need something like
https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
?

While running things multiple times now works, I still have problems doing multiple submissions from the same queue. Looking forward to the updated test/sample

Thanks,
Bas

On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank <Shashank.Sharma@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Bas,
> Thanks for trying this out.
>
> This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
> This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion.
> We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.
>
> You can use this integrated kernel stack (1+2) from my gitlab to build 
> your kernel: 
> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM 
> changes with doorbell objects.
>
> Regards
> Shashank
> -----Original Message-----
> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Sent: 10 April 2023 02:37
> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; 
> Koenig, Christian <Christian.Koenig@amd.com>
> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>
> Hi Shashank,
>
> I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?
>
> The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?
>
> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>
> Thanks,
> Bas
>
> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >
> > This patch series introduces AMDGPU usermode queues for gfx workloads.
> > Usermode queues is a method of GPU workload submission into the 
> > graphics hardware without any interaction with kernel/DRM schedulers.
> > In this method, a userspace graphics application can create its own 
> > workqueue and submit it directly in the GPU HW.
> >
> > The general idea of how this is supposed to work:
> > - The application creates the following GPU objetcs:
> >   - A queue object to hold the workload packets.
> >   - A read pointer object.
> >   - A write pointer object.
> >   - A doorbell page.
> > - The application picks a 32-bit offset in the doorbell page for this queue.
> > - The application uses the usermode_queue_create IOCTL introduced in
> >   this patch, by passing the the GPU addresses of these objects (read
> >   ptr, write ptr, queue base address and 32-bit doorbell offset from
> >   the doorbell page)
> > - The kernel creates the queue and maps it in the HW.
> > - The application can start submitting the data in the queue as soon as
> >   the kernel IOCTL returns.
> > - After filling the workload data in the queue, the app must write the
> >   number of dwords added in the queue into the doorbell offset, and the
> >   GPU will start fetching the data.
> >
> > libDRM changes for this series and a sample DRM test program can be 
> > found in the MESA merge request here:
> > https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
> >
> > This patch series depends on the doorbell-manager changes, which are 
> > being reviewed here:
> > https://patchwork.freedesktop.org/series/115802/
> >
> > Alex Deucher (1):
> >   drm/amdgpu: UAPI for user queue management
> >
> > Arvind Yadav (2):
> >   drm/amdgpu: add new parameters in v11_struct
> >   drm/amdgpu: map wptr BO into GART
> >
> > Shashank Sharma (6):
> >   drm/amdgpu: add usermode queue base code
> >   drm/amdgpu: add new IOCTL for usermode queue
> >   drm/amdgpu: create GFX-gen11 MQD for userqueue
> >   drm/amdgpu: create context space for usermode queue
> >   drm/amdgpu: map usermode queue into MES
> >   drm/amdgpu: generate doorbell index for userqueue
> >
> >  drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
> > ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | 
> > ++++++++++++++++++ 230 ++++++++++++++
> >  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
> >  drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
> >  include/uapi/drm/amdgpu_drm.h                 |  55 ++++
> >  9 files changed, 677 insertions(+), 9 deletions(-)  create mode
> > 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >  create mode 100644
> > drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> >  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >
> > --
> > 2.40.0
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10 13:40       ` Sharma, Shashank
@ 2023-04-10 13:46         ` Bas Nieuwenhuizen
  2023-04-10 14:01           ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-10 13:46 UTC (permalink / raw)
  To: Sharma, Shashank
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx

On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
<Shashank.Sharma@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Bas,
>
> This is not the correct interpretation of the code, the USERQ_IOCTL has both the OPs (create and destroy), but th euser has to exclusively call  it.
>
> Please see the sample test program in the existing libDRM series (userq_test.c, it specifically calls amdgpu_free_userq, which does the destroy_OP
>
> for the IOCTL.

In the presence of crashes the kernel should always be able to clean
this up no? Otherwise there is a resource leak?

>
> - Shashank
>
> -----Original Message-----
> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Sent: 10 April 2023 11:26
> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>
> Hi Shashank,
>
> I think I found the issue: I wasn't destroying the user queue in my program and the kernel doesn't clean up any remaining user queues in the postclose hook. I think we need something like
> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
> ?
>
> While running things multiple times now works, I still have problems doing multiple submissions from the same queue. Looking forward to the updated test/sample
>
> Thanks,
> Bas
>
> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank <Shashank.Sharma@amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> > Hello Bas,
> > Thanks for trying this out.
> >
> > This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
> > This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion.
> > We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.
> >
> > You can use this integrated kernel stack (1+2) from my gitlab to build
> > your kernel:
> > https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
> > ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
> > changes with doorbell objects.
> >
> > Regards
> > Shashank
> > -----Original Message-----
> > From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > Sent: 10 April 2023 02:37
> > To: Sharma, Shashank <Shashank.Sharma@amd.com>
> > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> > <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>;
> > Koenig, Christian <Christian.Koenig@amd.com>
> > Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >
> > Hi Shashank,
> >
> > I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?
> >
> > The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?
> >
> > Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
> >
> > Thanks,
> > Bas
> >
> > On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> > >
> > > This patch series introduces AMDGPU usermode queues for gfx workloads.
> > > Usermode queues is a method of GPU workload submission into the
> > > graphics hardware without any interaction with kernel/DRM schedulers.
> > > In this method, a userspace graphics application can create its own
> > > workqueue and submit it directly in the GPU HW.
> > >
> > > The general idea of how this is supposed to work:
> > > - The application creates the following GPU objetcs:
> > >   - A queue object to hold the workload packets.
> > >   - A read pointer object.
> > >   - A write pointer object.
> > >   - A doorbell page.
> > > - The application picks a 32-bit offset in the doorbell page for this queue.
> > > - The application uses the usermode_queue_create IOCTL introduced in
> > >   this patch, by passing the the GPU addresses of these objects (read
> > >   ptr, write ptr, queue base address and 32-bit doorbell offset from
> > >   the doorbell page)
> > > - The kernel creates the queue and maps it in the HW.
> > > - The application can start submitting the data in the queue as soon as
> > >   the kernel IOCTL returns.
> > > - After filling the workload data in the queue, the app must write the
> > >   number of dwords added in the queue into the doorbell offset, and the
> > >   GPU will start fetching the data.
> > >
> > > libDRM changes for this series and a sample DRM test program can be
> > > found in the MESA merge request here:
> > > https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
> > >
> > > This patch series depends on the doorbell-manager changes, which are
> > > being reviewed here:
> > > https://patchwork.freedesktop.org/series/115802/
> > >
> > > Alex Deucher (1):
> > >   drm/amdgpu: UAPI for user queue management
> > >
> > > Arvind Yadav (2):
> > >   drm/amdgpu: add new parameters in v11_struct
> > >   drm/amdgpu: map wptr BO into GART
> > >
> > > Shashank Sharma (6):
> > >   drm/amdgpu: add usermode queue base code
> > >   drm/amdgpu: add new IOCTL for usermode queue
> > >   drm/amdgpu: create GFX-gen11 MQD for userqueue
> > >   drm/amdgpu: create context space for usermode queue
> > >   drm/amdgpu: map usermode queue into MES
> > >   drm/amdgpu: generate doorbell index for userqueue
> > >
> > >  drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
> > > ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
> > > ++++++++++++++++++ 230 ++++++++++++++
> > >  .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
> > >  drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
> > >  include/uapi/drm/amdgpu_drm.h                 |  55 ++++
> > >  9 files changed, 677 insertions(+), 9 deletions(-)  create mode
> > > 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> > >  create mode 100644
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> > >  create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> > >
> > > --
> > > 2.40.0
> > >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10 13:46         ` Bas Nieuwenhuizen
@ 2023-04-10 14:01           ` Shashank Sharma
  2023-04-10 14:04             ` Bas Nieuwenhuizen
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-04-10 14:01 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx


On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
> <Shashank.Sharma@amd.com> wrote:
>> [AMD Official Use Only - General]
>>
>> Hello Bas,
>>
>> This is not the correct interpretation of the code, the USERQ_IOCTL has both the OPs (create and destroy), but th euser has to exclusively call  it.
>>
>> Please see the sample test program in the existing libDRM series (userq_test.c, it specifically calls amdgpu_free_userq, which does the destroy_OP
>>
>> for the IOCTL.
> In the presence of crashes the kernel should always be able to clean
> this up no? Otherwise there is a resource leak?

The crash handling is the same as any of the existing GPU resource which 
are allocated and freed with IOCTL_OPs.

To be honest a crash handling can be very elaborated and complex one, 
and hence only can be done at the driver unload IMO, which doesn't help 
at that stage,

coz anyways driver will re-allocate the resources on next load.

- Shashank

>
>> - Shashank
>>
>> -----Original Message-----
>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>> Sent: 10 April 2023 11:26
>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>
>> Hi Shashank,
>>
>> I think I found the issue: I wasn't destroying the user queue in my program and the kernel doesn't clean up any remaining user queues in the postclose hook. I think we need something like
>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>> ?
>>
>> While running things multiple times now works, I still have problems doing multiple submissions from the same queue. Looking forward to the updated test/sample
>>
>> Thanks,
>> Bas
>>
>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank <Shashank.Sharma@amd.com> wrote:
>>> [AMD Official Use Only - General]
>>>
>>> Hello Bas,
>>> Thanks for trying this out.
>>>
>>> This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
>>> This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion.
>>> We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.
>>>
>>> You can use this integrated kernel stack (1+2) from my gitlab to build
>>> your kernel:
>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>> changes with doorbell objects.
>>>
>>> Regards
>>> Shashank
>>> -----Original Message-----
>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>> Sent: 10 April 2023 02:37
>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>> <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>;
>>> Koenig, Christian <Christian.Koenig@amd.com>
>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>
>>> Hi Shashank,
>>>
>>> I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?
>>>
>>> The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?
>>>
>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>
>>> Thanks,
>>> Bas
>>>
>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>> This patch series introduces AMDGPU usermode queues for gfx workloads.
>>>> Usermode queues is a method of GPU workload submission into the
>>>> graphics hardware without any interaction with kernel/DRM schedulers.
>>>> In this method, a userspace graphics application can create its own
>>>> workqueue and submit it directly in the GPU HW.
>>>>
>>>> The general idea of how this is supposed to work:
>>>> - The application creates the following GPU objetcs:
>>>>    - A queue object to hold the workload packets.
>>>>    - A read pointer object.
>>>>    - A write pointer object.
>>>>    - A doorbell page.
>>>> - The application picks a 32-bit offset in the doorbell page for this queue.
>>>> - The application uses the usermode_queue_create IOCTL introduced in
>>>>    this patch, by passing the the GPU addresses of these objects (read
>>>>    ptr, write ptr, queue base address and 32-bit doorbell offset from
>>>>    the doorbell page)
>>>> - The kernel creates the queue and maps it in the HW.
>>>> - The application can start submitting the data in the queue as soon as
>>>>    the kernel IOCTL returns.
>>>> - After filling the workload data in the queue, the app must write the
>>>>    number of dwords added in the queue into the doorbell offset, and the
>>>>    GPU will start fetching the data.
>>>>
>>>> libDRM changes for this series and a sample DRM test program can be
>>>> found in the MESA merge request here:
>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>
>>>> This patch series depends on the doorbell-manager changes, which are
>>>> being reviewed here:
>>>> https://patchwork.freedesktop.org/series/115802/
>>>>
>>>> Alex Deucher (1):
>>>>    drm/amdgpu: UAPI for user queue management
>>>>
>>>> Arvind Yadav (2):
>>>>    drm/amdgpu: add new parameters in v11_struct
>>>>    drm/amdgpu: map wptr BO into GART
>>>>
>>>> Shashank Sharma (6):
>>>>    drm/amdgpu: add usermode queue base code
>>>>    drm/amdgpu: add new IOCTL for usermode queue
>>>>    drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>    drm/amdgpu: create context space for usermode queue
>>>>    drm/amdgpu: map usermode queue into MES
>>>>    drm/amdgpu: generate doorbell index for userqueue
>>>>
>>>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>> ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
>>>>   drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
>>>>   include/uapi/drm/amdgpu_drm.h                 |  55 ++++
>>>>   9 files changed, 677 insertions(+), 9 deletions(-)  create mode
>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>   create mode 100644
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>
>>>> --
>>>> 2.40.0
>>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10 14:01           ` Shashank Sharma
@ 2023-04-10 14:04             ` Bas Nieuwenhuizen
  2023-04-10 14:26               ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-10 14:04 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx

On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>
>
> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
> > On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
> > <Shashank.Sharma@amd.com> wrote:
> >> [AMD Official Use Only - General]
> >>
> >> Hello Bas,
> >>
> >> This is not the correct interpretation of the code, the USERQ_IOCTL has both the OPs (create and destroy), but th euser has to exclusively call  it.
> >>
> >> Please see the sample test program in the existing libDRM series (userq_test.c, it specifically calls amdgpu_free_userq, which does the destroy_OP
> >>
> >> for the IOCTL.
> > In the presence of crashes the kernel should always be able to clean
> > this up no? Otherwise there is a resource leak?
>
> The crash handling is the same as any of the existing GPU resource which
> are allocated and freed with IOCTL_OPs.

Most of those are handled in the when the DRM fd gets closed (i.e.
when the process exits):

- buffers through drm_gem_release()
- mappings in amdgpu_vm_fini
- contexts in amdgpu_ctx_mgr_fini

etc.

Why would we do things differently for userspace queues? It doesn't
look complicated looking at the above patch (which does seem to work).


>
> To be honest a crash handling can be very elaborated and complex one,
> and hence only can be done at the driver unload IMO, which doesn't help
> at that stage,
>
> coz anyways driver will re-allocate the resources on next load.
>
> - Shashank
>
> >
> >> - Shashank
> >>
> >> -----Original Message-----
> >> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> >> Sent: 10 April 2023 11:26
> >> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> >> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
> >> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >>
> >> Hi Shashank,
> >>
> >> I think I found the issue: I wasn't destroying the user queue in my program and the kernel doesn't clean up any remaining user queues in the postclose hook. I think we need something like
> >> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
> >> ?
> >>
> >> While running things multiple times now works, I still have problems doing multiple submissions from the same queue. Looking forward to the updated test/sample
> >>
> >> Thanks,
> >> Bas
> >>
> >> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank <Shashank.Sharma@amd.com> wrote:
> >>> [AMD Official Use Only - General]
> >>>
> >>> Hello Bas,
> >>> Thanks for trying this out.
> >>>
> >>> This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
> >>> This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion.
> >>> We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.
> >>>
> >>> You can use this integrated kernel stack (1+2) from my gitlab to build
> >>> your kernel:
> >>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
> >>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
> >>> changes with doorbell objects.
> >>>
> >>> Regards
> >>> Shashank
> >>> -----Original Message-----
> >>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> >>> Sent: 10 April 2023 02:37
> >>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> >>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> >>> <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>;
> >>> Koenig, Christian <Christian.Koenig@amd.com>
> >>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >>>
> >>> Hi Shashank,
> >>>
> >>> I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?
> >>>
> >>> The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?
> >>>
> >>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
> >>>
> >>> Thanks,
> >>> Bas
> >>>
> >>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
> >>>> This patch series introduces AMDGPU usermode queues for gfx workloads.
> >>>> Usermode queues is a method of GPU workload submission into the
> >>>> graphics hardware without any interaction with kernel/DRM schedulers.
> >>>> In this method, a userspace graphics application can create its own
> >>>> workqueue and submit it directly in the GPU HW.
> >>>>
> >>>> The general idea of how this is supposed to work:
> >>>> - The application creates the following GPU objetcs:
> >>>>    - A queue object to hold the workload packets.
> >>>>    - A read pointer object.
> >>>>    - A write pointer object.
> >>>>    - A doorbell page.
> >>>> - The application picks a 32-bit offset in the doorbell page for this queue.
> >>>> - The application uses the usermode_queue_create IOCTL introduced in
> >>>>    this patch, by passing the the GPU addresses of these objects (read
> >>>>    ptr, write ptr, queue base address and 32-bit doorbell offset from
> >>>>    the doorbell page)
> >>>> - The kernel creates the queue and maps it in the HW.
> >>>> - The application can start submitting the data in the queue as soon as
> >>>>    the kernel IOCTL returns.
> >>>> - After filling the workload data in the queue, the app must write the
> >>>>    number of dwords added in the queue into the doorbell offset, and the
> >>>>    GPU will start fetching the data.
> >>>>
> >>>> libDRM changes for this series and a sample DRM test program can be
> >>>> found in the MESA merge request here:
> >>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
> >>>>
> >>>> This patch series depends on the doorbell-manager changes, which are
> >>>> being reviewed here:
> >>>> https://patchwork.freedesktop.org/series/115802/
> >>>>
> >>>> Alex Deucher (1):
> >>>>    drm/amdgpu: UAPI for user queue management
> >>>>
> >>>> Arvind Yadav (2):
> >>>>    drm/amdgpu: add new parameters in v11_struct
> >>>>    drm/amdgpu: map wptr BO into GART
> >>>>
> >>>> Shashank Sharma (6):
> >>>>    drm/amdgpu: add usermode queue base code
> >>>>    drm/amdgpu: add new IOCTL for usermode queue
> >>>>    drm/amdgpu: create GFX-gen11 MQD for userqueue
> >>>>    drm/amdgpu: create context space for usermode queue
> >>>>    drm/amdgpu: map usermode queue into MES
> >>>>    drm/amdgpu: generate doorbell index for userqueue
> >>>>
> >>>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
> >>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
> >>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
> >>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
> >>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
> >>>> ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
> >>>> ++++++++++++++++++ 230 ++++++++++++++
> >>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
> >>>>   drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
> >>>>   include/uapi/drm/amdgpu_drm.h                 |  55 ++++
> >>>>   9 files changed, 677 insertions(+), 9 deletions(-)  create mode
> >>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>   create mode 100644
> >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> >>>>   create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>
> >>>> --
> >>>> 2.40.0
> >>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10 14:04             ` Bas Nieuwenhuizen
@ 2023-04-10 14:26               ` Shashank Sharma
  2023-04-11  9:37                 ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-04-10 14:26 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, Yadav, Arvind, Koenig,
	Christian, amd-gfx


On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>
>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>> <Shashank.Sharma@amd.com> wrote:
>>>> [AMD Official Use Only - General]
>>>>
>>>> Hello Bas,
>>>>
>>>> This is not the correct interpretation of the code, the USERQ_IOCTL has both the OPs (create and destroy), but th euser has to exclusively call  it.
>>>>
>>>> Please see the sample test program in the existing libDRM series (userq_test.c, it specifically calls amdgpu_free_userq, which does the destroy_OP
>>>>
>>>> for the IOCTL.
>>> In the presence of crashes the kernel should always be able to clean
>>> this up no? Otherwise there is a resource leak?
>> The crash handling is the same as any of the existing GPU resource which
>> are allocated and freed with IOCTL_OPs.
> Most of those are handled in the when the DRM fd gets closed (i.e.
> when the process exits):
>
> - buffers through drm_gem_release()
> - mappings in amdgpu_vm_fini
> - contexts in amdgpu_ctx_mgr_fini
>
> etc.
>
> Why would we do things differently for userspace queues? It doesn't
> look complicated looking at the above patch (which does seem to work).

As the code is in initial stage, I have not given much thoughts about 
handling resource leak due to app crash, but this seems like a good 
suggestion.

I am taking a note and will try to accommodate this in an upcoming 
version of the series.

- Shashank

>> To be honest a crash handling can be very elaborated and complex one,
>> and hence only can be done at the driver unload IMO, which doesn't help
>> at that stage,
>>
>> coz anyways driver will re-allocate the resources on next load.
>>
>> - Shashank
>>
>>>> - Shashank
>>>>
>>>> -----Original Message-----
>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>> Sent: 10 April 2023 11:26
>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>
>>>> Hi Shashank,
>>>>
>>>> I think I found the issue: I wasn't destroying the user queue in my program and the kernel doesn't clean up any remaining user queues in the postclose hook. I think we need something like
>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>>>> ?
>>>>
>>>> While running things multiple times now works, I still have problems doing multiple submissions from the same queue. Looking forward to the updated test/sample
>>>>
>>>> Thanks,
>>>> Bas
>>>>
>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank <Shashank.Sharma@amd.com> wrote:
>>>>> [AMD Official Use Only - General]
>>>>>
>>>>> Hello Bas,
>>>>> Thanks for trying this out.
>>>>>
>>>>> This could be due to the doorbell as you mentioned, Usermode queue uses doorbell manager internally.
>>>>> This week, we are planning to publis the latest libDRM sample code which uses a doorbell object (instead of the doorbell hack IOCTL), adapting to that should fix your problem in my opinion.
>>>>> We have tested this full stack (libDRM test + Usermode queue + doorbell manager) for 500+ consecutive runs, and it worked well for us.
>>>>>
>>>>> You can use this integrated kernel stack (1+2) from my gitlab to build
>>>>> your kernel:
>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>> changes with doorbell objects.
>>>>>
>>>>> Regards
>>>>> Shashank
>>>>> -----Original Message-----
>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>> Sent: 10 April 2023 02:37
>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>;
>>>>> Koenig, Christian <Christian.Koenig@amd.com>
>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>
>>>>> Hi Shashank,
>>>>>
>>>>> I tried writing a program to experiment with usermode queues and I found some weird behavior: The first run of the program works as expected, while subsequent runs don't seem to do anything (and I allocate everything in GTT, so it should be zero initialized consistently). Is this a known issue?
>>>>>
>>>>> The linked libdrm code for the uapi still does a doorbell ioctl so it could very well be that I do the doorbell wrong (especially since the ioctl implementation was never shared AFAICT?), but it seems like the kernel submissions (i.e. write wptr in dwords to the wptr va and to the doorbell). Is it possible to update the test in libdrm?
>>>>>
>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>
>>>>> Thanks,
>>>>> Bas
>>>>>
>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>>>>>> This patch series introduces AMDGPU usermode queues for gfx workloads.
>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>> graphics hardware without any interaction with kernel/DRM schedulers.
>>>>>> In this method, a userspace graphics application can create its own
>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>
>>>>>> The general idea of how this is supposed to work:
>>>>>> - The application creates the following GPU objetcs:
>>>>>>     - A queue object to hold the workload packets.
>>>>>>     - A read pointer object.
>>>>>>     - A write pointer object.
>>>>>>     - A doorbell page.
>>>>>> - The application picks a 32-bit offset in the doorbell page for this queue.
>>>>>> - The application uses the usermode_queue_create IOCTL introduced in
>>>>>>     this patch, by passing the the GPU addresses of these objects (read
>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell offset from
>>>>>>     the doorbell page)
>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>> - The application can start submitting the data in the queue as soon as
>>>>>>     the kernel IOCTL returns.
>>>>>> - After filling the workload data in the queue, the app must write the
>>>>>>     number of dwords added in the queue into the doorbell offset, and the
>>>>>>     GPU will start fetching the data.
>>>>>>
>>>>>> libDRM changes for this series and a sample DRM test program can be
>>>>>> found in the MESA merge request here:
>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>
>>>>>> This patch series depends on the doorbell-manager changes, which are
>>>>>> being reviewed here:
>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>
>>>>>> Alex Deucher (1):
>>>>>>     drm/amdgpu: UAPI for user queue management
>>>>>>
>>>>>> Arvind Yadav (2):
>>>>>>     drm/amdgpu: add new parameters in v11_struct
>>>>>>     drm/amdgpu: map wptr BO into GART
>>>>>>
>>>>>> Shashank Sharma (6):
>>>>>>     drm/amdgpu: add usermode queue base code
>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>     drm/amdgpu: create context space for usermode queue
>>>>>>     drm/amdgpu: map usermode queue into MES
>>>>>>     drm/amdgpu: generate doorbell index for userqueue
>>>>>>
>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>> ++++++++++++++++++  .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
>>>>>>    include/uapi/drm/amdgpu_drm.h                 |  55 ++++
>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-)  create mode
>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>    create mode 100644
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>    create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>
>>>>>> --
>>>>>> 2.40.0
>>>>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue
  2023-04-10  0:02   ` Bas Nieuwenhuizen
@ 2023-04-10 14:28     ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-04-10 14:28 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Alex Deucher, Felix Kuehling, Shashank Sharma, Christian Koenig, amd-gfx


On 10/04/2023 02:02, Bas Nieuwenhuizen wrote:
> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Shashank Sharma <contactshashanksharma@gmail.com>
>>
>> This patch adds:
>> - A new IOCTL function to create and destroy
>> - A new structure to keep all the user queue data in one place.
>> - A function to generate unique index for the queue.
>>
>> V1: Worked on review comments from RFC patch series:
>>    - Alex: Keep a list of queues, instead of single queue per process.
>>    - Christian: Use the queue manager instead of global ptrs,
>>             Don't keep the queue structure in amdgpu_ctx
>>
>> V2: Worked on review comments:
>>   - Christian:
>>     - Formatting of text
>>     - There is no need for queuing of userqueues, with idr in place
>>   - Alex:
>>     - Remove use_doorbell, its unnecessary
>>     - Reuse amdgpu_mqd_props for saving mqd fields
>>
>>   - Code formatting and re-arrangement
>>
>> V3:
>>   - Integration with doorbell manager
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <contactshashanksharma@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 113 ++++++++++++++++++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
>>   3 files changed, 116 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 2d6bcfd727c8..229976a2d0e7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2749,6 +2749,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>>          DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>          DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>          DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>   };
>>
>>   static const struct drm_driver amdgpu_kms_driver = {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 13e1eebc1cb6..353f57c5a772 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -22,6 +22,119 @@
>>    */
>>
>>   #include "amdgpu.h"
>> +#include "amdgpu_vm.h"
>> +#include "amdgpu_userqueue.h"
>> +
>> +static inline int
>> +amdgpu_userqueue_index(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_usermode_queue *queue)
>> +{
>> +    return idr_alloc(&uq_mgr->userq_idr, queue, 1, AMDGPU_MAX_USERQ, GFP_KERNEL);
>> +}
>> +
>> +static inline void
>> +amdgpu_userqueue_free_index(struct amdgpu_userq_mgr *uq_mgr, int queue_id)
>> +{
>> +    idr_remove(&uq_mgr->userq_idr, queue_id);
>> +}
>> +
>> +static struct amdgpu_usermode_queue *
>> +amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>> +{
>> +    return idr_find(&uq_mgr->userq_idr, qid);
>> +}
>> +
>> +static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>> +{
>> +    struct amdgpu_usermode_queue *queue;
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>> +    struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
>> +    int r;
>> +
>> +    /* Do we have support userqueues for this IP ? */
>> +    if (!uq_mgr->userq_funcs[mqd_in->ip_type]) {
>> +        DRM_ERROR("GFX User queues not supported for this IP: %d\n", mqd_in->ip_type);
>> +        return -EINVAL;
>> +    }
>> +
>> +    queue = kzalloc(sizeof(struct amdgpu_usermode_queue), GFP_KERNEL);
>> +    if (!queue) {
>> +        DRM_ERROR("Failed to allocate memory for queue\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +    queue->userq_prop.wptr_gpu_addr = mqd_in->wptr_va;
>> +    queue->userq_prop.rptr_gpu_addr = mqd_in->rptr_va;
>> +    queue->userq_prop.queue_size = mqd_in->queue_size;
>> +    queue->userq_prop.hqd_base_gpu_addr = mqd_in->queue_va;
>> +    queue->userq_prop.queue_size = mqd_in->queue_size;
> This sets queue_size twice.

Noted,

- Shashank

>
>> +
>> +    queue->doorbell_handle = mqd_in->doorbell_handle;
>> +    queue->queue_type = mqd_in->ip_type;
>> +    queue->flags = mqd_in->flags;
>> +    queue->vm = &fpriv->vm;
>> +    queue->queue_id = amdgpu_userqueue_index(uq_mgr, queue);
>> +    if (queue->queue_id < 0) {
>> +        DRM_ERROR("Failed to allocate a queue id\n");
>> +        r = queue->queue_id;
>> +        goto free_queue;
>> +    }
>> +
>> +    args->out.queue_id = queue->queue_id;
>> +    args->out.flags = 0;
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    return 0;
>> +
>> +free_queue:
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    kfree(queue);
>> +    return r;
>> +}
>> +
>> +static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>> +{
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>> +    struct amdgpu_usermode_queue *queue;
>> +
>> +    queue = amdgpu_userqueue_find(uq_mgr, queue_id);
>> +    if (!queue) {
>> +        DRM_DEBUG_DRIVER("Invalid queue id to destroy\n");
>> +        return;
>> +    }
>> +
>> +    mutex_lock(&uq_mgr->userq_mutex);
>> +    amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>> +    mutex_unlock(&uq_mgr->userq_mutex);
>> +    kfree(queue);
>> +}
>> +
>> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data,
>> +                      struct drm_file *filp)
>> +{
>> +    union drm_amdgpu_userq *args = data;
>> +    int r = 0;
>> +
>> +    switch (args->in.op) {
>> +    case AMDGPU_USERQ_OP_CREATE:
>> +        r = amdgpu_userqueue_create(filp, args);
>> +        if (r)
>> +            DRM_ERROR("Failed to create usermode queue\n");
>> +        break;
>> +
>> +    case AMDGPU_USERQ_OP_FREE:
>> +        amdgpu_userqueue_destroy(filp, args->in.queue_id);
>> +        break;
>> +
>> +    default:
>> +        DRM_ERROR("Invalid user queue op specified: %d\n", args->in.op);
>> +        return -EINVAL;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>>
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev)
>>   {
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 7eeb8c9e6575..7625a862b1fc 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -42,6 +42,8 @@ struct amdgpu_userq_funcs {
>>          void (*mqd_destroy)(struct amdgpu_userq_mgr *, struct amdgpu_usermode_queue *);
>>   };
>>
>> +int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
>> +
>>   int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev);
>>
>>   void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr);
>> --
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART
  2023-04-10  0:00   ` Bas Nieuwenhuizen
@ 2023-04-11  9:29     ` Christian König
  2023-04-11 16:02       ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2023-04-11  9:29 UTC (permalink / raw)
  To: Bas Nieuwenhuizen, Shashank Sharma
  Cc: Alex Deucher, Felix Kuehling, Arvind Yadav, amd-gfx

Am 10.04.23 um 02:00 schrieb Bas Nieuwenhuizen:
> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma <shashank.sharma@amd.com> wrote:
>> From: Arvind Yadav <arvind.yadav@amd.com>
>>
>> To support oversubscription, MES expects WPTR BOs to be mapped
>> to GART, before they are submitted to usermode queues.
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 89 +++++++++++++++++++
>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |  1 +
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>   3 files changed, 91 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 5672efcbcffc..7409a4ae55da 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -43,6 +43,89 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr *uq_mgr, int qid)
>>       return idr_find(&uq_mgr->userq_idr, qid);
>>   }
>>
>> +static int
>> +amdgpu_userqueue_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct amdgpu_bo *bo)
>> +{
>> +    int ret;
>> +
>> +    ret = amdgpu_bo_reserve(bo, true);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>> +        goto err_reserve_bo_failed;
>> +    }
>> +
>> +    ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to pin bo. ret %d\n", ret);
>> +        goto err_pin_bo_failed;
>> +    }
>> +
>> +    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>> +        goto err_map_bo_gart_failed;
>> +    }
>> +
>> +
>> +    amdgpu_bo_unreserve(bo);
>> +    bo = amdgpu_bo_ref(bo);
>> +
>> +    return 0;
>> +
>> +err_map_bo_gart_failed:
>> +    amdgpu_bo_unpin(bo);
>> +err_pin_bo_failed:
>> +    amdgpu_bo_unreserve(bo);
>> +err_reserve_bo_failed:
>> +
>> +    return ret;
>> +}
>> +
>> +
>> +static int
>> +amdgpu_userqueue_create_wptr_mapping(struct amdgpu_device *adev,
>> +                                    struct drm_file *filp,
>> +                                    struct amdgpu_usermode_queue *queue)
>> +{
>> +    struct amdgpu_bo_va_mapping *wptr_mapping;
>> +    struct amdgpu_vm *wptr_vm;
>> +    struct amdgpu_bo *wptr_bo = NULL;
>> +    uint64_t wptr = queue->userq_prop.wptr_gpu_addr;
>> +    int ret;
>> +
>> +    wptr_vm = queue->vm;
>> +    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>> +    if (ret)
>> +        goto err_wptr_map_gart;
>> +
>> +    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> PAGE_SHIFT);
>> +    amdgpu_bo_unreserve(wptr_vm->root.bo);
>> +    if (!wptr_mapping) {
>> +        DRM_ERROR("Failed to lookup wptr bo\n");
>> +        ret = -EINVAL;
>> +        goto err_wptr_map_gart;
>> +    }
> This triggers for wptr BOs mapped to the high half of address space,
> may need some mangling wrt the top bits?

Yeah, correct. Shashank this needs to apply the hole mask before looking 
up the address.

Regards,
Christian.

>
>> +
>> +    wptr_bo = wptr_mapping->bo_va->base.bo;
>> +    if (wptr_bo->tbo.base.size > PAGE_SIZE) {
>> +        DRM_ERROR("Requested GART mapping for wptr bo larger than one page\n");
>> +        ret = -EINVAL;
>> +        goto err_wptr_map_gart;
>> +    }
>> +
>> +    ret = amdgpu_userqueue_map_gtt_bo_to_gart(adev, wptr_bo);
>> +    if (ret) {
>> +        DRM_ERROR("Failed to map wptr bo to GART\n");
>> +        goto err_wptr_map_gart;
>> +    }
>> +
>> +    queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
>> +    return 0;
>> +
>> +err_wptr_map_gart:
>> +    return ret;
>> +}
>> +
>>   static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
>>   {
>>       struct amdgpu_usermode_queue *queue;
>> @@ -82,6 +165,12 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>>           goto free_queue;
>>       }
>>
>> +    r = amdgpu_userqueue_create_wptr_mapping(uq_mgr->adev, filp, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to map WPTR (0x%llx) for userqueue\n", queue->userq_prop.wptr_gpu_addr);
>> +        goto free_queue;
>> +    }
>> +
>>       r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, queue);
>>       if (r) {
>>           DRM_ERROR("Failed to create/map userqueue MQD\n");
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> index 1627641a4a4e..274e78826334 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>> @@ -58,6 +58,7 @@ amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr *uq_mgr,
>>       queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>       queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>       queue_input.page_table_base_addr = amdgpu_gmc_pd_addr(queue->vm->root.bo);
>> +    queue_input.wptr_mc_addr = queue->wptr_mc_addr;
>>
>>       amdgpu_mes_lock(&adev->mes);
>>       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index 8b62ef77cd26..eaab7cf5fff6 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -38,6 +38,7 @@ struct amdgpu_usermode_queue {
>>          int queue_type;
>>          uint64_t flags;
>>          uint64_t doorbell_handle;
>> +       uint64_t wptr_mc_addr;
>>          uint64_t proc_ctx_gpu_addr;
>>          uint64_t gang_ctx_gpu_addr;
>>          uint64_t gds_ctx_gpu_addr;
>> --
>> 2.40.0
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-10 14:26               ` Shashank Sharma
@ 2023-04-11  9:37                 ` Christian König
  2023-04-11  9:48                   ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2023-04-11  9:37 UTC (permalink / raw)
  To: Shashank Sharma, Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx, Koenig, Christian,
	Yadav, Arvind

Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>
> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma 
>> <shashank.sharma@amd.com> wrote:
>>>
>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>> <Shashank.Sharma@amd.com> wrote:
>>>>> [AMD Official Use Only - General]
>>>>>
>>>>> Hello Bas,
>>>>>
>>>>> This is not the correct interpretation of the code, the 
>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser 
>>>>> has to exclusively call  it.
>>>>>
>>>>> Please see the sample test program in the existing libDRM series 
>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which does 
>>>>> the destroy_OP
>>>>>
>>>>> for the IOCTL.
>>>> In the presence of crashes the kernel should always be able to clean
>>>> this up no? Otherwise there is a resource leak?
>>> The crash handling is the same as any of the existing GPU resource 
>>> which
>>> are allocated and freed with IOCTL_OPs.
>> Most of those are handled in the when the DRM fd gets closed (i.e.
>> when the process exits):
>>
>> - buffers through drm_gem_release()
>> - mappings in amdgpu_vm_fini
>> - contexts in amdgpu_ctx_mgr_fini
>>
>> etc.
>>
>> Why would we do things differently for userspace queues? It doesn't
>> look complicated looking at the above patch (which does seem to work).
>
> As the code is in initial stage, I have not given much thoughts about 
> handling resource leak due to app crash, but this seems like a good 
> suggestion.
>
> I am taking a note and will try to accommodate this in an upcoming 
> version of the series.

Bas is right, the application doesn't necessary needs to clean up on 
exit (but it's still good custody to do so).

See amdgpu_driver_postclose_kms() for how we cleanup (for example) the 
ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.

Regards,
Christian.

>
> - Shashank
>
>>> To be honest a crash handling can be very elaborated and complex one,
>>> and hence only can be done at the driver unload IMO, which doesn't help
>>> at that stage,
>>>
>>> coz anyways driver will re-allocate the resources on next load.
>>>
>>> - Shashank
>>>
>>>>> - Shashank
>>>>>
>>>>> -----Original Message-----
>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>> Sent: 10 April 2023 11:26
>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>> <Felix.Kuehling@amd.com>; Koenig, Christian 
>>>>> <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>
>>>>> Hi Shashank,
>>>>>
>>>>> I think I found the issue: I wasn't destroying the user queue in 
>>>>> my program and the kernel doesn't clean up any remaining user 
>>>>> queues in the postclose hook. I think we need something like
>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29 
>>>>>
>>>>> ?
>>>>>
>>>>> While running things multiple times now works, I still have 
>>>>> problems doing multiple submissions from the same queue. Looking 
>>>>> forward to the updated test/sample
>>>>>
>>>>> Thanks,
>>>>> Bas
>>>>>
>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank 
>>>>> <Shashank.Sharma@amd.com> wrote:
>>>>>> [AMD Official Use Only - General]
>>>>>>
>>>>>> Hello Bas,
>>>>>> Thanks for trying this out.
>>>>>>
>>>>>> This could be due to the doorbell as you mentioned, Usermode 
>>>>>> queue uses doorbell manager internally.
>>>>>> This week, we are planning to publis the latest libDRM sample 
>>>>>> code which uses a doorbell object (instead of the doorbell hack 
>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>> We have tested this full stack (libDRM test + Usermode queue + 
>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well 
>>>>>> for us.
>>>>>>
>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to 
>>>>>> build
>>>>>> your kernel:
>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr 
>>>>>>
>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>> changes with doorbell objects.
>>>>>>
>>>>>> Regards
>>>>>> Shashank
>>>>>> -----Original Message-----
>>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>>> Sent: 10 April 2023 02:37
>>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>> <Felix.Kuehling@amd.com>;
>>>>>> Koenig, Christian <Christian.Koenig@amd.com>
>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>
>>>>>> Hi Shashank,
>>>>>>
>>>>>> I tried writing a program to experiment with usermode queues and 
>>>>>> I found some weird behavior: The first run of the program works 
>>>>>> as expected, while subsequent runs don't seem to do anything (and 
>>>>>> I allocate everything in GTT, so it should be zero initialized 
>>>>>> consistently). Is this a known issue?
>>>>>>
>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl 
>>>>>> so it could very well be that I do the doorbell wrong (especially 
>>>>>> since the ioctl implementation was never shared AFAICT?), but it 
>>>>>> seems like the kernel submissions (i.e. write wptr in dwords to 
>>>>>> the wptr va and to the doorbell). Is it possible to update the 
>>>>>> test in libdrm?
>>>>>>
>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>
>>>>>> Thanks,
>>>>>> Bas
>>>>>>
>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma 
>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>> This patch series introduces AMDGPU usermode queues for gfx 
>>>>>>> workloads.
>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>> graphics hardware without any interaction with kernel/DRM 
>>>>>>> schedulers.
>>>>>>> In this method, a userspace graphics application can create its own
>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>
>>>>>>> The general idea of how this is supposed to work:
>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>     - A queue object to hold the workload packets.
>>>>>>>     - A read pointer object.
>>>>>>>     - A write pointer object.
>>>>>>>     - A doorbell page.
>>>>>>> - The application picks a 32-bit offset in the doorbell page for 
>>>>>>> this queue.
>>>>>>> - The application uses the usermode_queue_create IOCTL 
>>>>>>> introduced in
>>>>>>>     this patch, by passing the the GPU addresses of these 
>>>>>>> objects (read
>>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell 
>>>>>>> offset from
>>>>>>>     the doorbell page)
>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>> - The application can start submitting the data in the queue as 
>>>>>>> soon as
>>>>>>>     the kernel IOCTL returns.
>>>>>>> - After filling the workload data in the queue, the app must 
>>>>>>> write the
>>>>>>>     number of dwords added in the queue into the doorbell 
>>>>>>> offset, and the
>>>>>>>     GPU will start fetching the data.
>>>>>>>
>>>>>>> libDRM changes for this series and a sample DRM test program can be
>>>>>>> found in the MESA merge request here:
>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>
>>>>>>> This patch series depends on the doorbell-manager changes, which 
>>>>>>> are
>>>>>>> being reviewed here:
>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>
>>>>>>> Alex Deucher (1):
>>>>>>>     drm/amdgpu: UAPI for user queue management
>>>>>>>
>>>>>>> Arvind Yadav (2):
>>>>>>>     drm/amdgpu: add new parameters in v11_struct
>>>>>>>     drm/amdgpu: map wptr BO into GART
>>>>>>>
>>>>>>> Shashank Sharma (6):
>>>>>>>     drm/amdgpu: add usermode queue base code
>>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>     drm/amdgpu: create context space for usermode queue
>>>>>>>     drm/amdgpu: map usermode queue into MES
>>>>>>>     drm/amdgpu: generate doorbell index for userqueue
>>>>>>>
>>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
>>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
>>>>>>>    include/uapi/drm/amdgpu_drm.h                 |  55 ++++
>>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>    create mode 100644
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>    create mode 100644 
>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>
>>>>>>> -- 
>>>>>>> 2.40.0
>>>>>>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-11  9:37                 ` Christian König
@ 2023-04-11  9:48                   ` Shashank Sharma
  2023-04-11 10:00                     ` Bas Nieuwenhuizen
  0 siblings, 1 reply; 56+ messages in thread
From: Shashank Sharma @ 2023-04-11  9:48 UTC (permalink / raw)
  To: Christian König, Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx, Yadav, Arvind


On 11/04/2023 11:37, Christian König wrote:
> Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>>
>> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma 
>>> <shashank.sharma@amd.com> wrote:
>>>>
>>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>>> <Shashank.Sharma@amd.com> wrote:
>>>>>> [AMD Official Use Only - General]
>>>>>>
>>>>>> Hello Bas,
>>>>>>
>>>>>> This is not the correct interpretation of the code, the 
>>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser 
>>>>>> has to exclusively call  it.
>>>>>>
>>>>>> Please see the sample test program in the existing libDRM series 
>>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which 
>>>>>> does the destroy_OP
>>>>>>
>>>>>> for the IOCTL.
>>>>> In the presence of crashes the kernel should always be able to clean
>>>>> this up no? Otherwise there is a resource leak?
>>>> The crash handling is the same as any of the existing GPU resource 
>>>> which
>>>> are allocated and freed with IOCTL_OPs.
>>> Most of those are handled in the when the DRM fd gets closed (i.e.
>>> when the process exits):
>>>
>>> - buffers through drm_gem_release()
>>> - mappings in amdgpu_vm_fini
>>> - contexts in amdgpu_ctx_mgr_fini
>>>
>>> etc.
>>>
>>> Why would we do things differently for userspace queues? It doesn't
>>> look complicated looking at the above patch (which does seem to work).
>>
>> As the code is in initial stage, I have not given much thoughts about 
>> handling resource leak due to app crash, but this seems like a good 
>> suggestion.
>>
>> I am taking a note and will try to accommodate this in an upcoming 
>> version of the series.
>
> Bas is right, the application doesn't necessary needs to clean up on 
> exit (but it's still good custody to do so).
>
> See amdgpu_driver_postclose_kms() for how we cleanup (for example) the 
> ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
>
Thanks for the pointers Christian,

I also feel like that its good to have this cleanup for those apps which 
did not clean-up themselves (due to crash or coding error).

So something like,

on close_fd,

for_idr_each {

    get_queue()

    if (queue)

        free_queue

}

But we will also keep the queue_free_OP as well, so that if an app 
allocate multiple queues, and wants to free some in between, it can do it.

- Shashank

> Regards,
> Christian.
>
>>
>> - Shashank
>>
>>>> To be honest a crash handling can be very elaborated and complex one,
>>>> and hence only can be done at the driver unload IMO, which doesn't 
>>>> help
>>>> at that stage,
>>>>
>>>> coz anyways driver will re-allocate the resources on next load.
>>>>
>>>> - Shashank
>>>>
>>>>>> - Shashank
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>>> Sent: 10 April 2023 11:26
>>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>> <Felix.Kuehling@amd.com>; Koenig, Christian 
>>>>>> <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>
>>>>>> Hi Shashank,
>>>>>>
>>>>>> I think I found the issue: I wasn't destroying the user queue in 
>>>>>> my program and the kernel doesn't clean up any remaining user 
>>>>>> queues in the postclose hook. I think we need something like
>>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29 
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> While running things multiple times now works, I still have 
>>>>>> problems doing multiple submissions from the same queue. Looking 
>>>>>> forward to the updated test/sample
>>>>>>
>>>>>> Thanks,
>>>>>> Bas
>>>>>>
>>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank 
>>>>>> <Shashank.Sharma@amd.com> wrote:
>>>>>>> [AMD Official Use Only - General]
>>>>>>>
>>>>>>> Hello Bas,
>>>>>>> Thanks for trying this out.
>>>>>>>
>>>>>>> This could be due to the doorbell as you mentioned, Usermode 
>>>>>>> queue uses doorbell manager internally.
>>>>>>> This week, we are planning to publis the latest libDRM sample 
>>>>>>> code which uses a doorbell object (instead of the doorbell hack 
>>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>>> We have tested this full stack (libDRM test + Usermode queue + 
>>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well 
>>>>>>> for us.
>>>>>>>
>>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to 
>>>>>>> build
>>>>>>> your kernel:
>>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr 
>>>>>>>
>>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>>> changes with doorbell objects.
>>>>>>>
>>>>>>> Regards
>>>>>>> Shashank
>>>>>>> -----Original Message-----
>>>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>>>> Sent: 10 April 2023 02:37
>>>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>>> <Felix.Kuehling@amd.com>;
>>>>>>> Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>
>>>>>>> Hi Shashank,
>>>>>>>
>>>>>>> I tried writing a program to experiment with usermode queues and 
>>>>>>> I found some weird behavior: The first run of the program works 
>>>>>>> as expected, while subsequent runs don't seem to do anything 
>>>>>>> (and I allocate everything in GTT, so it should be zero 
>>>>>>> initialized consistently). Is this a known issue?
>>>>>>>
>>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl 
>>>>>>> so it could very well be that I do the doorbell wrong 
>>>>>>> (especially since the ioctl implementation was never shared 
>>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write 
>>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it 
>>>>>>> possible to update the test in libdrm?
>>>>>>>
>>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bas
>>>>>>>
>>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma 
>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>> This patch series introduces AMDGPU usermode queues for gfx 
>>>>>>>> workloads.
>>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>>> graphics hardware without any interaction with kernel/DRM 
>>>>>>>> schedulers.
>>>>>>>> In this method, a userspace graphics application can create its 
>>>>>>>> own
>>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>>
>>>>>>>> The general idea of how this is supposed to work:
>>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>>     - A queue object to hold the workload packets.
>>>>>>>>     - A read pointer object.
>>>>>>>>     - A write pointer object.
>>>>>>>>     - A doorbell page.
>>>>>>>> - The application picks a 32-bit offset in the doorbell page 
>>>>>>>> for this queue.
>>>>>>>> - The application uses the usermode_queue_create IOCTL 
>>>>>>>> introduced in
>>>>>>>>     this patch, by passing the the GPU addresses of these 
>>>>>>>> objects (read
>>>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell 
>>>>>>>> offset from
>>>>>>>>     the doorbell page)
>>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>>> - The application can start submitting the data in the queue as 
>>>>>>>> soon as
>>>>>>>>     the kernel IOCTL returns.
>>>>>>>> - After filling the workload data in the queue, the app must 
>>>>>>>> write the
>>>>>>>>     number of dwords added in the queue into the doorbell 
>>>>>>>> offset, and the
>>>>>>>>     GPU will start fetching the data.
>>>>>>>>
>>>>>>>> libDRM changes for this series and a sample DRM test program 
>>>>>>>> can be
>>>>>>>> found in the MESA merge request here:
>>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>>
>>>>>>>> This patch series depends on the doorbell-manager changes, 
>>>>>>>> which are
>>>>>>>> being reviewed here:
>>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>>
>>>>>>>> Alex Deucher (1):
>>>>>>>>     drm/amdgpu: UAPI for user queue management
>>>>>>>>
>>>>>>>> Arvind Yadav (2):
>>>>>>>>     drm/amdgpu: add new parameters in v11_struct
>>>>>>>>     drm/amdgpu: map wptr BO into GART
>>>>>>>>
>>>>>>>> Shashank Sharma (6):
>>>>>>>>     drm/amdgpu: add usermode queue base code
>>>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>>     drm/amdgpu: create context space for usermode queue
>>>>>>>>     drm/amdgpu: map usermode queue into MES
>>>>>>>>     drm/amdgpu: generate doorbell index for userqueue
>>>>>>>>
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
>>>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
>>>>>>>>    include/uapi/drm/amdgpu_drm.h                 | 55 ++++
>>>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>>    create mode 100644
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>>    create mode 100644 
>>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 2.40.0
>>>>>>>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-11  9:48                   ` Shashank Sharma
@ 2023-04-11 10:00                     ` Bas Nieuwenhuizen
  2023-04-11 10:55                       ` Shashank Sharma
  0 siblings, 1 reply; 56+ messages in thread
From: Bas Nieuwenhuizen @ 2023-04-11 10:00 UTC (permalink / raw)
  To: Shashank Sharma
  Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx,
	Christian König, Yadav, Arvind

On Tue, Apr 11, 2023 at 11:48 AM Shashank Sharma
<shashank.sharma@amd.com> wrote:
>
>
> On 11/04/2023 11:37, Christian König wrote:
> > Am 10.04.23 um 16:26 schrieb Shashank Sharma:
> >>
> >> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
> >>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma
> >>> <shashank.sharma@amd.com> wrote:
> >>>>
> >>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
> >>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
> >>>>> <Shashank.Sharma@amd.com> wrote:
> >>>>>> [AMD Official Use Only - General]
> >>>>>>
> >>>>>> Hello Bas,
> >>>>>>
> >>>>>> This is not the correct interpretation of the code, the
> >>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser
> >>>>>> has to exclusively call  it.
> >>>>>>
> >>>>>> Please see the sample test program in the existing libDRM series
> >>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which
> >>>>>> does the destroy_OP
> >>>>>>
> >>>>>> for the IOCTL.
> >>>>> In the presence of crashes the kernel should always be able to clean
> >>>>> this up no? Otherwise there is a resource leak?
> >>>> The crash handling is the same as any of the existing GPU resource
> >>>> which
> >>>> are allocated and freed with IOCTL_OPs.
> >>> Most of those are handled in the when the DRM fd gets closed (i.e.
> >>> when the process exits):
> >>>
> >>> - buffers through drm_gem_release()
> >>> - mappings in amdgpu_vm_fini
> >>> - contexts in amdgpu_ctx_mgr_fini
> >>>
> >>> etc.
> >>>
> >>> Why would we do things differently for userspace queues? It doesn't
> >>> look complicated looking at the above patch (which does seem to work).
> >>
> >> As the code is in initial stage, I have not given much thoughts about
> >> handling resource leak due to app crash, but this seems like a good
> >> suggestion.
> >>
> >> I am taking a note and will try to accommodate this in an upcoming
> >> version of the series.
> >
> > Bas is right, the application doesn't necessary needs to clean up on
> > exit (but it's still good custody to do so).
> >
> > See amdgpu_driver_postclose_kms() for how we cleanup (for example) the
> > ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
> >
> Thanks for the pointers Christian,
>
> I also feel like that its good to have this cleanup for those apps which
> did not clean-up themselves (due to crash or coding error).

I think the patch I linked earlier does exactly this: keep the IOCTL,
but on fini goes through the list and destroys the queue:
https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>
> So something like,
>
> on close_fd,
>
> for_idr_each {
>
>     get_queue()
>
>     if (queue)
>
>         free_queue
>
> }
>
> But we will also keep the queue_free_OP as well, so that if an app
> allocate multiple queues, and wants to free some in between, it can do it.
>
> - Shashank
>
> > Regards,
> > Christian.
> >
> >>
> >> - Shashank
> >>
> >>>> To be honest a crash handling can be very elaborated and complex one,
> >>>> and hence only can be done at the driver unload IMO, which doesn't
> >>>> help
> >>>> at that stage,
> >>>>
> >>>> coz anyways driver will re-allocate the resources on next load.
> >>>>
> >>>> - Shashank
> >>>>
> >>>>>> - Shashank
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> >>>>>> Sent: 10 April 2023 11:26
> >>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> >>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> >>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix
> >>>>>> <Felix.Kuehling@amd.com>; Koenig, Christian
> >>>>>> <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
> >>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >>>>>>
> >>>>>> Hi Shashank,
> >>>>>>
> >>>>>> I think I found the issue: I wasn't destroying the user queue in
> >>>>>> my program and the kernel doesn't clean up any remaining user
> >>>>>> queues in the postclose hook. I think we need something like
> >>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
> >>>>>>
> >>>>>> ?
> >>>>>>
> >>>>>> While running things multiple times now works, I still have
> >>>>>> problems doing multiple submissions from the same queue. Looking
> >>>>>> forward to the updated test/sample
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bas
> >>>>>>
> >>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
> >>>>>> <Shashank.Sharma@amd.com> wrote:
> >>>>>>> [AMD Official Use Only - General]
> >>>>>>>
> >>>>>>> Hello Bas,
> >>>>>>> Thanks for trying this out.
> >>>>>>>
> >>>>>>> This could be due to the doorbell as you mentioned, Usermode
> >>>>>>> queue uses doorbell manager internally.
> >>>>>>> This week, we are planning to publis the latest libDRM sample
> >>>>>>> code which uses a doorbell object (instead of the doorbell hack
> >>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
> >>>>>>> We have tested this full stack (libDRM test + Usermode queue +
> >>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well
> >>>>>>> for us.
> >>>>>>>
> >>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to
> >>>>>>> build
> >>>>>>> your kernel:
> >>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
> >>>>>>>
> >>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
> >>>>>>> changes with doorbell objects.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Shashank
> >>>>>>> -----Original Message-----
> >>>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> >>>>>>> Sent: 10 April 2023 02:37
> >>>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
> >>>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> >>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix
> >>>>>>> <Felix.Kuehling@amd.com>;
> >>>>>>> Koenig, Christian <Christian.Koenig@amd.com>
> >>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >>>>>>>
> >>>>>>> Hi Shashank,
> >>>>>>>
> >>>>>>> I tried writing a program to experiment with usermode queues and
> >>>>>>> I found some weird behavior: The first run of the program works
> >>>>>>> as expected, while subsequent runs don't seem to do anything
> >>>>>>> (and I allocate everything in GTT, so it should be zero
> >>>>>>> initialized consistently). Is this a known issue?
> >>>>>>>
> >>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl
> >>>>>>> so it could very well be that I do the doorbell wrong
> >>>>>>> (especially since the ioctl implementation was never shared
> >>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write
> >>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it
> >>>>>>> possible to update the test in libdrm?
> >>>>>>>
> >>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Bas
> >>>>>>>
> >>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma
> >>>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>>> This patch series introduces AMDGPU usermode queues for gfx
> >>>>>>>> workloads.
> >>>>>>>> Usermode queues is a method of GPU workload submission into the
> >>>>>>>> graphics hardware without any interaction with kernel/DRM
> >>>>>>>> schedulers.
> >>>>>>>> In this method, a userspace graphics application can create its
> >>>>>>>> own
> >>>>>>>> workqueue and submit it directly in the GPU HW.
> >>>>>>>>
> >>>>>>>> The general idea of how this is supposed to work:
> >>>>>>>> - The application creates the following GPU objetcs:
> >>>>>>>>     - A queue object to hold the workload packets.
> >>>>>>>>     - A read pointer object.
> >>>>>>>>     - A write pointer object.
> >>>>>>>>     - A doorbell page.
> >>>>>>>> - The application picks a 32-bit offset in the doorbell page
> >>>>>>>> for this queue.
> >>>>>>>> - The application uses the usermode_queue_create IOCTL
> >>>>>>>> introduced in
> >>>>>>>>     this patch, by passing the the GPU addresses of these
> >>>>>>>> objects (read
> >>>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell
> >>>>>>>> offset from
> >>>>>>>>     the doorbell page)
> >>>>>>>> - The kernel creates the queue and maps it in the HW.
> >>>>>>>> - The application can start submitting the data in the queue as
> >>>>>>>> soon as
> >>>>>>>>     the kernel IOCTL returns.
> >>>>>>>> - After filling the workload data in the queue, the app must
> >>>>>>>> write the
> >>>>>>>>     number of dwords added in the queue into the doorbell
> >>>>>>>> offset, and the
> >>>>>>>>     GPU will start fetching the data.
> >>>>>>>>
> >>>>>>>> libDRM changes for this series and a sample DRM test program
> >>>>>>>> can be
> >>>>>>>> found in the MESA merge request here:
> >>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
> >>>>>>>>
> >>>>>>>> This patch series depends on the doorbell-manager changes,
> >>>>>>>> which are
> >>>>>>>> being reviewed here:
> >>>>>>>> https://patchwork.freedesktop.org/series/115802/
> >>>>>>>>
> >>>>>>>> Alex Deucher (1):
> >>>>>>>>     drm/amdgpu: UAPI for user queue management
> >>>>>>>>
> >>>>>>>> Arvind Yadav (2):
> >>>>>>>>     drm/amdgpu: add new parameters in v11_struct
> >>>>>>>>     drm/amdgpu: map wptr BO into GART
> >>>>>>>>
> >>>>>>>> Shashank Sharma (6):
> >>>>>>>>     drm/amdgpu: add usermode queue base code
> >>>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
> >>>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
> >>>>>>>>     drm/amdgpu: create context space for usermode queue
> >>>>>>>>     drm/amdgpu: map usermode queue into MES
> >>>>>>>>     drm/amdgpu: generate doorbell index for userqueue
> >>>>>>>>
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
> >>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
> >>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
> >>>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
> >>>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
> >>>>>>>>    include/uapi/drm/amdgpu_drm.h                 | 55 ++++
> >>>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-) create mode
> >>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>>>>    create mode 100644
> >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> >>>>>>>>    create mode 100644
> >>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 2.40.0
> >>>>>>>>
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 0/9] AMDGPU Usermode queues
  2023-04-11 10:00                     ` Bas Nieuwenhuizen
@ 2023-04-11 10:55                       ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-04-11 10:55 UTC (permalink / raw)
  To: Bas Nieuwenhuizen
  Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx,
	Christian König, Yadav, Arvind


On 11/04/2023 12:00, Bas Nieuwenhuizen wrote:
> On Tue, Apr 11, 2023 at 11:48 AM Shashank Sharma
> <shashank.sharma@amd.com> wrote:
>>
>> On 11/04/2023 11:37, Christian König wrote:
>>> Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>>>> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>>>>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma
>>>>> <shashank.sharma@amd.com> wrote:
>>>>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>>>>> <Shashank.Sharma@amd.com> wrote:
>>>>>>>> [AMD Official Use Only - General]
>>>>>>>>
>>>>>>>> Hello Bas,
>>>>>>>>
>>>>>>>> This is not the correct interpretation of the code, the
>>>>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser
>>>>>>>> has to exclusively call  it.
>>>>>>>>
>>>>>>>> Please see the sample test program in the existing libDRM series
>>>>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which
>>>>>>>> does the destroy_OP
>>>>>>>>
>>>>>>>> for the IOCTL.
>>>>>>> In the presence of crashes the kernel should always be able to clean
>>>>>>> this up no? Otherwise there is a resource leak?
>>>>>> The crash handling is the same as any of the existing GPU resource
>>>>>> which
>>>>>> are allocated and freed with IOCTL_OPs.
>>>>> Most of those are handled in the when the DRM fd gets closed (i.e.
>>>>> when the process exits):
>>>>>
>>>>> - buffers through drm_gem_release()
>>>>> - mappings in amdgpu_vm_fini
>>>>> - contexts in amdgpu_ctx_mgr_fini
>>>>>
>>>>> etc.
>>>>>
>>>>> Why would we do things differently for userspace queues? It doesn't
>>>>> look complicated looking at the above patch (which does seem to work).
>>>> As the code is in initial stage, I have not given much thoughts about
>>>> handling resource leak due to app crash, but this seems like a good
>>>> suggestion.
>>>>
>>>> I am taking a note and will try to accommodate this in an upcoming
>>>> version of the series.
>>> Bas is right, the application doesn't necessary needs to clean up on
>>> exit (but it's still good custody to do so).
>>>
>>> See amdgpu_driver_postclose_kms() for how we cleanup (for example) the
>>> ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
>>>
>> Thanks for the pointers Christian,
>>
>> I also feel like that its good to have this cleanup for those apps which
>> did not clean-up themselves (due to crash or coding error).
> I think the patch I linked earlier does exactly this: keep the IOCTL,
> but on fini goes through the list and destroys the queue:
> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29

Yep, just needs additional check to free only when its not already 
freed, like doble free check. Will try to reuse most of it.

- Shashank

>> So something like,
>>
>> on close_fd,
>>
>> for_idr_each {
>>
>>      get_queue()
>>
>>      if (queue)
>>
>>          free_queue
>>
>> }
>>
>> But we will also keep the queue_free_OP as well, so that if an app
>> allocate multiple queues, and wants to free some in between, it can do it.
>>
>> - Shashank
>>
>>> Regards,
>>> Christian.
>>>
>>>> - Shashank
>>>>
>>>>>> To be honest a crash handling can be very elaborated and complex one,
>>>>>> and hence only can be done at the driver unload IMO, which doesn't
>>>>>> help
>>>>>> at that stage,
>>>>>>
>>>>>> coz anyways driver will re-allocate the resources on next load.
>>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>>> - Shashank
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>>>>> Sent: 10 April 2023 11:26
>>>>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix
>>>>>>>> <Felix.Kuehling@amd.com>; Koenig, Christian
>>>>>>>> <Christian.Koenig@amd.com>; Yadav, Arvind <Arvind.Yadav@amd.com>
>>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>>
>>>>>>>> Hi Shashank,
>>>>>>>>
>>>>>>>> I think I found the issue: I wasn't destroying the user queue in
>>>>>>>> my program and the kernel doesn't clean up any remaining user
>>>>>>>> queues in the postclose hook. I think we need something like
>>>>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> While running things multiple times now works, I still have
>>>>>>>> problems doing multiple submissions from the same queue. Looking
>>>>>>>> forward to the updated test/sample
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bas
>>>>>>>>
>>>>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
>>>>>>>> <Shashank.Sharma@amd.com> wrote:
>>>>>>>>> [AMD Official Use Only - General]
>>>>>>>>>
>>>>>>>>> Hello Bas,
>>>>>>>>> Thanks for trying this out.
>>>>>>>>>
>>>>>>>>> This could be due to the doorbell as you mentioned, Usermode
>>>>>>>>> queue uses doorbell manager internally.
>>>>>>>>> This week, we are planning to publis the latest libDRM sample
>>>>>>>>> code which uses a doorbell object (instead of the doorbell hack
>>>>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>>>>> We have tested this full stack (libDRM test + Usermode queue +
>>>>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well
>>>>>>>>> for us.
>>>>>>>>>
>>>>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to
>>>>>>>>> build
>>>>>>>>> your kernel:
>>>>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
>>>>>>>>>
>>>>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>>>>> changes with doorbell objects.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Shashank
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>>>>>> Sent: 10 April 2023 02:37
>>>>>>>>> To: Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>>>>>> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>>>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix
>>>>>>>>> <Felix.Kuehling@amd.com>;
>>>>>>>>> Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>>>
>>>>>>>>> Hi Shashank,
>>>>>>>>>
>>>>>>>>> I tried writing a program to experiment with usermode queues and
>>>>>>>>> I found some weird behavior: The first run of the program works
>>>>>>>>> as expected, while subsequent runs don't seem to do anything
>>>>>>>>> (and I allocate everything in GTT, so it should be zero
>>>>>>>>> initialized consistently). Is this a known issue?
>>>>>>>>>
>>>>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl
>>>>>>>>> so it could very well be that I do the doorbell wrong
>>>>>>>>> (especially since the ioctl implementation was never shared
>>>>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write
>>>>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it
>>>>>>>>> possible to update the test in libdrm?
>>>>>>>>>
>>>>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Bas
>>>>>>>>>
>>>>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma
>>>>>>>>> <shashank.sharma@amd.com> wrote:
>>>>>>>>>> This patch series introduces AMDGPU usermode queues for gfx
>>>>>>>>>> workloads.
>>>>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>>>>> graphics hardware without any interaction with kernel/DRM
>>>>>>>>>> schedulers.
>>>>>>>>>> In this method, a userspace graphics application can create its
>>>>>>>>>> own
>>>>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>>>>
>>>>>>>>>> The general idea of how this is supposed to work:
>>>>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>>>>      - A queue object to hold the workload packets.
>>>>>>>>>>      - A read pointer object.
>>>>>>>>>>      - A write pointer object.
>>>>>>>>>>      - A doorbell page.
>>>>>>>>>> - The application picks a 32-bit offset in the doorbell page
>>>>>>>>>> for this queue.
>>>>>>>>>> - The application uses the usermode_queue_create IOCTL
>>>>>>>>>> introduced in
>>>>>>>>>>      this patch, by passing the the GPU addresses of these
>>>>>>>>>> objects (read
>>>>>>>>>>      ptr, write ptr, queue base address and 32-bit doorbell
>>>>>>>>>> offset from
>>>>>>>>>>      the doorbell page)
>>>>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>>>>> - The application can start submitting the data in the queue as
>>>>>>>>>> soon as
>>>>>>>>>>      the kernel IOCTL returns.
>>>>>>>>>> - After filling the workload data in the queue, the app must
>>>>>>>>>> write the
>>>>>>>>>>      number of dwords added in the queue into the doorbell
>>>>>>>>>> offset, and the
>>>>>>>>>>      GPU will start fetching the data.
>>>>>>>>>>
>>>>>>>>>> libDRM changes for this series and a sample DRM test program
>>>>>>>>>> can be
>>>>>>>>>> found in the MESA merge request here:
>>>>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>>>>
>>>>>>>>>> This patch series depends on the doorbell-manager changes,
>>>>>>>>>> which are
>>>>>>>>>> being reviewed here:
>>>>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>>>>
>>>>>>>>>> Alex Deucher (1):
>>>>>>>>>>      drm/amdgpu: UAPI for user queue management
>>>>>>>>>>
>>>>>>>>>> Arvind Yadav (2):
>>>>>>>>>>      drm/amdgpu: add new parameters in v11_struct
>>>>>>>>>>      drm/amdgpu: map wptr BO into GART
>>>>>>>>>>
>>>>>>>>>> Shashank Sharma (6):
>>>>>>>>>>      drm/amdgpu: add usermode queue base code
>>>>>>>>>>      drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>>>>      drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>>>>      drm/amdgpu: create context space for usermode queue
>>>>>>>>>>      drm/amdgpu: map usermode queue into MES
>>>>>>>>>>      drm/amdgpu: generate doorbell index for userqueue
>>>>>>>>>>
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>>>>     .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
>>>>>>>>>>     drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
>>>>>>>>>>     include/uapi/drm/amdgpu_drm.h                 | 55 ++++
>>>>>>>>>>     9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>>>>     create mode 100644
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>>>>     create mode 100644
>>>>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 2.40.0
>>>>>>>>>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART
  2023-04-11  9:29     ` Christian König
@ 2023-04-11 16:02       ` Shashank Sharma
  0 siblings, 0 replies; 56+ messages in thread
From: Shashank Sharma @ 2023-04-11 16:02 UTC (permalink / raw)
  To: Christian König, Bas Nieuwenhuizen
  Cc: Alex Deucher, Felix Kuehling, Arvind Yadav, amd-gfx


On 11/04/2023 11:29, Christian König wrote:
> Am 10.04.23 um 02:00 schrieb Bas Nieuwenhuizen:
>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma 
>> <shashank.sharma@amd.com> wrote:
>>> From: Arvind Yadav <arvind.yadav@amd.com>
>>>
>>> To support oversubscription, MES expects WPTR BOs to be mapped
>>> to GART, before they are submitted to usermode queues.
>>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 89 
>>> +++++++++++++++++++
>>>   .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |  1 +
>>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |  1 +
>>>   3 files changed, 91 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> index 5672efcbcffc..7409a4ae55da 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> @@ -43,6 +43,89 @@ amdgpu_userqueue_find(struct amdgpu_userq_mgr 
>>> *uq_mgr, int qid)
>>>       return idr_find(&uq_mgr->userq_idr, qid);
>>>   }
>>>
>>> +static int
>>> +amdgpu_userqueue_map_gtt_bo_to_gart(struct amdgpu_device *adev, 
>>> struct amdgpu_bo *bo)
>>> +{
>>> +    int ret;
>>> +
>>> +    ret = amdgpu_bo_reserve(bo, true);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to reserve bo. ret %d\n", ret);
>>> +        goto err_reserve_bo_failed;
>>> +    }
>>> +
>>> +    ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to pin bo. ret %d\n", ret);
>>> +        goto err_pin_bo_failed;
>>> +    }
>>> +
>>> +    ret = amdgpu_ttm_alloc_gart(&bo->tbo);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to bind bo to GART. ret %d\n", ret);
>>> +        goto err_map_bo_gart_failed;
>>> +    }
>>> +
>>> +
>>> +    amdgpu_bo_unreserve(bo);
>>> +    bo = amdgpu_bo_ref(bo);
>>> +
>>> +    return 0;
>>> +
>>> +err_map_bo_gart_failed:
>>> +    amdgpu_bo_unpin(bo);
>>> +err_pin_bo_failed:
>>> +    amdgpu_bo_unreserve(bo);
>>> +err_reserve_bo_failed:
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +
>>> +static int
>>> +amdgpu_userqueue_create_wptr_mapping(struct amdgpu_device *adev,
>>> +                                    struct drm_file *filp,
>>> +                                    struct amdgpu_usermode_queue 
>>> *queue)
>>> +{
>>> +    struct amdgpu_bo_va_mapping *wptr_mapping;
>>> +    struct amdgpu_vm *wptr_vm;
>>> +    struct amdgpu_bo *wptr_bo = NULL;
>>> +    uint64_t wptr = queue->userq_prop.wptr_gpu_addr;
>>> +    int ret;
>>> +
>>> +    wptr_vm = queue->vm;
>>> +    ret = amdgpu_bo_reserve(wptr_vm->root.bo, false);
>>> +    if (ret)
>>> +        goto err_wptr_map_gart;
>>> +
>>> +    wptr_mapping = amdgpu_vm_bo_lookup_mapping(wptr_vm, wptr >> 
>>> PAGE_SHIFT);
>>> +    amdgpu_bo_unreserve(wptr_vm->root.bo);
>>> +    if (!wptr_mapping) {
>>> +        DRM_ERROR("Failed to lookup wptr bo\n");
>>> +        ret = -EINVAL;
>>> +        goto err_wptr_map_gart;
>>> +    }
>> This triggers for wptr BOs mapped to the high half of address space,
>> may need some mangling wrt the top bits?
>
> Yeah, correct. Shashank this needs to apply the hole mask before 
> looking up the address.


Noted, will update that.

- Shashank

>
> Regards,
> Christian.
>
>>
>>> +
>>> +    wptr_bo = wptr_mapping->bo_va->base.bo;
>>> +    if (wptr_bo->tbo.base.size > PAGE_SIZE) {
>>> +        DRM_ERROR("Requested GART mapping for wptr bo larger than 
>>> one page\n");
>>> +        ret = -EINVAL;
>>> +        goto err_wptr_map_gart;
>>> +    }
>>> +
>>> +    ret = amdgpu_userqueue_map_gtt_bo_to_gart(adev, wptr_bo);
>>> +    if (ret) {
>>> +        DRM_ERROR("Failed to map wptr bo to GART\n");
>>> +        goto err_wptr_map_gart;
>>> +    }
>>> +
>>> +    queue->wptr_mc_addr = wptr_bo->tbo.resource->start << PAGE_SHIFT;
>>> +    return 0;
>>> +
>>> +err_wptr_map_gart:
>>> +    return ret;
>>> +}
>>> +
>>>   static int amdgpu_userqueue_create(struct drm_file *filp, union 
>>> drm_amdgpu_userq *args)
>>>   {
>>>       struct amdgpu_usermode_queue *queue;
>>> @@ -82,6 +165,12 @@ static int amdgpu_userqueue_create(struct 
>>> drm_file *filp, union drm_amdgpu_userq
>>>           goto free_queue;
>>>       }
>>>
>>> +    r = amdgpu_userqueue_create_wptr_mapping(uq_mgr->adev, filp, 
>>> queue);
>>> +    if (r) {
>>> +        DRM_ERROR("Failed to map WPTR (0x%llx) for userqueue\n", 
>>> queue->userq_prop.wptr_gpu_addr);
>>> +        goto free_queue;
>>> +    }
>>> +
>>>       r = uq_mgr->userq_funcs[queue->queue_type]->mqd_create(uq_mgr, 
>>> queue);
>>>       if (r) {
>>>           DRM_ERROR("Failed to create/map userqueue MQD\n");
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>> index 1627641a4a4e..274e78826334 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>> @@ -58,6 +58,7 @@ amdgpu_userq_gfx_v11_map(struct amdgpu_userq_mgr 
>>> *uq_mgr,
>>>       queue_input.queue_size = queue->userq_prop.queue_size >> 2;
>>>       queue_input.doorbell_offset = queue->userq_prop.doorbell_index;
>>>       queue_input.page_table_base_addr = 
>>> amdgpu_gmc_pd_addr(queue->vm->root.bo);
>>> +    queue_input.wptr_mc_addr = queue->wptr_mc_addr;
>>>
>>>       amdgpu_mes_lock(&adev->mes);
>>>       r = adev->mes.funcs->add_hw_queue(&adev->mes, &queue_input);
>>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> index 8b62ef77cd26..eaab7cf5fff6 100644
>>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>> @@ -38,6 +38,7 @@ struct amdgpu_usermode_queue {
>>>          int queue_type;
>>>          uint64_t flags;
>>>          uint64_t doorbell_handle;
>>> +       uint64_t wptr_mc_addr;
>>>          uint64_t proc_ctx_gpu_addr;
>>>          uint64_t gang_ctx_gpu_addr;
>>>          uint64_t gds_ctx_gpu_addr;
>>> -- 
>>> 2.40.0
>>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-04-11 16:02 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29 16:04 [PATCH v3 0/9] AMDGPU Usermode queues Shashank Sharma
2023-03-29 16:04 ` [PATCH v3 1/9] drm/amdgpu: UAPI for user queue management Shashank Sharma
2023-03-29 17:25   ` Christian König
2023-03-29 17:57   ` Alex Deucher
2023-03-29 19:21     ` Shashank Sharma
2023-03-29 19:46       ` Alex Deucher
2023-03-30  6:13         ` Shashank Sharma
     [not found]   ` <71fc098c-c0cb-3097-4e11-c2d9bd9b4783@damsy.net>
2023-03-30  8:15     ` Shashank Sharma
2023-03-30 10:40       ` Christian König
2023-03-30 15:08         ` Alex Deucher
2023-03-29 16:04 ` [PATCH v3 2/9] drm/amdgpu: add usermode queue base code Shashank Sharma
2023-03-30 21:15   ` Alex Deucher
2023-03-31  8:52     ` Shashank Sharma
2023-04-04 16:05   ` Luben Tuikov
2023-03-29 16:04 ` [PATCH v3 3/9] drm/amdgpu: add new IOCTL for usermode queue Shashank Sharma
2023-04-10  0:02   ` Bas Nieuwenhuizen
2023-04-10 14:28     ` Shashank Sharma
2023-03-29 16:04 ` [PATCH v3 4/9] drm/amdgpu: create GFX-gen11 MQD for userqueue Shashank Sharma
2023-03-30 21:18   ` Alex Deucher
2023-03-31  8:49     ` Shashank Sharma
2023-04-04 16:21   ` Luben Tuikov
2023-03-29 16:04 ` [PATCH v3 5/9] drm/amdgpu: create context space for usermode queue Shashank Sharma
2023-03-30 21:23   ` Alex Deucher
2023-03-31  8:42     ` Shashank Sharma
2023-04-04 16:24   ` Luben Tuikov
2023-04-04 16:37     ` Shashank Sharma
2023-03-29 16:04 ` [PATCH v3 6/9] drm/amdgpu: add new parameters in v11_struct Shashank Sharma
2023-03-30 21:25   ` Alex Deucher
2023-03-31  6:39     ` Yadav, Arvind
2023-03-31  8:30     ` Shashank Sharma
2023-03-29 16:04 ` [PATCH v3 7/9] drm/amdgpu: map usermode queue into MES Shashank Sharma
2023-04-04 16:30   ` Luben Tuikov
2023-04-04 16:36     ` Shashank Sharma
2023-04-04 20:58       ` Luben Tuikov
2023-04-05 10:06         ` Shashank Sharma
2023-04-05 21:18           ` Luben Tuikov
2023-04-06  7:45             ` Shashank Sharma
2023-04-06 15:16               ` Felix Kuehling
2023-04-07  6:41                 ` Shashank Sharma
2023-03-29 16:04 ` [PATCH v3 8/9] drm/amdgpu: map wptr BO into GART Shashank Sharma
2023-04-10  0:00   ` Bas Nieuwenhuizen
2023-04-11  9:29     ` Christian König
2023-04-11 16:02       ` Shashank Sharma
2023-03-29 16:04 ` [PATCH v3 9/9] drm/amdgpu: generate doorbell index for userqueue Shashank Sharma
2023-04-10  0:36 ` [PATCH v3 0/9] AMDGPU Usermode queues Bas Nieuwenhuizen
2023-04-10  7:32   ` Sharma, Shashank
2023-04-10  9:25     ` Bas Nieuwenhuizen
2023-04-10 13:40       ` Sharma, Shashank
2023-04-10 13:46         ` Bas Nieuwenhuizen
2023-04-10 14:01           ` Shashank Sharma
2023-04-10 14:04             ` Bas Nieuwenhuizen
2023-04-10 14:26               ` Shashank Sharma
2023-04-11  9:37                 ` Christian König
2023-04-11  9:48                   ` Shashank Sharma
2023-04-11 10:00                     ` Bas Nieuwenhuizen
2023-04-11 10:55                       ` Shashank Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.