All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Usermode queue fencing synchronization
@ 2023-02-26 16:54 Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 1/6] drm/amdgpu: Implement a new 64bit sequence memory driver Arunpravin Paneer Selvam
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

This patch series introduces fence drivers for usermode graphics
queues synchronization.

The idea here is, we insert a hardware fence signal command at the end of user process
graphics rendering commands and the creates a fence using the current wptr. On the
Otherhand, a process required to access these shared resources should wait on the
fence address/wptr value provided by the driver through wait ioctl function.
Here the hardware/firmware supposed to write the read pointer into fence address.
Hence the process waiting on the fence address equating the fence address (wptr) >= rptr,
before start consuming the buffers. This way we achieve the implicit synchronization
among userspace process for the shared resources.

The core usermode queue and doorbell design patches in review are seen below
which are prerequisites for this work.
Task 1: https://patchwork.freedesktop.org/series/114065/
Task 2: https://patchwork.freedesktop.org/series/113669/#rev2

Alex Deucher (1):
  drm/amdgpu: UAPI headers for userqueue Secure semaphore

Arunpravin Paneer Selvam (5):
  drm/amdgpu: Implement a new 64bit sequence memory driver
  drm/amdgpu: Implement a new userqueue fence driver
  drm/amdgpu: Add mqd support for the fence address
  drm/amdgpu: Implement userqueue signal/wait IOCTL functions
  drm/amdgpu: Enable userqueue fence interrupt handling support

 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c     | 158 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h     |  48 ++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 512 ++++++++++++++++++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  68 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  21 +
 .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c |   4 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  20 +-
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 drivers/gpu/drm/amd/include/v11_structs.h     |   4 +-
 include/uapi/drm/amdgpu_drm.h                 |  46 ++
 15 files changed, 919 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] drm/amdgpu: Implement a new 64bit sequence memory driver
  2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
@ 2023-02-26 16:54 ` Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver Arunpravin Paneer Selvam
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

Developed a new driver which allocates a 64bit memory on
each request in sequence order. At the moment, user queue
fence memory is the main consumer of this seq64 driver.

v2: Worked on review comments from Christian for the following
    modifications

    - Move driver name from "semaphore" to "seq64"
    - Remove unnecessary PT/PD mapping
    - Move enable_mes check into init/fini functions.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile        |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   7 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c  | 158 +++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h  |  48 +++++++
 6 files changed, 232 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 6ae9d5792791..a239533a895f 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 	amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
 	amdgpu_fw_attestation.o amdgpu_securedisplay.o \
 	amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-	amdgpu_ring_mux.o
+	amdgpu_ring_mux.o amdgpu_seq64.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0625d6bdadf4..1c3eba2d0390 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -110,6 +110,7 @@
 #include "amdgpu_mca.h"
 #include "amdgpu_ras.h"
 #include "amdgpu_userqueue.h"
+#include "amdgpu_seq64.h"
 
 #define MAX_GPU_INSTANCE		16
 
@@ -480,6 +481,7 @@ struct amdgpu_fpriv {
 	struct amdgpu_vm	vm;
 	struct amdgpu_bo_va	*prt_va;
 	struct amdgpu_bo_va	*csa_va;
+	struct amdgpu_bo_va	*seq64_va;
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
@@ -944,6 +946,9 @@ struct amdgpu_device {
 	/* GDS */
 	struct amdgpu_gds		gds;
 
+	/* for userq and VM fences */
+	struct amdgpu_seq64		seq64;
+
 	/* KFD */
 	struct amdgpu_kfd_dev		kfd;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index afe6af9c0138..88097d12ced3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2417,6 +2417,12 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 					goto init_failed;
 				}
 			}
+
+			r = amdgpu_seq64_init(adev);
+			if (r) {
+				DRM_ERROR("allocate seq64 failed %d\n", r);
+				goto init_failed;
+			}
 		}
 	}
 
@@ -2873,6 +2879,7 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 			amdgpu_device_wb_fini(adev);
 			amdgpu_device_vram_scratch_fini(adev);
 			amdgpu_ib_pool_fini(adev);
+			amdgpu_seq64_fini(adev);
 		}
 
 		r = adev->ip_blocks[i].version->funcs->sw_fini((void *)adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 52e61e339a88..d1198ca5aa7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1182,6 +1182,12 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 			goto error_vm;
 	}
 
+	r = amdgpu_seq64_map(adev, &fpriv->vm, &fpriv->seq64_va,
+			     AMDGPU_SEQ64_VADDR_START,
+			     AMDGPU_SEQ64_SIZE);
+	if (r)
+		goto error_vm;
+
 	mutex_init(&fpriv->bo_list_lock);
 	idr_init_base(&fpriv->bo_list_handles, 1);
 
@@ -1249,6 +1255,13 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 		amdgpu_bo_unreserve(adev->virt.csa_obj);
 	}
 
+	if (fpriv->seq64_va) {
+		WARN_ON(amdgpu_bo_reserve(adev->seq64.sbo, true));
+		amdgpu_vm_bo_del(adev, fpriv->seq64_va);
+		fpriv->seq64_va = NULL;
+		amdgpu_bo_unreserve(adev->seq64.sbo);
+	}
+
 	pasid = fpriv->vm.pasid;
 	pd = amdgpu_bo_ref(fpriv->vm.root.bo);
 	if (!WARN_ON(amdgpu_bo_reserve(pd, true))) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
new file mode 100644
index 000000000000..bf43856cebbc
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+#include "amdgpu_seq64.h"
+
+void amdgpu_seq64_fini(struct amdgpu_device *adev)
+{
+	if (!adev->enable_mes)
+		return;
+
+	amdgpu_bo_free_kernel(&adev->seq64.sbo,
+			      NULL,
+			      (void **)&adev->seq64.cpu_base_addr);
+}
+
+int amdgpu_seq64_init(struct amdgpu_device *adev)
+{
+	int r;
+
+	if (!adev->enable_mes)
+		return -EINVAL; 
+
+	if (adev->seq64.sbo)
+		return 0;
+	
+	/*
+	 * AMDGPU_MAX_SEQ64_SLOTS * sizeof(u64) * 8 = AMDGPU_MAX_SEQ64_SLOTS
+	 * 64bit slots
+	 */
+	r = amdgpu_bo_create_kernel(adev, AMDGPU_SEQ64_SIZE,
+				    PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
+				    &adev->seq64.sbo, NULL,
+				    (void **)&adev->seq64.cpu_base_addr);
+	if (r) {
+		dev_warn(adev->dev, "(%d) create seq64 failed\n", r);
+		return r;
+	}
+	
+	memset(adev->seq64.cpu_base_addr, 0, AMDGPU_SEQ64_SIZE);
+	
+	adev->seq64.num_sem = AMDGPU_MAX_SEQ64_SLOTS;
+	memset(&adev->seq64.used, 0, sizeof(adev->seq64.used));
+	
+	return 0;
+}
+
+int amdgpu_seq64_map(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+		     struct amdgpu_bo_va **bo_va, u64 seq64_addr,
+		     uint32_t size)
+{
+	struct ttm_validate_buffer seq64_tv;
+	struct amdgpu_bo_list_entry pd;
+	struct ww_acquire_ctx ticket;
+	struct list_head list;
+	struct amdgpu_bo *bo;
+	int r;
+
+	bo = adev->seq64.sbo;
+	if (!bo)
+		return -EINVAL;
+
+	INIT_LIST_HEAD(&list);
+	INIT_LIST_HEAD(&seq64_tv.head);
+
+	seq64_tv.bo = &bo->tbo;
+	seq64_tv.num_shared = 1;
+
+	list_add(&seq64_tv.head, &list);
+	amdgpu_vm_get_pd_bo(vm, &list, &pd);
+
+	r = ttm_eu_reserve_buffers(&ticket, &list, true, NULL);
+	if (r)
+		return r;
+
+	*bo_va = amdgpu_vm_bo_add(adev, vm, bo);
+	if (!*bo_va) {
+		r = -ENOMEM;
+		goto error_vm;
+	}
+
+	r = amdgpu_vm_bo_map(adev, *bo_va, seq64_addr, 0, size,
+			     AMDGPU_PTE_READABLE | AMDGPU_PTE_WRITEABLE |
+			     AMDGPU_PTE_EXECUTABLE);
+	if (r) {
+		DRM_ERROR("failed to do bo_map on userq sem, err=%d\n", r);
+		goto error_map;
+	}
+
+	r = amdgpu_vm_bo_update(adev, *bo_va, false);
+	if (r) {
+		DRM_ERROR("failed to do vm_bo_update on userq sem\n");
+		goto error_map;
+	}
+
+	ttm_eu_backoff_reservation(&ticket, &list);
+
+	return 0;
+
+error_map:
+	amdgpu_vm_bo_del(adev, *bo_va);
+error_vm:
+	ttm_eu_backoff_reservation(&ticket, &list);
+	return r;
+}
+
+int amdgpu_seq64_get(struct amdgpu_device *adev, u64 *gpu_addr,
+		     u64 **cpu_addr)
+{
+	unsigned long bit_pos;
+	u32 offset;
+
+	bit_pos = find_first_zero_bit(adev->seq64.used, adev->seq64.num_sem);
+
+	if (bit_pos < adev->seq64.num_sem) {
+		__set_bit(bit_pos, adev->seq64.used);
+		offset = bit_pos << 6; /* convert to qw offset */
+	} else {
+		return -EINVAL;
+	}
+
+	*gpu_addr = offset + AMDGPU_SEQ64_VADDR_START;
+	*cpu_addr = offset + adev->seq64.cpu_base_addr;
+
+	return 0;
+}
+
+void amdgpu_seq64_free(struct amdgpu_device *adev, u64 gpu_addr)
+{
+	u32 offset;
+
+	offset = gpu_addr - AMDGPU_SEQ64_VADDR_START;
+
+	offset >>= 6;
+	if (offset < adev->seq64.num_sem)
+		__clear_bit(offset, adev->seq64.used);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
new file mode 100644
index 000000000000..e9b0afa9c5aa
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __AMDGPU_SEQ64_H__
+#define __AMDGPU_SEQ64_H__
+
+#define AMDGPU_SEQ64_SIZE		(2ULL << 20)
+#define AMDGPU_MAX_SEQ64_SLOTS		(AMDGPU_SEQ64_SIZE / (sizeof(u64) * 8))
+#define AMDGPU_SEQ64_VADDR_OFFSET	0x50000
+#define AMDGPU_SEQ64_VADDR_START	(AMDGPU_VA_RESERVED_SIZE + AMDGPU_SEQ64_VADDR_OFFSET)
+
+struct amdgpu_seq64 {
+	struct amdgpu_bo *sbo;
+	u32 num_sem;
+	u64 *cpu_base_addr;
+	unsigned long used[DIV_ROUND_UP(AMDGPU_MAX_SEQ64_SLOTS, BITS_PER_LONG)];
+};
+
+void amdgpu_seq64_fini(struct amdgpu_device *adev);
+int amdgpu_seq64_init(struct amdgpu_device *adev);
+int amdgpu_seq64_get(struct amdgpu_device *adev, u64 *gpu_addr, u64 **cpu_addr);
+void amdgpu_seq64_free(struct amdgpu_device *adev, u64 gpu_addr);
+int amdgpu_seq64_map(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+		     struct amdgpu_bo_va **bo_va, u64 seq64_addr, uint32_t size);
+
+#endif
+
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver
  2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 1/6] drm/amdgpu: Implement a new 64bit sequence memory driver Arunpravin Paneer Selvam
@ 2023-02-26 16:54 ` Arunpravin Paneer Selvam
  2023-02-27 12:42   ` Christian König
  2023-02-26 16:54 ` [PATCH 3/6] drm/amdgpu: Add mqd support for the fence address Arunpravin Paneer Selvam
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

Developed a userqueue fence driver for the userqueue process shared
BO synchronization.

Create a dma fence having write pointer as the seqno and allocate a
seq64 memory for each user queue process and feed this memory address
into the firmware/hardware, thus the firmware writes the read pointer
into the given address when the process completes it execution.
Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting
process to consume the buffers.

v2: Worked on review comments from Christian for the following
    modifications

    - Add wptr as sequence number into the fence
    - Add a reference count for the fence driver
    - Add dma_fence_put below the list_del as it might frees the userq fence.
    - Trim unnecessary code in interrupt handler.
    - Check dma fence signaled state in dma fence creation function for a
      potential problem of hardware completing the job processing beforehand.
    - Add necessary locks.
    - Create a list and process all the unsignaled fences.
    - clean up fences in destroy function.
    - implement .enabled callback function

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 +
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 251 ++++++++++++++++++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  61 +++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  20 ++
 .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
 6 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index a239533a895f..ea09273b585f 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 	amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
 	amdgpu_fw_attestation.o amdgpu_securedisplay.o \
 	amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-	amdgpu_ring_mux.o amdgpu_seq64.o
+	amdgpu_ring_mux.o amdgpu_seq64.o amdgpu_userq_fence.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index bd3462d0da5f..6b7ac1ebd04c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -53,6 +53,7 @@
 #include "amdgpu_xgmi.h"
 #include "amdgpu_reset.h"
 #include "amdgpu_userqueue.h"
+#include "amdgpu_userq_fence.h"
 
 /*
  * KMS wrapper.
@@ -2827,6 +2828,10 @@ static int __init amdgpu_init(void)
 	if (r)
 		goto error_fence;
 
+	r = amdgpu_userq_fence_slab_init();
+	if (r)
+		goto error_fence;
+
 	DRM_INFO("amdgpu kernel modesetting enabled.\n");
 	amdgpu_register_atpx_handler();
 	amdgpu_acpi_detect();
@@ -2851,6 +2856,7 @@ static void __exit amdgpu_exit(void)
 	amdgpu_unregister_atpx_handler();
 	amdgpu_sync_fini();
 	amdgpu_fence_slab_fini();
+	amdgpu_userq_fence_slab_fini();
 	mmu_notifier_synchronize();
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
new file mode 100644
index 000000000000..609a7328e9a6
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/kref.h>
+#include <linux/slab.h>
+
+#include <drm/drm_syncobj.h>
+
+#include "amdgpu.h"
+#include "amdgpu_userq_fence.h"
+#include "amdgpu_userqueue.h"
+
+static struct kmem_cache *amdgpu_userq_fence_slab;
+
+int amdgpu_userq_fence_slab_init(void)
+{
+	amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
+						    sizeof(struct amdgpu_userq_fence),
+						    0,
+						    SLAB_HWCACHE_ALIGN,
+						    NULL);
+	if (!amdgpu_userq_fence_slab)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void amdgpu_userq_fence_slab_fini(void)
+{
+	rcu_barrier();
+	kmem_cache_destroy(amdgpu_userq_fence_slab);
+}
+
+static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
+{
+	struct amdgpu_userq_fence *__f = container_of(f, struct amdgpu_userq_fence, base);
+
+	if (!__f)
+		return NULL;
+
+	if (__f->base.ops == &amdgpu_userq_fence_ops)
+		return __f;
+
+	return NULL;
+}
+
+static u64 amdgpu_userq_fence_read(struct amdgpu_userq_fence_driver *fence_drv)
+{
+	return le64_to_cpu(*fence_drv->cpu_addr);
+}
+
+int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev,
+				  struct amdgpu_usermode_queue *userq)
+{
+	struct amdgpu_userq_fence_driver *fence_drv;
+	int r;
+
+	fence_drv = userq->fence_drv;
+	if (!fence_drv)
+		return -EINVAL;
+
+	/* Acquire seq64 memory */
+	r = amdgpu_seq64_get(adev, &fence_drv->gpu_addr,
+			     &fence_drv->cpu_addr);
+	if (r)
+		return -ENOMEM;
+	
+	kref_init(&fence_drv->refcount);
+	INIT_LIST_HEAD(&fence_drv->fences);
+	spin_lock_init(&fence_drv->fence_list_lock);
+
+	fence_drv->adev = adev;
+	fence_drv->context = dma_fence_context_alloc(1);
+
+	get_task_comm(fence_drv->timeline_name, current);
+
+	return 0;
+}
+
+void amdgpu_userq_fence_driver_destroy(struct kref *ref)
+{
+	struct amdgpu_userq_fence_driver *fence_drv = container_of(ref,
+					 struct amdgpu_userq_fence_driver,
+					 refcount);
+	struct amdgpu_device *adev = fence_drv->adev;
+	struct amdgpu_userq_fence *fence, *tmp;
+	struct dma_fence *f;
+	
+	spin_lock(&fence_drv->fence_list_lock);
+	list_for_each_entry_safe(fence, tmp, &fence_drv->fences, link) {
+		f = &fence->base;
+		
+		if (!dma_fence_is_signaled(f)) {
+			dma_fence_set_error(f, -ECANCELED);
+			dma_fence_signal(f);
+		}
+		
+		list_del(&fence->link);
+		dma_fence_put(f);
+	}
+	
+	WARN_ON_ONCE(!list_empty(&fence_drv->fences));
+	spin_unlock(&fence_drv->fence_list_lock);
+	
+	/* Free seq64 memory */
+	amdgpu_seq64_free(adev, fence_drv->gpu_addr);
+	kfree(fence_drv);
+}
+
+void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
+{
+	kref_put(&fence_drv->refcount, amdgpu_userq_fence_driver_destroy);
+}
+
+int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
+			      u64 seq, struct dma_fence **f)
+{
+	struct amdgpu_userq_fence_driver *fence_drv;
+	struct amdgpu_userq_fence *userq_fence;
+	struct dma_fence *fence;
+
+	fence_drv = userq->fence_drv;
+	if (!fence_drv)
+		return -EINVAL;
+
+	userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
+	if (!userq_fence)
+		return -ENOMEM;
+
+	spin_lock_init(&userq_fence->lock);
+	INIT_LIST_HEAD(&userq_fence->link);
+	fence = &userq_fence->base;
+	userq_fence->fence_drv = fence_drv;
+
+	dma_fence_init(fence, &amdgpu_userq_fence_ops, &userq_fence->lock,
+		       fence_drv->context, seq);
+
+	kref_get(&fence_drv->refcount);
+
+	spin_lock(&fence_drv->fence_list_lock);
+	/* Check if hardware has already processed the job */
+	if (!dma_fence_is_signaled(fence)) {
+		dma_fence_get(fence);
+		list_add_tail(&userq_fence->link, &fence_drv->fences);
+		dma_fence_put(fence);
+	}
+	spin_unlock(&fence_drv->fence_list_lock);	
+	/* Release the fence driver reference */
+	amdgpu_userq_fence_driver_put(fence_drv);
+
+	*f = fence;
+
+	return 0;
+}
+
+bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver *fence_drv)
+{
+	struct amdgpu_userq_fence *userq_fence, *tmp;
+	struct dma_fence *fence;
+	u64 rptr;
+
+	if (!fence_drv)
+		return false;
+
+	rptr = amdgpu_userq_fence_read(fence_drv);
+
+	spin_lock(&fence_drv->fence_list_lock);
+	list_for_each_entry_safe(userq_fence, tmp, &fence_drv->fences, link) {
+		fence = &userq_fence->base;
+
+		if (rptr >= fence->seqno) {
+			dma_fence_signal(fence);
+			list_del(&userq_fence->link);
+
+			dma_fence_put(fence);
+		} else {
+			break;
+		}
+	}
+	spin_unlock(&fence_drv->fence_list_lock);
+
+	return true;
+}
+
+static const char *amdgpu_userq_fence_get_driver_name(struct dma_fence *f)
+{
+	return "amdgpu_userqueue_fence";
+}
+
+static const char *amdgpu_userq_fence_get_timeline_name(struct dma_fence *f)
+{
+	struct amdgpu_userq_fence *fence = to_amdgpu_userq_fence(f);
+
+	return fence->fence_drv->timeline_name;
+}
+
+static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
+{
+	struct amdgpu_userq_fence *fence = to_amdgpu_userq_fence(f);
+	struct amdgpu_userq_fence_driver *fence_drv = fence->fence_drv;
+	u64 rptr, wptr;
+
+	rptr = amdgpu_userq_fence_read(fence_drv);
+	wptr = fence->base.seqno;
+
+	if (rptr >= wptr)
+		return true;
+
+	return false;
+}
+
+static void amdgpu_userq_fence_free(struct rcu_head *rcu)
+{
+	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
+
+	kmem_cache_free(amdgpu_userq_fence_slab, to_amdgpu_userq_fence(f));
+}
+
+static void amdgpu_userq_fence_release(struct dma_fence *f)
+{
+	call_rcu(&f->rcu, amdgpu_userq_fence_free);
+}
+
+static const struct dma_fence_ops amdgpu_userq_fence_ops = {
+	.use_64bit_seqno = true,
+	.get_driver_name = amdgpu_userq_fence_get_driver_name,
+	.get_timeline_name = amdgpu_userq_fence_get_timeline_name,
+	.signaled = amdgpu_userq_fence_signaled,
+	.release = amdgpu_userq_fence_release,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
new file mode 100644
index 000000000000..230dd54b4cd3
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __AMDGPU_USERQ_FENCE_H__
+#define __AMDGPU_USERQ_FENCE_H__
+
+#include <linux/types.h>
+
+struct amdgpu_userq_fence {
+	struct dma_fence base;
+	/* userq fence lock */
+	spinlock_t lock;
+	struct list_head link;
+	struct amdgpu_userq_fence_driver *fence_drv;
+};
+
+struct amdgpu_userq_fence_driver {
+	struct kref refcount;
+	u64 gpu_addr;
+	u64 *cpu_addr;
+	u64 context;
+	/* fence list lock */
+	spinlock_t fence_list_lock;
+	struct list_head fences;
+	struct amdgpu_device *adev;
+	char timeline_name[TASK_COMM_LEN];
+};
+
+static const struct dma_fence_ops amdgpu_userq_fence_ops;
+
+int amdgpu_userq_fence_slab_init(void);
+void amdgpu_userq_fence_slab_fini(void);
+int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev, struct amdgpu_usermode_queue *userq);
+void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
+int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
+			      u64 seq, struct dma_fence **f);
+bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver *fence_drv);
+void amdgpu_userq_fence_driver_destroy(struct kref *ref);
+
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index 2f1aba1e9792..d4317b0f6487 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -23,6 +23,7 @@
 
 #include "amdgpu.h"
 #include "amdgpu_vm.h"
+#include "amdgpu_userq_fence.h"
 
 #define AMDGPU_USERQ_DOORBELL_INDEX (AMDGPU_NAVI10_DOORBELL_GFX_USERQUEUE_START + 4)
 
@@ -264,6 +265,8 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
     struct amdgpu_vm *vm = &fpriv->vm;
     struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
     struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
+    struct amdgpu_userq_fence_driver *fence_drv;
+    struct amdgpu_device *adev = uq_mgr->adev;
 
     pasid = vm->pasid;
     if (vm->pasid < 0) {
@@ -280,6 +283,19 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
         return -ENOMEM;
     }
 
+    fence_drv = kzalloc(sizeof(struct amdgpu_userq_fence_driver), GFP_KERNEL);
+    if (!fence_drv) {
+	    DRM_ERROR("Failed to allocate memory for fence driver\n");
+	    return -ENOMEM;
+    }
+
+    queue->fence_drv = fence_drv;
+    r = amdgpu_userq_fence_driver_get(adev, queue);
+    if (r) {
+	    DRM_ERROR("Failed to get fence driver\n");
+	    goto free_fence_drv;
+    }
+
     queue->vm = vm;
     queue->pasid = pasid;
     queue->wptr_gpu_addr = mqd_in->wptr_va;
@@ -339,6 +355,9 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
 free_qid:
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
 
+free_fence_drv:
+    amdgpu_userq_fence_driver_put(queue->fence_drv);
+
 free_queue:
     mutex_unlock(&uq_mgr->userq_mutex);
     kfree(queue);
@@ -363,6 +382,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
     amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
     amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
     list_del(&queue->userq_node);
+    amdgpu_userq_fence_driver_put(queue->fence_drv);
     mutex_unlock(&uq_mgr->userq_mutex);
     kfree(queue);
 }
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index bda27583b58c..cf7264df8c46 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -73,6 +73,8 @@ struct amdgpu_usermode_queue {
 	struct amdgpu_vm    	*vm;
 	struct amdgpu_userq_mgr *userq_mgr;
 	struct list_head 	userq_node;
+
+	struct amdgpu_userq_fence_driver *fence_drv;
 };
 
 int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/6] drm/amdgpu: Add mqd support for the fence address
  2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 1/6] drm/amdgpu: Implement a new 64bit sequence memory driver Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver Arunpravin Paneer Selvam
@ 2023-02-26 16:54 ` Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 4/6] drm/amdgpu: UAPI headers for userqueue Secure semaphore Arunpravin Paneer Selvam
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

- Add a field in struct v11_gfx_mqd for userqueue
  fence address.

- Assign fence gpu VA address to the userqueue mqd
  fence address fields.

v2: Remove the mask and replace with lower_32_bits (Christian)

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c | 3 +++
 drivers/gpu/drm/amd/include/v11_structs.h                 | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
index d2e5a42e1f75..b8943e6aea22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
@@ -83,6 +83,9 @@ amdgpu_userq_gfx_v11_mqd_create(struct amdgpu_userq_mgr *uq_mgr, struct amdgpu_u
     mqd->cp_rb_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
     mqd->cp_rb_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
 
+    mqd->fenceaddress_lo = lower_32_bits(queue->fence_drv->gpu_addr);
+    mqd->fenceaddress_hi = upper_32_bits(queue->fence_drv->gpu_addr);
+
     /* set up the gfx_hqd_control, similar as CP_RB0_CNTL */
     rb_bufsz = order_base_2(queue->queue_size / 4) - 1;
     tmp = RREG32_SOC15(GC, 0, regCP_GFX_HQD_CNTL);
diff --git a/drivers/gpu/drm/amd/include/v11_structs.h b/drivers/gpu/drm/amd/include/v11_structs.h
index f8008270f813..797ce6a1e56e 100644
--- a/drivers/gpu/drm/amd/include/v11_structs.h
+++ b/drivers/gpu/drm/amd/include/v11_structs.h
@@ -535,8 +535,8 @@ struct v11_gfx_mqd {
 	uint32_t reserved_507; // offset: 507  (0x1FB)
 	uint32_t reserved_508; // offset: 508  (0x1FC)
 	uint32_t reserved_509; // offset: 509  (0x1FD)
-	uint32_t reserved_510; // offset: 510  (0x1FE)
-	uint32_t reserved_511; // offset: 511  (0x1FF)
+	uint32_t fenceaddress_lo; // offset: 510  (0x1FE)
+	uint32_t fenceaddress_hi; // offset: 511  (0x1FF)
 };
 
 struct v11_sdma_mqd {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/6] drm/amdgpu: UAPI headers for userqueue Secure semaphore
  2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
                   ` (2 preceding siblings ...)
  2023-02-26 16:54 ` [PATCH 3/6] drm/amdgpu: Add mqd support for the fence address Arunpravin Paneer Selvam
@ 2023-02-26 16:54 ` Arunpravin Paneer Selvam
  2023-02-27 12:52   ` Christian König
  2023-02-26 16:54 ` [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions Arunpravin Paneer Selvam
  2023-02-26 16:54 ` [PATCH 6/6] drm/amdgpu: Enable userqueue fence interrupt handling support Arunpravin Paneer Selvam
  5 siblings, 1 reply; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

 - Add UAPI header support for userqueue Secure semaphore

   v2: (Christian)
     - Add bo handles,bo flags and padding fields.
     - Include value/va in a combined array.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
 .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c |  1 +
 include/uapi/drm/amdgpu_drm.h                 | 46 +++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
index b8943e6aea22..5cb255a39732 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
@@ -22,6 +22,7 @@
  */
 #include "amdgpu.h"
 #include "amdgpu_userqueue.h"
+#include "amdgpu_userq_fence.h"
 #include "v11_structs.h"
 #include "amdgpu_mes.h"
 #include "mes_api_def.h"
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 2d94cca566e0..bd37c715f5a7 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -56,6 +56,8 @@ extern "C" {
 #define DRM_AMDGPU_SCHED		0x15
 #define DRM_AMDGPU_USERQ		0x16
 #define DRM_AMDGPU_USERQ_DOORBELL_RING		0x17
+#define DRM_AMDGPU_USERQ_SIGNAL		0x18
+#define DRM_AMDGPU_USERQ_WAIT		0x19
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -75,6 +77,8 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
 #define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
 #define DRM_IOCTL_AMDGPU_USERQ_DOORBELL_RING		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_DOORBELL_RING, struct drm_amdgpu_db_ring)
+#define DRM_IOCTL_AMDGPU_USERQ_SIGNAL	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_SIGNAL, struct drm_amdgpu_userq_signal)
+#define DRM_IOCTL_AMDGPU_USERQ_WAIT	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_WAIT, struct drm_amdgpu_userq_wait)
 
 /**
  * DOC: memory domains
@@ -361,6 +365,48 @@ union drm_amdgpu_userq {
 	struct drm_amdgpu_userq_out out;
 };
 
+/* userq signal/wait ioctl */
+struct drm_amdgpu_userq_signal {
+	/** Queue ID */
+	__u32	queue_id;
+	/** Flags */
+	__u32   flags;
+	/** Sync obj handle */
+	__u32   handle;
+	__u32	pad;
+	/* Sync obj timeline */
+	__u64	point;
+	/** number of BO handles */
+	__u64   num_bo_handles;
+	/** array of BO handles */
+	__u64   bo_handles_array;
+	/** bo flags */
+	__u32 bo_flags;
+};
+
+struct drm_amdgpu_userq_fence_info {
+	__u64	va;
+	__u64	value;
+};
+
+struct drm_amdgpu_userq_wait {
+	/** Flags */
+	__u32   flags;
+	/** array of Sync obj handles */
+	__u64   handles;
+	__u32   pad;
+	/** number of Sync obj handles */
+	__u64	count_handles;
+	/** number of BO handles */
+	__u64	num_bo_handles;
+	/** array of BO handles */
+	__u64   bo_handles_array;
+	/** bo flags */
+	__u32 	bo_wait_flags;
+	/** array of addr/values */
+	__u64	userq_fence_info;
+};
+
 /* vm ioctl */
 #define AMDGPU_VM_OP_RESERVE_VMID	1
 #define AMDGPU_VM_OP_UNRESERVE_VMID	2
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions
  2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
                   ` (3 preceding siblings ...)
  2023-02-26 16:54 ` [PATCH 4/6] drm/amdgpu: UAPI headers for userqueue Secure semaphore Arunpravin Paneer Selvam
@ 2023-02-26 16:54 ` Arunpravin Paneer Selvam
  2023-02-27 12:59   ` Christian König
  2023-02-26 16:54 ` [PATCH 6/6] drm/amdgpu: Enable userqueue fence interrupt handling support Arunpravin Paneer Selvam
  5 siblings, 1 reply; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

This patch introduces new IOCTL for userqueue secure semaphore.

The signal IOCTL called from userspace application creates a drm
syncobj and array of bo GEM handles and passed in as parameter to
the driver to install the fence into it.

The wait IOCTL gets an array of drm syncobjs, finds the fences
attached to the drm syncobjs and obtain the array of
memory_address/fence_value combintion which are returned to
userspace.

v2: Worked on review comments from Christian for the following
    modifications

    - Install fence into GEM BO object.
    - Lock all BO's using the dma resv subsystem
    - Reorder the sequence in signal IOCTL function.
    - Get write pointer from the shadow wptr
    - use userq_fence to fetch the va/value in wait IOCTL.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   3 +
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 258 ++++++++++++++++++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |   1 +
 5 files changed, 270 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 1c3eba2d0390..255d73795493 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -964,6 +964,8 @@ struct amdgpu_device {
 	struct amdgpu_mes               mes;
 	struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
 
+	struct amdgpu_userq_mgr         *userq_mgr;
+
 	/* df */
 	struct amdgpu_df                df;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6b7ac1ebd04c..66a7304fabe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2752,6 +2752,9 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_DOORBELL_RING, amdgpu_userq_doorbell_ring_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 609a7328e9a6..26fd1d4f758a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -249,3 +249,261 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops = {
 	.signaled = amdgpu_userq_fence_signaled,
 	.release = amdgpu_userq_fence_release,
 };
+
+static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
+					struct amdgpu_usermode_queue *queue,
+					u64 *wptr)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_bo_va_mapping *mapping;
+	struct amdgpu_vm *vm = &fpriv->vm;
+	struct amdgpu_bo *bo;
+	u64 *ptr;
+	int r;
+
+	mapping = amdgpu_vm_bo_lookup_mapping(vm, queue->wptr_gpu_addr >> PAGE_SHIFT);
+	if (!mapping)
+		return -EINVAL;
+
+	bo = mapping->bo_va->base.bo;
+	r = amdgpu_bo_kmap(bo, (void **)&ptr);
+	if (r) {
+		DRM_ERROR("Failed mapping the userqueue wptr bo");
+		return r;
+	}
+
+	*wptr = le64_to_cpu(*ptr);
+
+	return 0;
+}
+
+int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *filp)
+{
+	struct drm_amdgpu_userq_signal *args = data;
+	struct amdgpu_device *adev = drm_to_adev(dev);
+	struct amdgpu_userq_mgr *userq_mgr = adev->userq_mgr;
+	struct amdgpu_usermode_queue *queue;
+	struct drm_syncobj *syncobj = NULL;
+	struct drm_gem_object **gobj;
+	u64 num_bo_handles, wptr;
+	struct dma_fence *fence;
+	u32 *bo_handles;
+	bool shared;
+	int r, i;
+
+	/* Retrieve the user queue */
+	queue = idr_find(&userq_mgr->userq_idr, args->queue_id);
+	if (!queue)
+		return -ENOENT;
+
+	r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
+	if (r)
+		return -EINVAL;
+
+	/* Find Syncobj if any */
+	syncobj = drm_syncobj_find(filp, args->handle);
+
+	/* Array of bo handles */
+	num_bo_handles = args->num_bo_handles;
+	bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), GFP_KERNEL);
+	if (bo_handles == NULL)
+		return -ENOMEM;
+
+	if (copy_from_user(bo_handles, u64_to_user_ptr(args->bo_handles_array),
+			   sizeof(u32) * num_bo_handles)) {
+		r = -EFAULT;
+		goto err_free_handles;
+	}
+
+	/* Array of GEM object handles */
+	gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
+	if (gobj == NULL) {
+		r = -ENOMEM;
+		goto err_free_handles;
+	}
+
+	for (i = 0; i < num_bo_handles; i++) {
+		/* Retrieve GEM object */
+		gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
+		if (!gobj[i]) {
+			r = -ENOENT;
+			goto err_put_gobj;
+		}
+
+		dma_resv_lock(gobj[i]->resv, NULL);
+		r = dma_resv_reserve_fences(gobj[i]->resv, 1);
+		if (r) {
+			dma_resv_unlock(gobj[i]->resv);
+			goto err_put_gobj;
+		}
+		dma_resv_unlock(gobj[i]->resv);
+	}
+
+	/* Create a new fence */
+	r = amdgpu_userq_fence_create(queue, wptr, &fence);
+	if (!fence)
+		goto err_put_gobj;
+
+	/* Add the created fence to syncobj/BO's */
+	if (syncobj)
+		drm_syncobj_replace_fence(syncobj, fence);
+
+	shared = args->bo_flags & AMDGPU_USERQ_BO_READ;
+	for (i = 0; i < num_bo_handles; i++) {
+		dma_resv_lock(gobj[i]->resv, NULL);
+		dma_resv_add_fence(gobj[i]->resv, fence, shared ?
+				   DMA_RESV_USAGE_READ :
+				   DMA_RESV_USAGE_WRITE);
+		dma_resv_unlock(gobj[i]->resv);
+	}
+
+	for (i = 0; i < num_bo_handles; i++)
+		drm_gem_object_put(gobj[i]);
+
+	dma_fence_put(fence);
+
+	/* Free all handles */
+	kfree(bo_handles);
+	kfree(gobj);
+
+	return 0;
+
+err_put_gobj:
+	while (i-- > 0)
+		drm_gem_object_put(gobj[i]);
+	kfree(gobj);
+err_free_handles:
+	kfree(bo_handles);
+
+	return r;
+}
+
+int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
+			    struct drm_file *filp)
+{
+	struct drm_amdgpu_userq_fence_info *fence_info = NULL;
+	struct drm_amdgpu_userq_wait *args = data;
+	struct amdgpu_userq_fence *userq_fence;
+	u32 *syncobj_handles, *bo_handles;
+	u64 num_syncobj, num_bo_handles;
+	struct drm_gem_object **gobj;
+	struct dma_fence *fence;
+	bool wait_flag;
+	int r, i, tmp;
+
+	/* Array of Syncobj handles */
+	num_syncobj = args->count_handles;
+	syncobj_handles = kmalloc_array(num_syncobj, sizeof(*syncobj_handles), GFP_KERNEL);
+	if (!syncobj_handles)
+		return -ENOMEM;
+
+	if (copy_from_user(syncobj_handles, u64_to_user_ptr(args->handles),
+			   sizeof(u32) * num_syncobj)) {
+		r = -EFAULT;
+		goto err_free_syncobj_handles;
+	}
+
+	/* Array of bo handles */
+	num_bo_handles = args->num_bo_handles;
+	bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), GFP_KERNEL);
+	if (!bo_handles)
+		goto err_free_syncobj_handles;
+
+	if (copy_from_user(bo_handles, u64_to_user_ptr(args->bo_handles_array),
+			   sizeof(u32) * num_bo_handles)) {
+		r = -EFAULT;
+		goto err_free_bo_handles;
+	}
+
+	/* Array of fence gpu address */
+	fence_info = kmalloc_array(num_syncobj + num_bo_handles, sizeof(*fence_info), GFP_KERNEL);
+	if (!fence_info) {
+		r = -ENOMEM;
+		goto err_free_bo_handles;
+	}
+
+	/* Retrieve syncobj's fence */
+	for (i = 0; i < num_syncobj; i++) {
+		r = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, &fence);
+		if (r) {
+			DRM_ERROR("syncobj %u failed to find the fence!\n", syncobj_handles[i]);
+			r = -ENOENT;
+			goto err_free_fence_info;
+		}
+
+		/* Store drm syncobj's gpu va address and value */
+		userq_fence = to_amdgpu_userq_fence(fence);
+		fence_info[i].va = userq_fence->fence_drv->gpu_addr;
+		fence_info[i].value = fence->seqno;
+		dma_fence_put(fence);
+	}
+
+	tmp = i;
+
+	/* Array of GEM object handles */
+	gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
+	if (gobj == NULL) {
+		r = -ENOMEM;
+		goto err_free_fence_info;
+	}
+
+	/* Retrieve GEM objects's fence */
+	wait_flag = args->bo_wait_flags & AMDGPU_USERQ_BO_READ;
+	for (i = 0; i < num_bo_handles; i++, tmp++) {
+		struct dma_fence *bo_fence;
+
+		gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
+		if (!gobj[i]) {
+			r = -ENOENT;
+			goto err_put_gobj;
+		}
+
+		dma_resv_lock(gobj[i]->resv, NULL);
+		r = dma_resv_get_singleton(gobj[i]->resv,
+					   wait_flag ?
+					   DMA_RESV_USAGE_READ :
+					   DMA_RESV_USAGE_WRITE,
+					   &bo_fence);
+		if (r) {
+			dma_resv_unlock(gobj[i]->resv);
+			goto err_put_gobj;
+		}
+		dma_resv_unlock(gobj[i]->resv);
+		drm_gem_object_put(gobj[i]);
+
+		/* Store GEM objects's gpu va address and value */
+		userq_fence = to_amdgpu_userq_fence(bo_fence);
+		fence_info[tmp].va = userq_fence->fence_drv->gpu_addr;
+		fence_info[tmp].value = bo_fence->seqno;
+		dma_fence_put(bo_fence);
+	}
+
+	if (copy_to_user(u64_to_user_ptr(args->userq_fence_info),
+	    fence_info, sizeof(fence_info))) {
+		r = -EFAULT;
+		goto err_free_gobj;
+	}
+
+	/* Free all handles */
+	kfree(syncobj_handles);
+	kfree(bo_handles);
+	kfree(fence_info);
+	kfree(gobj);
+
+	return 0;
+
+err_put_gobj:
+	while (i-- > 0)
+		drm_gem_object_put(gobj[i]);
+err_free_gobj:
+	kfree(gobj);
+err_free_fence_info:
+	kfree(fence_info);
+err_free_bo_handles:
+	kfree(bo_handles);
+err_free_syncobj_handles:
+	kfree(syncobj_handles);
+
+	return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
index 230dd54b4cd3..999d1e2baff5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
@@ -27,6 +27,8 @@
 
 #include <linux/types.h>
 
+#define AMDGPU_USERQ_BO_READ	0x1
+
 struct amdgpu_userq_fence {
 	struct dma_fence base;
 	/* userq fence lock */
@@ -57,5 +59,9 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
 			      u64 seq, struct dma_fence **f);
 bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver *fence_drv);
 void amdgpu_userq_fence_driver_destroy(struct kref *ref);
+int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *filp);
+int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
+			    struct drm_file *filp);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index d4317b0f6487..4d3d6777a178 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -457,6 +457,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
     idr_init_base(&userq_mgr->userq_idr, 1);
     INIT_LIST_HEAD(&userq_mgr->userq_list);
     userq_mgr->adev = adev;
+    adev->userq_mgr = userq_mgr;
 
     r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
     if (r) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] drm/amdgpu: Enable userqueue fence interrupt handling support
  2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
                   ` (4 preceding siblings ...)
  2023-02-26 16:54 ` [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions Arunpravin Paneer Selvam
@ 2023-02-26 16:54 ` Arunpravin Paneer Selvam
  5 siblings, 0 replies; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-26 16:54 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, christian.koenig, Arunpravin Paneer Selvam

- Added support to handle the userqueue protected fence signal
  hardware interrupt.

- Create a hash table which maps va address to the fence driver.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  1 +
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   |  3 +++
 .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 20 ++++++++++++++++++-
 5 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 255d73795493..3380bf66dd66 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -965,6 +965,7 @@ struct amdgpu_device {
 	struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
 
 	struct amdgpu_userq_mgr         *userq_mgr;
+	DECLARE_HASHTABLE(userq_fence_table, 5);
 
 	/* df */
 	struct amdgpu_df                df;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 88097d12ced3..4caed7cc848d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3595,6 +3595,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	mutex_init(&adev->mn_lock);
 	mutex_init(&adev->virt.vf_errors.lock);
 	hash_init(adev->mn_hash);
+	hash_init(adev->userq_fence_table);
 	mutex_init(&adev->psp.mutex);
 	mutex_init(&adev->notifier_lock);
 	mutex_init(&adev->pm.stable_pstate_ctx_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 26fd1d4f758a..91c0ab2c9370 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -90,6 +90,9 @@ int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev,
 	INIT_LIST_HEAD(&fence_drv->fences);
 	spin_lock_init(&fence_drv->fence_list_lock);
 
+	hash_add(adev->userq_fence_table, &fence_drv->node,
+		 fence_drv->gpu_addr);
+
 	fence_drv->adev = adev;
 	fence_drv->context = dma_fence_context_alloc(1);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
index 999d1e2baff5..ceab0ccf68a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
@@ -39,6 +39,7 @@ struct amdgpu_userq_fence {
 
 struct amdgpu_userq_fence_driver {
 	struct kref refcount;
+	struct hlist_node node;
 	u64 gpu_addr;
 	u64 *cpu_addr;
 	u64 context;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index a56c6e106d00..425ecf58b343 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -30,6 +30,7 @@
 #include "amdgpu_psp.h"
 #include "amdgpu_smu.h"
 #include "amdgpu_atomfirmware.h"
+#include "amdgpu_userq_fence.h"
 #include "imu_v11_0.h"
 #include "soc21.h"
 #include "nvd.h"
@@ -5870,10 +5871,27 @@ static int gfx_v11_0_eop_irq(struct amdgpu_device *adev,
 	u8 me_id, pipe_id, queue_id;
 	struct amdgpu_ring *ring;
 	uint32_t mes_queue_id = entry->src_data[0];
+	struct hlist_node *tmp;
+	struct amdgpu_userq_fence_driver *f;
+	u32 upper32 = entry->src_data[1];
+	u32 lower32 = entry->src_data[2];
+	u64 fence_address = ((u64)upper32 << 32) | lower32;
 
 	DRM_DEBUG("IH: CP EOP\n");
 
-	if (adev->enable_mes && (mes_queue_id & AMDGPU_FENCE_MES_QUEUE_FLAG)) {
+	if (adev->enable_mes && fence_address) {
+		hash_for_each_safe(adev->userq_fence_table, i, tmp, f, node) {
+			if (fence_address == f->gpu_addr) {
+				hash_del(&f->node);
+				break;
+			}
+		}
+
+		if (f) {
+			DRM_DEBUG("user queue fence address %llu\n", fence_address);
+			amdgpu_userq_fence_process(f);
+		}
+	} else if (adev->enable_mes && (mes_queue_id & AMDGPU_FENCE_MES_QUEUE_FLAG)) {
 		struct amdgpu_mes_queue *queue;
 
 		mes_queue_id &= AMDGPU_FENCE_MES_QUEUE_ID_MASK;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver
  2023-02-26 16:54 ` [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver Arunpravin Paneer Selvam
@ 2023-02-27 12:42   ` Christian König
  2023-02-28  8:45     ` Arunpravin Paneer Selvam
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-27 12:42 UTC (permalink / raw)
  To: Arunpravin Paneer Selvam, amd-gfx; +Cc: alexander.deucher

Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
> Developed a userqueue fence driver for the userqueue process shared
> BO synchronization.
>
> Create a dma fence having write pointer as the seqno and allocate a
> seq64 memory for each user queue process and feed this memory address
> into the firmware/hardware, thus the firmware writes the read pointer
> into the given address when the process completes it execution.
> Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting
> process to consume the buffers.
>
> v2: Worked on review comments from Christian for the following
>      modifications
>
>      - Add wptr as sequence number into the fence
>      - Add a reference count for the fence driver
>      - Add dma_fence_put below the list_del as it might frees the userq fence.
>      - Trim unnecessary code in interrupt handler.
>      - Check dma fence signaled state in dma fence creation function for a
>        potential problem of hardware completing the job processing beforehand.
>      - Add necessary locks.
>      - Create a list and process all the unsignaled fences.
>      - clean up fences in destroy function.
>      - implement .enabled callback function

A few more nit picks below, but from the technical side that looks 
mostly clean.

>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 +
>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 251 ++++++++++++++++++
>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  61 +++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  20 ++
>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
>   6 files changed, 341 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index a239533a895f..ea09273b585f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   	amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>   	amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>   	amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
> -	amdgpu_ring_mux.o amdgpu_seq64.o
> +	amdgpu_ring_mux.o amdgpu_seq64.o amdgpu_userq_fence.o
>   
>   amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index bd3462d0da5f..6b7ac1ebd04c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -53,6 +53,7 @@
>   #include "amdgpu_xgmi.h"
>   #include "amdgpu_reset.h"
>   #include "amdgpu_userqueue.h"
> +#include "amdgpu_userq_fence.h"
>   
>   /*
>    * KMS wrapper.
> @@ -2827,6 +2828,10 @@ static int __init amdgpu_init(void)
>   	if (r)
>   		goto error_fence;
>   
> +	r = amdgpu_userq_fence_slab_init();
> +	if (r)
> +		goto error_fence;
> +
>   	DRM_INFO("amdgpu kernel modesetting enabled.\n");
>   	amdgpu_register_atpx_handler();
>   	amdgpu_acpi_detect();
> @@ -2851,6 +2856,7 @@ static void __exit amdgpu_exit(void)
>   	amdgpu_unregister_atpx_handler();
>   	amdgpu_sync_fini();
>   	amdgpu_fence_slab_fini();
> +	amdgpu_userq_fence_slab_fini();
>   	mmu_notifier_synchronize();
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> new file mode 100644
> index 000000000000..609a7328e9a6
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -0,0 +1,251 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright 2023 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/kref.h>
> +#include <linux/slab.h>
> +
> +#include <drm/drm_syncobj.h>
> +
> +#include "amdgpu.h"
> +#include "amdgpu_userq_fence.h"
> +#include "amdgpu_userqueue.h"
> +
> +static struct kmem_cache *amdgpu_userq_fence_slab;
> +
> +int amdgpu_userq_fence_slab_init(void)
> +{
> +	amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
> +						    sizeof(struct amdgpu_userq_fence),
> +						    0,
> +						    SLAB_HWCACHE_ALIGN,
> +						    NULL);
> +	if (!amdgpu_userq_fence_slab)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void amdgpu_userq_fence_slab_fini(void)
> +{
> +	rcu_barrier();
> +	kmem_cache_destroy(amdgpu_userq_fence_slab);
> +}
> +
> +static inline struct amdgpu_userq_fence *to_amdgpu_userq_fence(struct dma_fence *f)
> +{
> +	struct amdgpu_userq_fence *__f = container_of(f, struct amdgpu_userq_fence, base);
> +
> +	if (!__f)
> +		return NULL;
> +
> +	if (__f->base.ops == &amdgpu_userq_fence_ops)
> +		return __f;
> +
> +	return NULL;
> +}
> +
> +static u64 amdgpu_userq_fence_read(struct amdgpu_userq_fence_driver *fence_drv)
> +{
> +	return le64_to_cpu(*fence_drv->cpu_addr);
> +}
> +
> +int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev,
> +				  struct amdgpu_usermode_queue *userq)

Looks like you misunderstood me a bit.

The usual naming convention in Linux for reference counted objects is 
like that:

_alloc() or similar for a function creating the object (the one which 
has the kref_init).
_get() for the function grabbing a reference.
_put() for the function releasing a reference.
_free() or _destroy() or similar for the function which is called by 
_put() to cleanup.

> +{
> +	struct amdgpu_userq_fence_driver *fence_drv;
> +	int r;
> +
> +	fence_drv = userq->fence_drv;
> +	if (!fence_drv)
> +		return -EINVAL;
> +
> +	/* Acquire seq64 memory */
> +	r = amdgpu_seq64_get(adev, &fence_drv->gpu_addr,
> +			     &fence_drv->cpu_addr);
> +	if (r)
> +		return -ENOMEM;
> +	
> +	kref_init(&fence_drv->refcount);
> +	INIT_LIST_HEAD(&fence_drv->fences);
> +	spin_lock_init(&fence_drv->fence_list_lock);
> +
> +	fence_drv->adev = adev;
> +	fence_drv->context = dma_fence_context_alloc(1);
> +
> +	get_task_comm(fence_drv->timeline_name, current);
> +
> +	return 0;
> +}
> +
> +void amdgpu_userq_fence_driver_destroy(struct kref *ref)
> +{
> +	struct amdgpu_userq_fence_driver *fence_drv = container_of(ref,
> +					 struct amdgpu_userq_fence_driver,
> +					 refcount);
> +	struct amdgpu_device *adev = fence_drv->adev;
> +	struct amdgpu_userq_fence *fence, *tmp;
> +	struct dma_fence *f;
> +	
> +	spin_lock(&fence_drv->fence_list_lock);
> +	list_for_each_entry_safe(fence, tmp, &fence_drv->fences, link) {
> +		f = &fence->base;
> +		
> +		if (!dma_fence_is_signaled(f)) {
> +			dma_fence_set_error(f, -ECANCELED);
> +			dma_fence_signal(f);
> +		}
> +		
> +		list_del(&fence->link);
> +		dma_fence_put(f);
> +	}
> +	
> +	WARN_ON_ONCE(!list_empty(&fence_drv->fences));
> +	spin_unlock(&fence_drv->fence_list_lock);
> +	
> +	/* Free seq64 memory */
> +	amdgpu_seq64_free(adev, fence_drv->gpu_addr);
> +	kfree(fence_drv);
> +}
> +
> +void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv)
> +{
> +	kref_put(&fence_drv->refcount, amdgpu_userq_fence_driver_destroy);
> +}
> +
> +int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> +			      u64 seq, struct dma_fence **f)
> +{
> +	struct amdgpu_userq_fence_driver *fence_drv;
> +	struct amdgpu_userq_fence *userq_fence;
> +	struct dma_fence *fence;
> +
> +	fence_drv = userq->fence_drv;
> +	if (!fence_drv)
> +		return -EINVAL;
> +
> +	userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, GFP_ATOMIC);
> +	if (!userq_fence)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&userq_fence->lock);
> +	INIT_LIST_HEAD(&userq_fence->link);
> +	fence = &userq_fence->base;
> +	userq_fence->fence_drv = fence_drv;
> +
> +	dma_fence_init(fence, &amdgpu_userq_fence_ops, &userq_fence->lock,
> +		       fence_drv->context, seq);
> +
> +	kref_get(&fence_drv->refcount);
> +
> +	spin_lock(&fence_drv->fence_list_lock);
> +	/* Check if hardware has already processed the job */
> +	if (!dma_fence_is_signaled(fence)) {
> +		dma_fence_get(fence);
> +		list_add_tail(&userq_fence->link, &fence_drv->fences);
> +		dma_fence_put(fence);
> +	}
> +	spin_unlock(&fence_drv->fence_list_lock);	

> +	/* Release the fence driver reference */
> +	amdgpu_userq_fence_driver_put(fence_drv);

Hui? That doesn't make much sense. We should keep the reference as long 
as the fence exists or at least as long as it isn't signaled (the later 
is probably better, but tricky to get right).

> +
> +	*f = fence;
> +
> +	return 0;
> +}
> +
> +bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver *fence_drv)

Maybe name that function amdgpu_userq_fence_driver_process() and move 
that up a bit to group together the functions dealing with the 
fence_driver and the functions dealing with the fence.

> +{
> +	struct amdgpu_userq_fence *userq_fence, *tmp;
> +	struct dma_fence *fence;
> +	u64 rptr;
> +
> +	if (!fence_drv)
> +		return false;
> +
> +	rptr = amdgpu_userq_fence_read(fence_drv);
> +
> +	spin_lock(&fence_drv->fence_list_lock);
> +	list_for_each_entry_safe(userq_fence, tmp, &fence_drv->fences, link) {
> +		fence = &userq_fence->base;
> +
> +		if (rptr >= fence->seqno) {
> +			dma_fence_signal(fence);
> +			list_del(&userq_fence->link);
> +
> +			dma_fence_put(fence);
> +		} else {
> +			break;
> +		}
> +	}
> +	spin_unlock(&fence_drv->fence_list_lock);
> +
> +	return true;

BTW: What's the return value good for? Could we drop that?

Regards,
Christian.

> +}
> +
> +static const char *amdgpu_userq_fence_get_driver_name(struct dma_fence *f)
> +{
> +	return "amdgpu_userqueue_fence";
> +}
> +
> +static const char *amdgpu_userq_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	struct amdgpu_userq_fence *fence = to_amdgpu_userq_fence(f);
> +
> +	return fence->fence_drv->timeline_name;
> +}
> +
> +static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
> +{
> +	struct amdgpu_userq_fence *fence = to_amdgpu_userq_fence(f);
> +	struct amdgpu_userq_fence_driver *fence_drv = fence->fence_drv;
> +	u64 rptr, wptr;
> +
> +	rptr = amdgpu_userq_fence_read(fence_drv);
> +	wptr = fence->base.seqno;
> +
> +	if (rptr >= wptr)
> +		return true;
> +
> +	return false;
> +}
> +
> +static void amdgpu_userq_fence_free(struct rcu_head *rcu)
> +{
> +	struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
> +
> +	kmem_cache_free(amdgpu_userq_fence_slab, to_amdgpu_userq_fence(f));
> +}
> +
> +static void amdgpu_userq_fence_release(struct dma_fence *f)
> +{
> +	call_rcu(&f->rcu, amdgpu_userq_fence_free);
> +}
> +
> +static const struct dma_fence_ops amdgpu_userq_fence_ops = {
> +	.use_64bit_seqno = true,
> +	.get_driver_name = amdgpu_userq_fence_get_driver_name,
> +	.get_timeline_name = amdgpu_userq_fence_get_timeline_name,
> +	.signaled = amdgpu_userq_fence_signaled,
> +	.release = amdgpu_userq_fence_release,
> +};
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> new file mode 100644
> index 000000000000..230dd54b4cd3
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright 2023 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __AMDGPU_USERQ_FENCE_H__
> +#define __AMDGPU_USERQ_FENCE_H__
> +
> +#include <linux/types.h>
> +
> +struct amdgpu_userq_fence {
> +	struct dma_fence base;
> +	/* userq fence lock */
> +	spinlock_t lock;
> +	struct list_head link;
> +	struct amdgpu_userq_fence_driver *fence_drv;
> +};
> +
> +struct amdgpu_userq_fence_driver {
> +	struct kref refcount;
> +	u64 gpu_addr;
> +	u64 *cpu_addr;
> +	u64 context;
> +	/* fence list lock */
> +	spinlock_t fence_list_lock;
> +	struct list_head fences;
> +	struct amdgpu_device *adev;
> +	char timeline_name[TASK_COMM_LEN];
> +};
> +
> +static const struct dma_fence_ops amdgpu_userq_fence_ops;
> +
> +int amdgpu_userq_fence_slab_init(void);
> +void amdgpu_userq_fence_slab_fini(void);
> +int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev, struct amdgpu_usermode_queue *userq);
> +void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver *fence_drv);
> +int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> +			      u64 seq, struct dma_fence **f);
> +bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver *fence_drv);
> +void amdgpu_userq_fence_driver_destroy(struct kref *ref);
> +
> +#endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index 2f1aba1e9792..d4317b0f6487 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -23,6 +23,7 @@
>   
>   #include "amdgpu.h"
>   #include "amdgpu_vm.h"
> +#include "amdgpu_userq_fence.h"
>   
>   #define AMDGPU_USERQ_DOORBELL_INDEX (AMDGPU_NAVI10_DOORBELL_GFX_USERQUEUE_START + 4)
>   
> @@ -264,6 +265,8 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>       struct amdgpu_vm *vm = &fpriv->vm;
>       struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>       struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
> +    struct amdgpu_userq_fence_driver *fence_drv;
> +    struct amdgpu_device *adev = uq_mgr->adev;
>   
>       pasid = vm->pasid;
>       if (vm->pasid < 0) {
> @@ -280,6 +283,19 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>           return -ENOMEM;
>       }
>   
> +    fence_drv = kzalloc(sizeof(struct amdgpu_userq_fence_driver), GFP_KERNEL);
> +    if (!fence_drv) {
> +	    DRM_ERROR("Failed to allocate memory for fence driver\n");
> +	    return -ENOMEM;
> +    }
> +
> +    queue->fence_drv = fence_drv;
> +    r = amdgpu_userq_fence_driver_get(adev, queue);
> +    if (r) {
> +	    DRM_ERROR("Failed to get fence driver\n");
> +	    goto free_fence_drv;
> +    }
> +
>       queue->vm = vm;
>       queue->pasid = pasid;
>       queue->wptr_gpu_addr = mqd_in->wptr_va;
> @@ -339,6 +355,9 @@ static int amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq
>   free_qid:
>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>   
> +free_fence_drv:
> +    amdgpu_userq_fence_driver_put(queue->fence_drv);
> +
>   free_queue:
>       mutex_unlock(&uq_mgr->userq_mutex);
>       kfree(queue);
> @@ -363,6 +382,7 @@ static void amdgpu_userqueue_destroy(struct drm_file *filp, int queue_id)
>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>       list_del(&queue->userq_node);
> +    amdgpu_userq_fence_driver_put(queue->fence_drv);
>       mutex_unlock(&uq_mgr->userq_mutex);
>       kfree(queue);
>   }
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index bda27583b58c..cf7264df8c46 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -73,6 +73,8 @@ struct amdgpu_usermode_queue {
>   	struct amdgpu_vm    	*vm;
>   	struct amdgpu_userq_mgr *userq_mgr;
>   	struct list_head 	userq_node;
> +
> +	struct amdgpu_userq_fence_driver *fence_drv;
>   };
>   
>   int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] drm/amdgpu: UAPI headers for userqueue Secure semaphore
  2023-02-26 16:54 ` [PATCH 4/6] drm/amdgpu: UAPI headers for userqueue Secure semaphore Arunpravin Paneer Selvam
@ 2023-02-27 12:52   ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2023-02-27 12:52 UTC (permalink / raw)
  To: Arunpravin Paneer Selvam, amd-gfx; +Cc: alexander.deucher

Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
>   - Add UAPI header support for userqueue Secure semaphore
>
>     v2: (Christian)
>       - Add bo handles,bo flags and padding fields.
>       - Include value/va in a combined array.
>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> ---
>   .../amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c |  1 +
>   include/uapi/drm/amdgpu_drm.h                 | 46 +++++++++++++++++++
>   2 files changed, 47 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> index b8943e6aea22..5cb255a39732 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_mqd_gfx_v11.c
> @@ -22,6 +22,7 @@
>    */
>   #include "amdgpu.h"
>   #include "amdgpu_userqueue.h"
> +#include "amdgpu_userq_fence.h"
>   #include "v11_structs.h"
>   #include "amdgpu_mes.h"
>   #include "mes_api_def.h"
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 2d94cca566e0..bd37c715f5a7 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -56,6 +56,8 @@ extern "C" {
>   #define DRM_AMDGPU_SCHED		0x15
>   #define DRM_AMDGPU_USERQ		0x16
>   #define DRM_AMDGPU_USERQ_DOORBELL_RING		0x17
> +#define DRM_AMDGPU_USERQ_SIGNAL		0x18
> +#define DRM_AMDGPU_USERQ_WAIT		0x19
>   
>   #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>   #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
> @@ -75,6 +77,8 @@ extern "C" {
>   #define DRM_IOCTL_AMDGPU_SCHED		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>   #define DRM_IOCTL_AMDGPU_USERQ		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ, union drm_amdgpu_userq)
>   #define DRM_IOCTL_AMDGPU_USERQ_DOORBELL_RING		DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_DOORBELL_RING, struct drm_amdgpu_db_ring)
> +#define DRM_IOCTL_AMDGPU_USERQ_SIGNAL	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_SIGNAL, struct drm_amdgpu_userq_signal)
> +#define DRM_IOCTL_AMDGPU_USERQ_WAIT	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_WAIT, struct drm_amdgpu_userq_wait)
>   
>   /**
>    * DOC: memory domains
> @@ -361,6 +365,48 @@ union drm_amdgpu_userq {
>   	struct drm_amdgpu_userq_out out;
>   };
>   
> +/* userq signal/wait ioctl */
> +struct drm_amdgpu_userq_signal {
> +	/** Queue ID */
> +	__u32	queue_id;
> +	/** Flags */
> +	__u32   flags;
> +	/** Sync obj handle */
> +	__u32   handle;

Maybe rename to obj_handle or even syncobj_handle.

> +	__u32	pad;
> +	/* Sync obj timeline */
> +	__u64	point;

Same prefix here, either obj_point or even better syncobj_point.

> +	/** number of BO handles */
> +	__u64   num_bo_handles;

This can be 32bit I think. And when you move the bo_flags after this 
field we don't even need a padding.

> +	/** array of BO handles */
> +	__u64   bo_handles_array;
> +	/** bo flags */
> +	__u32 bo_flags;
> +};
> +
> +struct drm_amdgpu_userq_fence_info {
> +	__u64	va;
> +	__u64	value;
> +};
> +
> +struct drm_amdgpu_userq_wait {
> +	/** Flags */
> +	__u32   flags;
> +	/** array of Sync obj handles */
> +	__u64   handles;
> +	__u32   pad;

That padding is misplaced as far as I can see.

> +	/** number of Sync obj handles */
> +	__u64	count_handles;

Better let's us use num_syncobj_handles here as well. And u32 should be 
sufficient.

If you re-arrange the fields you could also drop the padding.

> +	/** number of BO handles */
> +	__u64	num_bo_handles;

Again u32 is sufficient for the number of handles.

> +	/** array of BO handles */
> +	__u64   bo_handles_array;
> +	/** bo flags */
> +	__u32 	bo_wait_flags;
> +	/** array of addr/values */
> +	__u64	userq_fence_info;

We need a number for that as well. It can be that a BO is idle and has 
no fences and it can be that we have tons of fences on a single BO.

The usual semantics is that you call the kernel driver with the 
userq_fence_info==0 first to get how much space you need to reserve and 
then once more after you allocated enough.

See function sync_file_ioctl_fence_info() for a good example how to do this.

Regards,
Christian.


> +};
> +
>   /* vm ioctl */
>   #define AMDGPU_VM_OP_RESERVE_VMID	1
>   #define AMDGPU_VM_OP_UNRESERVE_VMID	2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions
  2023-02-26 16:54 ` [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions Arunpravin Paneer Selvam
@ 2023-02-27 12:59   ` Christian König
  2023-02-27 13:20     ` Arunpravin Paneer Selvam
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-27 12:59 UTC (permalink / raw)
  To: Arunpravin Paneer Selvam, amd-gfx; +Cc: alexander.deucher

Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
> This patch introduces new IOCTL for userqueue secure semaphore.
>
> The signal IOCTL called from userspace application creates a drm
> syncobj and array of bo GEM handles and passed in as parameter to
> the driver to install the fence into it.
>
> The wait IOCTL gets an array of drm syncobjs, finds the fences
> attached to the drm syncobjs and obtain the array of
> memory_address/fence_value combintion which are returned to
> userspace.
>
> v2: Worked on review comments from Christian for the following
>      modifications
>
>      - Install fence into GEM BO object.
>      - Lock all BO's using the dma resv subsystem
>      - Reorder the sequence in signal IOCTL function.
>      - Get write pointer from the shadow wptr
>      - use userq_fence to fetch the va/value in wait IOCTL.
>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   3 +
>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 258 ++++++++++++++++++
>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |   1 +
>   5 files changed, 270 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 1c3eba2d0390..255d73795493 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -964,6 +964,8 @@ struct amdgpu_device {
>   	struct amdgpu_mes               mes;
>   	struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
>   
> +	struct amdgpu_userq_mgr         *userq_mgr;
> +
>   	/* df */
>   	struct amdgpu_df                df;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6b7ac1ebd04c..66a7304fabe3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2752,6 +2752,9 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>   	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_DOORBELL_RING, amdgpu_userq_doorbell_ring_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> +
>   };
>   
>   static const struct drm_driver amdgpu_kms_driver = {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 609a7328e9a6..26fd1d4f758a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -249,3 +249,261 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops = {
>   	.signaled = amdgpu_userq_fence_signaled,
>   	.release = amdgpu_userq_fence_release,
>   };
> +
> +static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
> +					struct amdgpu_usermode_queue *queue,
> +					u64 *wptr)
> +{
> +	struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +	struct amdgpu_bo_va_mapping *mapping;
> +	struct amdgpu_vm *vm = &fpriv->vm;
> +	struct amdgpu_bo *bo;
> +	u64 *ptr;
> +	int r;
> +
> +	mapping = amdgpu_vm_bo_lookup_mapping(vm, queue->wptr_gpu_addr >> PAGE_SHIFT);
> +	if (!mapping)
> +		return -EINVAL;
> +
> +	bo = mapping->bo_va->base.bo;
> +	r = amdgpu_bo_kmap(bo, (void **)&ptr);

Oh, that's not something you can do that easily.

The BO must be reserved (locked) first if you want to call 
amdgpu_bo_kmap() on it.

> +	if (r) {
> +		DRM_ERROR("Failed mapping the userqueue wptr bo");
> +		return r;
> +	}
> +
> +	*wptr = le64_to_cpu(*ptr);
> +
> +	return 0;
> +}
> +
> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *filp)
> +{
> +	struct drm_amdgpu_userq_signal *args = data;
> +	struct amdgpu_device *adev = drm_to_adev(dev);
> +	struct amdgpu_userq_mgr *userq_mgr = adev->userq_mgr;
> +	struct amdgpu_usermode_queue *queue;
> +	struct drm_syncobj *syncobj = NULL;
> +	struct drm_gem_object **gobj;
> +	u64 num_bo_handles, wptr;
> +	struct dma_fence *fence;
> +	u32 *bo_handles;
> +	bool shared;
> +	int r, i;
> +
> +	/* Retrieve the user queue */
> +	queue = idr_find(&userq_mgr->userq_idr, args->queue_id);
> +	if (!queue)
> +		return -ENOENT;
> +
> +	r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
> +	if (r)
> +		return -EINVAL;
> +
> +	/* Find Syncobj if any */
> +	syncobj = drm_syncobj_find(filp, args->handle);
> +
> +	/* Array of bo handles */
> +	num_bo_handles = args->num_bo_handles;
> +	bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), GFP_KERNEL);
> +	if (bo_handles == NULL)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(bo_handles, u64_to_user_ptr(args->bo_handles_array),
> +			   sizeof(u32) * num_bo_handles)) {
> +		r = -EFAULT;
> +		goto err_free_handles;
> +	}
> +
> +	/* Array of GEM object handles */
> +	gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
> +	if (gobj == NULL) {
> +		r = -ENOMEM;
> +		goto err_free_handles;
> +	}
> +
> +	for (i = 0; i < num_bo_handles; i++) {
> +		/* Retrieve GEM object */
> +		gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
> +		if (!gobj[i]) {
> +			r = -ENOENT;
> +			goto err_put_gobj;
> +		}
> +
> +		dma_resv_lock(gobj[i]->resv, NULL);
> +		r = dma_resv_reserve_fences(gobj[i]->resv, 1);
> +		if (r) {
> +			dma_resv_unlock(gobj[i]->resv);
> +			goto err_put_gobj;
> +		}
> +		dma_resv_unlock(gobj[i]->resv);
> +	}
> +
> +	/* Create a new fence */
> +	r = amdgpu_userq_fence_create(queue, wptr, &fence);
> +	if (!fence)
> +		goto err_put_gobj;
> +
> +	/* Add the created fence to syncobj/BO's */
> +	if (syncobj)
> +		drm_syncobj_replace_fence(syncobj, fence);
> +
> +	shared = args->bo_flags & AMDGPU_USERQ_BO_READ;
> +	for (i = 0; i < num_bo_handles; i++) {
> +		dma_resv_lock(gobj[i]->resv, NULL);
> +		dma_resv_add_fence(gobj[i]->resv, fence, shared ?
> +				   DMA_RESV_USAGE_READ :
> +				   DMA_RESV_USAGE_WRITE);
> +		dma_resv_unlock(gobj[i]->resv);
> +	}

That will still not work correctly. You've forgotten to call 
dma_resv_reserve_fences().

The tricky part is that you need to do this for all BOs at the same time.

I will work on my drm_exec() patch set once more. That one should make 
this job much easier.

Similar applies to the _wait function, but let's get _signal working first.

Regards,
Christian.

> +
> +	for (i = 0; i < num_bo_handles; i++)
> +		drm_gem_object_put(gobj[i]);
> +
> +	dma_fence_put(fence);
> +
> +	/* Free all handles */
> +	kfree(bo_handles);
> +	kfree(gobj);
> +
> +	return 0;
> +
> +err_put_gobj:
> +	while (i-- > 0)
> +		drm_gem_object_put(gobj[i]);
> +	kfree(gobj);
> +err_free_handles:
> +	kfree(bo_handles);
> +
> +	return r;
> +}
> +
> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> +			    struct drm_file *filp)
> +{
> +	struct drm_amdgpu_userq_fence_info *fence_info = NULL;
> +	struct drm_amdgpu_userq_wait *args = data;
> +	struct amdgpu_userq_fence *userq_fence;
> +	u32 *syncobj_handles, *bo_handles;
> +	u64 num_syncobj, num_bo_handles;
> +	struct drm_gem_object **gobj;
> +	struct dma_fence *fence;
> +	bool wait_flag;
> +	int r, i, tmp;
> +
> +	/* Array of Syncobj handles */
> +	num_syncobj = args->count_handles;
> +	syncobj_handles = kmalloc_array(num_syncobj, sizeof(*syncobj_handles), GFP_KERNEL);
> +	if (!syncobj_handles)
> +		return -ENOMEM;
> +
> +	if (copy_from_user(syncobj_handles, u64_to_user_ptr(args->handles),
> +			   sizeof(u32) * num_syncobj)) {
> +		r = -EFAULT;
> +		goto err_free_syncobj_handles;
> +	}
> +
> +	/* Array of bo handles */
> +	num_bo_handles = args->num_bo_handles;
> +	bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), GFP_KERNEL);
> +	if (!bo_handles)
> +		goto err_free_syncobj_handles;
> +
> +	if (copy_from_user(bo_handles, u64_to_user_ptr(args->bo_handles_array),
> +			   sizeof(u32) * num_bo_handles)) {
> +		r = -EFAULT;
> +		goto err_free_bo_handles;
> +	}
> +
> +	/* Array of fence gpu address */
> +	fence_info = kmalloc_array(num_syncobj + num_bo_handles, sizeof(*fence_info), GFP_KERNEL);
> +	if (!fence_info) {
> +		r = -ENOMEM;
> +		goto err_free_bo_handles;
> +	}
> +
> +	/* Retrieve syncobj's fence */
> +	for (i = 0; i < num_syncobj; i++) {
> +		r = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, &fence);
> +		if (r) {
> +			DRM_ERROR("syncobj %u failed to find the fence!\n", syncobj_handles[i]);
> +			r = -ENOENT;
> +			goto err_free_fence_info;
> +		}
> +
> +		/* Store drm syncobj's gpu va address and value */
> +		userq_fence = to_amdgpu_userq_fence(fence);
> +		fence_info[i].va = userq_fence->fence_drv->gpu_addr;
> +		fence_info[i].value = fence->seqno;
> +		dma_fence_put(fence);
> +	}
> +
> +	tmp = i;
> +
> +	/* Array of GEM object handles */
> +	gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
> +	if (gobj == NULL) {
> +		r = -ENOMEM;
> +		goto err_free_fence_info;
> +	}
> +
> +	/* Retrieve GEM objects's fence */
> +	wait_flag = args->bo_wait_flags & AMDGPU_USERQ_BO_READ;
> +	for (i = 0; i < num_bo_handles; i++, tmp++) {
> +		struct dma_fence *bo_fence;
> +
> +		gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
> +		if (!gobj[i]) {
> +			r = -ENOENT;
> +			goto err_put_gobj;
> +		}
> +
> +		dma_resv_lock(gobj[i]->resv, NULL);
> +		r = dma_resv_get_singleton(gobj[i]->resv,
> +					   wait_flag ?
> +					   DMA_RESV_USAGE_READ :
> +					   DMA_RESV_USAGE_WRITE,
> +					   &bo_fence);
> +		if (r) {
> +			dma_resv_unlock(gobj[i]->resv);
> +			goto err_put_gobj;
> +		}
> +		dma_resv_unlock(gobj[i]->resv);
> +		drm_gem_object_put(gobj[i]);
> +
> +		/* Store GEM objects's gpu va address and value */
> +		userq_fence = to_amdgpu_userq_fence(bo_fence);
> +		fence_info[tmp].va = userq_fence->fence_drv->gpu_addr;
> +		fence_info[tmp].value = bo_fence->seqno;
> +		dma_fence_put(bo_fence);
> +	}
> +
> +	if (copy_to_user(u64_to_user_ptr(args->userq_fence_info),
> +	    fence_info, sizeof(fence_info))) {
> +		r = -EFAULT;
> +		goto err_free_gobj;
> +	}
> +
> +	/* Free all handles */
> +	kfree(syncobj_handles);
> +	kfree(bo_handles);
> +	kfree(fence_info);
> +	kfree(gobj);
> +
> +	return 0;
> +
> +err_put_gobj:
> +	while (i-- > 0)
> +		drm_gem_object_put(gobj[i]);
> +err_free_gobj:
> +	kfree(gobj);
> +err_free_fence_info:
> +	kfree(fence_info);
> +err_free_bo_handles:
> +	kfree(bo_handles);
> +err_free_syncobj_handles:
> +	kfree(syncobj_handles);
> +
> +	return r;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> index 230dd54b4cd3..999d1e2baff5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> @@ -27,6 +27,8 @@
>   
>   #include <linux/types.h>
>   
> +#define AMDGPU_USERQ_BO_READ	0x1
> +
>   struct amdgpu_userq_fence {
>   	struct dma_fence base;
>   	/* userq fence lock */
> @@ -57,5 +59,9 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
>   			      u64 seq, struct dma_fence **f);
>   bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver *fence_drv);
>   void amdgpu_userq_fence_driver_destroy(struct kref *ref);
> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *filp);
> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> +			    struct drm_file *filp);
>   
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index d4317b0f6487..4d3d6777a178 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -457,6 +457,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>       idr_init_base(&userq_mgr->userq_idr, 1);
>       INIT_LIST_HEAD(&userq_mgr->userq_list);
>       userq_mgr->adev = adev;
> +    adev->userq_mgr = userq_mgr;
>   
>       r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>       if (r) {


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions
  2023-02-27 12:59   ` Christian König
@ 2023-02-27 13:20     ` Arunpravin Paneer Selvam
  2023-02-27 13:23       ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-27 13:20 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher

Hi Christian,


On 2/27/2023 6:29 PM, Christian König wrote:
> Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
>> This patch introduces new IOCTL for userqueue secure semaphore.
>>
>> The signal IOCTL called from userspace application creates a drm
>> syncobj and array of bo GEM handles and passed in as parameter to
>> the driver to install the fence into it.
>>
>> The wait IOCTL gets an array of drm syncobjs, finds the fences
>> attached to the drm syncobjs and obtain the array of
>> memory_address/fence_value combintion which are returned to
>> userspace.
>>
>> v2: Worked on review comments from Christian for the following
>>      modifications
>>
>>      - Install fence into GEM BO object.
>>      - Lock all BO's using the dma resv subsystem
>>      - Reorder the sequence in signal IOCTL function.
>>      - Get write pointer from the shadow wptr
>>      - use userq_fence to fetch the va/value in wait IOCTL.
>>
>> Signed-off-by: Arunpravin Paneer Selvam 
>> <Arunpravin.PaneerSelvam@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   3 +
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 258 ++++++++++++++++++
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |   6 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |   1 +
>>   5 files changed, 270 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 1c3eba2d0390..255d73795493 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -964,6 +964,8 @@ struct amdgpu_device {
>>       struct amdgpu_mes               mes;
>>       struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
>>   +    struct amdgpu_userq_mgr         *userq_mgr;
>> +
>>       /* df */
>>       struct amdgpu_df                df;
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 6b7ac1ebd04c..66a7304fabe3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2752,6 +2752,9 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] 
>> = {
>>       DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>>       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>>       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_DOORBELL_RING, 
>> amdgpu_userq_doorbell_ring_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, 
>> amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, 
>> DRM_AUTH|DRM_RENDER_ALLOW),
>> +
>>   };
>>     static const struct drm_driver amdgpu_kms_driver = {
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> index 609a7328e9a6..26fd1d4f758a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> @@ -249,3 +249,261 @@ static const struct dma_fence_ops 
>> amdgpu_userq_fence_ops = {
>>       .signaled = amdgpu_userq_fence_signaled,
>>       .release = amdgpu_userq_fence_release,
>>   };
>> +
>> +static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
>> +                    struct amdgpu_usermode_queue *queue,
>> +                    u64 *wptr)
>> +{
>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> +    struct amdgpu_bo_va_mapping *mapping;
>> +    struct amdgpu_vm *vm = &fpriv->vm;
>> +    struct amdgpu_bo *bo;
>> +    u64 *ptr;
>> +    int r;
>> +
>> +    mapping = amdgpu_vm_bo_lookup_mapping(vm, queue->wptr_gpu_addr 
>> >> PAGE_SHIFT);
>> +    if (!mapping)
>> +        return -EINVAL;
>> +
>> +    bo = mapping->bo_va->base.bo;
>> +    r = amdgpu_bo_kmap(bo, (void **)&ptr);
>
> Oh, that's not something you can do that easily.
>
> The BO must be reserved (locked) first if you want to call 
> amdgpu_bo_kmap() on it.
sure, I will take care
>
>> +    if (r) {
>> +        DRM_ERROR("Failed mapping the userqueue wptr bo");
>> +        return r;
>> +    }
>> +
>> +    *wptr = le64_to_cpu(*ptr);
>> +
>> +    return 0;
>> +}
>> +
>> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>> +                  struct drm_file *filp)
>> +{
>> +    struct drm_amdgpu_userq_signal *args = data;
>> +    struct amdgpu_device *adev = drm_to_adev(dev);
>> +    struct amdgpu_userq_mgr *userq_mgr = adev->userq_mgr;
>> +    struct amdgpu_usermode_queue *queue;
>> +    struct drm_syncobj *syncobj = NULL;
>> +    struct drm_gem_object **gobj;
>> +    u64 num_bo_handles, wptr;
>> +    struct dma_fence *fence;
>> +    u32 *bo_handles;
>> +    bool shared;
>> +    int r, i;
>> +
>> +    /* Retrieve the user queue */
>> +    queue = idr_find(&userq_mgr->userq_idr, args->queue_id);
>> +    if (!queue)
>> +        return -ENOENT;
>> +
>> +    r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
>> +    if (r)
>> +        return -EINVAL;
>> +
>> +    /* Find Syncobj if any */
>> +    syncobj = drm_syncobj_find(filp, args->handle);
>> +
>> +    /* Array of bo handles */
>> +    num_bo_handles = args->num_bo_handles;
>> +    bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), 
>> GFP_KERNEL);
>> +    if (bo_handles == NULL)
>> +        return -ENOMEM;
>> +
>> +    if (copy_from_user(bo_handles, 
>> u64_to_user_ptr(args->bo_handles_array),
>> +               sizeof(u32) * num_bo_handles)) {
>> +        r = -EFAULT;
>> +        goto err_free_handles;
>> +    }
>> +
>> +    /* Array of GEM object handles */
>> +    gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>> +    if (gobj == NULL) {
>> +        r = -ENOMEM;
>> +        goto err_free_handles;
>> +    }
>> +
>> +    for (i = 0; i < num_bo_handles; i++) {
>> +        /* Retrieve GEM object */
>> +        gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
>> +        if (!gobj[i]) {
>> +            r = -ENOENT;
>> +            goto err_put_gobj;
>> +        }
>> +
>> +        dma_resv_lock(gobj[i]->resv, NULL);
>> +        r = dma_resv_reserve_fences(gobj[i]->resv, 1);
I am reserving place for each BO here, should we move this down?

Thanks,
Arun
>> +        if (r) {
>> +            dma_resv_unlock(gobj[i]->resv);
>> +            goto err_put_gobj;
>> +        }
>> +        dma_resv_unlock(gobj[i]->resv);
>> +    }
>> +
>> +    /* Create a new fence */
>> +    r = amdgpu_userq_fence_create(queue, wptr, &fence);
>> +    if (!fence)
>> +        goto err_put_gobj;
>> +
>> +    /* Add the created fence to syncobj/BO's */
>> +    if (syncobj)
>> +        drm_syncobj_replace_fence(syncobj, fence);
>> +
>> +    shared = args->bo_flags & AMDGPU_USERQ_BO_READ;
>> +    for (i = 0; i < num_bo_handles; i++) {
>> +        dma_resv_lock(gobj[i]->resv, NULL);
>> +        dma_resv_add_fence(gobj[i]->resv, fence, shared ?
>> +                   DMA_RESV_USAGE_READ :
>> +                   DMA_RESV_USAGE_WRITE);
>> +        dma_resv_unlock(gobj[i]->resv);
>> +    }
>
> That will still not work correctly. You've forgotten to call 
> dma_resv_reserve_fences().
>
> The tricky part is that you need to do this for all BOs at the same time.
>
> I will work on my drm_exec() patch set once more. That one should make 
> this job much easier.
>
> Similar applies to the _wait function, but let's get _signal working 
> first.
>
> Regards,
> Christian.
>
>> +
>> +    for (i = 0; i < num_bo_handles; i++)
>> +        drm_gem_object_put(gobj[i]);
>> +
>> +    dma_fence_put(fence);
>> +
>> +    /* Free all handles */
>> +    kfree(bo_handles);
>> +    kfree(gobj);
>> +
>> +    return 0;
>> +
>> +err_put_gobj:
>> +    while (i-- > 0)
>> +        drm_gem_object_put(gobj[i]);
>> +    kfree(gobj);
>> +err_free_handles:
>> +    kfree(bo_handles);
>> +
>> +    return r;
>> +}
>> +
>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>> +                struct drm_file *filp)
>> +{
>> +    struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>> +    struct drm_amdgpu_userq_wait *args = data;
>> +    struct amdgpu_userq_fence *userq_fence;
>> +    u32 *syncobj_handles, *bo_handles;
>> +    u64 num_syncobj, num_bo_handles;
>> +    struct drm_gem_object **gobj;
>> +    struct dma_fence *fence;
>> +    bool wait_flag;
>> +    int r, i, tmp;
>> +
>> +    /* Array of Syncobj handles */
>> +    num_syncobj = args->count_handles;
>> +    syncobj_handles = kmalloc_array(num_syncobj, 
>> sizeof(*syncobj_handles), GFP_KERNEL);
>> +    if (!syncobj_handles)
>> +        return -ENOMEM;
>> +
>> +    if (copy_from_user(syncobj_handles, u64_to_user_ptr(args->handles),
>> +               sizeof(u32) * num_syncobj)) {
>> +        r = -EFAULT;
>> +        goto err_free_syncobj_handles;
>> +    }
>> +
>> +    /* Array of bo handles */
>> +    num_bo_handles = args->num_bo_handles;
>> +    bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), 
>> GFP_KERNEL);
>> +    if (!bo_handles)
>> +        goto err_free_syncobj_handles;
>> +
>> +    if (copy_from_user(bo_handles, 
>> u64_to_user_ptr(args->bo_handles_array),
>> +               sizeof(u32) * num_bo_handles)) {
>> +        r = -EFAULT;
>> +        goto err_free_bo_handles;
>> +    }
>> +
>> +    /* Array of fence gpu address */
>> +    fence_info = kmalloc_array(num_syncobj + num_bo_handles, 
>> sizeof(*fence_info), GFP_KERNEL);
>> +    if (!fence_info) {
>> +        r = -ENOMEM;
>> +        goto err_free_bo_handles;
>> +    }
>> +
>> +    /* Retrieve syncobj's fence */
>> +    for (i = 0; i < num_syncobj; i++) {
>> +        r = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, 
>> &fence);
>> +        if (r) {
>> +            DRM_ERROR("syncobj %u failed to find the fence!\n", 
>> syncobj_handles[i]);
>> +            r = -ENOENT;
>> +            goto err_free_fence_info;
>> +        }
>> +
>> +        /* Store drm syncobj's gpu va address and value */
>> +        userq_fence = to_amdgpu_userq_fence(fence);
>> +        fence_info[i].va = userq_fence->fence_drv->gpu_addr;
>> +        fence_info[i].value = fence->seqno;
>> +        dma_fence_put(fence);
>> +    }
>> +
>> +    tmp = i;
>> +
>> +    /* Array of GEM object handles */
>> +    gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>> +    if (gobj == NULL) {
>> +        r = -ENOMEM;
>> +        goto err_free_fence_info;
>> +    }
>> +
>> +    /* Retrieve GEM objects's fence */
>> +    wait_flag = args->bo_wait_flags & AMDGPU_USERQ_BO_READ;
>> +    for (i = 0; i < num_bo_handles; i++, tmp++) {
>> +        struct dma_fence *bo_fence;
>> +
>> +        gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
>> +        if (!gobj[i]) {
>> +            r = -ENOENT;
>> +            goto err_put_gobj;
>> +        }
>> +
>> +        dma_resv_lock(gobj[i]->resv, NULL);
>> +        r = dma_resv_get_singleton(gobj[i]->resv,
>> +                       wait_flag ?
>> +                       DMA_RESV_USAGE_READ :
>> +                       DMA_RESV_USAGE_WRITE,
>> +                       &bo_fence);
>> +        if (r) {
>> +            dma_resv_unlock(gobj[i]->resv);
>> +            goto err_put_gobj;
>> +        }
>> +        dma_resv_unlock(gobj[i]->resv);
>> +        drm_gem_object_put(gobj[i]);
>> +
>> +        /* Store GEM objects's gpu va address and value */
>> +        userq_fence = to_amdgpu_userq_fence(bo_fence);
>> +        fence_info[tmp].va = userq_fence->fence_drv->gpu_addr;
>> +        fence_info[tmp].value = bo_fence->seqno;
>> +        dma_fence_put(bo_fence);
>> +    }
>> +
>> +    if (copy_to_user(u64_to_user_ptr(args->userq_fence_info),
>> +        fence_info, sizeof(fence_info))) {
>> +        r = -EFAULT;
>> +        goto err_free_gobj;
>> +    }
>> +
>> +    /* Free all handles */
>> +    kfree(syncobj_handles);
>> +    kfree(bo_handles);
>> +    kfree(fence_info);
>> +    kfree(gobj);
>> +
>> +    return 0;
>> +
>> +err_put_gobj:
>> +    while (i-- > 0)
>> +        drm_gem_object_put(gobj[i]);
>> +err_free_gobj:
>> +    kfree(gobj);
>> +err_free_fence_info:
>> +    kfree(fence_info);
>> +err_free_bo_handles:
>> +    kfree(bo_handles);
>> +err_free_syncobj_handles:
>> +    kfree(syncobj_handles);
>> +
>> +    return r;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>> index 230dd54b4cd3..999d1e2baff5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>> @@ -27,6 +27,8 @@
>>     #include <linux/types.h>
>>   +#define AMDGPU_USERQ_BO_READ    0x1
>> +
>>   struct amdgpu_userq_fence {
>>       struct dma_fence base;
>>       /* userq fence lock */
>> @@ -57,5 +59,9 @@ int amdgpu_userq_fence_create(struct 
>> amdgpu_usermode_queue *userq,
>>                     u64 seq, struct dma_fence **f);
>>   bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver 
>> *fence_drv);
>>   void amdgpu_userq_fence_driver_destroy(struct kref *ref);
>> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>> +                  struct drm_file *filp);
>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>> +                struct drm_file *filp);
>>     #endif
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index d4317b0f6487..4d3d6777a178 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -457,6 +457,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr 
>> *userq_mgr, struct amdgpu_devi
>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>       INIT_LIST_HEAD(&userq_mgr->userq_list);
>>       userq_mgr->adev = adev;
>> +    adev->userq_mgr = userq_mgr;
>>         r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>>       if (r) {
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions
  2023-02-27 13:20     ` Arunpravin Paneer Selvam
@ 2023-02-27 13:23       ` Christian König
  2023-02-27 13:35         ` Arunpravin Paneer Selvam
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-27 13:23 UTC (permalink / raw)
  To: Arunpravin Paneer Selvam, amd-gfx; +Cc: alexander.deucher

Hi Arun,

Am 27.02.23 um 14:20 schrieb Arunpravin Paneer Selvam:
> Hi Christian,
>
>
> On 2/27/2023 6:29 PM, Christian König wrote:
>> Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
>>> This patch introduces new IOCTL for userqueue secure semaphore.
>>>
>>> The signal IOCTL called from userspace application creates a drm
>>> syncobj and array of bo GEM handles and passed in as parameter to
>>> the driver to install the fence into it.
>>>
>>> The wait IOCTL gets an array of drm syncobjs, finds the fences
>>> attached to the drm syncobjs and obtain the array of
>>> memory_address/fence_value combintion which are returned to
>>> userspace.
>>>
>>> v2: Worked on review comments from Christian for the following
>>>      modifications
>>>
>>>      - Install fence into GEM BO object.
>>>      - Lock all BO's using the dma resv subsystem
>>>      - Reorder the sequence in signal IOCTL function.
>>>      - Get write pointer from the shadow wptr
>>>      - use userq_fence to fetch the va/value in wait IOCTL.
>>>
>>> Signed-off-by: Arunpravin Paneer Selvam 
>>> <Arunpravin.PaneerSelvam@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   3 +
>>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 258 
>>> ++++++++++++++++++
>>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |   6 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |   1 +
>>>   5 files changed, 270 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 1c3eba2d0390..255d73795493 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -964,6 +964,8 @@ struct amdgpu_device {
>>>       struct amdgpu_mes               mes;
>>>       struct amdgpu_mqd               mqds[AMDGPU_HW_IP_NUM];
>>>   +    struct amdgpu_userq_mgr         *userq_mgr;
>>> +
>>>       /* df */
>>>       struct amdgpu_df                df;
>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index 6b7ac1ebd04c..66a7304fabe3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2752,6 +2752,9 @@ const struct drm_ioctl_desc 
>>> amdgpu_ioctls_kms[] = {
>>>       DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, 
>>> amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
>>> DRM_AUTH|DRM_RENDER_ALLOW),
>>>       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_DOORBELL_RING, 
>>> amdgpu_userq_doorbell_ring_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, 
>>> amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, 
>>> DRM_AUTH|DRM_RENDER_ALLOW),
>>> +
>>>   };
>>>     static const struct drm_driver amdgpu_kms_driver = {
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> index 609a7328e9a6..26fd1d4f758a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>> @@ -249,3 +249,261 @@ static const struct dma_fence_ops 
>>> amdgpu_userq_fence_ops = {
>>>       .signaled = amdgpu_userq_fence_signaled,
>>>       .release = amdgpu_userq_fence_release,
>>>   };
>>> +
>>> +static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
>>> +                    struct amdgpu_usermode_queue *queue,
>>> +                    u64 *wptr)
>>> +{
>>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>> +    struct amdgpu_bo_va_mapping *mapping;
>>> +    struct amdgpu_vm *vm = &fpriv->vm;
>>> +    struct amdgpu_bo *bo;
>>> +    u64 *ptr;
>>> +    int r;
>>> +
>>> +    mapping = amdgpu_vm_bo_lookup_mapping(vm, queue->wptr_gpu_addr 
>>> >> PAGE_SHIFT);
>>> +    if (!mapping)
>>> +        return -EINVAL;
>>> +
>>> +    bo = mapping->bo_va->base.bo;
>>> +    r = amdgpu_bo_kmap(bo, (void **)&ptr);
>>
>> Oh, that's not something you can do that easily.
>>
>> The BO must be reserved (locked) first if you want to call 
>> amdgpu_bo_kmap() on it.
> sure, I will take care
>>
>>> +    if (r) {
>>> +        DRM_ERROR("Failed mapping the userqueue wptr bo");
>>> +        return r;
>>> +    }
>>> +
>>> +    *wptr = le64_to_cpu(*ptr);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>>> +                  struct drm_file *filp)
>>> +{
>>> +    struct drm_amdgpu_userq_signal *args = data;
>>> +    struct amdgpu_device *adev = drm_to_adev(dev);
>>> +    struct amdgpu_userq_mgr *userq_mgr = adev->userq_mgr;
>>> +    struct amdgpu_usermode_queue *queue;
>>> +    struct drm_syncobj *syncobj = NULL;
>>> +    struct drm_gem_object **gobj;
>>> +    u64 num_bo_handles, wptr;
>>> +    struct dma_fence *fence;
>>> +    u32 *bo_handles;
>>> +    bool shared;
>>> +    int r, i;
>>> +
>>> +    /* Retrieve the user queue */
>>> +    queue = idr_find(&userq_mgr->userq_idr, args->queue_id);
>>> +    if (!queue)
>>> +        return -ENOENT;
>>> +
>>> +    r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
>>> +    if (r)
>>> +        return -EINVAL;
>>> +
>>> +    /* Find Syncobj if any */
>>> +    syncobj = drm_syncobj_find(filp, args->handle);
>>> +
>>> +    /* Array of bo handles */
>>> +    num_bo_handles = args->num_bo_handles;
>>> +    bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), 
>>> GFP_KERNEL);
>>> +    if (bo_handles == NULL)
>>> +        return -ENOMEM;
>>> +
>>> +    if (copy_from_user(bo_handles, 
>>> u64_to_user_ptr(args->bo_handles_array),
>>> +               sizeof(u32) * num_bo_handles)) {
>>> +        r = -EFAULT;
>>> +        goto err_free_handles;
>>> +    }
>>> +
>>> +    /* Array of GEM object handles */
>>> +    gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>> +    if (gobj == NULL) {
>>> +        r = -ENOMEM;
>>> +        goto err_free_handles;
>>> +    }
>>> +
>>> +    for (i = 0; i < num_bo_handles; i++) {
>>> +        /* Retrieve GEM object */
>>> +        gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
>>> +        if (!gobj[i]) {
>>> +            r = -ENOENT;
>>> +            goto err_put_gobj;
>>> +        }
>>> +
>>> +        dma_resv_lock(gobj[i]->resv, NULL);
>>> +        r = dma_resv_reserve_fences(gobj[i]->resv, 1);
> I am reserving place for each BO here, should we move this down?

See below.

>
> Thanks,
> Arun
>>> +        if (r) {
>>> +            dma_resv_unlock(gobj[i]->resv);
>>> +            goto err_put_gobj;
>>> +        }
>>> +        dma_resv_unlock(gobj[i]->resv);

The problem is here. You can't unlock the BO after you reserved the 
fence slot or the reservation is dropped again. What you need to do is 
to lock all BOs first and then reserve the fence slot.

I've coded together a drm_exec helper to make that simple a while ago, 
but never found the time to upstream it. Give me a day or so to rebase 
the code.

Regards,
Christian.

>>> +    }
>>> +
>>> +    /* Create a new fence */
>>> +    r = amdgpu_userq_fence_create(queue, wptr, &fence);
>>> +    if (!fence)
>>> +        goto err_put_gobj;
>>> +
>>> +    /* Add the created fence to syncobj/BO's */
>>> +    if (syncobj)
>>> +        drm_syncobj_replace_fence(syncobj, fence);
>>> +
>>> +    shared = args->bo_flags & AMDGPU_USERQ_BO_READ;
>>> +    for (i = 0; i < num_bo_handles; i++) {
>>> +        dma_resv_lock(gobj[i]->resv, NULL);
>>> +        dma_resv_add_fence(gobj[i]->resv, fence, shared ?
>>> +                   DMA_RESV_USAGE_READ :
>>> +                   DMA_RESV_USAGE_WRITE);
>>> +        dma_resv_unlock(gobj[i]->resv);
>>> +    }
>>
>> That will still not work correctly. You've forgotten to call 
>> dma_resv_reserve_fences().
>>
>> The tricky part is that you need to do this for all BOs at the same 
>> time.
>>
>> I will work on my drm_exec() patch set once more. That one should 
>> make this job much easier.
>>
>> Similar applies to the _wait function, but let's get _signal working 
>> first.
>>
>> Regards,
>> Christian.
>>
>>> +
>>> +    for (i = 0; i < num_bo_handles; i++)
>>> +        drm_gem_object_put(gobj[i]);
>>> +
>>> +    dma_fence_put(fence);
>>> +
>>> +    /* Free all handles */
>>> +    kfree(bo_handles);
>>> +    kfree(gobj);
>>> +
>>> +    return 0;
>>> +
>>> +err_put_gobj:
>>> +    while (i-- > 0)
>>> +        drm_gem_object_put(gobj[i]);
>>> +    kfree(gobj);
>>> +err_free_handles:
>>> +    kfree(bo_handles);
>>> +
>>> +    return r;
>>> +}
>>> +
>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>> +                struct drm_file *filp)
>>> +{
>>> +    struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>>> +    struct drm_amdgpu_userq_wait *args = data;
>>> +    struct amdgpu_userq_fence *userq_fence;
>>> +    u32 *syncobj_handles, *bo_handles;
>>> +    u64 num_syncobj, num_bo_handles;
>>> +    struct drm_gem_object **gobj;
>>> +    struct dma_fence *fence;
>>> +    bool wait_flag;
>>> +    int r, i, tmp;
>>> +
>>> +    /* Array of Syncobj handles */
>>> +    num_syncobj = args->count_handles;
>>> +    syncobj_handles = kmalloc_array(num_syncobj, 
>>> sizeof(*syncobj_handles), GFP_KERNEL);
>>> +    if (!syncobj_handles)
>>> +        return -ENOMEM;
>>> +
>>> +    if (copy_from_user(syncobj_handles, 
>>> u64_to_user_ptr(args->handles),
>>> +               sizeof(u32) * num_syncobj)) {
>>> +        r = -EFAULT;
>>> +        goto err_free_syncobj_handles;
>>> +    }
>>> +
>>> +    /* Array of bo handles */
>>> +    num_bo_handles = args->num_bo_handles;
>>> +    bo_handles = kmalloc_array(num_bo_handles, sizeof(*bo_handles), 
>>> GFP_KERNEL);
>>> +    if (!bo_handles)
>>> +        goto err_free_syncobj_handles;
>>> +
>>> +    if (copy_from_user(bo_handles, 
>>> u64_to_user_ptr(args->bo_handles_array),
>>> +               sizeof(u32) * num_bo_handles)) {
>>> +        r = -EFAULT;
>>> +        goto err_free_bo_handles;
>>> +    }
>>> +
>>> +    /* Array of fence gpu address */
>>> +    fence_info = kmalloc_array(num_syncobj + num_bo_handles, 
>>> sizeof(*fence_info), GFP_KERNEL);
>>> +    if (!fence_info) {
>>> +        r = -ENOMEM;
>>> +        goto err_free_bo_handles;
>>> +    }
>>> +
>>> +    /* Retrieve syncobj's fence */
>>> +    for (i = 0; i < num_syncobj; i++) {
>>> +        r = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, 
>>> &fence);
>>> +        if (r) {
>>> +            DRM_ERROR("syncobj %u failed to find the fence!\n", 
>>> syncobj_handles[i]);
>>> +            r = -ENOENT;
>>> +            goto err_free_fence_info;
>>> +        }
>>> +
>>> +        /* Store drm syncobj's gpu va address and value */
>>> +        userq_fence = to_amdgpu_userq_fence(fence);
>>> +        fence_info[i].va = userq_fence->fence_drv->gpu_addr;
>>> +        fence_info[i].value = fence->seqno;
>>> +        dma_fence_put(fence);
>>> +    }
>>> +
>>> +    tmp = i;
>>> +
>>> +    /* Array of GEM object handles */
>>> +    gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>> +    if (gobj == NULL) {
>>> +        r = -ENOMEM;
>>> +        goto err_free_fence_info;
>>> +    }
>>> +
>>> +    /* Retrieve GEM objects's fence */
>>> +    wait_flag = args->bo_wait_flags & AMDGPU_USERQ_BO_READ;
>>> +    for (i = 0; i < num_bo_handles; i++, tmp++) {
>>> +        struct dma_fence *bo_fence;
>>> +
>>> +        gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
>>> +        if (!gobj[i]) {
>>> +            r = -ENOENT;
>>> +            goto err_put_gobj;
>>> +        }
>>> +
>>> +        dma_resv_lock(gobj[i]->resv, NULL);
>>> +        r = dma_resv_get_singleton(gobj[i]->resv,
>>> +                       wait_flag ?
>>> +                       DMA_RESV_USAGE_READ :
>>> +                       DMA_RESV_USAGE_WRITE,
>>> +                       &bo_fence);
>>> +        if (r) {
>>> +            dma_resv_unlock(gobj[i]->resv);
>>> +            goto err_put_gobj;
>>> +        }
>>> +        dma_resv_unlock(gobj[i]->resv);
>>> +        drm_gem_object_put(gobj[i]);
>>> +
>>> +        /* Store GEM objects's gpu va address and value */
>>> +        userq_fence = to_amdgpu_userq_fence(bo_fence);
>>> +        fence_info[tmp].va = userq_fence->fence_drv->gpu_addr;
>>> +        fence_info[tmp].value = bo_fence->seqno;
>>> +        dma_fence_put(bo_fence);
>>> +    }
>>> +
>>> +    if (copy_to_user(u64_to_user_ptr(args->userq_fence_info),
>>> +        fence_info, sizeof(fence_info))) {
>>> +        r = -EFAULT;
>>> +        goto err_free_gobj;
>>> +    }
>>> +
>>> +    /* Free all handles */
>>> +    kfree(syncobj_handles);
>>> +    kfree(bo_handles);
>>> +    kfree(fence_info);
>>> +    kfree(gobj);
>>> +
>>> +    return 0;
>>> +
>>> +err_put_gobj:
>>> +    while (i-- > 0)
>>> +        drm_gem_object_put(gobj[i]);
>>> +err_free_gobj:
>>> +    kfree(gobj);
>>> +err_free_fence_info:
>>> +    kfree(fence_info);
>>> +err_free_bo_handles:
>>> +    kfree(bo_handles);
>>> +err_free_syncobj_handles:
>>> +    kfree(syncobj_handles);
>>> +
>>> +    return r;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>> index 230dd54b4cd3..999d1e2baff5 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>> @@ -27,6 +27,8 @@
>>>     #include <linux/types.h>
>>>   +#define AMDGPU_USERQ_BO_READ    0x1
>>> +
>>>   struct amdgpu_userq_fence {
>>>       struct dma_fence base;
>>>       /* userq fence lock */
>>> @@ -57,5 +59,9 @@ int amdgpu_userq_fence_create(struct 
>>> amdgpu_usermode_queue *userq,
>>>                     u64 seq, struct dma_fence **f);
>>>   bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver 
>>> *fence_drv);
>>>   void amdgpu_userq_fence_driver_destroy(struct kref *ref);
>>> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>>> +                  struct drm_file *filp);
>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>> +                struct drm_file *filp);
>>>     #endif
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> index d4317b0f6487..4d3d6777a178 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>> @@ -457,6 +457,7 @@ int amdgpu_userq_mgr_init(struct 
>>> amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>>       INIT_LIST_HEAD(&userq_mgr->userq_list);
>>>       userq_mgr->adev = adev;
>>> +    adev->userq_mgr = userq_mgr;
>>>         r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>>>       if (r) {
>>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions
  2023-02-27 13:23       ` Christian König
@ 2023-02-27 13:35         ` Arunpravin Paneer Selvam
  0 siblings, 0 replies; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-27 13:35 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher



On 2/27/2023 6:53 PM, Christian König wrote:
> Hi Arun,
>
> Am 27.02.23 um 14:20 schrieb Arunpravin Paneer Selvam:
>> Hi Christian,
>>
>>
>> On 2/27/2023 6:29 PM, Christian König wrote:
>>> Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
>>>> This patch introduces new IOCTL for userqueue secure semaphore.
>>>>
>>>> The signal IOCTL called from userspace application creates a drm
>>>> syncobj and array of bo GEM handles and passed in as parameter to
>>>> the driver to install the fence into it.
>>>>
>>>> The wait IOCTL gets an array of drm syncobjs, finds the fences
>>>> attached to the drm syncobjs and obtain the array of
>>>> memory_address/fence_value combintion which are returned to
>>>> userspace.
>>>>
>>>> v2: Worked on review comments from Christian for the following
>>>>      modifications
>>>>
>>>>      - Install fence into GEM BO object.
>>>>      - Lock all BO's using the dma resv subsystem
>>>>      - Reorder the sequence in signal IOCTL function.
>>>>      - Get write pointer from the shadow wptr
>>>>      - use userq_fence to fetch the va/value in wait IOCTL.
>>>>
>>>> Signed-off-by: Arunpravin Paneer Selvam 
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   3 +
>>>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 258 
>>>> ++++++++++++++++++
>>>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |   6 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |   1 +
>>>>   5 files changed, 270 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 1c3eba2d0390..255d73795493 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -964,6 +964,8 @@ struct amdgpu_device {
>>>>       struct amdgpu_mes               mes;
>>>>       struct amdgpu_mqd mqds[AMDGPU_HW_IP_NUM];
>>>>   +    struct amdgpu_userq_mgr         *userq_mgr;
>>>> +
>>>>       /* df */
>>>>       struct amdgpu_df                df;
>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 6b7ac1ebd04c..66a7304fabe3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -2752,6 +2752,9 @@ const struct drm_ioctl_desc 
>>>> amdgpu_ioctls_kms[] = {
>>>>       DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, 
>>>> amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ, amdgpu_userq_ioctl, 
>>>> DRM_AUTH|DRM_RENDER_ALLOW),
>>>>       DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_DOORBELL_RING, 
>>>> amdgpu_userq_doorbell_ring_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, 
>>>> amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, 
>>>> DRM_AUTH|DRM_RENDER_ALLOW),
>>>> +
>>>>   };
>>>>     static const struct drm_driver amdgpu_kms_driver = {
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> index 609a7328e9a6..26fd1d4f758a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>>> @@ -249,3 +249,261 @@ static const struct dma_fence_ops 
>>>> amdgpu_userq_fence_ops = {
>>>>       .signaled = amdgpu_userq_fence_signaled,
>>>>       .release = amdgpu_userq_fence_release,
>>>>   };
>>>> +
>>>> +static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
>>>> +                    struct amdgpu_usermode_queue *queue,
>>>> +                    u64 *wptr)
>>>> +{
>>>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>> +    struct amdgpu_bo_va_mapping *mapping;
>>>> +    struct amdgpu_vm *vm = &fpriv->vm;
>>>> +    struct amdgpu_bo *bo;
>>>> +    u64 *ptr;
>>>> +    int r;
>>>> +
>>>> +    mapping = amdgpu_vm_bo_lookup_mapping(vm, queue->wptr_gpu_addr 
>>>> >> PAGE_SHIFT);
>>>> +    if (!mapping)
>>>> +        return -EINVAL;
>>>> +
>>>> +    bo = mapping->bo_va->base.bo;
>>>> +    r = amdgpu_bo_kmap(bo, (void **)&ptr);
>>>
>>> Oh, that's not something you can do that easily.
>>>
>>> The BO must be reserved (locked) first if you want to call 
>>> amdgpu_bo_kmap() on it.
>> sure, I will take care
>>>
>>>> +    if (r) {
>>>> +        DRM_ERROR("Failed mapping the userqueue wptr bo");
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    *wptr = le64_to_cpu(*ptr);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>>>> +                  struct drm_file *filp)
>>>> +{
>>>> +    struct drm_amdgpu_userq_signal *args = data;
>>>> +    struct amdgpu_device *adev = drm_to_adev(dev);
>>>> +    struct amdgpu_userq_mgr *userq_mgr = adev->userq_mgr;
>>>> +    struct amdgpu_usermode_queue *queue;
>>>> +    struct drm_syncobj *syncobj = NULL;
>>>> +    struct drm_gem_object **gobj;
>>>> +    u64 num_bo_handles, wptr;
>>>> +    struct dma_fence *fence;
>>>> +    u32 *bo_handles;
>>>> +    bool shared;
>>>> +    int r, i;
>>>> +
>>>> +    /* Retrieve the user queue */
>>>> +    queue = idr_find(&userq_mgr->userq_idr, args->queue_id);
>>>> +    if (!queue)
>>>> +        return -ENOENT;
>>>> +
>>>> +    r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
>>>> +    if (r)
>>>> +        return -EINVAL;
>>>> +
>>>> +    /* Find Syncobj if any */
>>>> +    syncobj = drm_syncobj_find(filp, args->handle);
>>>> +
>>>> +    /* Array of bo handles */
>>>> +    num_bo_handles = args->num_bo_handles;
>>>> +    bo_handles = kmalloc_array(num_bo_handles, 
>>>> sizeof(*bo_handles), GFP_KERNEL);
>>>> +    if (bo_handles == NULL)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    if (copy_from_user(bo_handles, 
>>>> u64_to_user_ptr(args->bo_handles_array),
>>>> +               sizeof(u32) * num_bo_handles)) {
>>>> +        r = -EFAULT;
>>>> +        goto err_free_handles;
>>>> +    }
>>>> +
>>>> +    /* Array of GEM object handles */
>>>> +    gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>>> +    if (gobj == NULL) {
>>>> +        r = -ENOMEM;
>>>> +        goto err_free_handles;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < num_bo_handles; i++) {
>>>> +        /* Retrieve GEM object */
>>>> +        gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
>>>> +        if (!gobj[i]) {
>>>> +            r = -ENOENT;
>>>> +            goto err_put_gobj;
>>>> +        }
>>>> +
>>>> +        dma_resv_lock(gobj[i]->resv, NULL);
>>>> +        r = dma_resv_reserve_fences(gobj[i]->resv, 1);
>> I am reserving place for each BO here, should we move this down?
>
> See below.
>
>>
>> Thanks,
>> Arun
>>>> +        if (r) {
>>>> +            dma_resv_unlock(gobj[i]->resv);
>>>> +            goto err_put_gobj;
>>>> +        }
>>>> +        dma_resv_unlock(gobj[i]->resv);
>
> The problem is here. You can't unlock the BO after you reserved the 
> fence slot or the reservation is dropped again. What you need to do is 
> to lock all BOs first and then reserve the fence slot.
>
> I've coded together a drm_exec helper to make that simple a while ago, 
> but never found the time to upstream it. Give me a day or so to rebase 
> the code.
sure, I will work on other comments.
>
> Regards,
> Christian.
>
>>>> +    }
>>>> +
>>>> +    /* Create a new fence */
>>>> +    r = amdgpu_userq_fence_create(queue, wptr, &fence);
>>>> +    if (!fence)
>>>> +        goto err_put_gobj;
>>>> +
>>>> +    /* Add the created fence to syncobj/BO's */
>>>> +    if (syncobj)
>>>> +        drm_syncobj_replace_fence(syncobj, fence);
>>>> +
>>>> +    shared = args->bo_flags & AMDGPU_USERQ_BO_READ;
>>>> +    for (i = 0; i < num_bo_handles; i++) {
>>>> +        dma_resv_lock(gobj[i]->resv, NULL);
>>>> +        dma_resv_add_fence(gobj[i]->resv, fence, shared ?
>>>> +                   DMA_RESV_USAGE_READ :
>>>> +                   DMA_RESV_USAGE_WRITE);
>>>> +        dma_resv_unlock(gobj[i]->resv);
>>>> +    }
>>>
>>> That will still not work correctly. You've forgotten to call 
>>> dma_resv_reserve_fences().
>>>
>>> The tricky part is that you need to do this for all BOs at the same 
>>> time.
>>>
>>> I will work on my drm_exec() patch set once more. That one should 
>>> make this job much easier.
>>>
>>> Similar applies to the _wait function, but let's get _signal working 
>>> first.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +
>>>> +    for (i = 0; i < num_bo_handles; i++)
>>>> +        drm_gem_object_put(gobj[i]);
>>>> +
>>>> +    dma_fence_put(fence);
>>>> +
>>>> +    /* Free all handles */
>>>> +    kfree(bo_handles);
>>>> +    kfree(gobj);
>>>> +
>>>> +    return 0;
>>>> +
>>>> +err_put_gobj:
>>>> +    while (i-- > 0)
>>>> +        drm_gem_object_put(gobj[i]);
>>>> +    kfree(gobj);
>>>> +err_free_handles:
>>>> +    kfree(bo_handles);
>>>> +
>>>> +    return r;
>>>> +}
>>>> +
>>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>>> +                struct drm_file *filp)
>>>> +{
>>>> +    struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>>>> +    struct drm_amdgpu_userq_wait *args = data;
>>>> +    struct amdgpu_userq_fence *userq_fence;
>>>> +    u32 *syncobj_handles, *bo_handles;
>>>> +    u64 num_syncobj, num_bo_handles;
>>>> +    struct drm_gem_object **gobj;
>>>> +    struct dma_fence *fence;
>>>> +    bool wait_flag;
>>>> +    int r, i, tmp;
>>>> +
>>>> +    /* Array of Syncobj handles */
>>>> +    num_syncobj = args->count_handles;
>>>> +    syncobj_handles = kmalloc_array(num_syncobj, 
>>>> sizeof(*syncobj_handles), GFP_KERNEL);
>>>> +    if (!syncobj_handles)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    if (copy_from_user(syncobj_handles, 
>>>> u64_to_user_ptr(args->handles),
>>>> +               sizeof(u32) * num_syncobj)) {
>>>> +        r = -EFAULT;
>>>> +        goto err_free_syncobj_handles;
>>>> +    }
>>>> +
>>>> +    /* Array of bo handles */
>>>> +    num_bo_handles = args->num_bo_handles;
>>>> +    bo_handles = kmalloc_array(num_bo_handles, 
>>>> sizeof(*bo_handles), GFP_KERNEL);
>>>> +    if (!bo_handles)
>>>> +        goto err_free_syncobj_handles;
>>>> +
>>>> +    if (copy_from_user(bo_handles, 
>>>> u64_to_user_ptr(args->bo_handles_array),
>>>> +               sizeof(u32) * num_bo_handles)) {
>>>> +        r = -EFAULT;
>>>> +        goto err_free_bo_handles;
>>>> +    }
>>>> +
>>>> +    /* Array of fence gpu address */
>>>> +    fence_info = kmalloc_array(num_syncobj + num_bo_handles, 
>>>> sizeof(*fence_info), GFP_KERNEL);
>>>> +    if (!fence_info) {
>>>> +        r = -ENOMEM;
>>>> +        goto err_free_bo_handles;
>>>> +    }
>>>> +
>>>> +    /* Retrieve syncobj's fence */
>>>> +    for (i = 0; i < num_syncobj; i++) {
>>>> +        r = drm_syncobj_find_fence(filp, syncobj_handles[i], 0, 0, 
>>>> &fence);
>>>> +        if (r) {
>>>> +            DRM_ERROR("syncobj %u failed to find the fence!\n", 
>>>> syncobj_handles[i]);
>>>> +            r = -ENOENT;
>>>> +            goto err_free_fence_info;
>>>> +        }
>>>> +
>>>> +        /* Store drm syncobj's gpu va address and value */
>>>> +        userq_fence = to_amdgpu_userq_fence(fence);
>>>> +        fence_info[i].va = userq_fence->fence_drv->gpu_addr;
>>>> +        fence_info[i].value = fence->seqno;
>>>> +        dma_fence_put(fence);
>>>> +    }
>>>> +
>>>> +    tmp = i;
>>>> +
>>>> +    /* Array of GEM object handles */
>>>> +    gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>>> +    if (gobj == NULL) {
>>>> +        r = -ENOMEM;
>>>> +        goto err_free_fence_info;
>>>> +    }
>>>> +
>>>> +    /* Retrieve GEM objects's fence */
>>>> +    wait_flag = args->bo_wait_flags & AMDGPU_USERQ_BO_READ;
>>>> +    for (i = 0; i < num_bo_handles; i++, tmp++) {
>>>> +        struct dma_fence *bo_fence;
>>>> +
>>>> +        gobj[i] = drm_gem_object_lookup(filp, bo_handles[i]);
>>>> +        if (!gobj[i]) {
>>>> +            r = -ENOENT;
>>>> +            goto err_put_gobj;
>>>> +        }
>>>> +
>>>> +        dma_resv_lock(gobj[i]->resv, NULL);
>>>> +        r = dma_resv_get_singleton(gobj[i]->resv,
>>>> +                       wait_flag ?
>>>> +                       DMA_RESV_USAGE_READ :
>>>> +                       DMA_RESV_USAGE_WRITE,
>>>> +                       &bo_fence);
>>>> +        if (r) {
>>>> +            dma_resv_unlock(gobj[i]->resv);
>>>> +            goto err_put_gobj;
>>>> +        }
>>>> +        dma_resv_unlock(gobj[i]->resv);
>>>> +        drm_gem_object_put(gobj[i]);
>>>> +
>>>> +        /* Store GEM objects's gpu va address and value */
>>>> +        userq_fence = to_amdgpu_userq_fence(bo_fence);
>>>> +        fence_info[tmp].va = userq_fence->fence_drv->gpu_addr;
>>>> +        fence_info[tmp].value = bo_fence->seqno;
>>>> +        dma_fence_put(bo_fence);
>>>> +    }
>>>> +
>>>> +    if (copy_to_user(u64_to_user_ptr(args->userq_fence_info),
>>>> +        fence_info, sizeof(fence_info))) {
>>>> +        r = -EFAULT;
>>>> +        goto err_free_gobj;
>>>> +    }
>>>> +
>>>> +    /* Free all handles */
>>>> +    kfree(syncobj_handles);
>>>> +    kfree(bo_handles);
>>>> +    kfree(fence_info);
>>>> +    kfree(gobj);
>>>> +
>>>> +    return 0;
>>>> +
>>>> +err_put_gobj:
>>>> +    while (i-- > 0)
>>>> +        drm_gem_object_put(gobj[i]);
>>>> +err_free_gobj:
>>>> +    kfree(gobj);
>>>> +err_free_fence_info:
>>>> +    kfree(fence_info);
>>>> +err_free_bo_handles:
>>>> +    kfree(bo_handles);
>>>> +err_free_syncobj_handles:
>>>> +    kfree(syncobj_handles);
>>>> +
>>>> +    return r;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>> index 230dd54b4cd3..999d1e2baff5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>>> @@ -27,6 +27,8 @@
>>>>     #include <linux/types.h>
>>>>   +#define AMDGPU_USERQ_BO_READ    0x1
>>>> +
>>>>   struct amdgpu_userq_fence {
>>>>       struct dma_fence base;
>>>>       /* userq fence lock */
>>>> @@ -57,5 +59,9 @@ int amdgpu_userq_fence_create(struct 
>>>> amdgpu_usermode_queue *userq,
>>>>                     u64 seq, struct dma_fence **f);
>>>>   bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver 
>>>> *fence_drv);
>>>>   void amdgpu_userq_fence_driver_destroy(struct kref *ref);
>>>> +int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
>>>> +                  struct drm_file *filp);
>>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>>> +                struct drm_file *filp);
>>>>     #endif
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> index d4317b0f6487..4d3d6777a178 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> @@ -457,6 +457,7 @@ int amdgpu_userq_mgr_init(struct 
>>>> amdgpu_userq_mgr *userq_mgr, struct amdgpu_devi
>>>>       idr_init_base(&userq_mgr->userq_idr, 1);
>>>>       INIT_LIST_HEAD(&userq_mgr->userq_list);
>>>>       userq_mgr->adev = adev;
>>>> +    adev->userq_mgr = userq_mgr;
>>>>         r = amdgpu_userqueue_setup_mqd_funcs(userq_mgr);
>>>>       if (r) {
>>>
>>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver
  2023-02-27 12:42   ` Christian König
@ 2023-02-28  8:45     ` Arunpravin Paneer Selvam
  0 siblings, 0 replies; 14+ messages in thread
From: Arunpravin Paneer Selvam @ 2023-02-28  8:45 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: alexander.deucher

Hi Christian,

On 2/27/2023 6:12 PM, Christian König wrote:
> Am 26.02.23 um 17:54 schrieb Arunpravin Paneer Selvam:
>> Developed a userqueue fence driver for the userqueue process shared
>> BO synchronization.
>>
>> Create a dma fence having write pointer as the seqno and allocate a
>> seq64 memory for each user queue process and feed this memory address
>> into the firmware/hardware, thus the firmware writes the read pointer
>> into the given address when the process completes it execution.
>> Compare wptr and rptr, if rptr >= wptr, signal the fences for the 
>> waiting
>> process to consume the buffers.
>>
>> v2: Worked on review comments from Christian for the following
>>      modifications
>>
>>      - Add wptr as sequence number into the fence
>>      - Add a reference count for the fence driver
>>      - Add dma_fence_put below the list_del as it might frees the 
>> userq fence.
>>      - Trim unnecessary code in interrupt handler.
>>      - Check dma fence signaled state in dma fence creation function 
>> for a
>>        potential problem of hardware completing the job processing 
>> beforehand.
>>      - Add necessary locks.
>>      - Create a list and process all the unsignaled fences.
>>      - clean up fences in destroy function.
>>      - implement .enabled callback function
>
> A few more nit picks below, but from the technical side that looks 
> mostly clean.
>
>>
>> Signed-off-by: Arunpravin Paneer Selvam 
>> <Arunpravin.PaneerSelvam@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 +
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c   | 251 ++++++++++++++++++
>>   .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h   |  61 +++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c |  20 ++
>>   .../gpu/drm/amd/include/amdgpu_userqueue.h    |   2 +
>>   6 files changed, 341 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index a239533a895f..ea09273b585f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>>       amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>>       amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>>       amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
>> -    amdgpu_ring_mux.o amdgpu_seq64.o
>> +    amdgpu_ring_mux.o amdgpu_seq64.o amdgpu_userq_fence.o
>>     amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index bd3462d0da5f..6b7ac1ebd04c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -53,6 +53,7 @@
>>   #include "amdgpu_xgmi.h"
>>   #include "amdgpu_reset.h"
>>   #include "amdgpu_userqueue.h"
>> +#include "amdgpu_userq_fence.h"
>>     /*
>>    * KMS wrapper.
>> @@ -2827,6 +2828,10 @@ static int __init amdgpu_init(void)
>>       if (r)
>>           goto error_fence;
>>   +    r = amdgpu_userq_fence_slab_init();
>> +    if (r)
>> +        goto error_fence;
>> +
>>       DRM_INFO("amdgpu kernel modesetting enabled.\n");
>>       amdgpu_register_atpx_handler();
>>       amdgpu_acpi_detect();
>> @@ -2851,6 +2856,7 @@ static void __exit amdgpu_exit(void)
>>       amdgpu_unregister_atpx_handler();
>>       amdgpu_sync_fini();
>>       amdgpu_fence_slab_fini();
>> +    amdgpu_userq_fence_slab_fini();
>>       mmu_notifier_synchronize();
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> new file mode 100644
>> index 000000000000..609a7328e9a6
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> @@ -0,0 +1,251 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright 2023 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include <linux/kref.h>
>> +#include <linux/slab.h>
>> +
>> +#include <drm/drm_syncobj.h>
>> +
>> +#include "amdgpu.h"
>> +#include "amdgpu_userq_fence.h"
>> +#include "amdgpu_userqueue.h"
>> +
>> +static struct kmem_cache *amdgpu_userq_fence_slab;
>> +
>> +int amdgpu_userq_fence_slab_init(void)
>> +{
>> +    amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence",
>> +                            sizeof(struct amdgpu_userq_fence),
>> +                            0,
>> +                            SLAB_HWCACHE_ALIGN,
>> +                            NULL);
>> +    if (!amdgpu_userq_fence_slab)
>> +        return -ENOMEM;
>> +
>> +    return 0;
>> +}
>> +
>> +void amdgpu_userq_fence_slab_fini(void)
>> +{
>> +    rcu_barrier();
>> +    kmem_cache_destroy(amdgpu_userq_fence_slab);
>> +}
>> +
>> +static inline struct amdgpu_userq_fence 
>> *to_amdgpu_userq_fence(struct dma_fence *f)
>> +{
>> +    struct amdgpu_userq_fence *__f = container_of(f, struct 
>> amdgpu_userq_fence, base);
>> +
>> +    if (!__f)
>> +        return NULL;
>> +
>> +    if (__f->base.ops == &amdgpu_userq_fence_ops)
>> +        return __f;
>> +
>> +    return NULL;
>> +}
>> +
>> +static u64 amdgpu_userq_fence_read(struct amdgpu_userq_fence_driver 
>> *fence_drv)
>> +{
>> +    return le64_to_cpu(*fence_drv->cpu_addr);
>> +}
>> +
>> +int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev,
>> +                  struct amdgpu_usermode_queue *userq)
>
> Looks like you misunderstood me a bit.
>
> The usual naming convention in Linux for reference counted objects is 
> like that:
>
> _alloc() or similar for a function creating the object (the one which 
> has the kref_init).
> _get() for the function grabbing a reference.
> _put() for the function releasing a reference.
> _free() or _destroy() or similar for the function which is called by 
> _put() to cleanup.
>
>> +{
>> +    struct amdgpu_userq_fence_driver *fence_drv;
>> +    int r;
>> +
>> +    fence_drv = userq->fence_drv;
>> +    if (!fence_drv)
>> +        return -EINVAL;
>> +
>> +    /* Acquire seq64 memory */
>> +    r = amdgpu_seq64_get(adev, &fence_drv->gpu_addr,
>> +                 &fence_drv->cpu_addr);
>> +    if (r)
>> +        return -ENOMEM;
>> +
>> +    kref_init(&fence_drv->refcount);
>> +    INIT_LIST_HEAD(&fence_drv->fences);
>> +    spin_lock_init(&fence_drv->fence_list_lock);
>> +
>> +    fence_drv->adev = adev;
>> +    fence_drv->context = dma_fence_context_alloc(1);
>> +
>> +    get_task_comm(fence_drv->timeline_name, current);
>> +
>> +    return 0;
>> +}
>> +
>> +void amdgpu_userq_fence_driver_destroy(struct kref *ref)
>> +{
>> +    struct amdgpu_userq_fence_driver *fence_drv = container_of(ref,
>> +                     struct amdgpu_userq_fence_driver,
>> +                     refcount);
>> +    struct amdgpu_device *adev = fence_drv->adev;
>> +    struct amdgpu_userq_fence *fence, *tmp;
>> +    struct dma_fence *f;
>> +
>> +    spin_lock(&fence_drv->fence_list_lock);
>> +    list_for_each_entry_safe(fence, tmp, &fence_drv->fences, link) {
>> +        f = &fence->base;
>> +
>> +        if (!dma_fence_is_signaled(f)) {
>> +            dma_fence_set_error(f, -ECANCELED);
>> +            dma_fence_signal(f);
>> +        }
>> +
>> +        list_del(&fence->link);
>> +        dma_fence_put(f);
>> +    }
>> +
>> +    WARN_ON_ONCE(!list_empty(&fence_drv->fences));
>> +    spin_unlock(&fence_drv->fence_list_lock);
>> +
>> +    /* Free seq64 memory */
>> +    amdgpu_seq64_free(adev, fence_drv->gpu_addr);
>> +    kfree(fence_drv);
>> +}
>> +
>> +void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver 
>> *fence_drv)
>> +{
>> +    kref_put(&fence_drv->refcount, amdgpu_userq_fence_driver_destroy);
>> +}
>> +
>> +int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
>> +                  u64 seq, struct dma_fence **f)
>> +{
>> +    struct amdgpu_userq_fence_driver *fence_drv;
>> +    struct amdgpu_userq_fence *userq_fence;
>> +    struct dma_fence *fence;
>> +
>> +    fence_drv = userq->fence_drv;
>> +    if (!fence_drv)
>> +        return -EINVAL;
>> +
>> +    userq_fence = kmem_cache_alloc(amdgpu_userq_fence_slab, 
>> GFP_ATOMIC);
>> +    if (!userq_fence)
>> +        return -ENOMEM;
>> +
>> +    spin_lock_init(&userq_fence->lock);
>> +    INIT_LIST_HEAD(&userq_fence->link);
>> +    fence = &userq_fence->base;
>> +    userq_fence->fence_drv = fence_drv;
>> +
>> +    dma_fence_init(fence, &amdgpu_userq_fence_ops, &userq_fence->lock,
>> +               fence_drv->context, seq);
>> +
>> +    kref_get(&fence_drv->refcount);
>> +
>> +    spin_lock(&fence_drv->fence_list_lock);
>> +    /* Check if hardware has already processed the job */
>> +    if (!dma_fence_is_signaled(fence)) {
>> +        dma_fence_get(fence);
>> +        list_add_tail(&userq_fence->link, &fence_drv->fences);
>> +        dma_fence_put(fence);
>> +    }
>> +    spin_unlock(&fence_drv->fence_list_lock);
>
>> +    /* Release the fence driver reference */
>> +    amdgpu_userq_fence_driver_put(fence_drv);
>
> Hui? That doesn't make much sense. We should keep the reference as 
> long as the fence exists or at least as long as it isn't signaled (the 
> later is probably better, but tricky to get right).
>
>> +
>> +    *f = fence;
>> +
>> +    return 0;
>> +}
>> +
>> +bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver 
>> *fence_drv)
>
> Maybe name that function amdgpu_userq_fence_driver_process() and move 
> that up a bit to group together the functions dealing with the 
> fence_driver and the functions dealing with the fence.
>
>> +{
>> +    struct amdgpu_userq_fence *userq_fence, *tmp;
>> +    struct dma_fence *fence;
>> +    u64 rptr;
>> +
>> +    if (!fence_drv)
>> +        return false;
>> +
>> +    rptr = amdgpu_userq_fence_read(fence_drv);
>> +
>> +    spin_lock(&fence_drv->fence_list_lock);
>> +    list_for_each_entry_safe(userq_fence, tmp, &fence_drv->fences, 
>> link) {
>> +        fence = &userq_fence->base;
>> +
>> +        if (rptr >= fence->seqno) {
>> +            dma_fence_signal(fence);
>> +            list_del(&userq_fence->link);
>> +
>> +            dma_fence_put(fence);
>> +        } else {
>> +            break;
>> +        }
>> +    }
>> +    spin_unlock(&fence_drv->fence_list_lock);
>> +
>> +    return true;
>
> BTW: What's the return value good for? Could we drop that?

Working on v3, will include all the review comments.

Thanks,
Arun
>
> Regards,
> Christian.
>
>> +}
>> +
>> +static const char *amdgpu_userq_fence_get_driver_name(struct 
>> dma_fence *f)
>> +{
>> +    return "amdgpu_userqueue_fence";
>> +}
>> +
>> +static const char *amdgpu_userq_fence_get_timeline_name(struct 
>> dma_fence *f)
>> +{
>> +    struct amdgpu_userq_fence *fence = to_amdgpu_userq_fence(f);
>> +
>> +    return fence->fence_drv->timeline_name;
>> +}
>> +
>> +static bool amdgpu_userq_fence_signaled(struct dma_fence *f)
>> +{
>> +    struct amdgpu_userq_fence *fence = to_amdgpu_userq_fence(f);
>> +    struct amdgpu_userq_fence_driver *fence_drv = fence->fence_drv;
>> +    u64 rptr, wptr;
>> +
>> +    rptr = amdgpu_userq_fence_read(fence_drv);
>> +    wptr = fence->base.seqno;
>> +
>> +    if (rptr >= wptr)
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>> +static void amdgpu_userq_fence_free(struct rcu_head *rcu)
>> +{
>> +    struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>> +
>> +    kmem_cache_free(amdgpu_userq_fence_slab, to_amdgpu_userq_fence(f));
>> +}
>> +
>> +static void amdgpu_userq_fence_release(struct dma_fence *f)
>> +{
>> +    call_rcu(&f->rcu, amdgpu_userq_fence_free);
>> +}
>> +
>> +static const struct dma_fence_ops amdgpu_userq_fence_ops = {
>> +    .use_64bit_seqno = true,
>> +    .get_driver_name = amdgpu_userq_fence_get_driver_name,
>> +    .get_timeline_name = amdgpu_userq_fence_get_timeline_name,
>> +    .signaled = amdgpu_userq_fence_signaled,
>> +    .release = amdgpu_userq_fence_release,
>> +};
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>> new file mode 100644
>> index 000000000000..230dd54b4cd3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>> @@ -0,0 +1,61 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright 2023 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef __AMDGPU_USERQ_FENCE_H__
>> +#define __AMDGPU_USERQ_FENCE_H__
>> +
>> +#include <linux/types.h>
>> +
>> +struct amdgpu_userq_fence {
>> +    struct dma_fence base;
>> +    /* userq fence lock */
>> +    spinlock_t lock;
>> +    struct list_head link;
>> +    struct amdgpu_userq_fence_driver *fence_drv;
>> +};
>> +
>> +struct amdgpu_userq_fence_driver {
>> +    struct kref refcount;
>> +    u64 gpu_addr;
>> +    u64 *cpu_addr;
>> +    u64 context;
>> +    /* fence list lock */
>> +    spinlock_t fence_list_lock;
>> +    struct list_head fences;
>> +    struct amdgpu_device *adev;
>> +    char timeline_name[TASK_COMM_LEN];
>> +};
>> +
>> +static const struct dma_fence_ops amdgpu_userq_fence_ops;
>> +
>> +int amdgpu_userq_fence_slab_init(void);
>> +void amdgpu_userq_fence_slab_fini(void);
>> +int amdgpu_userq_fence_driver_get(struct amdgpu_device *adev, struct 
>> amdgpu_usermode_queue *userq);
>> +void amdgpu_userq_fence_driver_put(struct amdgpu_userq_fence_driver 
>> *fence_drv);
>> +int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
>> +                  u64 seq, struct dma_fence **f);
>> +bool amdgpu_userq_fence_process(struct amdgpu_userq_fence_driver 
>> *fence_drv);
>> +void amdgpu_userq_fence_driver_destroy(struct kref *ref);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 2f1aba1e9792..d4317b0f6487 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -23,6 +23,7 @@
>>     #include "amdgpu.h"
>>   #include "amdgpu_vm.h"
>> +#include "amdgpu_userq_fence.h"
>>     #define AMDGPU_USERQ_DOORBELL_INDEX 
>> (AMDGPU_NAVI10_DOORBELL_GFX_USERQUEUE_START + 4)
>>   @@ -264,6 +265,8 @@ static int amdgpu_userqueue_create(struct 
>> drm_file *filp, union drm_amdgpu_userq
>>       struct amdgpu_vm *vm = &fpriv->vm;
>>       struct amdgpu_userq_mgr *uq_mgr = &fpriv->userq_mgr;
>>       struct drm_amdgpu_userq_mqd *mqd_in = &args->in.mqd;
>> +    struct amdgpu_userq_fence_driver *fence_drv;
>> +    struct amdgpu_device *adev = uq_mgr->adev;
>>         pasid = vm->pasid;
>>       if (vm->pasid < 0) {
>> @@ -280,6 +283,19 @@ static int amdgpu_userqueue_create(struct 
>> drm_file *filp, union drm_amdgpu_userq
>>           return -ENOMEM;
>>       }
>>   +    fence_drv = kzalloc(sizeof(struct amdgpu_userq_fence_driver), 
>> GFP_KERNEL);
>> +    if (!fence_drv) {
>> +        DRM_ERROR("Failed to allocate memory for fence driver\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    queue->fence_drv = fence_drv;
>> +    r = amdgpu_userq_fence_driver_get(adev, queue);
>> +    if (r) {
>> +        DRM_ERROR("Failed to get fence driver\n");
>> +        goto free_fence_drv;
>> +    }
>> +
>>       queue->vm = vm;
>>       queue->pasid = pasid;
>>       queue->wptr_gpu_addr = mqd_in->wptr_va;
>> @@ -339,6 +355,9 @@ static int amdgpu_userqueue_create(struct 
>> drm_file *filp, union drm_amdgpu_userq
>>   free_qid:
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>   +free_fence_drv:
>> +    amdgpu_userq_fence_driver_put(queue->fence_drv);
>> +
>>   free_queue:
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       kfree(queue);
>> @@ -363,6 +382,7 @@ static void amdgpu_userqueue_destroy(struct 
>> drm_file *filp, int queue_id)
>>       amdgpu_userqueue_destroy_mqd(uq_mgr, queue);
>>       amdgpu_userqueue_free_index(uq_mgr, queue->queue_id);
>>       list_del(&queue->userq_node);
>> +    amdgpu_userq_fence_driver_put(queue->fence_drv);
>>       mutex_unlock(&uq_mgr->userq_mutex);
>>       kfree(queue);
>>   }
>> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h 
>> b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> index bda27583b58c..cf7264df8c46 100644
>> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>> @@ -73,6 +73,8 @@ struct amdgpu_usermode_queue {
>>       struct amdgpu_vm        *vm;
>>       struct amdgpu_userq_mgr *userq_mgr;
>>       struct list_head     userq_node;
>> +
>> +    struct amdgpu_userq_fence_driver *fence_drv;
>>   };
>>     int amdgpu_userq_ioctl(struct drm_device *dev, void *data, struct 
>> drm_file *filp);
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-28  8:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-26 16:54 [PATCH 0/6] Usermode queue fencing synchronization Arunpravin Paneer Selvam
2023-02-26 16:54 ` [PATCH 1/6] drm/amdgpu: Implement a new 64bit sequence memory driver Arunpravin Paneer Selvam
2023-02-26 16:54 ` [PATCH 2/6] drm/amdgpu: Implement a new userqueue fence driver Arunpravin Paneer Selvam
2023-02-27 12:42   ` Christian König
2023-02-28  8:45     ` Arunpravin Paneer Selvam
2023-02-26 16:54 ` [PATCH 3/6] drm/amdgpu: Add mqd support for the fence address Arunpravin Paneer Selvam
2023-02-26 16:54 ` [PATCH 4/6] drm/amdgpu: UAPI headers for userqueue Secure semaphore Arunpravin Paneer Selvam
2023-02-27 12:52   ` Christian König
2023-02-26 16:54 ` [PATCH 5/6] drm/amdgpu: Implement userqueue signal/wait IOCTL functions Arunpravin Paneer Selvam
2023-02-27 12:59   ` Christian König
2023-02-27 13:20     ` Arunpravin Paneer Selvam
2023-02-27 13:23       ` Christian König
2023-02-27 13:35         ` Arunpravin Paneer Selvam
2023-02-26 16:54 ` [PATCH 6/6] drm/amdgpu: Enable userqueue fence interrupt handling support Arunpravin Paneer Selvam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.