All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] AMDGPU doorbell manager
@ 2023-04-12 16:25 Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 01/12] drm/amdgpu: create a new file for " Shashank Sharma
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx; +Cc: mukul.joshi, contactshashanksharma, arvind.yadav, Shashank Sharma

Doorbells in AMDGPU drivers are currently managed by different clients
in a scattered way, across may places. The existing clients are:
- AMDGPU graphics driver for kernel level doorbell writes.
- AMDGPU MES module for kernel level doorbell write (MES ring test and
  aggregated doorbells).
- AMDGPU MES module for MES process doorbells.
- AMDKFD module for KFD/KIQ kernel doorbells.
- AMDKFD module for KFD process doorbells.
- AMDGPU usermode queues for usermode doorbell writes (upcoming).

This patch series introduces doorbell-manager to keep the doorbell
handling at a central place. The fundamental changes are:
- Introduce and accommodate a new GEM domain for doorbells.
- Prepare the AMDGPU ttm backend for handling doorbell allocation.
- Create doorbell BOs for kernel-level and process level doorbell
  opertations, and place it in existing structures.
- Modify the existing graphics, KFD and MES code to use the
  doorbell-manager functions.
- Remove the existing doorbell management code in KFD/MES.

The idea is that:
- a kernel client can call doorbell manager functions to allocate/free
  doorbell pages.
- a usermode app can directly allocate a page from the doorbell bar just
  like a GEM object and use it for different usermode queues.
- There is no direct UAPI change in this series, just an additional flag
  for GEM_OBJECT_DOORBELL, here:
  https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/286

PS: This series has been sanity tested with kfd_test_suit to ensure
    it is not introducing any regressions due to kfd doorbell changes.

Alex Deucher (2):
  drm/amdgpu: add UAPI for allocating doorbell memory
  drm/amdgpu: accommodate DOMAIN/PL_DOORBELL

Shashank Sharma (10):
  drm/amdgpu: create a new file for doorbell manager
  drm/amdgpu: don't modify num_doorbells for mes
  drm/amdgpu: initialize ttm for doorbells
  drm/amdgpu: create kernel doorbell pages
  get absolute offset from doorbell index
  drm/amdgpu: use doorbell manager for kfd kernel doorbells
  drm/amdgpu: use doorbell manager for kfd process doorbells
  drm/amdgpu: remove unused functions and variables
  drm/amdgpu: use doorbell mgr for MES kernel doorbells
  drm/amdgpu: cleanup MES process level doorbells

 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 174 +------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |  17 +-
 .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 230 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       | 148 +++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |   2 +
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h    |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  30 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  13 -
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |   2 -
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c     | 207 +++++-----------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  27 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  21 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |  22 +-
 include/uapi/drm/amdgpu_drm.h                 |   7 +-
 19 files changed, 446 insertions(+), 494 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c

-- 
2.40.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 01/12] drm/amdgpu: create a new file for doorbell manager
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-13 10:48   ` Christian König
  2023-04-12 16:25 ` [PATCH v2 02/12] drm/amdgpu: don't modify num_doorbells for mes Shashank Sharma
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch:
- creates a new file for doorbell management.
- moves doorbell code from amdgpu_device.c to this file.

V2:
 - remove doc from function declaration (Christian)
 - remove 'device' from function names to make it consistent (Alex)
 - add SPDX license identifier (Luben)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 174 +----------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |   6 +
 .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 183 ++++++++++++++++++
 4 files changed, 195 insertions(+), 170 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 798d0e9a60b7..204665f20319 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -41,7 +41,7 @@ ccflags-y := -I$(FULL_AMD_PATH)/include/asic_reg \
 amdgpu-y := amdgpu_drv.o
 
 # add KMS driver
-amdgpu-y += amdgpu_device.o amdgpu_kms.o \
+amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
 	amdgpu_atombios.o atombios_crtc.o amdgpu_connectors.o \
 	atom.o amdgpu_fence.o amdgpu_ttm.o amdgpu_object.o amdgpu_gart.o \
 	amdgpu_encoders.o amdgpu_display.o amdgpu_i2c.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 57ee1c4a81e9..88c75f7a8d66 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -579,94 +579,6 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev,
 	}
 }
 
-/**
- * amdgpu_mm_rdoorbell - read a doorbell dword
- *
- * @adev: amdgpu_device pointer
- * @index: doorbell index
- *
- * Returns the value in the doorbell aperture at the
- * requested doorbell index (CIK).
- */
-u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
-{
-	if (amdgpu_device_skip_hw_access(adev))
-		return 0;
-
-	if (index < adev->doorbell.num_kernel_doorbells) {
-		return readl(adev->doorbell.ptr + index);
-	} else {
-		DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
-		return 0;
-	}
-}
-
-/**
- * amdgpu_mm_wdoorbell - write a doorbell dword
- *
- * @adev: amdgpu_device pointer
- * @index: doorbell index
- * @v: value to write
- *
- * Writes @v to the doorbell aperture at the
- * requested doorbell index (CIK).
- */
-void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
-{
-	if (amdgpu_device_skip_hw_access(adev))
-		return;
-
-	if (index < adev->doorbell.num_kernel_doorbells) {
-		writel(v, adev->doorbell.ptr + index);
-	} else {
-		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
-	}
-}
-
-/**
- * amdgpu_mm_rdoorbell64 - read a doorbell Qword
- *
- * @adev: amdgpu_device pointer
- * @index: doorbell index
- *
- * Returns the value in the doorbell aperture at the
- * requested doorbell index (VEGA10+).
- */
-u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
-{
-	if (amdgpu_device_skip_hw_access(adev))
-		return 0;
-
-	if (index < adev->doorbell.num_kernel_doorbells) {
-		return atomic64_read((atomic64_t *)(adev->doorbell.ptr + index));
-	} else {
-		DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
-		return 0;
-	}
-}
-
-/**
- * amdgpu_mm_wdoorbell64 - write a doorbell Qword
- *
- * @adev: amdgpu_device pointer
- * @index: doorbell index
- * @v: value to write
- *
- * Writes @v to the doorbell aperture at the
- * requested doorbell index (VEGA10+).
- */
-void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
-{
-	if (amdgpu_device_skip_hw_access(adev))
-		return;
-
-	if (index < adev->doorbell.num_kernel_doorbells) {
-		atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
-	} else {
-		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
-	}
-}
-
 /**
  * amdgpu_device_indirect_rreg - read an indirect register
  *
@@ -1016,82 +928,6 @@ int amdgpu_device_pci_reset(struct amdgpu_device *adev)
 	return pci_reset_function(adev->pdev);
 }
 
-/*
- * GPU doorbell aperture helpers function.
- */
-/**
- * amdgpu_device_doorbell_init - Init doorbell driver information.
- *
- * @adev: amdgpu_device pointer
- *
- * Init doorbell driver information (CIK)
- * Returns 0 on success, error on failure.
- */
-static int amdgpu_device_doorbell_init(struct amdgpu_device *adev)
-{
-
-	/* No doorbell on SI hardware generation */
-	if (adev->asic_type < CHIP_BONAIRE) {
-		adev->doorbell.base = 0;
-		adev->doorbell.size = 0;
-		adev->doorbell.num_kernel_doorbells = 0;
-		adev->doorbell.ptr = NULL;
-		return 0;
-	}
-
-	if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
-		return -EINVAL;
-
-	amdgpu_asic_init_doorbell_index(adev);
-
-	/* doorbell bar mapping */
-	adev->doorbell.base = pci_resource_start(adev->pdev, 2);
-	adev->doorbell.size = pci_resource_len(adev->pdev, 2);
-
-	if (adev->enable_mes) {
-		adev->doorbell.num_kernel_doorbells =
-			adev->doorbell.size / sizeof(u32);
-	} else {
-		adev->doorbell.num_kernel_doorbells =
-			min_t(u32, adev->doorbell.size / sizeof(u32),
-			      adev->doorbell_index.max_assignment+1);
-		if (adev->doorbell.num_kernel_doorbells == 0)
-			return -EINVAL;
-
-		/* For Vega, reserve and map two pages on doorbell BAR since SDMA
-		 * paging queue doorbell use the second page. The
-		 * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
-		 * doorbells are in the first page. So with paging queue enabled,
-		 * the max num_kernel_doorbells should + 1 page (0x400 in dword)
-		 */
-		if (adev->asic_type >= CHIP_VEGA10)
-			adev->doorbell.num_kernel_doorbells += 0x400;
-	}
-
-	adev->doorbell.ptr = ioremap(adev->doorbell.base,
-				     adev->doorbell.num_kernel_doorbells *
-				     sizeof(u32));
-	if (adev->doorbell.ptr == NULL)
-		return -ENOMEM;
-
-	return 0;
-}
-
-/**
- * amdgpu_device_doorbell_fini - Tear down doorbell driver information.
- *
- * @adev: amdgpu_device pointer
- *
- * Tear down doorbell driver information (CIK)
- */
-static void amdgpu_device_doorbell_fini(struct amdgpu_device *adev)
-{
-	iounmap(adev->doorbell.ptr);
-	adev->doorbell.ptr = NULL;
-}
-
-
-
 /*
  * amdgpu_device_wb_*()
  * Writeback is the method by which the GPU updates special pages in memory
@@ -1239,7 +1075,7 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
 			      cmd & ~PCI_COMMAND_MEMORY);
 
 	/* Free the VRAM and doorbell BAR, we most likely need to move both. */
-	amdgpu_device_doorbell_fini(adev);
+	amdgpu_doorbell_fini(adev);
 	if (adev->asic_type >= CHIP_BONAIRE)
 		pci_release_resource(adev->pdev, 2);
 
@@ -1256,7 +1092,7 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
 	/* When the doorbell or fb BAR isn't available we have no chance of
 	 * using the device.
 	 */
-	r = amdgpu_device_doorbell_init(adev);
+	r = amdgpu_doorbell_init(adev);
 	if (r || (pci_resource_flags(adev->pdev, 0) & IORESOURCE_UNSET))
 		return -ENODEV;
 
@@ -3712,7 +3548,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 		dev_info(adev->dev, "PCIE atomic ops is not supported\n");
 
 	/* doorbell bar mapping and doorbell index init*/
-	amdgpu_device_doorbell_init(adev);
+	amdgpu_doorbell_init(adev);
 
 	if (amdgpu_emu_mode == 1) {
 		/* post the asic on emulation mode */
@@ -3942,7 +3778,7 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
 	unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1);
 
 	/* Unmap all mapped bars - Doorbell, registers and VRAM */
-	amdgpu_device_doorbell_fini(adev);
+	amdgpu_doorbell_fini(adev);
 
 	iounmap(adev->rmmio);
 	adev->rmmio = NULL;
@@ -4051,7 +3887,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
 
 		iounmap(adev->rmmio);
 		adev->rmmio = NULL;
-		amdgpu_device_doorbell_fini(adev);
+		amdgpu_doorbell_fini(adev);
 		drm_dev_exit(idx);
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 908433ac6cb3..3e77a6b4bd69 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -306,6 +306,12 @@ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v);
 u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index);
 void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v);
 
+/*
+ * GPU doorbell aperture helpers function.
+ */
+int amdgpu_doorbell_init(struct amdgpu_device *adev);
+void amdgpu_doorbell_fini(struct amdgpu_device *adev);
+
 #define RDOORBELL32(index) amdgpu_mm_rdoorbell(adev, (index))
 #define WDOORBELL32(index, v) amdgpu_mm_wdoorbell(adev, (index), (v))
 #define RDOORBELL64(index) amdgpu_mm_rdoorbell64(adev, (index))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
new file mode 100644
index 000000000000..64fec954815d
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+
+/**
+ * amdgpu_mm_rdoorbell - read a doorbell dword
+ *
+ * @adev: amdgpu_device pointer
+ * @index: doorbell index
+ *
+ * Returns the value in the doorbell aperture at the
+ * requested doorbell index (CIK).
+ */
+u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
+{
+	if (amdgpu_device_skip_hw_access(adev))
+		return 0;
+
+	if (index < adev->doorbell.num_kernel_doorbells)
+		return readl(adev->doorbell.ptr + index);
+
+	DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
+	return 0;
+}
+
+/**
+ * amdgpu_mm_wdoorbell - write a doorbell dword
+ *
+ * @adev: amdgpu_device pointer
+ * @index: doorbell index
+ * @v: value to write
+ *
+ * Writes @v to the doorbell aperture at the
+ * requested doorbell index (CIK).
+ */
+void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
+{
+	if (amdgpu_device_skip_hw_access(adev))
+		return;
+
+	if (index < adev->doorbell.num_kernel_doorbells)
+		writel(v, adev->doorbell.ptr + index);
+	else
+		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
+}
+
+/**
+ * amdgpu_mm_rdoorbell64 - read a doorbell Qword
+ *
+ * @adev: amdgpu_device pointer
+ * @index: doorbell index
+ *
+ * Returns the value in the doorbell aperture at the
+ * requested doorbell index (VEGA10+).
+ */
+u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
+{
+	if (amdgpu_device_skip_hw_access(adev))
+		return 0;
+
+	if (index < adev->doorbell.num_kernel_doorbells)
+		return atomic64_read((atomic64_t *)(adev->doorbell.ptr + index));
+
+	DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
+	return 0;
+}
+
+/**
+ * amdgpu_mm_wdoorbell64 - write a doorbell Qword
+ *
+ * @adev: amdgpu_device pointer
+ * @index: doorbell index
+ * @v: value to write
+ *
+ * Writes @v to the doorbell aperture at the
+ * requested doorbell index (VEGA10+).
+ */
+void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
+{
+	if (amdgpu_device_skip_hw_access(adev))
+		return;
+
+	if (index < adev->doorbell.num_kernel_doorbells)
+		atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
+	else
+		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
+}
+
+/*
+ * GPU doorbell aperture helpers function.
+ */
+/**
+ * amdgpu_doorbell_init - Init doorbell driver information.
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Init doorbell driver information (CIK)
+ * Returns 0 on success, error on failure.
+ */
+int amdgpu_doorbell_init(struct amdgpu_device *adev)
+{
+
+	/* No doorbell on SI hardware generation */
+	if (adev->asic_type < CHIP_BONAIRE) {
+		adev->doorbell.base = 0;
+		adev->doorbell.size = 0;
+		adev->doorbell.num_kernel_doorbells = 0;
+		adev->doorbell.ptr = NULL;
+		return 0;
+	}
+
+	if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
+		return -EINVAL;
+
+	amdgpu_asic_init_doorbell_index(adev);
+
+	/* doorbell bar mapping */
+	adev->doorbell.base = pci_resource_start(adev->pdev, 2);
+	adev->doorbell.size = pci_resource_len(adev->pdev, 2);
+
+	if (adev->enable_mes) {
+		adev->doorbell.num_kernel_doorbells =
+			adev->doorbell.size / sizeof(u32);
+	} else {
+		adev->doorbell.num_kernel_doorbells =
+			min_t(u32, adev->doorbell.size / sizeof(u32),
+			      adev->doorbell_index.max_assignment+1);
+		if (adev->doorbell.num_kernel_doorbells == 0)
+			return -EINVAL;
+
+		/* For Vega, reserve and map two pages on doorbell BAR since SDMA
+		 * paging queue doorbell use the second page. The
+		 * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
+		 * doorbells are in the first page. So with paging queue enabled,
+		 * the max num_kernel_doorbells should + 1 page (0x400 in dword)
+		 */
+		if (adev->asic_type >= CHIP_VEGA10)
+			adev->doorbell.num_kernel_doorbells += 0x400;
+	}
+
+	adev->doorbell.ptr = ioremap(adev->doorbell.base,
+				     adev->doorbell.num_kernel_doorbells *
+				     sizeof(u32));
+	if (adev->doorbell.ptr == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/**
+ * amdgpu_doorbell_fini - Tear down doorbell driver information.
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Tear down doorbell driver information (CIK)
+ */
+void amdgpu_doorbell_fini(struct amdgpu_device *adev)
+{
+	iounmap(adev->doorbell.ptr);
+	adev->doorbell.ptr = NULL;
+}
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/12] drm/amdgpu: don't modify num_doorbells for mes
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 01/12] drm/amdgpu: create a new file for " Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 03/12] drm/amdgpu: add UAPI for allocating doorbell memory Shashank Sharma
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch removes the check and change in num_kernel_doorbells
for MES, which is not being used anywhere by MES code.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 34 ++++++++-----------
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
index 64fec954815d..4b934aadcce6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
@@ -140,25 +140,21 @@ int amdgpu_doorbell_init(struct amdgpu_device *adev)
 	adev->doorbell.base = pci_resource_start(adev->pdev, 2);
 	adev->doorbell.size = pci_resource_len(adev->pdev, 2);
 
-	if (adev->enable_mes) {
-		adev->doorbell.num_kernel_doorbells =
-			adev->doorbell.size / sizeof(u32);
-	} else {
-		adev->doorbell.num_kernel_doorbells =
-			min_t(u32, adev->doorbell.size / sizeof(u32),
-			      adev->doorbell_index.max_assignment+1);
-		if (adev->doorbell.num_kernel_doorbells == 0)
-			return -EINVAL;
-
-		/* For Vega, reserve and map two pages on doorbell BAR since SDMA
-		 * paging queue doorbell use the second page. The
-		 * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
-		 * doorbells are in the first page. So with paging queue enabled,
-		 * the max num_kernel_doorbells should + 1 page (0x400 in dword)
-		 */
-		if (adev->asic_type >= CHIP_VEGA10)
-			adev->doorbell.num_kernel_doorbells += 0x400;
-	}
+	adev->doorbell.num_kernel_doorbells =
+		min_t(u32, adev->doorbell.size / sizeof(u32),
+				adev->doorbell_index.max_assignment+1);
+	if (adev->doorbell.num_kernel_doorbells == 0)
+		return -EINVAL;
+
+	/*
+	 * For Vega, reserve and map two pages on doorbell BAR since SDMA
+	 * paging queue doorbell use the second page. The
+	 * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
+	 * doorbells are in the first page. So with paging queue enabled,
+	 * the max num_kernel_doorbells should + 1 page (0x400 in dword)
+	 */
+	if (adev->asic_type >= CHIP_VEGA10)
+		adev->doorbell.num_kernel_doorbells += 0x400;
 
 	adev->doorbell.ptr = ioremap(adev->doorbell.base,
 				     adev->doorbell.num_kernel_doorbells *
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/12] drm/amdgpu: add UAPI for allocating doorbell memory
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 01/12] drm/amdgpu: create a new file for " Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 02/12] drm/amdgpu: don't modify num_doorbells for mes Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 04/12] drm/amdgpu: accommodate DOMAIN/PL_DOORBELL Shashank Sharma
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

From: Alex Deucher <alexander.deucher@amd.com>

This patch adds flags for a new gem domain AMDGPU_GEM_DOMAIN_DOORBELL
in the UAPI layer.

V2: Drop 'memory' from description (Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 4038abe8505a..cc5d551abda5 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -94,6 +94,9 @@ extern "C" {
  *
  * %AMDGPU_GEM_DOMAIN_OA	Ordered append, used by 3D or Compute engines
  * for appending data.
+ *
+ * %AMDGPU_GEM_DOMAIN_DOORBELL	Doorbell. It is an MMIO region for
+ * signalling user mode queues.
  */
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
@@ -101,12 +104,14 @@ extern "C" {
 #define AMDGPU_GEM_DOMAIN_GDS		0x8
 #define AMDGPU_GEM_DOMAIN_GWS		0x10
 #define AMDGPU_GEM_DOMAIN_OA		0x20
+#define AMDGPU_GEM_DOMAIN_DOORBELL	0x40
 #define AMDGPU_GEM_DOMAIN_MASK		(AMDGPU_GEM_DOMAIN_CPU | \
 					 AMDGPU_GEM_DOMAIN_GTT | \
 					 AMDGPU_GEM_DOMAIN_VRAM | \
 					 AMDGPU_GEM_DOMAIN_GDS | \
 					 AMDGPU_GEM_DOMAIN_GWS | \
-					 AMDGPU_GEM_DOMAIN_OA)
+					 AMDGPU_GEM_DOMAIN_OA | \
+					 AMDGPU_GEM_DOMAIN_DOORBELL)
 
 /* Flag that CPU access will be required for the case of VRAM domain */
 #define AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED	(1 << 0)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/12] drm/amdgpu: accommodate DOMAIN/PL_DOORBELL
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (2 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 03/12] drm/amdgpu: add UAPI for allocating doorbell memory Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 05/12] drm/amdgpu: initialize ttm for doorbells Shashank Sharma
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

From: Alex Deucher <alexander.deucher@amd.com>

This patch adds changes:
- to accommodate the new GEM domain DOORBELL
- to accommodate the new TTM PL DOORBELL

in order to manage doorbell pages as GEM object.

V2: Addressed reviwe comments from Christian
    - drop the doorbell changes for pinning/unpinning
    - drop the doorbell changes for dma-buf map
    - drop the doorbell changes for sgt
    - no need to handle TTM_PL_FLAG_CONTIGUOUS for doorbell
    - add caching type for doorbell

V3: - Removed unrelated empty line (Christian)
    - Add PL_DOORBELL in mem_type_to_domain() as well (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c     | 11 ++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h     |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c        | 16 +++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h        |  1 +
 5 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4e684c2afc70..b0fb2e1706f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -147,6 +147,14 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		c++;
 	}
 
+	if (domain & AMDGPU_GEM_DOMAIN_DOORBELL) {
+		places[c].fpfn = 0;
+		places[c].lpfn = 0;
+		places[c].mem_type = AMDGPU_PL_DOORBELL;
+		places[c].flags = 0;
+		c++;
+	}
+
 	if (domain & AMDGPU_GEM_DOMAIN_GTT) {
 		places[c].fpfn = 0;
 		places[c].lpfn = 0;
@@ -466,7 +474,7 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
 		goto fail;
 	}
 
-	/* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */
+	/* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU, _DOMAIN_DOORBELL */
 	return true;
 
 fail:
@@ -1013,6 +1021,7 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
 	} else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
 		atomic64_sub(amdgpu_bo_size(bo), &adev->gart_pin_size);
 	}
+
 }
 
 static const char *amdgpu_vram_names[] = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 93207badf83f..f546b403053f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -152,6 +152,8 @@ static inline unsigned amdgpu_mem_type_to_domain(u32 mem_type)
 		return AMDGPU_GEM_DOMAIN_GWS;
 	case AMDGPU_PL_OA:
 		return AMDGPU_GEM_DOMAIN_OA;
+	case AMDGPU_PL_DOORBELL:
+		return AMDGPU_GEM_DOMAIN_DOORBELL;
 	default:
 		break;
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 5c4f93ee0c57..3c988cc406e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -90,6 +90,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
 		cur->node = block;
 		break;
 	case TTM_PL_TT:
+	case AMDGPU_PL_DOORBELL:
 		node = to_ttm_range_mgr_node(res)->mm_nodes;
 		while (start >= node->size << PAGE_SHIFT)
 			start -= node++->size << PAGE_SHIFT;
@@ -152,6 +153,7 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size)
 		cur->size = min(amdgpu_vram_mgr_block_size(block), cur->remaining);
 		break;
 	case TTM_PL_TT:
+	case AMDGPU_PL_DOORBELL:
 		node = cur->node;
 
 		cur->node = ++node;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 55e0284b2bdd..6f61491ef3dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -128,6 +128,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 	case AMDGPU_PL_GDS:
 	case AMDGPU_PL_GWS:
 	case AMDGPU_PL_OA:
+	case AMDGPU_PL_DOORBELL:
 		placement->num_placement = 0;
 		placement->num_busy_placement = 0;
 		return;
@@ -500,9 +501,11 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict,
 	if (old_mem->mem_type == AMDGPU_PL_GDS ||
 	    old_mem->mem_type == AMDGPU_PL_GWS ||
 	    old_mem->mem_type == AMDGPU_PL_OA ||
+	    old_mem->mem_type == AMDGPU_PL_DOORBELL ||
 	    new_mem->mem_type == AMDGPU_PL_GDS ||
 	    new_mem->mem_type == AMDGPU_PL_GWS ||
-	    new_mem->mem_type == AMDGPU_PL_OA) {
+	    new_mem->mem_type == AMDGPU_PL_OA ||
+	    new_mem->mem_type == AMDGPU_PL_DOORBELL) {
 		/* Nothing to save here */
 		ttm_bo_move_null(bo, new_mem);
 		goto out;
@@ -586,6 +589,12 @@ static int amdgpu_ttm_io_mem_reserve(struct ttm_device *bdev,
 		mem->bus.offset += adev->gmc.aper_base;
 		mem->bus.is_iomem = true;
 		break;
+	case AMDGPU_PL_DOORBELL:
+		mem->bus.offset = mem->start << PAGE_SHIFT;
+		mem->bus.offset += adev->doorbell.base;
+		mem->bus.is_iomem = true;
+		mem->bus.caching = ttm_uncached;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -600,6 +609,10 @@ static unsigned long amdgpu_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
 
 	amdgpu_res_first(bo->resource, (u64)page_offset << PAGE_SHIFT, 0,
 			 &cursor);
+
+	if (bo->resource->mem_type == AMDGPU_PL_DOORBELL)
+		return ((uint64_t)(adev->doorbell.base + cursor.start)) >> PAGE_SHIFT;
+
 	return (adev->gmc.aper_base + cursor.start) >> PAGE_SHIFT;
 }
 
@@ -1267,6 +1280,7 @@ uint64_t amdgpu_ttm_tt_pde_flags(struct ttm_tt *ttm, struct ttm_resource *mem)
 		flags |= AMDGPU_PTE_VALID;
 
 	if (mem && (mem->mem_type == TTM_PL_TT ||
+		    mem->mem_type == AMDGPU_PL_DOORBELL ||
 		    mem->mem_type == AMDGPU_PL_PREEMPT)) {
 		flags |= AMDGPU_PTE_SYSTEM;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index e2cd5894afc9..761cd6b2b942 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -33,6 +33,7 @@
 #define AMDGPU_PL_GWS		(TTM_PL_PRIV + 1)
 #define AMDGPU_PL_OA		(TTM_PL_PRIV + 2)
 #define AMDGPU_PL_PREEMPT	(TTM_PL_PRIV + 3)
+#define AMDGPU_PL_DOORBELL	(TTM_PL_PRIV + 4)
 
 #define AMDGPU_GTT_MAX_TRANSFER_SIZE	512
 #define AMDGPU_GTT_NUM_TRANSFER_WINDOWS	2
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/12] drm/amdgpu: initialize ttm for doorbells
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (3 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 04/12] drm/amdgpu: accommodate DOMAIN/PL_DOORBELL Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 06/12] drm/amdgpu: create kernel doorbell pages Shashank Sharma
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch initialzes the ttm resource manager for doorbells.

V2: Do not round up doorbell size (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 6f61491ef3dd..da65b72cba77 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1858,6 +1858,13 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 	DRM_INFO("amdgpu: %uM of GTT memory ready.\n",
 		 (unsigned)(gtt_size / (1024 * 1024)));
 
+	/* Initiailize doorbell pool on PCI BAR */
+	r = amdgpu_ttm_init_on_chip(adev, AMDGPU_PL_DOORBELL, adev->doorbell.size / PAGE_SIZE);
+	if (r) {
+		DRM_ERROR("Failed initializing doorbell heap.\n");
+		return r;
+	}
+
 	/* Initialize preemptible memory pool */
 	r = amdgpu_preempt_mgr_init(adev);
 	if (r) {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/12] drm/amdgpu: create kernel doorbell pages
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (4 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 05/12] drm/amdgpu: initialize ttm for doorbells Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 07/12] get absolute offset from doorbell index Shashank Sharma
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch:
- creates a doorbell page for graphics driver usages.
- adds a few new varlables in adev->doorbell structure to
  keep track of kernel's doorbell-bo.
- removes the adev->doorbell.ptr variable, replaces it with
  kernel-doorbell-bo's cpu address.

V2: - Create doorbell BO directly, no wrappe functions (Alex)
    - no additional doorbell structure (Alex, Christian)
    - Use doorbell_cpu_ptr, remove ioremap (Christian, Alex)
    - Allocate one extra page of doorbells for MES (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |  8 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 56 ++++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  7 +++
 4 files changed, 60 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index 3e77a6b4bd69..fbbff12a14cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -31,10 +31,15 @@ struct amdgpu_doorbell {
 	/* doorbell mmio */
 	resource_size_t		base;
 	resource_size_t		size;
-	u32 __iomem		*ptr;
 
 	/* Number of doorbells reserved for amdgpu kernel driver */
 	u32 num_kernel_doorbells;
+
+	/* Kernel doorbells */
+	struct amdgpu_bo *kernel_doorbells;
+
+	/* For CPU access of doorbells */
+	uint32_t *cpu_addr;
 };
 
 /* Reserved doorbells for amdgpu (including multimedia).
@@ -311,6 +316,7 @@ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v);
  */
 int amdgpu_doorbell_init(struct amdgpu_device *adev);
 void amdgpu_doorbell_fini(struct amdgpu_device *adev);
+int amdgpu_doorbell_create_kernel_doorbells(struct amdgpu_device *adev);
 
 #define RDOORBELL32(index) amdgpu_mm_rdoorbell(adev, (index))
 #define WDOORBELL32(index, v) amdgpu_mm_wdoorbell(adev, (index), (v))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
index 4b934aadcce6..8be2dfa8fa74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
@@ -39,7 +39,7 @@ u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
 		return 0;
 
 	if (index < adev->doorbell.num_kernel_doorbells)
-		return readl(adev->doorbell.ptr + index);
+		return readl(adev->doorbell.cpu_addr + index);
 
 	DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
 	return 0;
@@ -61,7 +61,7 @@ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
 		return;
 
 	if (index < adev->doorbell.num_kernel_doorbells)
-		writel(v, adev->doorbell.ptr + index);
+		writel(v, adev->doorbell.cpu_addr + index);
 	else
 		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
 }
@@ -81,7 +81,7 @@ u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
 		return 0;
 
 	if (index < adev->doorbell.num_kernel_doorbells)
-		return atomic64_read((atomic64_t *)(adev->doorbell.ptr + index));
+		return atomic64_read((atomic64_t *)(adev->doorbell.cpu_addr + index));
 
 	DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
 	return 0;
@@ -103,11 +103,47 @@ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
 		return;
 
 	if (index < adev->doorbell.num_kernel_doorbells)
-		atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
+		atomic64_set((atomic64_t *)(adev->doorbell.cpu_addr + index), v);
 	else
 		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
 }
 
+/**
+ * amdgpu_doorbell_create_kernel_doorbells - Create kernel doorbells for graphics
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Creates doorbells for graphics driver usages.
+ * returns 0 on success, error otherwise.
+ */
+int amdgpu_doorbell_create_kernel_doorbells(struct amdgpu_device *adev)
+{
+	int r;
+	int size;
+
+	/* Reserve first num_kernel_doorbells (page-aligned) for kernel ops */
+	size = ALIGN(adev->doorbell.num_kernel_doorbells * sizeof(u32), PAGE_SIZE);
+
+	/* Allocate an extra page for MES kernel usages (ring test) */
+	adev->mes.db_start_dw_offset = size / sizeof(u32);
+	size += PAGE_SIZE;
+
+	r = amdgpu_bo_create_kernel(adev,
+				    size,
+				    PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_DOORBELL,
+				    &adev->doorbell.kernel_doorbells,
+				    NULL,
+				    (void **)&adev->doorbell.cpu_addr);
+	if (r) {
+		DRM_ERROR("Failed to allocate kernel doorbells, err=%d\n", r);
+		return r;
+	}
+
+	adev->doorbell.num_kernel_doorbells = size / sizeof(u32);
+	return 0;
+}
+
 /*
  * GPU doorbell aperture helpers function.
  */
@@ -127,7 +163,6 @@ int amdgpu_doorbell_init(struct amdgpu_device *adev)
 		adev->doorbell.base = 0;
 		adev->doorbell.size = 0;
 		adev->doorbell.num_kernel_doorbells = 0;
-		adev->doorbell.ptr = NULL;
 		return 0;
 	}
 
@@ -156,12 +191,6 @@ int amdgpu_doorbell_init(struct amdgpu_device *adev)
 	if (adev->asic_type >= CHIP_VEGA10)
 		adev->doorbell.num_kernel_doorbells += 0x400;
 
-	adev->doorbell.ptr = ioremap(adev->doorbell.base,
-				     adev->doorbell.num_kernel_doorbells *
-				     sizeof(u32));
-	if (adev->doorbell.ptr == NULL)
-		return -ENOMEM;
-
 	return 0;
 }
 
@@ -174,6 +203,7 @@ int amdgpu_doorbell_init(struct amdgpu_device *adev)
  */
 void amdgpu_doorbell_fini(struct amdgpu_device *adev)
 {
-	iounmap(adev->doorbell.ptr);
-	adev->doorbell.ptr = NULL;
+	amdgpu_bo_free_kernel(&adev->doorbell.kernel_doorbells,
+			      NULL,
+			      (void **)&adev->doorbell.cpu_addr);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 97c05d08a551..8880f3e3702e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -128,6 +128,9 @@ struct amdgpu_mes {
 	int                             (*kiq_hw_init)(struct amdgpu_device *adev);
 	int                             (*kiq_hw_fini)(struct amdgpu_device *adev);
 
+	/* MES doorbells */
+	uint32_t			db_start_dw_offset;
+
 	/* ip specific functions */
 	const struct amdgpu_mes_funcs   *funcs;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index da65b72cba77..05dfa2fcb334 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1865,6 +1865,13 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 		return r;
 	}
 
+	/* Create a boorbell page for kernel usages */
+	r = amdgpu_doorbell_create_kernel_doorbells(adev);
+	if (r) {
+		DRM_ERROR("Failed to initialize kernel doorbells.\n");
+		return r;
+	}
+
 	/* Initialize preemptible memory pool */
 	r = amdgpu_preempt_mgr_init(adev);
 	if (r) {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/12] get absolute offset from doorbell index
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (5 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 06/12] drm/amdgpu: create kernel doorbell pages Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-13 10:59   ` Christian König
  2023-04-12 16:25 ` [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells Shashank Sharma
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch adds a helper function which converts a doorbell's
relative index in a BO to an absolute doorbell offset in the
doorbell BAR.

V2: No space between the variable name doc (Luben)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |  3 +++
 .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 21 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
index fbbff12a14cd..b110e002bad2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
@@ -317,6 +317,9 @@ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v);
 int amdgpu_doorbell_init(struct amdgpu_device *adev);
 void amdgpu_doorbell_fini(struct amdgpu_device *adev);
 int amdgpu_doorbell_create_kernel_doorbells(struct amdgpu_device *adev);
+uint32_t amdgpu_doorbell_index_on_bar(struct amdgpu_device *adev,
+				       struct amdgpu_bo *db_bo,
+				       uint32_t doorbell_index);
 
 #define RDOORBELL32(index) amdgpu_mm_rdoorbell(adev, (index))
 #define WDOORBELL32(index, v) amdgpu_mm_wdoorbell(adev, (index), (v))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
index 8be2dfa8fa74..a99cd56e8bf4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
@@ -108,6 +108,27 @@ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
 		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
 }
 
+/**
+ * amdgpu_doorbell_index_on_bar - Find doorbell's absolute offset in BAR
+ *
+ * @adev: amdgpu_device pointer
+ * @db_bo: doorbell object's bo
+ * @db_index: doorbell relative index in this doorbell object
+ *
+ * returns doorbell's absolute index in BAR
+ */
+uint32_t amdgpu_doorbell_index_on_bar(struct amdgpu_device *adev,
+				       struct amdgpu_bo *db_bo,
+				       uint32_t doorbell_index)
+{
+	int db_bo_offset;
+
+	db_bo_offset = amdgpu_bo_gpu_offset_no_check(db_bo);
+
+	/* doorbell index is 32 bit but doorbell's size is 64-bit, so *2 */
+	return db_bo_offset / sizeof(u32) + doorbell_index * 2;
+}
+
 /**
  * amdgpu_doorbell_create_kernel_doorbells - Create kernel doorbells for graphics
  *
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (6 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 07/12] get absolute offset from doorbell index Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-13 11:02   ` Christian König
  2023-04-21 19:58   ` Felix Kuehling
  2023-04-12 16:25 ` [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells Shashank Sharma
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, Felix Kuehling, arvind.yadav,
	Alex Deucher, contactshashanksharma, Christian Koenig

This patch:
- adds a doorbell bo in kfd device structure.
- creates doorbell page for kfd kernel usages.
- updates the get_kernel_doorbell and free_kernel_doorbell functions
  accordingly

V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   2 -
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 110 ++++++----------------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   6 ++
 3 files changed, 36 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index b8936340742b..49e3c4e021af 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -435,8 +435,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf)
 	atomic_set(&kfd->compute_profile, 0);
 
 	mutex_init(&kfd->doorbell_mutex);
-	memset(&kfd->doorbell_available_index, 0,
-		sizeof(kfd->doorbell_available_index));
 
 	atomic_set(&kfd->sram_ecc_flag, 0);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index cd4e61bf0493..82b4a56b0afc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -61,81 +61,39 @@ size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
 /* Doorbell calculations for device init. */
 int kfd_doorbell_init(struct kfd_dev *kfd)
 {
-	size_t doorbell_start_offset;
-	size_t doorbell_aperture_size;
-	size_t doorbell_process_limit;
+	int r;
+	int size = PAGE_SIZE;
 
-	/*
-	 * With MES enabled, just set the doorbell base as it is needed
-	 * to calculate doorbell physical address.
-	 */
-	if (kfd->shared_resources.enable_mes) {
-		kfd->doorbell_base =
-			kfd->shared_resources.doorbell_physical_address;
-		return 0;
-	}
-
-	/*
-	 * We start with calculations in bytes because the input data might
-	 * only be byte-aligned.
-	 * Only after we have done the rounding can we assume any alignment.
-	 */
-
-	doorbell_start_offset =
-			roundup(kfd->shared_resources.doorbell_start_offset,
-					kfd_doorbell_process_slice(kfd));
-
-	doorbell_aperture_size =
-			rounddown(kfd->shared_resources.doorbell_aperture_size,
-					kfd_doorbell_process_slice(kfd));
-
-	if (doorbell_aperture_size > doorbell_start_offset)
-		doorbell_process_limit =
-			(doorbell_aperture_size - doorbell_start_offset) /
-						kfd_doorbell_process_slice(kfd);
-	else
-		return -ENOSPC;
-
-	if (!kfd->max_doorbell_slices ||
-	    doorbell_process_limit < kfd->max_doorbell_slices)
-		kfd->max_doorbell_slices = doorbell_process_limit;
-
-	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address +
-				doorbell_start_offset;
-
-	kfd->doorbell_base_dw_offset = doorbell_start_offset / sizeof(u32);
-
-	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
-					   kfd_doorbell_process_slice(kfd));
-
-	if (!kfd->doorbell_kernel_ptr)
+	/* Bitmap to dynamically allocate doorbells from kernel page */
+	kfd->doorbell_bitmap = bitmap_zalloc(size / sizeof(u32), GFP_KERNEL);
+	if (!kfd->doorbell_bitmap) {
+		DRM_ERROR("Failed to allocate kernel doorbell bitmap\n");
 		return -ENOMEM;
+	}
 
-	pr_debug("Doorbell initialization:\n");
-	pr_debug("doorbell base           == 0x%08lX\n",
-			(uintptr_t)kfd->doorbell_base);
-
-	pr_debug("doorbell_base_dw_offset      == 0x%08lX\n",
-			kfd->doorbell_base_dw_offset);
-
-	pr_debug("doorbell_process_limit  == 0x%08lX\n",
-			doorbell_process_limit);
-
-	pr_debug("doorbell_kernel_offset  == 0x%08lX\n",
-			(uintptr_t)kfd->doorbell_base);
-
-	pr_debug("doorbell aperture size  == 0x%08lX\n",
-			kfd->shared_resources.doorbell_aperture_size);
-
-	pr_debug("doorbell kernel address == %p\n", kfd->doorbell_kernel_ptr);
+	/* Alloc a doorbell page for KFD kernel usages */
+	r = amdgpu_bo_create_kernel(kfd->adev,
+				    size,
+				    PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_DOORBELL,
+				    &kfd->doorbells,
+				    NULL,
+				    (void **)&kfd->doorbell_kernel_ptr);
+	if (r) {
+		pr_err("failed to allocate kernel doorbells\n");
+		bitmap_free(kfd->doorbell_bitmap);
+		return r;
+	}
 
+	pr_debug("Doorbell kernel address == %p\n", kfd->doorbell_kernel_ptr);
 	return 0;
 }
 
 void kfd_doorbell_fini(struct kfd_dev *kfd)
 {
-	if (kfd->doorbell_kernel_ptr)
-		iounmap(kfd->doorbell_kernel_ptr);
+	bitmap_free(kfd->doorbell_bitmap);
+	amdgpu_bo_free_kernel(&kfd->doorbells, NULL,
+			     (void **)&kfd->doorbell_kernel_ptr);
 }
 
 int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
@@ -188,22 +146,15 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 	u32 inx;
 
 	mutex_lock(&kfd->doorbell_mutex);
-	inx = find_first_zero_bit(kfd->doorbell_available_index,
-					KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+	inx = find_first_zero_bit(kfd->doorbell_bitmap, PAGE_SIZE / sizeof(u32));
 
-	__set_bit(inx, kfd->doorbell_available_index);
+	__set_bit(inx, kfd->doorbell_bitmap);
 	mutex_unlock(&kfd->doorbell_mutex);
 
 	if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
 		return NULL;
 
-	inx *= kfd->device_info.doorbell_size / sizeof(u32);
-
-	/*
-	 * Calculating the kernel doorbell offset using the first
-	 * doorbell page.
-	 */
-	*doorbell_off = kfd->doorbell_base_dw_offset + inx;
+	*doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx);
 
 	pr_debug("Get kernel queue doorbell\n"
 			"     doorbell offset   == 0x%08X\n"
@@ -217,11 +168,10 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
 {
 	unsigned int inx;
 
-	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr)
-		* sizeof(u32) / kfd->device_info.doorbell_size;
+	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
 
 	mutex_lock(&kfd->doorbell_mutex);
-	__clear_bit(inx, kfd->doorbell_available_index);
+	__clear_bit(inx, kfd->doorbell_bitmap);
 	mutex_unlock(&kfd->doorbell_mutex);
 }
 
@@ -280,7 +230,7 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
 	if (!pdd->doorbell_index) {
 		int r = kfd_alloc_process_doorbells(pdd->dev,
 						    &pdd->doorbell_index);
-		if (r)
+		if (r < 0)
 			return 0;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 552c3ac85a13..0b199eb6dc88 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -346,6 +346,12 @@ struct kfd_dev {
 
 	/* HMM page migration MEMORY_DEVICE_PRIVATE mapping */
 	struct dev_pagemap pgmap;
+
+	/* Kernel doorbells for KFD device */
+	struct amdgpu_bo *doorbells;
+
+	/* bitmap for dynamic doorbell allocation from this object */
+	unsigned long *doorbell_bitmap;
 };
 
 enum kfd_mempool {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (7 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-13 11:07   ` Christian König
  2023-04-21 20:11   ` Felix Kuehling
  2023-04-12 16:25 ` [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables Shashank Sharma
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, Felix Kuehling, arvind.yadav,
	Alex Deucher, contactshashanksharma, Christian Koenig

This patch:
- adds a doorbell object in kfd pdd structure.
- allocates doorbells for a process while creating its pdd.
- frees the doorbells with pdd destroy.
- removes previous calls to allocate process doorbells as
  its not required anymore.

PS: This patch ensures that we don't break the existing KFD
    functionality, but now KFD userspace library should also
    create doorbell pages as AMDGPU GEM objects using libdrm
    functions in userspace. The reference code for the same
    is available with AMDGPU Usermode queue libdrm MR. Once
    this is done, we will not need to create process doorbells
    in kernel.

V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel
      instead (Alex).
    - Do not use custom doorbell structure, instead use separate
      variables for bo and doorbell_bitmap (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 13 ----
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  8 +--
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c     | 65 ++++++++++---------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  9 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 21 +++---
 .../amd/amdkfd/kfd_process_queue_manager.c    | 22 +++----
 6 files changed, 64 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6d291aa6386b..0e40756417e5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -327,12 +327,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 		goto err_bind_process;
 	}
 
-	if (!pdd->doorbell_index &&
-	    kfd_alloc_process_doorbells(dev, &pdd->doorbell_index) < 0) {
-		err = -ENOMEM;
-		goto err_alloc_doorbells;
-	}
-
 	/* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
 	 * on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
 	 */
@@ -410,7 +404,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 	if (wptr_bo)
 		amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo);
 err_wptr_map_gart:
-err_alloc_doorbells:
 err_bind_process:
 err_pdd:
 	mutex_unlock(&p->mutex);
@@ -2163,12 +2156,6 @@ static int criu_restore_devices(struct kfd_process *p,
 			ret = PTR_ERR(pdd);
 			goto exit;
 		}
-
-		if (!pdd->doorbell_index &&
-		    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
-			ret = -ENOMEM;
-			goto exit;
-		}
 	}
 
 	/*
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index ecb4c3abc629..855bf8ce3f16 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -371,7 +371,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
 			unsigned int found;
 
 			found = find_first_zero_bit(qpd->doorbell_bitmap,
-						KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+						    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
 			if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
 				pr_debug("No doorbells available");
 				return -EBUSY;
@@ -381,9 +381,9 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
 		}
 	}
 
-	q->properties.doorbell_off =
-		kfd_get_doorbell_dw_offset_in_bar(dev, qpd_to_pdd(qpd),
-					  q->doorbell_id);
+	q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,
+								  qpd->proc_doorbells,
+								  q->doorbell_id);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 82b4a56b0afc..718cfe9cb4f5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -227,46 +227,47 @@ uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
 
 phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
 {
-	if (!pdd->doorbell_index) {
-		int r = kfd_alloc_process_doorbells(pdd->dev,
-						    &pdd->doorbell_index);
-		if (r < 0)
-			return 0;
-	}
+	struct amdgpu_device *adev = pdd->dev->adev;
+	uint32_t first_db_index;
 
-	return pdd->dev->doorbell_base +
-		pdd->doorbell_index * kfd_doorbell_process_slice(pdd->dev);
+	first_db_index = amdgpu_doorbell_index_on_bar(adev, pdd->qpd.proc_doorbells, 0);
+	return adev->doorbell.base + first_db_index * sizeof(uint32_t);
 }
 
-int kfd_alloc_process_doorbells(struct kfd_dev *kfd, unsigned int *doorbell_index)
+int kfd_alloc_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
 {
-	int r = 0;
-
-	if (!kfd->shared_resources.enable_mes)
-		r = ida_simple_get(&kfd->doorbell_ida, 1,
-				   kfd->max_doorbell_slices, GFP_KERNEL);
-	else
-		r = amdgpu_mes_alloc_process_doorbells(
-				(struct amdgpu_device *)kfd->adev,
-				doorbell_index);
+	int r;
+	struct qcm_process_device *qpd = &pdd->qpd;
 
-	if (r > 0)
-		*doorbell_index = r;
+	/* Allocate bitmap for dynamic doorbell allocation */
+	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
+					     GFP_KERNEL);
+	if (!qpd->doorbell_bitmap) {
+		DRM_ERROR("Failed to allocate process doorbell bitmap\n");
+		return -ENOMEM;
+	}
 
-	if (r < 0)
-		pr_err("Failed to allocate process doorbells\n");
+	/* Allocate doorbells for this process */
+	r = amdgpu_bo_create_kernel(kfd->adev,
+				    kfd_doorbell_process_slice(kfd),
+				    PAGE_SIZE,
+				    AMDGPU_GEM_DOMAIN_DOORBELL,
+				    &qpd->proc_doorbells,
+				    NULL,
+				    NULL);
+	if (r) {
+		DRM_ERROR("Failed to allocate process doorbells\n");
+		bitmap_free(qpd->doorbell_bitmap);
+		return r;
+	}
 
-	return r;
+	return 0;
 }
 
-void kfd_free_process_doorbells(struct kfd_dev *kfd, unsigned int doorbell_index)
+void kfd_free_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
 {
-	if (doorbell_index) {
-		if (!kfd->shared_resources.enable_mes)
-			ida_simple_remove(&kfd->doorbell_ida, doorbell_index);
-		else
-			amdgpu_mes_free_process_doorbells(
-					(struct amdgpu_device *)kfd->adev,
-					doorbell_index);
-	}
+	struct qcm_process_device *qpd = &pdd->qpd;
+
+	bitmap_free(qpd->doorbell_bitmap);
+	amdgpu_bo_free_kernel(&qpd->proc_doorbells, NULL, NULL);
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0b199eb6dc88..dfff77379acb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -661,7 +661,10 @@ struct qcm_process_device {
 	uint64_t ib_base;
 	void *ib_kaddr;
 
-	/* doorbell resources per process per device */
+	/* doorbells for kfd process */
+	struct amdgpu_bo *proc_doorbells;
+
+	/* bitmap for dynamic doorbell allocation from the bo */
 	unsigned long *doorbell_bitmap;
 };
 
@@ -1009,9 +1012,9 @@ unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
 					unsigned int doorbell_id);
 phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd);
 int kfd_alloc_process_doorbells(struct kfd_dev *kfd,
-				unsigned int *doorbell_index);
+				 struct kfd_process_device *pdd);
 void kfd_free_process_doorbells(struct kfd_dev *kfd,
-				unsigned int doorbell_index);
+				 struct kfd_process_device *pdd);
 /* GTT Sub-Allocator */
 
 int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 51b1683ac5c1..217ff72a97b0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1037,10 +1037,9 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
-		bitmap_free(pdd->qpd.doorbell_bitmap);
 		idr_destroy(&pdd->alloc_idr);
 
-		kfd_free_process_doorbells(pdd->dev, pdd->doorbell_index);
+		kfd_free_process_doorbells(pdd->dev, pdd);
 
 		if (pdd->dev->shared_resources.enable_mes)
 			amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev,
@@ -1453,11 +1452,6 @@ static int init_doorbell_bitmap(struct qcm_process_device *qpd,
 	if (!KFD_IS_SOC15(dev))
 		return 0;
 
-	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
-					     GFP_KERNEL);
-	if (!qpd->doorbell_bitmap)
-		return -ENOMEM;
-
 	/* Mask out doorbells reserved for SDMA, IH, and VCN on SOC15. */
 	pr_debug("reserved doorbell 0x%03x - 0x%03x\n", range_start, range_end);
 	pr_debug("reserved doorbell 0x%03x - 0x%03x\n",
@@ -1499,9 +1493,15 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	if (!pdd)
 		return NULL;
 
+	retval = kfd_alloc_process_doorbells(dev, pdd);
+	if (retval) {
+		pr_err("failed to allocate process doorbells\n");
+		goto err_free_pdd;
+	}
+
 	if (init_doorbell_bitmap(&pdd->qpd, dev)) {
 		pr_err("Failed to init doorbell for process\n");
-		goto err_free_pdd;
+		goto err_free_db;
 	}
 
 	pdd->dev = dev;
@@ -1529,7 +1529,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 						false);
 		if (retval) {
 			pr_err("failed to allocate process context bo\n");
-			goto err_free_pdd;
+			goto err_free_db;
 		}
 		memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE);
 	}
@@ -1541,6 +1541,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 
 	return pdd;
 
+err_free_db:
+	kfd_free_process_doorbells(pdd->dev, pdd);
+
 err_free_pdd:
 	kfree(pdd);
 	return NULL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 5137476ec18e..6c95586f08a6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -344,17 +344,19 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 		goto err_create_queue;
 	}
 
-	if (q && p_doorbell_offset_in_process)
+	if (q && p_doorbell_offset_in_process) {
 		/* Return the doorbell offset within the doorbell page
 		 * to the caller so it can be passed up to user mode
 		 * (in bytes).
-		 * There are always 1024 doorbells per process, so in case
-		 * of 8-byte doorbells, there are two doorbell pages per
-		 * process.
+		 * relative doorbell index = Absolute doorbell index -
+		 * absolute index of first doorbell in the page.
 		 */
-		*p_doorbell_offset_in_process =
-			(q->properties.doorbell_off * sizeof(uint32_t)) &
-			(kfd_doorbell_process_slice(dev) - 1);
+		uint32_t first_db_index = amdgpu_doorbell_index_on_bar(pdd->dev->adev,
+									pdd->qpd.proc_doorbells, 0);
+
+		*p_doorbell_offset_in_process = (q->properties.doorbell_off
+						- first_db_index) * sizeof(uint32_t);
+	}
 
 	pr_debug("PQM After DQM create queue\n");
 
@@ -858,12 +860,6 @@ int kfd_criu_restore_queue(struct kfd_process *p,
 		goto exit;
 	}
 
-	if (!pdd->doorbell_index &&
-	    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
-		ret = -ENOMEM;
-		goto exit;
-	}
-
 	/* data stored in this order: mqd, ctl_stack */
 	mqd = q_extra_data;
 	ctl_stack = mqd + q_data->mqd_size;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (8 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-06-09 19:34   ` Felix Kuehling
  2023-04-12 16:25 ` [PATCH v2 11/12] drm/amdgpu: use doorbell mgr for MES kernel doorbells Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells Shashank Sharma
  11 siblings, 1 reply; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch removes some variables and functions from KFD
doorbell handling code, which are no more required since
doorbell manager is handling doorbell calculations.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 32 -----------------------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     | 12 ---------
 2 files changed, 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 718cfe9cb4f5..f4088cfd52db 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -193,38 +193,6 @@ void write_kernel_doorbell64(void __iomem *db, u64 value)
 	}
 }
 
-unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
-					struct kfd_process_device *pdd,
-					unsigned int doorbell_id)
-{
-	/*
-	 * doorbell_base_dw_offset accounts for doorbells taken by KGD.
-	 * index * kfd_doorbell_process_slice/sizeof(u32) adjusts to
-	 * the process's doorbells. The offset returned is in dword
-	 * units regardless of the ASIC-dependent doorbell size.
-	 */
-	if (!kfd->shared_resources.enable_mes)
-		return kfd->doorbell_base_dw_offset +
-			pdd->doorbell_index
-			* kfd_doorbell_process_slice(kfd) / sizeof(u32) +
-			doorbell_id *
-			kfd->device_info.doorbell_size / sizeof(u32);
-	else
-		return amdgpu_mes_get_doorbell_dw_offset_in_bar(
-				(struct amdgpu_device *)kfd->adev,
-				pdd->doorbell_index, doorbell_id);
-}
-
-uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
-{
-	uint64_t num_of_elems = (kfd->shared_resources.doorbell_aperture_size -
-				kfd->shared_resources.doorbell_start_offset) /
-					kfd_doorbell_process_slice(kfd) + 1;
-
-	return num_of_elems;
-
-}
-
 phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
 {
 	struct amdgpu_device *adev = pdd->dev->adev;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index dfff77379acb..1bc6a8ed8cda 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -257,15 +257,6 @@ struct kfd_dev {
 
 	unsigned int id;		/* topology stub index */
 
-	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
-					 * KFD. It is aligned for mapping
-					 * into user mode
-					 */
-	size_t doorbell_base_dw_offset;	/* Offset from the start of the PCI
-					 * doorbell BAR to the first KFD
-					 * doorbell in dwords. GFX reserves
-					 * the segment before this offset.
-					 */
 	u32 __iomem *doorbell_kernel_ptr; /* This is a pointer for a doorbells
 					   * page used by kernel queue
 					   */
@@ -276,8 +267,6 @@ struct kfd_dev {
 
 	const struct kfd2kgd_calls *kfd2kgd;
 	struct mutex doorbell_mutex;
-	DECLARE_BITMAP(doorbell_available_index,
-			KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
 
 	void *gtt_mem;
 	uint64_t gtt_start_gpu_addr;
@@ -754,7 +743,6 @@ struct kfd_process_device {
 	struct attribute attr_evict;
 
 	struct kobject *kobj_stats;
-	unsigned int doorbell_index;
 
 	/*
 	 * @cu_occupancy: Reports occupancy of Compute Units (CU) of a process
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 11/12] drm/amdgpu: use doorbell mgr for MES kernel doorbells
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (9 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-12 16:25 ` [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells Shashank Sharma
  11 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

This patch:
- Removes the existing doorbell management code, and its variables
  from the doorbell_init function, it will be done in doorbell
  manager now.
- uses the doorbell page created for MES kernel level needs (doorbells
  for MES self tests)
- current MES code was allocating MES doorbells in MES process context,
  but those were getting written using kernel doorbell calls. This patch
  instead allocates a MES kernel doorbell for this (in add_hw_queue).

V2: Create an extra page of doorbells for MES during kernel doorbell
    creation (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 94 ++++++++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  3 +
 2 files changed, 40 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 0c546245793b..cd3ee851f0a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -65,91 +65,70 @@ unsigned int amdgpu_mes_get_doorbell_dw_offset_in_bar(
 		doorbell_id * 2);
 }
 
-static int amdgpu_mes_queue_doorbell_get(struct amdgpu_device *adev,
+static int amdgpu_mes_kernel_doorbell_get(struct amdgpu_device *adev,
 					 struct amdgpu_mes_process *process,
 					 int ip_type, uint64_t *doorbell_index)
 {
 	unsigned int offset, found;
+	struct amdgpu_mes *mes = &adev->mes;
 
-	if (ip_type == AMDGPU_RING_TYPE_SDMA) {
+	if (ip_type == AMDGPU_RING_TYPE_SDMA)
 		offset = adev->doorbell_index.sdma_engine[0];
-		found = find_next_zero_bit(process->doorbell_bitmap,
-					   AMDGPU_MES_MAX_NUM_OF_QUEUES_PER_PROCESS,
-					   offset);
-	} else {
-		found = find_first_zero_bit(process->doorbell_bitmap,
-					    AMDGPU_MES_MAX_NUM_OF_QUEUES_PER_PROCESS);
-	}
+	else
+		offset = 0;
 
-	if (found >= AMDGPU_MES_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+	found = find_next_zero_bit(mes->doorbell_bitmap, mes->num_mes_dbs, offset);
+	if (found >= mes->num_mes_dbs) {
 		DRM_WARN("No doorbell available\n");
 		return -ENOSPC;
 	}
 
-	set_bit(found, process->doorbell_bitmap);
-
-	*doorbell_index = amdgpu_mes_get_doorbell_dw_offset_in_bar(adev,
-				process->doorbell_index, found);
+	set_bit(found, mes->doorbell_bitmap);
 
+	/* Get the absolute doorbell index on BAR */
+	*doorbell_index = mes->db_start_dw_offset + found * 2;
 	return 0;
 }
 
-static void amdgpu_mes_queue_doorbell_free(struct amdgpu_device *adev,
+static void amdgpu_mes_kernel_doorbell_free(struct amdgpu_device *adev,
 					   struct amdgpu_mes_process *process,
 					   uint32_t doorbell_index)
 {
-	unsigned int old, doorbell_id;
-
-	doorbell_id = doorbell_index -
-		(process->doorbell_index *
-		 amdgpu_mes_doorbell_process_slice(adev)) / sizeof(u32);
-	doorbell_id /= 2;
+	unsigned int old, rel_index;
+	struct amdgpu_mes *mes = &adev->mes;
 
-	old = test_and_clear_bit(doorbell_id, process->doorbell_bitmap);
+	/* Find the relative index of the doorbell in this object */
+	rel_index = (doorbell_index - mes->db_start_dw_offset) / 2;
+	old = test_and_clear_bit(rel_index, mes->doorbell_bitmap);
 	WARN_ON(!old);
 }
 
 static int amdgpu_mes_doorbell_init(struct amdgpu_device *adev)
 {
-	size_t doorbell_start_offset;
-	size_t doorbell_aperture_size;
-	size_t doorbell_process_limit;
-	size_t aggregated_doorbell_start;
 	int i;
+	struct amdgpu_mes *mes = &adev->mes;
 
-	aggregated_doorbell_start = (adev->doorbell_index.max_assignment + 1) * sizeof(u32);
-	aggregated_doorbell_start =
-		roundup(aggregated_doorbell_start, PAGE_SIZE);
-
-	doorbell_start_offset = aggregated_doorbell_start + PAGE_SIZE;
-	doorbell_start_offset =
-		roundup(doorbell_start_offset,
-			amdgpu_mes_doorbell_process_slice(adev));
-
-	doorbell_aperture_size = adev->doorbell.size;
-	doorbell_aperture_size =
-			rounddown(doorbell_aperture_size,
-				  amdgpu_mes_doorbell_process_slice(adev));
-
-	if (doorbell_aperture_size > doorbell_start_offset)
-		doorbell_process_limit =
-			(doorbell_aperture_size - doorbell_start_offset) /
-			amdgpu_mes_doorbell_process_slice(adev);
-	else
-		return -ENOSPC;
-
-	adev->mes.doorbell_id_offset = doorbell_start_offset / sizeof(u32);
-	adev->mes.max_doorbell_slices = doorbell_process_limit;
+	/* Bitmap for dynamic allocation of kernel doorbells */
+	mes->doorbell_bitmap = bitmap_zalloc(PAGE_SIZE / sizeof(u32), GFP_KERNEL);
+	if (!mes->doorbell_bitmap) {
+		DRM_ERROR("Failed to allocate MES doorbell bitmap\n");
+		return -ENOMEM;
+	}
 
-	/* allocate Qword range for aggregated doorbell */
-	for (i = 0; i < AMDGPU_MES_PRIORITY_NUM_LEVELS; i++)
-		adev->mes.aggregated_doorbells[i] =
-			aggregated_doorbell_start / sizeof(u32) + i * 2;
+	mes->num_mes_dbs = PAGE_SIZE / AMDGPU_ONE_DOORBELL_SIZE;
+	for (i = 0; i < AMDGPU_MES_PRIORITY_NUM_LEVELS; i++) {
+		adev->mes.aggregated_doorbells[i] = mes->db_start_dw_offset + i * 2;
+		set_bit(i, mes->doorbell_bitmap);
+	}
 
-	DRM_INFO("max_doorbell_slices=%zu\n", doorbell_process_limit);
 	return 0;
 }
 
+static void amdgpu_mes_doorbell_free(struct amdgpu_device *adev)
+{
+	bitmap_free(adev->mes.doorbell_bitmap);
+}
+
 int amdgpu_mes_init(struct amdgpu_device *adev)
 {
 	int i, r;
@@ -248,6 +227,7 @@ void amdgpu_mes_fini(struct amdgpu_device *adev)
 	amdgpu_device_wb_free(adev, adev->mes.sch_ctx_offs);
 	amdgpu_device_wb_free(adev, adev->mes.query_status_fence_offs);
 	amdgpu_device_wb_free(adev, adev->mes.read_val_offs);
+	amdgpu_mes_doorbell_free(adev);
 
 	idr_destroy(&adev->mes.pasid_idr);
 	idr_destroy(&adev->mes.gang_id_idr);
@@ -677,7 +657,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int gang_id,
 	*queue_id = queue->queue_id = r;
 
 	/* allocate a doorbell index for the queue */
-	r = amdgpu_mes_queue_doorbell_get(adev, gang->process,
+	r = amdgpu_mes_kernel_doorbell_get(adev, gang->process,
 					  qprops->queue_type,
 					  &qprops->doorbell_off);
 	if (r)
@@ -735,7 +715,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int gang_id,
 	return 0;
 
 clean_up_doorbell:
-	amdgpu_mes_queue_doorbell_free(adev, gang->process,
+	amdgpu_mes_kernel_doorbell_free(adev, gang->process,
 				       qprops->doorbell_off);
 clean_up_queue_id:
 	spin_lock_irqsave(&adev->mes.queue_id_lock, flags);
@@ -790,7 +770,7 @@ int amdgpu_mes_remove_hw_queue(struct amdgpu_device *adev, int queue_id)
 			  queue_id);
 
 	list_del(&queue->list);
-	amdgpu_mes_queue_doorbell_free(adev, gang->process,
+	amdgpu_mes_kernel_doorbell_free(adev, gang->process,
 				       queue->doorbell_off);
 	amdgpu_mes_unlock(&adev->mes);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 8880f3e3702e..b04225585757 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -27,6 +27,7 @@
 #include "amdgpu_irq.h"
 #include "kgd_kfd_interface.h"
 #include "amdgpu_gfx.h"
+#include "amdgpu_doorbell.h"
 #include <linux/sched/mm.h>
 
 #define AMDGPU_MES_MAX_COMPUTE_PIPES        8
@@ -130,6 +131,8 @@ struct amdgpu_mes {
 
 	/* MES doorbells */
 	uint32_t			db_start_dw_offset;
+	uint32_t			num_mes_dbs;
+	unsigned long			*doorbell_bitmap;
 
 	/* ip specific functions */
 	const struct amdgpu_mes_funcs   *funcs;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells
  2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
                   ` (10 preceding siblings ...)
  2023-04-12 16:25 ` [PATCH v2 11/12] drm/amdgpu: use doorbell mgr for MES kernel doorbells Shashank Sharma
@ 2023-04-12 16:25 ` Shashank Sharma
  2023-04-13 11:19   ` Christian König
  11 siblings, 1 reply; 28+ messages in thread
From: Shashank Sharma @ 2023-04-12 16:25 UTC (permalink / raw)
  To: amd-gfx
  Cc: mukul.joshi, Shashank Sharma, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

MES allocates process level doorbells, but there is no userspace
client to consume it. It was only being used for the MES ring
tests (in kernel), and was written by kernel doorbell write.

The previous patch of this series has changed the MES ring test code to
use kernel level MES doorbells. This patch now cleans up the process level
doorbell allocation code which is not required.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 54 +------------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 10 -----
 2 files changed, 1 insertion(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index cd3ee851f0a4..2180e8e2c82e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -36,35 +36,6 @@ int amdgpu_mes_doorbell_process_slice(struct amdgpu_device *adev)
 		       PAGE_SIZE);
 }
 
-int amdgpu_mes_alloc_process_doorbells(struct amdgpu_device *adev,
-				      unsigned int *doorbell_index)
-{
-	int r = ida_simple_get(&adev->mes.doorbell_ida, 2,
-			       adev->mes.max_doorbell_slices,
-			       GFP_KERNEL);
-	if (r > 0)
-		*doorbell_index = r;
-
-	return r;
-}
-
-void amdgpu_mes_free_process_doorbells(struct amdgpu_device *adev,
-				      unsigned int doorbell_index)
-{
-	if (doorbell_index)
-		ida_simple_remove(&adev->mes.doorbell_ida, doorbell_index);
-}
-
-unsigned int amdgpu_mes_get_doorbell_dw_offset_in_bar(
-					struct amdgpu_device *adev,
-					uint32_t doorbell_index,
-					unsigned int doorbell_id)
-{
-	return ((doorbell_index *
-		amdgpu_mes_doorbell_process_slice(adev)) / sizeof(u32) +
-		doorbell_id * 2);
-}
-
 static int amdgpu_mes_kernel_doorbell_get(struct amdgpu_device *adev,
 					 struct amdgpu_mes_process *process,
 					 int ip_type, uint64_t *doorbell_index)
@@ -256,15 +227,6 @@ int amdgpu_mes_create_process(struct amdgpu_device *adev, int pasid,
 		return -ENOMEM;
 	}
 
-	process->doorbell_bitmap =
-		kzalloc(DIV_ROUND_UP(AMDGPU_MES_MAX_NUM_OF_QUEUES_PER_PROCESS,
-				     BITS_PER_BYTE), GFP_KERNEL);
-	if (!process->doorbell_bitmap) {
-		DRM_ERROR("failed to allocate doorbell bitmap\n");
-		kfree(process);
-		return -ENOMEM;
-	}
-
 	/* allocate the process context bo and map it */
 	r = amdgpu_bo_create_kernel(adev, AMDGPU_MES_PROC_CTX_SIZE, PAGE_SIZE,
 				    AMDGPU_GEM_DOMAIN_GTT,
@@ -291,15 +253,6 @@ int amdgpu_mes_create_process(struct amdgpu_device *adev, int pasid,
 		goto clean_up_ctx;
 	}
 
-	/* allocate the starting doorbell index of the process */
-	r = amdgpu_mes_alloc_process_doorbells(adev, &process->doorbell_index);
-	if (r < 0) {
-		DRM_ERROR("failed to allocate doorbell for process\n");
-		goto clean_up_pasid;
-	}
-
-	DRM_DEBUG("process doorbell index = %d\n", process->doorbell_index);
-
 	INIT_LIST_HEAD(&process->gang_list);
 	process->vm = vm;
 	process->pasid = pasid;
@@ -309,15 +262,12 @@ int amdgpu_mes_create_process(struct amdgpu_device *adev, int pasid,
 	amdgpu_mes_unlock(&adev->mes);
 	return 0;
 
-clean_up_pasid:
-	idr_remove(&adev->mes.pasid_idr, pasid);
-	amdgpu_mes_unlock(&adev->mes);
 clean_up_ctx:
+	amdgpu_mes_unlock(&adev->mes);
 	amdgpu_bo_free_kernel(&process->proc_ctx_bo,
 			      &process->proc_ctx_gpu_addr,
 			      &process->proc_ctx_cpu_ptr);
 clean_up_memory:
-	kfree(process->doorbell_bitmap);
 	kfree(process);
 	return r;
 }
@@ -363,7 +313,6 @@ void amdgpu_mes_destroy_process(struct amdgpu_device *adev, int pasid)
 		idr_remove(&adev->mes.gang_id_idr, gang->gang_id);
 	}
 
-	amdgpu_mes_free_process_doorbells(adev, process->doorbell_index);
 	idr_remove(&adev->mes.pasid_idr, pasid);
 	amdgpu_mes_unlock(&adev->mes);
 
@@ -385,7 +334,6 @@ void amdgpu_mes_destroy_process(struct amdgpu_device *adev, int pasid)
 	amdgpu_bo_free_kernel(&process->proc_ctx_bo,
 			      &process->proc_ctx_gpu_addr,
 			      &process->proc_ctx_cpu_ptr);
-	kfree(process->doorbell_bitmap);
 	kfree(process);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index b04225585757..f96010dbd12a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -77,7 +77,6 @@ struct amdgpu_mes {
 	uint32_t			kiq_version;
 
 	uint32_t                        total_max_queue;
-	uint32_t                        doorbell_id_offset;
 	uint32_t                        max_doorbell_slices;
 
 	uint64_t                        default_process_quantum;
@@ -148,7 +147,6 @@ struct amdgpu_mes_process {
 	uint64_t 		process_quantum;
 	struct 			list_head gang_list;
 	uint32_t 		doorbell_index;
-	unsigned long 		*doorbell_bitmap;
 	struct mutex		doorbell_lock;
 };
 
@@ -367,14 +365,6 @@ int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device *adev,
 
 int amdgpu_mes_self_test(struct amdgpu_device *adev);
 
-int amdgpu_mes_alloc_process_doorbells(struct amdgpu_device *adev,
-					unsigned int *doorbell_index);
-void amdgpu_mes_free_process_doorbells(struct amdgpu_device *adev,
-					unsigned int doorbell_index);
-unsigned int amdgpu_mes_get_doorbell_dw_offset_in_bar(
-					struct amdgpu_device *adev,
-					uint32_t doorbell_index,
-					unsigned int doorbell_id);
 int amdgpu_mes_doorbell_process_slice(struct amdgpu_device *adev);
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 01/12] drm/amdgpu: create a new file for doorbell manager
  2023-04-12 16:25 ` [PATCH v2 01/12] drm/amdgpu: create a new file for " Shashank Sharma
@ 2023-04-13 10:48   ` Christian König
  2023-04-13 10:51     ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2023-04-13 10:48 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma, arvind.yadav

Am 12.04.23 um 18:25 schrieb Shashank Sharma:
> This patch:
> - creates a new file for doorbell management.
> - moves doorbell code from amdgpu_device.c to this file.
>
> V2:
>   - remove doc from function declaration (Christian)
>   - remove 'device' from function names to make it consistent (Alex)
>   - add SPDX license identifier (Luben)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 174 +----------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |   6 +
>   .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 183 ++++++++++++++++++
>   4 files changed, 195 insertions(+), 170 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 798d0e9a60b7..204665f20319 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -41,7 +41,7 @@ ccflags-y := -I$(FULL_AMD_PATH)/include/asic_reg \
>   amdgpu-y := amdgpu_drv.o
>   
>   # add KMS driver
> -amdgpu-y += amdgpu_device.o amdgpu_kms.o \
> +amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
>   	amdgpu_atombios.o atombios_crtc.o amdgpu_connectors.o \
>   	atom.o amdgpu_fence.o amdgpu_ttm.o amdgpu_object.o amdgpu_gart.o \
>   	amdgpu_encoders.o amdgpu_display.o amdgpu_i2c.o \
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 57ee1c4a81e9..88c75f7a8d66 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -579,94 +579,6 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev,
>   	}
>   }
>   
> -/**
> - * amdgpu_mm_rdoorbell - read a doorbell dword
> - *
> - * @adev: amdgpu_device pointer
> - * @index: doorbell index
> - *
> - * Returns the value in the doorbell aperture at the
> - * requested doorbell index (CIK).
> - */
> -u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
> -{
> -	if (amdgpu_device_skip_hw_access(adev))
> -		return 0;
> -
> -	if (index < adev->doorbell.num_kernel_doorbells) {
> -		return readl(adev->doorbell.ptr + index);
> -	} else {
> -		DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
> -		return 0;
> -	}
> -}
> -
> -/**
> - * amdgpu_mm_wdoorbell - write a doorbell dword
> - *
> - * @adev: amdgpu_device pointer
> - * @index: doorbell index
> - * @v: value to write
> - *
> - * Writes @v to the doorbell aperture at the
> - * requested doorbell index (CIK).
> - */
> -void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
> -{
> -	if (amdgpu_device_skip_hw_access(adev))
> -		return;
> -
> -	if (index < adev->doorbell.num_kernel_doorbells) {
> -		writel(v, adev->doorbell.ptr + index);
> -	} else {
> -		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
> -	}
> -}
> -
> -/**
> - * amdgpu_mm_rdoorbell64 - read a doorbell Qword
> - *
> - * @adev: amdgpu_device pointer
> - * @index: doorbell index
> - *
> - * Returns the value in the doorbell aperture at the
> - * requested doorbell index (VEGA10+).
> - */
> -u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
> -{
> -	if (amdgpu_device_skip_hw_access(adev))
> -		return 0;
> -
> -	if (index < adev->doorbell.num_kernel_doorbells) {
> -		return atomic64_read((atomic64_t *)(adev->doorbell.ptr + index));
> -	} else {
> -		DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
> -		return 0;
> -	}
> -}
> -
> -/**
> - * amdgpu_mm_wdoorbell64 - write a doorbell Qword
> - *
> - * @adev: amdgpu_device pointer
> - * @index: doorbell index
> - * @v: value to write
> - *
> - * Writes @v to the doorbell aperture at the
> - * requested doorbell index (VEGA10+).
> - */
> -void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
> -{
> -	if (amdgpu_device_skip_hw_access(adev))
> -		return;
> -
> -	if (index < adev->doorbell.num_kernel_doorbells) {
> -		atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
> -	} else {
> -		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
> -	}
> -}
> -
>   /**
>    * amdgpu_device_indirect_rreg - read an indirect register
>    *
> @@ -1016,82 +928,6 @@ int amdgpu_device_pci_reset(struct amdgpu_device *adev)
>   	return pci_reset_function(adev->pdev);
>   }
>   
> -/*
> - * GPU doorbell aperture helpers function.
> - */
> -/**
> - * amdgpu_device_doorbell_init - Init doorbell driver information.
> - *
> - * @adev: amdgpu_device pointer
> - *
> - * Init doorbell driver information (CIK)
> - * Returns 0 on success, error on failure.
> - */
> -static int amdgpu_device_doorbell_init(struct amdgpu_device *adev)
> -{
> -
> -	/* No doorbell on SI hardware generation */
> -	if (adev->asic_type < CHIP_BONAIRE) {
> -		adev->doorbell.base = 0;
> -		adev->doorbell.size = 0;
> -		adev->doorbell.num_kernel_doorbells = 0;
> -		adev->doorbell.ptr = NULL;
> -		return 0;
> -	}
> -
> -	if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
> -		return -EINVAL;
> -
> -	amdgpu_asic_init_doorbell_index(adev);
> -
> -	/* doorbell bar mapping */
> -	adev->doorbell.base = pci_resource_start(adev->pdev, 2);
> -	adev->doorbell.size = pci_resource_len(adev->pdev, 2);
> -
> -	if (adev->enable_mes) {
> -		adev->doorbell.num_kernel_doorbells =
> -			adev->doorbell.size / sizeof(u32);
> -	} else {
> -		adev->doorbell.num_kernel_doorbells =
> -			min_t(u32, adev->doorbell.size / sizeof(u32),
> -			      adev->doorbell_index.max_assignment+1);
> -		if (adev->doorbell.num_kernel_doorbells == 0)
> -			return -EINVAL;
> -
> -		/* For Vega, reserve and map two pages on doorbell BAR since SDMA
> -		 * paging queue doorbell use the second page. The
> -		 * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
> -		 * doorbells are in the first page. So with paging queue enabled,
> -		 * the max num_kernel_doorbells should + 1 page (0x400 in dword)
> -		 */
> -		if (adev->asic_type >= CHIP_VEGA10)
> -			adev->doorbell.num_kernel_doorbells += 0x400;
> -	}
> -
> -	adev->doorbell.ptr = ioremap(adev->doorbell.base,
> -				     adev->doorbell.num_kernel_doorbells *
> -				     sizeof(u32));
> -	if (adev->doorbell.ptr == NULL)
> -		return -ENOMEM;
> -
> -	return 0;
> -}
> -
> -/**
> - * amdgpu_device_doorbell_fini - Tear down doorbell driver information.
> - *
> - * @adev: amdgpu_device pointer
> - *
> - * Tear down doorbell driver information (CIK)
> - */
> -static void amdgpu_device_doorbell_fini(struct amdgpu_device *adev)
> -{
> -	iounmap(adev->doorbell.ptr);
> -	adev->doorbell.ptr = NULL;
> -}
> -
> -
> -
>   /*
>    * amdgpu_device_wb_*()
>    * Writeback is the method by which the GPU updates special pages in memory
> @@ -1239,7 +1075,7 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>   			      cmd & ~PCI_COMMAND_MEMORY);
>   
>   	/* Free the VRAM and doorbell BAR, we most likely need to move both. */
> -	amdgpu_device_doorbell_fini(adev);
> +	amdgpu_doorbell_fini(adev);
>   	if (adev->asic_type >= CHIP_BONAIRE)
>   		pci_release_resource(adev->pdev, 2);
>   
> @@ -1256,7 +1092,7 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>   	/* When the doorbell or fb BAR isn't available we have no chance of
>   	 * using the device.
>   	 */
> -	r = amdgpu_device_doorbell_init(adev);
> +	r = amdgpu_doorbell_init(adev);
>   	if (r || (pci_resource_flags(adev->pdev, 0) & IORESOURCE_UNSET))
>   		return -ENODEV;
>   
> @@ -3712,7 +3548,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   		dev_info(adev->dev, "PCIE atomic ops is not supported\n");
>   
>   	/* doorbell bar mapping and doorbell index init*/
> -	amdgpu_device_doorbell_init(adev);
> +	amdgpu_doorbell_init(adev);
>   
>   	if (amdgpu_emu_mode == 1) {
>   		/* post the asic on emulation mode */
> @@ -3942,7 +3778,7 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
>   	unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1);
>   
>   	/* Unmap all mapped bars - Doorbell, registers and VRAM */
> -	amdgpu_device_doorbell_fini(adev);
> +	amdgpu_doorbell_fini(adev);
>   
>   	iounmap(adev->rmmio);
>   	adev->rmmio = NULL;
> @@ -4051,7 +3887,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>   
>   		iounmap(adev->rmmio);
>   		adev->rmmio = NULL;
> -		amdgpu_device_doorbell_fini(adev);
> +		amdgpu_doorbell_fini(adev);
>   		drm_dev_exit(idx);
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
> index 908433ac6cb3..3e77a6b4bd69 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
> @@ -306,6 +306,12 @@ void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v);
>   u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index);
>   void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v);
>   
> +/*
> + * GPU doorbell aperture helpers function.
> + */
> +int amdgpu_doorbell_init(struct amdgpu_device *adev);
> +void amdgpu_doorbell_fini(struct amdgpu_device *adev);
> +
>   #define RDOORBELL32(index) amdgpu_mm_rdoorbell(adev, (index))
>   #define WDOORBELL32(index, v) amdgpu_mm_wdoorbell(adev, (index), (v))
>   #define RDOORBELL64(index) amdgpu_mm_rdoorbell64(adev, (index))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> new file mode 100644
> index 000000000000..64fec954815d
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> @@ -0,0 +1,183 @@
> +// SPDX-License-Identifier: GPL-2.0+

That should probably be MIT. Except if Alex thinks otherwise.

> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu.h"
> +
> +/**
> + * amdgpu_mm_rdoorbell - read a doorbell dword
> + *
> + * @adev: amdgpu_device pointer
> + * @index: doorbell index
> + *
> + * Returns the value in the doorbell aperture at the
> + * requested doorbell index (CIK).
> + */
> +u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
> +{
> +	if (amdgpu_device_skip_hw_access(adev))
> +		return 0;
> +
> +	if (index < adev->doorbell.num_kernel_doorbells)
> +		return readl(adev->doorbell.ptr + index);
> +
> +	DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
> +	return 0;
> +}
> +
> +/**
> + * amdgpu_mm_wdoorbell - write a doorbell dword
> + *
> + * @adev: amdgpu_device pointer
> + * @index: doorbell index
> + * @v: value to write
> + *
> + * Writes @v to the doorbell aperture at the
> + * requested doorbell index (CIK).
> + */
> +void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
> +{
> +	if (amdgpu_device_skip_hw_access(adev))
> +		return;
> +
> +	if (index < adev->doorbell.num_kernel_doorbells)
> +		writel(v, adev->doorbell.ptr + index);
> +	else
> +		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
> +}
> +
> +/**
> + * amdgpu_mm_rdoorbell64 - read a doorbell Qword
> + *
> + * @adev: amdgpu_device pointer
> + * @index: doorbell index
> + *
> + * Returns the value in the doorbell aperture at the
> + * requested doorbell index (VEGA10+).
> + */
> +u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
> +{
> +	if (amdgpu_device_skip_hw_access(adev))
> +		return 0;
> +
> +	if (index < adev->doorbell.num_kernel_doorbells)
> +		return atomic64_read((atomic64_t *)(adev->doorbell.ptr + index));
> +
> +	DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
> +	return 0;
> +}
> +
> +/**
> + * amdgpu_mm_wdoorbell64 - write a doorbell Qword
> + *
> + * @adev: amdgpu_device pointer
> + * @index: doorbell index
> + * @v: value to write
> + *
> + * Writes @v to the doorbell aperture at the
> + * requested doorbell index (VEGA10+).
> + */
> +void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
> +{
> +	if (amdgpu_device_skip_hw_access(adev))
> +		return;
> +
> +	if (index < adev->doorbell.num_kernel_doorbells)
> +		atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
> +	else
> +		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
> +}
> +
> +/*
> + * GPU doorbell aperture helpers function.
> + */
> +/**
> + * amdgpu_doorbell_init - Init doorbell driver information.
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Init doorbell driver information (CIK)
> + * Returns 0 on success, error on failure.
> + */
> +int amdgpu_doorbell_init(struct amdgpu_device *adev)
> +{
> +
> +	/* No doorbell on SI hardware generation */
> +	if (adev->asic_type < CHIP_BONAIRE) {
> +		adev->doorbell.base = 0;
> +		adev->doorbell.size = 0;
> +		adev->doorbell.num_kernel_doorbells = 0;
> +		adev->doorbell.ptr = NULL;
> +		return 0;
> +	}
> +
> +	if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
> +		return -EINVAL;
> +
> +	amdgpu_asic_init_doorbell_index(adev);
> +
> +	/* doorbell bar mapping */
> +	adev->doorbell.base = pci_resource_start(adev->pdev, 2);
> +	adev->doorbell.size = pci_resource_len(adev->pdev, 2);
> +
> +	if (adev->enable_mes) {
> +		adev->doorbell.num_kernel_doorbells =
> +			adev->doorbell.size / sizeof(u32);
> +	} else {
> +		adev->doorbell.num_kernel_doorbells =
> +			min_t(u32, adev->doorbell.size / sizeof(u32),
> +			      adev->doorbell_index.max_assignment+1);
> +		if (adev->doorbell.num_kernel_doorbells == 0)
> +			return -EINVAL;
> +
> +		/* For Vega, reserve and map two pages on doorbell BAR since SDMA
> +		 * paging queue doorbell use the second page. The
> +		 * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
> +		 * doorbells are in the first page. So with paging queue enabled,
> +		 * the max num_kernel_doorbells should + 1 page (0x400 in dword)
> +		 */
> +		if (adev->asic_type >= CHIP_VEGA10)
> +			adev->doorbell.num_kernel_doorbells += 0x400;
> +	}
> +
> +	adev->doorbell.ptr = ioremap(adev->doorbell.base,
> +				     adev->doorbell.num_kernel_doorbells *
> +				     sizeof(u32));

It might be a good idea to drop this. Or do we really have any use case 
where we need to access doorbells outside of the kernel reserved ones? 
(If yes, forget what I've said this looks good then).

Christian.

> +	if (adev->doorbell.ptr == NULL)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +/**
> + * amdgpu_doorbell_fini - Tear down doorbell driver information.
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Tear down doorbell driver information (CIK)
> + */
> +void amdgpu_doorbell_fini(struct amdgpu_device *adev)
> +{
> +	iounmap(adev->doorbell.ptr);
> +	adev->doorbell.ptr = NULL;
> +}


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 01/12] drm/amdgpu: create a new file for doorbell manager
  2023-04-13 10:48   ` Christian König
@ 2023-04-13 10:51     ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2023-04-13 10:51 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma, arvind.yadav

Am 13.04.23 um 12:48 schrieb Christian König:
> Am 12.04.23 um 18:25 schrieb Shashank Sharma:
>> This patch:
>> - creates a new file for doorbell management.
>> - moves doorbell code from amdgpu_device.c to this file.
>>
>> V2:
>>   - remove doc from function declaration (Christian)
>>   - remove 'device' from function names to make it consistent (Alex)
>>   - add SPDX license identifier (Luben)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 174 +----------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |   6 +
>>   .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 183 ++++++++++++++++++
>>   4 files changed, 195 insertions(+), 170 deletions(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 798d0e9a60b7..204665f20319 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -41,7 +41,7 @@ ccflags-y := -I$(FULL_AMD_PATH)/include/asic_reg \
>>   amdgpu-y := amdgpu_drv.o
>>     # add KMS driver
>> -amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>> +amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
>>       amdgpu_atombios.o atombios_crtc.o amdgpu_connectors.o \
>>       atom.o amdgpu_fence.o amdgpu_ttm.o amdgpu_object.o amdgpu_gart.o \
>>       amdgpu_encoders.o amdgpu_display.o amdgpu_i2c.o \
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 57ee1c4a81e9..88c75f7a8d66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -579,94 +579,6 @@ void amdgpu_mm_wreg_mmio_rlc(struct 
>> amdgpu_device *adev,
>>       }
>>   }
>>   -/**
>> - * amdgpu_mm_rdoorbell - read a doorbell dword
>> - *
>> - * @adev: amdgpu_device pointer
>> - * @index: doorbell index
>> - *
>> - * Returns the value in the doorbell aperture at the
>> - * requested doorbell index (CIK).
>> - */
>> -u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
>> -{
>> -    if (amdgpu_device_skip_hw_access(adev))
>> -        return 0;
>> -
>> -    if (index < adev->doorbell.num_kernel_doorbells) {
>> -        return readl(adev->doorbell.ptr + index);
>> -    } else {
>> -        DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", 
>> index);
>> -        return 0;
>> -    }
>> -}
>> -
>> -/**
>> - * amdgpu_mm_wdoorbell - write a doorbell dword
>> - *
>> - * @adev: amdgpu_device pointer
>> - * @index: doorbell index
>> - * @v: value to write
>> - *
>> - * Writes @v to the doorbell aperture at the
>> - * requested doorbell index (CIK).
>> - */
>> -void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
>> -{
>> -    if (amdgpu_device_skip_hw_access(adev))
>> -        return;
>> -
>> -    if (index < adev->doorbell.num_kernel_doorbells) {
>> -        writel(v, adev->doorbell.ptr + index);
>> -    } else {
>> -        DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", 
>> index);
>> -    }
>> -}
>> -
>> -/**
>> - * amdgpu_mm_rdoorbell64 - read a doorbell Qword
>> - *
>> - * @adev: amdgpu_device pointer
>> - * @index: doorbell index
>> - *
>> - * Returns the value in the doorbell aperture at the
>> - * requested doorbell index (VEGA10+).
>> - */
>> -u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
>> -{
>> -    if (amdgpu_device_skip_hw_access(adev))
>> -        return 0;
>> -
>> -    if (index < adev->doorbell.num_kernel_doorbells) {
>> -        return atomic64_read((atomic64_t *)(adev->doorbell.ptr + 
>> index));
>> -    } else {
>> -        DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", 
>> index);
>> -        return 0;
>> -    }
>> -}
>> -
>> -/**
>> - * amdgpu_mm_wdoorbell64 - write a doorbell Qword
>> - *
>> - * @adev: amdgpu_device pointer
>> - * @index: doorbell index
>> - * @v: value to write
>> - *
>> - * Writes @v to the doorbell aperture at the
>> - * requested doorbell index (VEGA10+).
>> - */
>> -void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, 
>> u64 v)
>> -{
>> -    if (amdgpu_device_skip_hw_access(adev))
>> -        return;
>> -
>> -    if (index < adev->doorbell.num_kernel_doorbells) {
>> -        atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
>> -    } else {
>> -        DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", 
>> index);
>> -    }
>> -}
>> -
>>   /**
>>    * amdgpu_device_indirect_rreg - read an indirect register
>>    *
>> @@ -1016,82 +928,6 @@ int amdgpu_device_pci_reset(struct 
>> amdgpu_device *adev)
>>       return pci_reset_function(adev->pdev);
>>   }
>>   -/*
>> - * GPU doorbell aperture helpers function.
>> - */
>> -/**
>> - * amdgpu_device_doorbell_init - Init doorbell driver information.
>> - *
>> - * @adev: amdgpu_device pointer
>> - *
>> - * Init doorbell driver information (CIK)
>> - * Returns 0 on success, error on failure.
>> - */
>> -static int amdgpu_device_doorbell_init(struct amdgpu_device *adev)
>> -{
>> -
>> -    /* No doorbell on SI hardware generation */
>> -    if (adev->asic_type < CHIP_BONAIRE) {
>> -        adev->doorbell.base = 0;
>> -        adev->doorbell.size = 0;
>> -        adev->doorbell.num_kernel_doorbells = 0;
>> -        adev->doorbell.ptr = NULL;
>> -        return 0;
>> -    }
>> -
>> -    if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
>> -        return -EINVAL;
>> -
>> -    amdgpu_asic_init_doorbell_index(adev);
>> -
>> -    /* doorbell bar mapping */
>> -    adev->doorbell.base = pci_resource_start(adev->pdev, 2);
>> -    adev->doorbell.size = pci_resource_len(adev->pdev, 2);
>> -
>> -    if (adev->enable_mes) {
>> -        adev->doorbell.num_kernel_doorbells =
>> -            adev->doorbell.size / sizeof(u32);
>> -    } else {
>> -        adev->doorbell.num_kernel_doorbells =
>> -            min_t(u32, adev->doorbell.size / sizeof(u32),
>> -                  adev->doorbell_index.max_assignment+1);
>> -        if (adev->doorbell.num_kernel_doorbells == 0)
>> -            return -EINVAL;
>> -
>> -        /* For Vega, reserve and map two pages on doorbell BAR since 
>> SDMA
>> -         * paging queue doorbell use the second page. The
>> -         * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
>> -         * doorbells are in the first page. So with paging queue 
>> enabled,
>> -         * the max num_kernel_doorbells should + 1 page (0x400 in 
>> dword)
>> -         */
>> -        if (adev->asic_type >= CHIP_VEGA10)
>> -            adev->doorbell.num_kernel_doorbells += 0x400;
>> -    }
>> -
>> -    adev->doorbell.ptr = ioremap(adev->doorbell.base,
>> -                     adev->doorbell.num_kernel_doorbells *
>> -                     sizeof(u32));
>> -    if (adev->doorbell.ptr == NULL)
>> -        return -ENOMEM;
>> -
>> -    return 0;
>> -}
>> -
>> -/**
>> - * amdgpu_device_doorbell_fini - Tear down doorbell driver information.
>> - *
>> - * @adev: amdgpu_device pointer
>> - *
>> - * Tear down doorbell driver information (CIK)
>> - */
>> -static void amdgpu_device_doorbell_fini(struct amdgpu_device *adev)
>> -{
>> -    iounmap(adev->doorbell.ptr);
>> -    adev->doorbell.ptr = NULL;
>> -}
>> -
>> -
>> -
>>   /*
>>    * amdgpu_device_wb_*()
>>    * Writeback is the method by which the GPU updates special pages 
>> in memory
>> @@ -1239,7 +1075,7 @@ int amdgpu_device_resize_fb_bar(struct 
>> amdgpu_device *adev)
>>                     cmd & ~PCI_COMMAND_MEMORY);
>>         /* Free the VRAM and doorbell BAR, we most likely need to 
>> move both. */
>> -    amdgpu_device_doorbell_fini(adev);
>> +    amdgpu_doorbell_fini(adev);
>>       if (adev->asic_type >= CHIP_BONAIRE)
>>           pci_release_resource(adev->pdev, 2);
>>   @@ -1256,7 +1092,7 @@ int amdgpu_device_resize_fb_bar(struct 
>> amdgpu_device *adev)
>>       /* When the doorbell or fb BAR isn't available we have no 
>> chance of
>>        * using the device.
>>        */
>> -    r = amdgpu_device_doorbell_init(adev);
>> +    r = amdgpu_doorbell_init(adev);
>>       if (r || (pci_resource_flags(adev->pdev, 0) & IORESOURCE_UNSET))
>>           return -ENODEV;
>>   @@ -3712,7 +3548,7 @@ int amdgpu_device_init(struct amdgpu_device 
>> *adev,
>>           dev_info(adev->dev, "PCIE atomic ops is not supported\n");
>>         /* doorbell bar mapping and doorbell index init*/
>> -    amdgpu_device_doorbell_init(adev);
>> +    amdgpu_doorbell_init(adev);
>>         if (amdgpu_emu_mode == 1) {
>>           /* post the asic on emulation mode */
>> @@ -3942,7 +3778,7 @@ static void amdgpu_device_unmap_mmio(struct 
>> amdgpu_device *adev)
>> unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1);
>>         /* Unmap all mapped bars - Doorbell, registers and VRAM */
>> -    amdgpu_device_doorbell_fini(adev);
>> +    amdgpu_doorbell_fini(adev);
>>         iounmap(adev->rmmio);
>>       adev->rmmio = NULL;
>> @@ -4051,7 +3887,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device 
>> *adev)
>>             iounmap(adev->rmmio);
>>           adev->rmmio = NULL;
>> -        amdgpu_device_doorbell_fini(adev);
>> +        amdgpu_doorbell_fini(adev);
>>           drm_dev_exit(idx);
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
>> index 908433ac6cb3..3e77a6b4bd69 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
>> @@ -306,6 +306,12 @@ void amdgpu_mm_wdoorbell(struct amdgpu_device 
>> *adev, u32 index, u32 v);
>>   u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index);
>>   void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, 
>> u64 v);
>>   +/*
>> + * GPU doorbell aperture helpers function.
>> + */
>> +int amdgpu_doorbell_init(struct amdgpu_device *adev);
>> +void amdgpu_doorbell_fini(struct amdgpu_device *adev);
>> +
>>   #define RDOORBELL32(index) amdgpu_mm_rdoorbell(adev, (index))
>>   #define WDOORBELL32(index, v) amdgpu_mm_wdoorbell(adev, (index), (v))
>>   #define RDOORBELL64(index) amdgpu_mm_rdoorbell64(adev, (index))
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
>> new file mode 100644
>> index 000000000000..64fec954815d
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
>> @@ -0,0 +1,183 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>
> That should probably be MIT. Except if Alex thinks otherwise.
>
>> +/*
>> + * Copyright 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person 
>> obtaining a
>> + * copy of this software and associated documentation files (the 
>> "Software"),
>> + * to deal in the Software without restriction, including without 
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, 
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom 
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be 
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include "amdgpu.h"
>> +
>> +/**
>> + * amdgpu_mm_rdoorbell - read a doorbell dword
>> + *
>> + * @adev: amdgpu_device pointer
>> + * @index: doorbell index
>> + *
>> + * Returns the value in the doorbell aperture at the
>> + * requested doorbell index (CIK).
>> + */
>> +u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
>> +{
>> +    if (amdgpu_device_skip_hw_access(adev))
>> +        return 0;
>> +
>> +    if (index < adev->doorbell.num_kernel_doorbells)
>> +        return readl(adev->doorbell.ptr + index);
>> +
>> +    DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
>> +    return 0;
>> +}
>> +
>> +/**
>> + * amdgpu_mm_wdoorbell - write a doorbell dword
>> + *
>> + * @adev: amdgpu_device pointer
>> + * @index: doorbell index
>> + * @v: value to write
>> + *
>> + * Writes @v to the doorbell aperture at the
>> + * requested doorbell index (CIK).
>> + */
>> +void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, u32 v)
>> +{
>> +    if (amdgpu_device_skip_hw_access(adev))
>> +        return;
>> +
>> +    if (index < adev->doorbell.num_kernel_doorbells)
>> +        writel(v, adev->doorbell.ptr + index);
>> +    else
>> +        DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", 
>> index);
>> +}
>> +
>> +/**
>> + * amdgpu_mm_rdoorbell64 - read a doorbell Qword
>> + *
>> + * @adev: amdgpu_device pointer
>> + * @index: doorbell index
>> + *
>> + * Returns the value in the doorbell aperture at the
>> + * requested doorbell index (VEGA10+).
>> + */
>> +u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
>> +{
>> +    if (amdgpu_device_skip_hw_access(adev))
>> +        return 0;
>> +
>> +    if (index < adev->doorbell.num_kernel_doorbells)
>> +        return atomic64_read((atomic64_t *)(adev->doorbell.ptr + 
>> index));
>> +
>> +    DRM_ERROR("reading beyond doorbell aperture: 0x%08x!\n", index);
>> +    return 0;
>> +}
>> +
>> +/**
>> + * amdgpu_mm_wdoorbell64 - write a doorbell Qword
>> + *
>> + * @adev: amdgpu_device pointer
>> + * @index: doorbell index
>> + * @v: value to write
>> + *
>> + * Writes @v to the doorbell aperture at the
>> + * requested doorbell index (VEGA10+).
>> + */
>> +void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, 
>> u64 v)
>> +{
>> +    if (amdgpu_device_skip_hw_access(adev))
>> +        return;
>> +
>> +    if (index < adev->doorbell.num_kernel_doorbells)
>> +        atomic64_set((atomic64_t *)(adev->doorbell.ptr + index), v);
>> +    else
>> +        DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", 
>> index);
>> +}
>> +
>> +/*
>> + * GPU doorbell aperture helpers function.
>> + */
>> +/**
>> + * amdgpu_doorbell_init - Init doorbell driver information.
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * Init doorbell driver information (CIK)
>> + * Returns 0 on success, error on failure.
>> + */
>> +int amdgpu_doorbell_init(struct amdgpu_device *adev)
>> +{
>> +
>> +    /* No doorbell on SI hardware generation */
>> +    if (adev->asic_type < CHIP_BONAIRE) {
>> +        adev->doorbell.base = 0;
>> +        adev->doorbell.size = 0;
>> +        adev->doorbell.num_kernel_doorbells = 0;
>> +        adev->doorbell.ptr = NULL;
>> +        return 0;
>> +    }
>> +
>> +    if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
>> +        return -EINVAL;
>> +
>> +    amdgpu_asic_init_doorbell_index(adev);
>> +
>> +    /* doorbell bar mapping */
>> +    adev->doorbell.base = pci_resource_start(adev->pdev, 2);
>> +    adev->doorbell.size = pci_resource_len(adev->pdev, 2);
>> +
>> +    if (adev->enable_mes) {
>> +        adev->doorbell.num_kernel_doorbells =
>> +            adev->doorbell.size / sizeof(u32);
>> +    } else {
>> +        adev->doorbell.num_kernel_doorbells =
>> +            min_t(u32, adev->doorbell.size / sizeof(u32),
>> +                  adev->doorbell_index.max_assignment+1);
>> +        if (adev->doorbell.num_kernel_doorbells == 0)
>> +            return -EINVAL;
>> +
>> +        /* For Vega, reserve and map two pages on doorbell BAR since 
>> SDMA
>> +         * paging queue doorbell use the second page. The
>> +         * AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
>> +         * doorbells are in the first page. So with paging queue 
>> enabled,
>> +         * the max num_kernel_doorbells should + 1 page (0x400 in 
>> dword)
>> +         */
>> +        if (adev->asic_type >= CHIP_VEGA10)
>> +            adev->doorbell.num_kernel_doorbells += 0x400;
>> +    }
>> +
>> +    adev->doorbell.ptr = ioremap(adev->doorbell.base,
>> +                     adev->doorbell.num_kernel_doorbells *
>> +                     sizeof(u32));
>
> It might be a good idea to drop this. Or do we really have any use 
> case where we need to access doorbells outside of the kernel reserved 
> ones? (If yes, forget what I've said this looks good then).

Forget what I've wrote here. Just seen than you clean this up in patch #6.

With the license question solved feel free to add Reviewed-by: Christian 
König <christian.koenig@amd.com> to the patch.

Regards,
Christian.

>
> Christian.
>
>> +    if (adev->doorbell.ptr == NULL)
>> +        return -ENOMEM;
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * amdgpu_doorbell_fini - Tear down doorbell driver information.
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * Tear down doorbell driver information (CIK)
>> + */
>> +void amdgpu_doorbell_fini(struct amdgpu_device *adev)
>> +{
>> +    iounmap(adev->doorbell.ptr);
>> +    adev->doorbell.ptr = NULL;
>> +}
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/12] get absolute offset from doorbell index
  2023-04-12 16:25 ` [PATCH v2 07/12] get absolute offset from doorbell index Shashank Sharma
@ 2023-04-13 10:59   ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2023-04-13 10:59 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav

Am 12.04.23 um 18:25 schrieb Shashank Sharma:
> This patch adds a helper function which converts a doorbell's
> relative index in a BO to an absolute doorbell offset in the
> doorbell BAR.
>
> V2: No space between the variable name doc (Luben)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h  |  3 +++
>   .../gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c  | 21 +++++++++++++++++++
>   2 files changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
> index fbbff12a14cd..b110e002bad2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell.h
> @@ -317,6 +317,9 @@ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v);
>   int amdgpu_doorbell_init(struct amdgpu_device *adev);
>   void amdgpu_doorbell_fini(struct amdgpu_device *adev);
>   int amdgpu_doorbell_create_kernel_doorbells(struct amdgpu_device *adev);
> +uint32_t amdgpu_doorbell_index_on_bar(struct amdgpu_device *adev,
> +				       struct amdgpu_bo *db_bo,
> +				       uint32_t doorbell_index);
>   
>   #define RDOORBELL32(index) amdgpu_mm_rdoorbell(adev, (index))
>   #define WDOORBELL32(index, v) amdgpu_mm_wdoorbell(adev, (index), (v))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> index 8be2dfa8fa74..a99cd56e8bf4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_doorbell_mgr.c
> @@ -108,6 +108,27 @@ void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 index, u64 v)
>   		DRM_ERROR("writing beyond doorbell aperture: 0x%08x!\n", index);
>   }
>   
> +/**
> + * amdgpu_doorbell_index_on_bar - Find doorbell's absolute offset in BAR
> + *
> + * @adev: amdgpu_device pointer
> + * @db_bo: doorbell object's bo
> + * @db_index: doorbell relative index in this doorbell object
> + *
> + * returns doorbell's absolute index in BAR
> + */
> +uint32_t amdgpu_doorbell_index_on_bar(struct amdgpu_device *adev,
> +				       struct amdgpu_bo *db_bo,
> +				       uint32_t doorbell_index)
> +{
> +	int db_bo_offset;
> +
> +	db_bo_offset = amdgpu_bo_gpu_offset_no_check(db_bo);
> +
> +	/* doorbell index is 32 bit but doorbell's size is 64-bit, so *2 */
> +	return db_bo_offset / sizeof(u32) + doorbell_index * 2;
> +}
> +
>   /**
>    * amdgpu_doorbell_create_kernel_doorbells - Create kernel doorbells for graphics
>    *


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-12 16:25 ` [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells Shashank Sharma
@ 2023-04-13 11:02   ` Christian König
  2023-04-13 13:08     ` Shashank Sharma
  2023-04-21 19:58   ` Felix Kuehling
  1 sibling, 1 reply; 28+ messages in thread
From: Christian König @ 2023-04-13 11:02 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, Felix Kuehling, contactshashanksharma,
	arvind.yadav

Am 12.04.23 um 18:25 schrieb Shashank Sharma:
> This patch:
> - adds a doorbell bo in kfd device structure.
> - creates doorbell page for kfd kernel usages.
> - updates the get_kernel_doorbell and free_kernel_doorbell functions
>    accordingly
>
> V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   2 -
>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 110 ++++++----------------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   6 ++
>   3 files changed, 36 insertions(+), 82 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index b8936340742b..49e3c4e021af 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -435,8 +435,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf)
>   	atomic_set(&kfd->compute_profile, 0);
>   
>   	mutex_init(&kfd->doorbell_mutex);
> -	memset(&kfd->doorbell_available_index, 0,
> -		sizeof(kfd->doorbell_available_index));
>   
>   	atomic_set(&kfd->sram_ecc_flag, 0);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index cd4e61bf0493..82b4a56b0afc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -61,81 +61,39 @@ size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
>   /* Doorbell calculations for device init. */
>   int kfd_doorbell_init(struct kfd_dev *kfd)
>   {
> -	size_t doorbell_start_offset;
> -	size_t doorbell_aperture_size;
> -	size_t doorbell_process_limit;
> +	int r;
> +	int size = PAGE_SIZE;

It's usually good practice to declare variables like "r" and "i" last. 
Some upstream maintainers even require reverse xmas tree here.

>   
> -	/*
> -	 * With MES enabled, just set the doorbell base as it is needed
> -	 * to calculate doorbell physical address.
> -	 */
> -	if (kfd->shared_resources.enable_mes) {
> -		kfd->doorbell_base =
> -			kfd->shared_resources.doorbell_physical_address;
> -		return 0;
> -	}
> -
> -	/*
> -	 * We start with calculations in bytes because the input data might
> -	 * only be byte-aligned.
> -	 * Only after we have done the rounding can we assume any alignment.
> -	 */
> -
> -	doorbell_start_offset =
> -			roundup(kfd->shared_resources.doorbell_start_offset,
> -					kfd_doorbell_process_slice(kfd));
> -
> -	doorbell_aperture_size =
> -			rounddown(kfd->shared_resources.doorbell_aperture_size,
> -					kfd_doorbell_process_slice(kfd));
> -
> -	if (doorbell_aperture_size > doorbell_start_offset)
> -		doorbell_process_limit =
> -			(doorbell_aperture_size - doorbell_start_offset) /
> -						kfd_doorbell_process_slice(kfd);
> -	else
> -		return -ENOSPC;
> -
> -	if (!kfd->max_doorbell_slices ||
> -	    doorbell_process_limit < kfd->max_doorbell_slices)
> -		kfd->max_doorbell_slices = doorbell_process_limit;
> -
> -	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address +
> -				doorbell_start_offset;
> -
> -	kfd->doorbell_base_dw_offset = doorbell_start_offset / sizeof(u32);
> -
> -	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
> -					   kfd_doorbell_process_slice(kfd));
> -
> -	if (!kfd->doorbell_kernel_ptr)
> +	/* Bitmap to dynamically allocate doorbells from kernel page */
> +	kfd->doorbell_bitmap = bitmap_zalloc(size / sizeof(u32), GFP_KERNEL);
> +	if (!kfd->doorbell_bitmap) {
> +		DRM_ERROR("Failed to allocate kernel doorbell bitmap\n");
>   		return -ENOMEM;
> +	}
>   
> -	pr_debug("Doorbell initialization:\n");
> -	pr_debug("doorbell base           == 0x%08lX\n",
> -			(uintptr_t)kfd->doorbell_base);
> -
> -	pr_debug("doorbell_base_dw_offset      == 0x%08lX\n",
> -			kfd->doorbell_base_dw_offset);
> -
> -	pr_debug("doorbell_process_limit  == 0x%08lX\n",
> -			doorbell_process_limit);
> -
> -	pr_debug("doorbell_kernel_offset  == 0x%08lX\n",
> -			(uintptr_t)kfd->doorbell_base);
> -
> -	pr_debug("doorbell aperture size  == 0x%08lX\n",
> -			kfd->shared_resources.doorbell_aperture_size);
> -
> -	pr_debug("doorbell kernel address == %p\n", kfd->doorbell_kernel_ptr);
> +	/* Alloc a doorbell page for KFD kernel usages */
> +	r = amdgpu_bo_create_kernel(kfd->adev,
> +				    size,
> +				    PAGE_SIZE,
> +				    AMDGPU_GEM_DOMAIN_DOORBELL,
> +				    &kfd->doorbells,
> +				    NULL,
> +				    (void **)&kfd->doorbell_kernel_ptr);
> +	if (r) {
> +		pr_err("failed to allocate kernel doorbells\n");
> +		bitmap_free(kfd->doorbell_bitmap);
> +		return r;
> +	}
>   
> +	pr_debug("Doorbell kernel address == %p\n", kfd->doorbell_kernel_ptr);
>   	return 0;
>   }
>   
>   void kfd_doorbell_fini(struct kfd_dev *kfd)
>   {
> -	if (kfd->doorbell_kernel_ptr)
> -		iounmap(kfd->doorbell_kernel_ptr);
> +	bitmap_free(kfd->doorbell_bitmap);
> +	amdgpu_bo_free_kernel(&kfd->doorbells, NULL,
> +			     (void **)&kfd->doorbell_kernel_ptr);
>   }
>   
>   int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
> @@ -188,22 +146,15 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
>   	u32 inx;
>   
>   	mutex_lock(&kfd->doorbell_mutex);
> -	inx = find_first_zero_bit(kfd->doorbell_available_index,
> -					KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
> +	inx = find_first_zero_bit(kfd->doorbell_bitmap, PAGE_SIZE / sizeof(u32));
>   
> -	__set_bit(inx, kfd->doorbell_available_index);
> +	__set_bit(inx, kfd->doorbell_bitmap);
>   	mutex_unlock(&kfd->doorbell_mutex);
>   
>   	if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>   		return NULL;
>   
> -	inx *= kfd->device_info.doorbell_size / sizeof(u32);
> -
> -	/*
> -	 * Calculating the kernel doorbell offset using the first
> -	 * doorbell page.
> -	 */
> -	*doorbell_off = kfd->doorbell_base_dw_offset + inx;
> +	*doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx);
>   
>   	pr_debug("Get kernel queue doorbell\n"
>   			"     doorbell offset   == 0x%08X\n"
> @@ -217,11 +168,10 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
>   {
>   	unsigned int inx;
>   
> -	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr)
> -		* sizeof(u32) / kfd->device_info.doorbell_size;
> +	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
>   
>   	mutex_lock(&kfd->doorbell_mutex);
> -	__clear_bit(inx, kfd->doorbell_available_index);
> +	__clear_bit(inx, kfd->doorbell_bitmap);
>   	mutex_unlock(&kfd->doorbell_mutex);
>   }
>   
> @@ -280,7 +230,7 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
>   	if (!pdd->doorbell_index) {
>   		int r = kfd_alloc_process_doorbells(pdd->dev,
>   						    &pdd->doorbell_index);
> -		if (r)
> +		if (r < 0)
>   			return 0;
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 552c3ac85a13..0b199eb6dc88 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -346,6 +346,12 @@ struct kfd_dev {
>   
>   	/* HMM page migration MEMORY_DEVICE_PRIVATE mapping */
>   	struct dev_pagemap pgmap;
> +
> +	/* Kernel doorbells for KFD device */
> +	struct amdgpu_bo *doorbells;
> +
> +	/* bitmap for dynamic doorbell allocation from this object */
> +	unsigned long *doorbell_bitmap;

When doorbell_available_index is now not longer used you should probably 
remove it as well.

Christian.

>   };
>   
>   enum kfd_mempool {


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells
  2023-04-12 16:25 ` [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells Shashank Sharma
@ 2023-04-13 11:07   ` Christian König
  2023-04-21 20:11   ` Felix Kuehling
  1 sibling, 0 replies; 28+ messages in thread
From: Christian König @ 2023-04-13 11:07 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: mukul.joshi, Felix Kuehling, arvind.yadav, Alex Deucher,
	contactshashanksharma, Christian Koenig

Am 12.04.23 um 18:25 schrieb Shashank Sharma:
> This patch:
> - adds a doorbell object in kfd pdd structure.
> - allocates doorbells for a process while creating its pdd.
> - frees the doorbells with pdd destroy.
> - removes previous calls to allocate process doorbells as
>    its not required anymore.
>
> PS: This patch ensures that we don't break the existing KFD
>      functionality, but now KFD userspace library should also
>      create doorbell pages as AMDGPU GEM objects using libdrm
>      functions in userspace. The reference code for the same
>      is available with AMDGPU Usermode queue libdrm MR. Once
>      this is done, we will not need to create process doorbells
>      in kernel.
>
> V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel
>        instead (Alex).
>      - Do not use custom doorbell structure, instead use separate
>        variables for bo and doorbell_bitmap (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Felix needs to take a look at this, I only understand halve of what's 
going on here.

Feel free to add an Acked-by: Christian König <christian.koenig@amd.com> 
cause the halve I understand looks good.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 13 ----
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  8 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c     | 65 ++++++++++---------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  9 ++-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 21 +++---
>   .../amd/amdkfd/kfd_process_queue_manager.c    | 22 +++----
>   6 files changed, 64 insertions(+), 74 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 6d291aa6386b..0e40756417e5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -327,12 +327,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   		goto err_bind_process;
>   	}
>   
> -	if (!pdd->doorbell_index &&
> -	    kfd_alloc_process_doorbells(dev, &pdd->doorbell_index) < 0) {
> -		err = -ENOMEM;
> -		goto err_alloc_doorbells;
> -	}
> -
>   	/* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
>   	 * on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
>   	 */
> @@ -410,7 +404,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   	if (wptr_bo)
>   		amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo);
>   err_wptr_map_gart:
> -err_alloc_doorbells:
>   err_bind_process:
>   err_pdd:
>   	mutex_unlock(&p->mutex);
> @@ -2163,12 +2156,6 @@ static int criu_restore_devices(struct kfd_process *p,
>   			ret = PTR_ERR(pdd);
>   			goto exit;
>   		}
> -
> -		if (!pdd->doorbell_index &&
> -		    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
> -			ret = -ENOMEM;
> -			goto exit;
> -		}
>   	}
>   
>   	/*
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index ecb4c3abc629..855bf8ce3f16 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -371,7 +371,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
>   			unsigned int found;
>   
>   			found = find_first_zero_bit(qpd->doorbell_bitmap,
> -						KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
> +						    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>   			if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
>   				pr_debug("No doorbells available");
>   				return -EBUSY;
> @@ -381,9 +381,9 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
>   		}
>   	}
>   
> -	q->properties.doorbell_off =
> -		kfd_get_doorbell_dw_offset_in_bar(dev, qpd_to_pdd(qpd),
> -					  q->doorbell_id);
> +	q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,
> +								  qpd->proc_doorbells,
> +								  q->doorbell_id);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 82b4a56b0afc..718cfe9cb4f5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -227,46 +227,47 @@ uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
>   
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
>   {
> -	if (!pdd->doorbell_index) {
> -		int r = kfd_alloc_process_doorbells(pdd->dev,
> -						    &pdd->doorbell_index);
> -		if (r < 0)
> -			return 0;
> -	}
> +	struct amdgpu_device *adev = pdd->dev->adev;
> +	uint32_t first_db_index;
>   
> -	return pdd->dev->doorbell_base +
> -		pdd->doorbell_index * kfd_doorbell_process_slice(pdd->dev);
> +	first_db_index = amdgpu_doorbell_index_on_bar(adev, pdd->qpd.proc_doorbells, 0);
> +	return adev->doorbell.base + first_db_index * sizeof(uint32_t);
>   }
>   
> -int kfd_alloc_process_doorbells(struct kfd_dev *kfd, unsigned int *doorbell_index)
> +int kfd_alloc_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
>   {
> -	int r = 0;
> -
> -	if (!kfd->shared_resources.enable_mes)
> -		r = ida_simple_get(&kfd->doorbell_ida, 1,
> -				   kfd->max_doorbell_slices, GFP_KERNEL);
> -	else
> -		r = amdgpu_mes_alloc_process_doorbells(
> -				(struct amdgpu_device *)kfd->adev,
> -				doorbell_index);
> +	int r;
> +	struct qcm_process_device *qpd = &pdd->qpd;
>   
> -	if (r > 0)
> -		*doorbell_index = r;
> +	/* Allocate bitmap for dynamic doorbell allocation */
> +	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
> +					     GFP_KERNEL);
> +	if (!qpd->doorbell_bitmap) {
> +		DRM_ERROR("Failed to allocate process doorbell bitmap\n");
> +		return -ENOMEM;
> +	}
>   
> -	if (r < 0)
> -		pr_err("Failed to allocate process doorbells\n");
> +	/* Allocate doorbells for this process */
> +	r = amdgpu_bo_create_kernel(kfd->adev,
> +				    kfd_doorbell_process_slice(kfd),
> +				    PAGE_SIZE,
> +				    AMDGPU_GEM_DOMAIN_DOORBELL,
> +				    &qpd->proc_doorbells,
> +				    NULL,
> +				    NULL);
> +	if (r) {
> +		DRM_ERROR("Failed to allocate process doorbells\n");
> +		bitmap_free(qpd->doorbell_bitmap);
> +		return r;
> +	}
>   
> -	return r;
> +	return 0;
>   }
>   
> -void kfd_free_process_doorbells(struct kfd_dev *kfd, unsigned int doorbell_index)
> +void kfd_free_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
>   {
> -	if (doorbell_index) {
> -		if (!kfd->shared_resources.enable_mes)
> -			ida_simple_remove(&kfd->doorbell_ida, doorbell_index);
> -		else
> -			amdgpu_mes_free_process_doorbells(
> -					(struct amdgpu_device *)kfd->adev,
> -					doorbell_index);
> -	}
> +	struct qcm_process_device *qpd = &pdd->qpd;
> +
> +	bitmap_free(qpd->doorbell_bitmap);
> +	amdgpu_bo_free_kernel(&qpd->proc_doorbells, NULL, NULL);
>   }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 0b199eb6dc88..dfff77379acb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -661,7 +661,10 @@ struct qcm_process_device {
>   	uint64_t ib_base;
>   	void *ib_kaddr;
>   
> -	/* doorbell resources per process per device */
> +	/* doorbells for kfd process */
> +	struct amdgpu_bo *proc_doorbells;
> +
> +	/* bitmap for dynamic doorbell allocation from the bo */
>   	unsigned long *doorbell_bitmap;
>   };
>   
> @@ -1009,9 +1012,9 @@ unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
>   					unsigned int doorbell_id);
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd);
>   int kfd_alloc_process_doorbells(struct kfd_dev *kfd,
> -				unsigned int *doorbell_index);
> +				 struct kfd_process_device *pdd);
>   void kfd_free_process_doorbells(struct kfd_dev *kfd,
> -				unsigned int doorbell_index);
> +				 struct kfd_process_device *pdd);
>   /* GTT Sub-Allocator */
>   
>   int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 51b1683ac5c1..217ff72a97b0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1037,10 +1037,9 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>   			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>   				get_order(KFD_CWSR_TBA_TMA_SIZE));
>   
> -		bitmap_free(pdd->qpd.doorbell_bitmap);
>   		idr_destroy(&pdd->alloc_idr);
>   
> -		kfd_free_process_doorbells(pdd->dev, pdd->doorbell_index);
> +		kfd_free_process_doorbells(pdd->dev, pdd);
>   
>   		if (pdd->dev->shared_resources.enable_mes)
>   			amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev,
> @@ -1453,11 +1452,6 @@ static int init_doorbell_bitmap(struct qcm_process_device *qpd,
>   	if (!KFD_IS_SOC15(dev))
>   		return 0;
>   
> -	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
> -					     GFP_KERNEL);
> -	if (!qpd->doorbell_bitmap)
> -		return -ENOMEM;
> -
>   	/* Mask out doorbells reserved for SDMA, IH, and VCN on SOC15. */
>   	pr_debug("reserved doorbell 0x%03x - 0x%03x\n", range_start, range_end);
>   	pr_debug("reserved doorbell 0x%03x - 0x%03x\n",
> @@ -1499,9 +1493,15 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   	if (!pdd)
>   		return NULL;
>   
> +	retval = kfd_alloc_process_doorbells(dev, pdd);
> +	if (retval) {
> +		pr_err("failed to allocate process doorbells\n");
> +		goto err_free_pdd;
> +	}
> +
>   	if (init_doorbell_bitmap(&pdd->qpd, dev)) {
>   		pr_err("Failed to init doorbell for process\n");
> -		goto err_free_pdd;
> +		goto err_free_db;
>   	}
>   
>   	pdd->dev = dev;
> @@ -1529,7 +1529,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   						false);
>   		if (retval) {
>   			pr_err("failed to allocate process context bo\n");
> -			goto err_free_pdd;
> +			goto err_free_db;
>   		}
>   		memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE);
>   	}
> @@ -1541,6 +1541,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   
>   	return pdd;
>   
> +err_free_db:
> +	kfd_free_process_doorbells(pdd->dev, pdd);
> +
>   err_free_pdd:
>   	kfree(pdd);
>   	return NULL;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 5137476ec18e..6c95586f08a6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -344,17 +344,19 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>   		goto err_create_queue;
>   	}
>   
> -	if (q && p_doorbell_offset_in_process)
> +	if (q && p_doorbell_offset_in_process) {
>   		/* Return the doorbell offset within the doorbell page
>   		 * to the caller so it can be passed up to user mode
>   		 * (in bytes).
> -		 * There are always 1024 doorbells per process, so in case
> -		 * of 8-byte doorbells, there are two doorbell pages per
> -		 * process.
> +		 * relative doorbell index = Absolute doorbell index -
> +		 * absolute index of first doorbell in the page.
>   		 */
> -		*p_doorbell_offset_in_process =
> -			(q->properties.doorbell_off * sizeof(uint32_t)) &
> -			(kfd_doorbell_process_slice(dev) - 1);
> +		uint32_t first_db_index = amdgpu_doorbell_index_on_bar(pdd->dev->adev,
> +									pdd->qpd.proc_doorbells, 0);
> +
> +		*p_doorbell_offset_in_process = (q->properties.doorbell_off
> +						- first_db_index) * sizeof(uint32_t);
> +	}
>   
>   	pr_debug("PQM After DQM create queue\n");
>   
> @@ -858,12 +860,6 @@ int kfd_criu_restore_queue(struct kfd_process *p,
>   		goto exit;
>   	}
>   
> -	if (!pdd->doorbell_index &&
> -	    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
> -		ret = -ENOMEM;
> -		goto exit;
> -	}
> -
>   	/* data stored in this order: mqd, ctl_stack */
>   	mqd = q_extra_data;
>   	ctl_stack = mqd + q_data->mqd_size;


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells
  2023-04-12 16:25 ` [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells Shashank Sharma
@ 2023-04-13 11:19   ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2023-04-13 11:19 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav

Am 12.04.23 um 18:25 schrieb Shashank Sharma:
> MES allocates process level doorbells, but there is no userspace
> client to consume it. It was only being used for the MES ring
> tests (in kernel), and was written by kernel doorbell write.
>
> The previous patch of this series has changed the MES ring test code to
> use kernel level MES doorbells. This patch now cleans up the process level
> doorbell allocation code which is not required.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 54 +------------------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 10 -----
>   2 files changed, 1 insertion(+), 63 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> index cd3ee851f0a4..2180e8e2c82e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> @@ -36,35 +36,6 @@ int amdgpu_mes_doorbell_process_slice(struct amdgpu_device *adev)
>   		       PAGE_SIZE);
>   }
>   
> -int amdgpu_mes_alloc_process_doorbells(struct amdgpu_device *adev,
> -				      unsigned int *doorbell_index)
> -{
> -	int r = ida_simple_get(&adev->mes.doorbell_ida, 2,
> -			       adev->mes.max_doorbell_slices,
> -			       GFP_KERNEL);
> -	if (r > 0)
> -		*doorbell_index = r;
> -
> -	return r;
> -}
> -
> -void amdgpu_mes_free_process_doorbells(struct amdgpu_device *adev,
> -				      unsigned int doorbell_index)
> -{
> -	if (doorbell_index)
> -		ida_simple_remove(&adev->mes.doorbell_ida, doorbell_index);
> -}
> -
> -unsigned int amdgpu_mes_get_doorbell_dw_offset_in_bar(
> -					struct amdgpu_device *adev,
> -					uint32_t doorbell_index,
> -					unsigned int doorbell_id)
> -{
> -	return ((doorbell_index *
> -		amdgpu_mes_doorbell_process_slice(adev)) / sizeof(u32) +
> -		doorbell_id * 2);
> -}
> -
>   static int amdgpu_mes_kernel_doorbell_get(struct amdgpu_device *adev,
>   					 struct amdgpu_mes_process *process,
>   					 int ip_type, uint64_t *doorbell_index)
> @@ -256,15 +227,6 @@ int amdgpu_mes_create_process(struct amdgpu_device *adev, int pasid,
>   		return -ENOMEM;
>   	}
>   
> -	process->doorbell_bitmap =
> -		kzalloc(DIV_ROUND_UP(AMDGPU_MES_MAX_NUM_OF_QUEUES_PER_PROCESS,
> -				     BITS_PER_BYTE), GFP_KERNEL);
> -	if (!process->doorbell_bitmap) {
> -		DRM_ERROR("failed to allocate doorbell bitmap\n");
> -		kfree(process);
> -		return -ENOMEM;
> -	}
> -
>   	/* allocate the process context bo and map it */
>   	r = amdgpu_bo_create_kernel(adev, AMDGPU_MES_PROC_CTX_SIZE, PAGE_SIZE,
>   				    AMDGPU_GEM_DOMAIN_GTT,
> @@ -291,15 +253,6 @@ int amdgpu_mes_create_process(struct amdgpu_device *adev, int pasid,
>   		goto clean_up_ctx;
>   	}
>   
> -	/* allocate the starting doorbell index of the process */
> -	r = amdgpu_mes_alloc_process_doorbells(adev, &process->doorbell_index);
> -	if (r < 0) {
> -		DRM_ERROR("failed to allocate doorbell for process\n");
> -		goto clean_up_pasid;
> -	}
> -
> -	DRM_DEBUG("process doorbell index = %d\n", process->doorbell_index);
> -
>   	INIT_LIST_HEAD(&process->gang_list);
>   	process->vm = vm;
>   	process->pasid = pasid;
> @@ -309,15 +262,12 @@ int amdgpu_mes_create_process(struct amdgpu_device *adev, int pasid,
>   	amdgpu_mes_unlock(&adev->mes);
>   	return 0;
>   
> -clean_up_pasid:
> -	idr_remove(&adev->mes.pasid_idr, pasid);
> -	amdgpu_mes_unlock(&adev->mes);
>   clean_up_ctx:
> +	amdgpu_mes_unlock(&adev->mes);
>   	amdgpu_bo_free_kernel(&process->proc_ctx_bo,
>   			      &process->proc_ctx_gpu_addr,
>   			      &process->proc_ctx_cpu_ptr);
>   clean_up_memory:
> -	kfree(process->doorbell_bitmap);
>   	kfree(process);
>   	return r;
>   }
> @@ -363,7 +313,6 @@ void amdgpu_mes_destroy_process(struct amdgpu_device *adev, int pasid)
>   		idr_remove(&adev->mes.gang_id_idr, gang->gang_id);
>   	}
>   
> -	amdgpu_mes_free_process_doorbells(adev, process->doorbell_index);
>   	idr_remove(&adev->mes.pasid_idr, pasid);
>   	amdgpu_mes_unlock(&adev->mes);
>   
> @@ -385,7 +334,6 @@ void amdgpu_mes_destroy_process(struct amdgpu_device *adev, int pasid)
>   	amdgpu_bo_free_kernel(&process->proc_ctx_bo,
>   			      &process->proc_ctx_gpu_addr,
>   			      &process->proc_ctx_cpu_ptr);
> -	kfree(process->doorbell_bitmap);
>   	kfree(process);
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index b04225585757..f96010dbd12a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -77,7 +77,6 @@ struct amdgpu_mes {
>   	uint32_t			kiq_version;
>   
>   	uint32_t                        total_max_queue;
> -	uint32_t                        doorbell_id_offset;
>   	uint32_t                        max_doorbell_slices;
>   
>   	uint64_t                        default_process_quantum;
> @@ -148,7 +147,6 @@ struct amdgpu_mes_process {
>   	uint64_t 		process_quantum;
>   	struct 			list_head gang_list;
>   	uint32_t 		doorbell_index;
> -	unsigned long 		*doorbell_bitmap;
>   	struct mutex		doorbell_lock;
>   };
>   
> @@ -367,14 +365,6 @@ int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device *adev,
>   
>   int amdgpu_mes_self_test(struct amdgpu_device *adev);
>   
> -int amdgpu_mes_alloc_process_doorbells(struct amdgpu_device *adev,
> -					unsigned int *doorbell_index);
> -void amdgpu_mes_free_process_doorbells(struct amdgpu_device *adev,
> -					unsigned int doorbell_index);
> -unsigned int amdgpu_mes_get_doorbell_dw_offset_in_bar(
> -					struct amdgpu_device *adev,
> -					uint32_t doorbell_index,
> -					unsigned int doorbell_id);
>   int amdgpu_mes_doorbell_process_slice(struct amdgpu_device *adev);
>   
>   /*


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-13 11:02   ` Christian König
@ 2023-04-13 13:08     ` Shashank Sharma
  0 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-13 13:08 UTC (permalink / raw)
  To: Christian König, amd-gfx
  Cc: Alex Deucher, mukul.joshi, Felix Kuehling, contactshashanksharma,
	arvind.yadav

Hey Christian,

On 13/04/2023 13:02, Christian König wrote:
> Am 12.04.23 um 18:25 schrieb Shashank Sharma:
>> This patch:
>> - adds a doorbell bo in kfd device structure.
>> - creates doorbell page for kfd kernel usages.
>> - updates the get_kernel_doorbell and free_kernel_doorbell functions
>>    accordingly
>>
>> V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   2 -
>>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 110 ++++++----------------
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   6 ++
>>   3 files changed, 36 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index b8936340742b..49e3c4e021af 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -435,8 +435,6 @@ struct kfd_dev *kgd2kfd_probe(struct 
>> amdgpu_device *adev, bool vf)
>>       atomic_set(&kfd->compute_profile, 0);
>>         mutex_init(&kfd->doorbell_mutex);
>> -    memset(&kfd->doorbell_available_index, 0,
>> -        sizeof(kfd->doorbell_available_index));
>>         atomic_set(&kfd->sram_ecc_flag, 0);
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> index cd4e61bf0493..82b4a56b0afc 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> @@ -61,81 +61,39 @@ size_t kfd_doorbell_process_slice(struct kfd_dev 
>> *kfd)
>>   /* Doorbell calculations for device init. */
>>   int kfd_doorbell_init(struct kfd_dev *kfd)
>>   {
>> -    size_t doorbell_start_offset;
>> -    size_t doorbell_aperture_size;
>> -    size_t doorbell_process_limit;
>> +    int r;
>> +    int size = PAGE_SIZE;
>
> It's usually good practice to declare variables like "r" and "i" last. 
> Some upstream maintainers even require reverse xmas tree here.
Noted, will change it.
>
>>   -    /*
>> -     * With MES enabled, just set the doorbell base as it is needed
>> -     * to calculate doorbell physical address.
>> -     */
>> -    if (kfd->shared_resources.enable_mes) {
>> -        kfd->doorbell_base =
>> -            kfd->shared_resources.doorbell_physical_address;
>> -        return 0;
>> -    }
>> -
>> -    /*
>> -     * We start with calculations in bytes because the input data might
>> -     * only be byte-aligned.
>> -     * Only after we have done the rounding can we assume any 
>> alignment.
>> -     */
>> -
>> -    doorbell_start_offset =
>> - roundup(kfd->shared_resources.doorbell_start_offset,
>> -                    kfd_doorbell_process_slice(kfd));
>> -
>> -    doorbell_aperture_size =
>> - rounddown(kfd->shared_resources.doorbell_aperture_size,
>> -                    kfd_doorbell_process_slice(kfd));
>> -
>> -    if (doorbell_aperture_size > doorbell_start_offset)
>> -        doorbell_process_limit =
>> -            (doorbell_aperture_size - doorbell_start_offset) /
>> -                        kfd_doorbell_process_slice(kfd);
>> -    else
>> -        return -ENOSPC;
>> -
>> -    if (!kfd->max_doorbell_slices ||
>> -        doorbell_process_limit < kfd->max_doorbell_slices)
>> -        kfd->max_doorbell_slices = doorbell_process_limit;
>> -
>> -    kfd->doorbell_base = 
>> kfd->shared_resources.doorbell_physical_address +
>> -                doorbell_start_offset;
>> -
>> -    kfd->doorbell_base_dw_offset = doorbell_start_offset / sizeof(u32);
>> -
>> -    kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
>> -                       kfd_doorbell_process_slice(kfd));
>> -
>> -    if (!kfd->doorbell_kernel_ptr)
>> +    /* Bitmap to dynamically allocate doorbells from kernel page */
>> +    kfd->doorbell_bitmap = bitmap_zalloc(size / sizeof(u32), 
>> GFP_KERNEL);
>> +    if (!kfd->doorbell_bitmap) {
>> +        DRM_ERROR("Failed to allocate kernel doorbell bitmap\n");
>>           return -ENOMEM;
>> +    }
>>   -    pr_debug("Doorbell initialization:\n");
>> -    pr_debug("doorbell base           == 0x%08lX\n",
>> -            (uintptr_t)kfd->doorbell_base);
>> -
>> -    pr_debug("doorbell_base_dw_offset      == 0x%08lX\n",
>> -            kfd->doorbell_base_dw_offset);
>> -
>> -    pr_debug("doorbell_process_limit  == 0x%08lX\n",
>> -            doorbell_process_limit);
>> -
>> -    pr_debug("doorbell_kernel_offset  == 0x%08lX\n",
>> -            (uintptr_t)kfd->doorbell_base);
>> -
>> -    pr_debug("doorbell aperture size  == 0x%08lX\n",
>> -            kfd->shared_resources.doorbell_aperture_size);
>> -
>> -    pr_debug("doorbell kernel address == %p\n", 
>> kfd->doorbell_kernel_ptr);
>> +    /* Alloc a doorbell page for KFD kernel usages */
>> +    r = amdgpu_bo_create_kernel(kfd->adev,
>> +                    size,
>> +                    PAGE_SIZE,
>> +                    AMDGPU_GEM_DOMAIN_DOORBELL,
>> +                    &kfd->doorbells,
>> +                    NULL,
>> +                    (void **)&kfd->doorbell_kernel_ptr);
>> +    if (r) {
>> +        pr_err("failed to allocate kernel doorbells\n");
>> +        bitmap_free(kfd->doorbell_bitmap);
>> +        return r;
>> +    }
>>   +    pr_debug("Doorbell kernel address == %p\n", 
>> kfd->doorbell_kernel_ptr);
>>       return 0;
>>   }
>>     void kfd_doorbell_fini(struct kfd_dev *kfd)
>>   {
>> -    if (kfd->doorbell_kernel_ptr)
>> -        iounmap(kfd->doorbell_kernel_ptr);
>> +    bitmap_free(kfd->doorbell_bitmap);
>> +    amdgpu_bo_free_kernel(&kfd->doorbells, NULL,
>> +                 (void **)&kfd->doorbell_kernel_ptr);
>>   }
>>     int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process 
>> *process,
>> @@ -188,22 +146,15 @@ void __iomem *kfd_get_kernel_doorbell(struct 
>> kfd_dev *kfd,
>>       u32 inx;
>>         mutex_lock(&kfd->doorbell_mutex);
>> -    inx = find_first_zero_bit(kfd->doorbell_available_index,
>> -                    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>> +    inx = find_first_zero_bit(kfd->doorbell_bitmap, PAGE_SIZE / 
>> sizeof(u32));
>>   -    __set_bit(inx, kfd->doorbell_available_index);
>> +    __set_bit(inx, kfd->doorbell_bitmap);
>>       mutex_unlock(&kfd->doorbell_mutex);
>>         if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>>           return NULL;
>>   -    inx *= kfd->device_info.doorbell_size / sizeof(u32);
>> -
>> -    /*
>> -     * Calculating the kernel doorbell offset using the first
>> -     * doorbell page.
>> -     */
>> -    *doorbell_off = kfd->doorbell_base_dw_offset + inx;
>> +    *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, 
>> kfd->doorbells, inx);
>>         pr_debug("Get kernel queue doorbell\n"
>>               "     doorbell offset   == 0x%08X\n"
>> @@ -217,11 +168,10 @@ void kfd_release_kernel_doorbell(struct kfd_dev 
>> *kfd, u32 __iomem *db_addr)
>>   {
>>       unsigned int inx;
>>   -    inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr)
>> -        * sizeof(u32) / kfd->device_info.doorbell_size;
>> +    inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
>>         mutex_lock(&kfd->doorbell_mutex);
>> -    __clear_bit(inx, kfd->doorbell_available_index);
>> +    __clear_bit(inx, kfd->doorbell_bitmap);
>>       mutex_unlock(&kfd->doorbell_mutex);
>>   }
>>   @@ -280,7 +230,7 @@ phys_addr_t kfd_get_process_doorbells(struct 
>> kfd_process_device *pdd)
>>       if (!pdd->doorbell_index) {
>>           int r = kfd_alloc_process_doorbells(pdd->dev,
>>                               &pdd->doorbell_index);
>> -        if (r)
>> +        if (r < 0)
>>               return 0;
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 552c3ac85a13..0b199eb6dc88 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -346,6 +346,12 @@ struct kfd_dev {
>>         /* HMM page migration MEMORY_DEVICE_PRIVATE mapping */
>>       struct dev_pagemap pgmap;
>> +
>> +    /* Kernel doorbells for KFD device */
>> +    struct amdgpu_bo *doorbells;
>> +
>> +    /* bitmap for dynamic doorbell allocation from this object */
>> +    unsigned long *doorbell_bitmap;
>
> When doorbell_available_index is now not longer used you should 
> probably remove it as well.

Actually, KFD used two levels of bitmaps to allocate doorbell:

- first level of bitmap to assign a pool of 1024 doorbells to a client 
from big doorbell pool,

- second level of bitmap to allocate one doorbell to the queue from the 
1024 doorbells of the clients.

In the new design, we are allocating and saving a page of doorbells for 
KFD kernel, but we still need one bitmap to dynamically allocate and 
assign single doorbell from the page, both at process and kernel level. 
So doorbell_bitmap is replacement for doorbell_available_index in kfd, 
to make code symmetrical everywhere (kfd kernel, mes kernel and kfd 
process).

- Shashank

>
>>   };
>>     enum kfd_mempool {
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-12 16:25 ` [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells Shashank Sharma
  2023-04-13 11:02   ` Christian König
@ 2023-04-21 19:58   ` Felix Kuehling
  2023-04-22  6:39     ` Shashank Sharma
  1 sibling, 1 reply; 28+ messages in thread
From: Felix Kuehling @ 2023-04-21 19:58 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav


On 2023-04-12 12:25, Shashank Sharma wrote:
> This patch:
> - adds a doorbell bo in kfd device structure.
> - creates doorbell page for kfd kernel usages.
> - updates the get_kernel_doorbell and free_kernel_doorbell functions
>    accordingly
>
> V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   2 -
>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 110 ++++++----------------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   6 ++
>   3 files changed, 36 insertions(+), 82 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index b8936340742b..49e3c4e021af 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -435,8 +435,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf)
>   	atomic_set(&kfd->compute_profile, 0);
>   
>   	mutex_init(&kfd->doorbell_mutex);
> -	memset(&kfd->doorbell_available_index, 0,
> -		sizeof(kfd->doorbell_available_index));
>   
>   	atomic_set(&kfd->sram_ecc_flag, 0);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index cd4e61bf0493..82b4a56b0afc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -61,81 +61,39 @@ size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
>   /* Doorbell calculations for device init. */
>   int kfd_doorbell_init(struct kfd_dev *kfd)
>   {
> -	size_t doorbell_start_offset;
> -	size_t doorbell_aperture_size;
> -	size_t doorbell_process_limit;
> +	int r;
> +	int size = PAGE_SIZE;

On GPUs with 64-bit doorbells, ROCm uses two pages of doorbells per 
process. In hindsight that's probably overkill and we could live with a 
single doorbell page per process in terms of the number of doorbells 
needed. But this is not easy to change due to the way that doorbells are 
routed to SDMA engines, at least on Arcturus and later MI GPUs that have 
lots of SDMA engines and queues. We need enough doorbells in each 
process's doorbell slice to reach all SDMA queues. On Arcturus that's up 
to 64 queues. The way the routing is set up, only 32 doorbells per page 
are routed to various SDMA engines, so we need two pages.

Changing the doorbell routing is not trivial due to SRIOV support, where 
the routing is programmed by the hypervisor driver. The hypervisor 
driver and all guest drivers (Windows and Linux) have to agree on the 
routing. Changing it breaks the ABI between hypervisor and guest drivers.

Regards,
   Felix


>   
> -	/*
> -	 * With MES enabled, just set the doorbell base as it is needed
> -	 * to calculate doorbell physical address.
> -	 */
> -	if (kfd->shared_resources.enable_mes) {
> -		kfd->doorbell_base =
> -			kfd->shared_resources.doorbell_physical_address;
> -		return 0;
> -	}
> -
> -	/*
> -	 * We start with calculations in bytes because the input data might
> -	 * only be byte-aligned.
> -	 * Only after we have done the rounding can we assume any alignment.
> -	 */
> -
> -	doorbell_start_offset =
> -			roundup(kfd->shared_resources.doorbell_start_offset,
> -					kfd_doorbell_process_slice(kfd));
> -
> -	doorbell_aperture_size =
> -			rounddown(kfd->shared_resources.doorbell_aperture_size,
> -					kfd_doorbell_process_slice(kfd));
> -
> -	if (doorbell_aperture_size > doorbell_start_offset)
> -		doorbell_process_limit =
> -			(doorbell_aperture_size - doorbell_start_offset) /
> -						kfd_doorbell_process_slice(kfd);
> -	else
> -		return -ENOSPC;
> -
> -	if (!kfd->max_doorbell_slices ||
> -	    doorbell_process_limit < kfd->max_doorbell_slices)
> -		kfd->max_doorbell_slices = doorbell_process_limit;
> -
> -	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address +
> -				doorbell_start_offset;
> -
> -	kfd->doorbell_base_dw_offset = doorbell_start_offset / sizeof(u32);
> -
> -	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
> -					   kfd_doorbell_process_slice(kfd));
> -
> -	if (!kfd->doorbell_kernel_ptr)
> +	/* Bitmap to dynamically allocate doorbells from kernel page */
> +	kfd->doorbell_bitmap = bitmap_zalloc(size / sizeof(u32), GFP_KERNEL);
> +	if (!kfd->doorbell_bitmap) {
> +		DRM_ERROR("Failed to allocate kernel doorbell bitmap\n");
>   		return -ENOMEM;
> +	}
>   
> -	pr_debug("Doorbell initialization:\n");
> -	pr_debug("doorbell base           == 0x%08lX\n",
> -			(uintptr_t)kfd->doorbell_base);
> -
> -	pr_debug("doorbell_base_dw_offset      == 0x%08lX\n",
> -			kfd->doorbell_base_dw_offset);
> -
> -	pr_debug("doorbell_process_limit  == 0x%08lX\n",
> -			doorbell_process_limit);
> -
> -	pr_debug("doorbell_kernel_offset  == 0x%08lX\n",
> -			(uintptr_t)kfd->doorbell_base);
> -
> -	pr_debug("doorbell aperture size  == 0x%08lX\n",
> -			kfd->shared_resources.doorbell_aperture_size);
> -
> -	pr_debug("doorbell kernel address == %p\n", kfd->doorbell_kernel_ptr);
> +	/* Alloc a doorbell page for KFD kernel usages */
> +	r = amdgpu_bo_create_kernel(kfd->adev,
> +				    size,
> +				    PAGE_SIZE,
> +				    AMDGPU_GEM_DOMAIN_DOORBELL,
> +				    &kfd->doorbells,
> +				    NULL,
> +				    (void **)&kfd->doorbell_kernel_ptr);
> +	if (r) {
> +		pr_err("failed to allocate kernel doorbells\n");
> +		bitmap_free(kfd->doorbell_bitmap);
> +		return r;
> +	}
>   
> +	pr_debug("Doorbell kernel address == %p\n", kfd->doorbell_kernel_ptr);
>   	return 0;
>   }
>   
>   void kfd_doorbell_fini(struct kfd_dev *kfd)
>   {
> -	if (kfd->doorbell_kernel_ptr)
> -		iounmap(kfd->doorbell_kernel_ptr);
> +	bitmap_free(kfd->doorbell_bitmap);
> +	amdgpu_bo_free_kernel(&kfd->doorbells, NULL,
> +			     (void **)&kfd->doorbell_kernel_ptr);
>   }
>   
>   int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
> @@ -188,22 +146,15 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
>   	u32 inx;
>   
>   	mutex_lock(&kfd->doorbell_mutex);
> -	inx = find_first_zero_bit(kfd->doorbell_available_index,
> -					KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
> +	inx = find_first_zero_bit(kfd->doorbell_bitmap, PAGE_SIZE / sizeof(u32));
>   
> -	__set_bit(inx, kfd->doorbell_available_index);
> +	__set_bit(inx, kfd->doorbell_bitmap);
>   	mutex_unlock(&kfd->doorbell_mutex);
>   
>   	if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>   		return NULL;
>   
> -	inx *= kfd->device_info.doorbell_size / sizeof(u32);
> -
> -	/*
> -	 * Calculating the kernel doorbell offset using the first
> -	 * doorbell page.
> -	 */
> -	*doorbell_off = kfd->doorbell_base_dw_offset + inx;
> +	*doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx);
>   
>   	pr_debug("Get kernel queue doorbell\n"
>   			"     doorbell offset   == 0x%08X\n"
> @@ -217,11 +168,10 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
>   {
>   	unsigned int inx;
>   
> -	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr)
> -		* sizeof(u32) / kfd->device_info.doorbell_size;
> +	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
>   
>   	mutex_lock(&kfd->doorbell_mutex);
> -	__clear_bit(inx, kfd->doorbell_available_index);
> +	__clear_bit(inx, kfd->doorbell_bitmap);
>   	mutex_unlock(&kfd->doorbell_mutex);
>   }
>   
> @@ -280,7 +230,7 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
>   	if (!pdd->doorbell_index) {
>   		int r = kfd_alloc_process_doorbells(pdd->dev,
>   						    &pdd->doorbell_index);
> -		if (r)
> +		if (r < 0)
>   			return 0;
>   	}
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 552c3ac85a13..0b199eb6dc88 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -346,6 +346,12 @@ struct kfd_dev {
>   
>   	/* HMM page migration MEMORY_DEVICE_PRIVATE mapping */
>   	struct dev_pagemap pgmap;
> +
> +	/* Kernel doorbells for KFD device */
> +	struct amdgpu_bo *doorbells;
> +
> +	/* bitmap for dynamic doorbell allocation from this object */
> +	unsigned long *doorbell_bitmap;
>   };
>   
>   enum kfd_mempool {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells
  2023-04-12 16:25 ` [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells Shashank Sharma
  2023-04-13 11:07   ` Christian König
@ 2023-04-21 20:11   ` Felix Kuehling
  2023-04-22  6:45     ` Shashank Sharma
  1 sibling, 1 reply; 28+ messages in thread
From: Felix Kuehling @ 2023-04-21 20:11 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav

On 2023-04-12 12:25, Shashank Sharma wrote:
> This patch:
> - adds a doorbell object in kfd pdd structure.
> - allocates doorbells for a process while creating its pdd.
> - frees the doorbells with pdd destroy.
> - removes previous calls to allocate process doorbells as
>    its not required anymore.
>
> PS: This patch ensures that we don't break the existing KFD
>      functionality, but now KFD userspace library should also
>      create doorbell pages as AMDGPU GEM objects using libdrm
>      functions in userspace. The reference code for the same
>      is available with AMDGPU Usermode queue libdrm MR. Once
>      this is done, we will not need to create process doorbells
>      in kernel.

I like this approach of keeping existing functionality working, but 
having a transition path for user mode. If I understand it correctly, 
allocating doorbells as GEM objects would enable overcommitment of 
doorbells. That's a capability we haven't had in KFD up to now. Trying 
to launch too many processes that need doorbells would simlpy fail.

With overcommitment, idle processes could have their doorbells objects 
evicted, sot that active processes can work. Doorbell objects would need 
eviction fences to ensure that processes are not scheduled on the GPU 
while their doorbell allocation is in use by other processes. The 
doorbell offset in the queue properties would need to be updated when 
doorbells are validated and potentially moved to different physical pages.

These details would need to be worked out in kernel mode before user 
mode can transition to GEM for doorbell allocation.


>
> V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel
>        instead (Alex).
>      - Do not use custom doorbell structure, instead use separate
>        variables for bo and doorbell_bitmap (Alex)
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 13 ----
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  8 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c     | 65 ++++++++++---------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  9 ++-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 21 +++---
>   .../amd/amdkfd/kfd_process_queue_manager.c    | 22 +++----
>   6 files changed, 64 insertions(+), 74 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 6d291aa6386b..0e40756417e5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -327,12 +327,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   		goto err_bind_process;
>   	}
>   
> -	if (!pdd->doorbell_index &&
> -	    kfd_alloc_process_doorbells(dev, &pdd->doorbell_index) < 0) {
> -		err = -ENOMEM;
> -		goto err_alloc_doorbells;
> -	}
> -

You're moving this to kfd_create_process_device_data, which means 
processes will allocate and pin doorbells, even if they never create 
queues. We specifically moved this to first queue creation to allow more 
ROCm processes to be started, as long as they don't create queues. This 
is important on systems with many GPUs where not all processes need to 
use all GPU. It also helps with certain tools that need to initialized 
ROCr to gather information, but don't need to create queues. See this 
patch for reference:

commit 16f0013157bf8c95d10b9360491e3c920f85641e
Author: Felix Kuehling <Felix.Kuehling@amd.com>
Date:   Wed Aug 3 14:45:45 2022 -0400

     drm/amdkfd: Allocate doorbells only when needed
     
     Only allocate doorbells when the first queue is created on a GPU or the
     doorbells need to be mapped into CPU or GPU virtual address space. This
     avoids allocating doorbells unnecessarily and can allow more processes
     to use KFD on multi-GPU systems.
     
     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
     Reviewed-by: Kent Russell <kent.Russell@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Until we have an alternative way to allocate doorbells that supports 
overcommitment, late doorbell allocation is an important feature. This 
change will cause regressions for some ROCm users.

Regards,
   Felix



>   	/* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
>   	 * on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
>   	 */
> @@ -410,7 +404,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   	if (wptr_bo)
>   		amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo);
>   err_wptr_map_gart:
> -err_alloc_doorbells:
>   err_bind_process:
>   err_pdd:
>   	mutex_unlock(&p->mutex);
> @@ -2163,12 +2156,6 @@ static int criu_restore_devices(struct kfd_process *p,
>   			ret = PTR_ERR(pdd);
>   			goto exit;
>   		}
> -
> -		if (!pdd->doorbell_index &&
> -		    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
> -			ret = -ENOMEM;
> -			goto exit;
> -		}
>   	}
>   
>   	/*
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index ecb4c3abc629..855bf8ce3f16 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -371,7 +371,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
>   			unsigned int found;
>   
>   			found = find_first_zero_bit(qpd->doorbell_bitmap,
> -						KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
> +						    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>   			if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
>   				pr_debug("No doorbells available");
>   				return -EBUSY;
> @@ -381,9 +381,9 @@ static int allocate_doorbell(struct qcm_process_device *qpd,
>   		}
>   	}
>   
> -	q->properties.doorbell_off =
> -		kfd_get_doorbell_dw_offset_in_bar(dev, qpd_to_pdd(qpd),
> -					  q->doorbell_id);
> +	q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev,
> +								  qpd->proc_doorbells,
> +								  q->doorbell_id);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 82b4a56b0afc..718cfe9cb4f5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -227,46 +227,47 @@ uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
>   
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
>   {
> -	if (!pdd->doorbell_index) {
> -		int r = kfd_alloc_process_doorbells(pdd->dev,
> -						    &pdd->doorbell_index);
> -		if (r < 0)
> -			return 0;
> -	}
> +	struct amdgpu_device *adev = pdd->dev->adev;
> +	uint32_t first_db_index;
>   
> -	return pdd->dev->doorbell_base +
> -		pdd->doorbell_index * kfd_doorbell_process_slice(pdd->dev);
> +	first_db_index = amdgpu_doorbell_index_on_bar(adev, pdd->qpd.proc_doorbells, 0);
> +	return adev->doorbell.base + first_db_index * sizeof(uint32_t);
>   }
>   
> -int kfd_alloc_process_doorbells(struct kfd_dev *kfd, unsigned int *doorbell_index)
> +int kfd_alloc_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
>   {
> -	int r = 0;
> -
> -	if (!kfd->shared_resources.enable_mes)
> -		r = ida_simple_get(&kfd->doorbell_ida, 1,
> -				   kfd->max_doorbell_slices, GFP_KERNEL);
> -	else
> -		r = amdgpu_mes_alloc_process_doorbells(
> -				(struct amdgpu_device *)kfd->adev,
> -				doorbell_index);
> +	int r;
> +	struct qcm_process_device *qpd = &pdd->qpd;
>   
> -	if (r > 0)
> -		*doorbell_index = r;
> +	/* Allocate bitmap for dynamic doorbell allocation */
> +	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
> +					     GFP_KERNEL);
> +	if (!qpd->doorbell_bitmap) {
> +		DRM_ERROR("Failed to allocate process doorbell bitmap\n");
> +		return -ENOMEM;
> +	}
>   
> -	if (r < 0)
> -		pr_err("Failed to allocate process doorbells\n");
> +	/* Allocate doorbells for this process */
> +	r = amdgpu_bo_create_kernel(kfd->adev,
> +				    kfd_doorbell_process_slice(kfd),
> +				    PAGE_SIZE,
> +				    AMDGPU_GEM_DOMAIN_DOORBELL,
> +				    &qpd->proc_doorbells,
> +				    NULL,
> +				    NULL);
> +	if (r) {
> +		DRM_ERROR("Failed to allocate process doorbells\n");
> +		bitmap_free(qpd->doorbell_bitmap);
> +		return r;
> +	}
>   
> -	return r;
> +	return 0;
>   }
>   
> -void kfd_free_process_doorbells(struct kfd_dev *kfd, unsigned int doorbell_index)
> +void kfd_free_process_doorbells(struct kfd_dev *kfd, struct kfd_process_device *pdd)
>   {
> -	if (doorbell_index) {
> -		if (!kfd->shared_resources.enable_mes)
> -			ida_simple_remove(&kfd->doorbell_ida, doorbell_index);
> -		else
> -			amdgpu_mes_free_process_doorbells(
> -					(struct amdgpu_device *)kfd->adev,
> -					doorbell_index);
> -	}
> +	struct qcm_process_device *qpd = &pdd->qpd;
> +
> +	bitmap_free(qpd->doorbell_bitmap);
> +	amdgpu_bo_free_kernel(&qpd->proc_doorbells, NULL, NULL);
>   }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 0b199eb6dc88..dfff77379acb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -661,7 +661,10 @@ struct qcm_process_device {
>   	uint64_t ib_base;
>   	void *ib_kaddr;
>   
> -	/* doorbell resources per process per device */
> +	/* doorbells for kfd process */
> +	struct amdgpu_bo *proc_doorbells;
> +
> +	/* bitmap for dynamic doorbell allocation from the bo */
>   	unsigned long *doorbell_bitmap;
>   };
>   
> @@ -1009,9 +1012,9 @@ unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
>   					unsigned int doorbell_id);
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd);
>   int kfd_alloc_process_doorbells(struct kfd_dev *kfd,
> -				unsigned int *doorbell_index);
> +				 struct kfd_process_device *pdd);
>   void kfd_free_process_doorbells(struct kfd_dev *kfd,
> -				unsigned int doorbell_index);
> +				 struct kfd_process_device *pdd);
>   /* GTT Sub-Allocator */
>   
>   int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 51b1683ac5c1..217ff72a97b0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1037,10 +1037,9 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>   			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>   				get_order(KFD_CWSR_TBA_TMA_SIZE));
>   
> -		bitmap_free(pdd->qpd.doorbell_bitmap);
>   		idr_destroy(&pdd->alloc_idr);
>   
> -		kfd_free_process_doorbells(pdd->dev, pdd->doorbell_index);
> +		kfd_free_process_doorbells(pdd->dev, pdd);
>   
>   		if (pdd->dev->shared_resources.enable_mes)
>   			amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev,
> @@ -1453,11 +1452,6 @@ static int init_doorbell_bitmap(struct qcm_process_device *qpd,
>   	if (!KFD_IS_SOC15(dev))
>   		return 0;
>   
> -	qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
> -					     GFP_KERNEL);
> -	if (!qpd->doorbell_bitmap)
> -		return -ENOMEM;
> -
>   	/* Mask out doorbells reserved for SDMA, IH, and VCN on SOC15. */
>   	pr_debug("reserved doorbell 0x%03x - 0x%03x\n", range_start, range_end);
>   	pr_debug("reserved doorbell 0x%03x - 0x%03x\n",
> @@ -1499,9 +1493,15 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   	if (!pdd)
>   		return NULL;
>   
> +	retval = kfd_alloc_process_doorbells(dev, pdd);
> +	if (retval) {
> +		pr_err("failed to allocate process doorbells\n");
> +		goto err_free_pdd;
> +	}
> +
>   	if (init_doorbell_bitmap(&pdd->qpd, dev)) {
>   		pr_err("Failed to init doorbell for process\n");
> -		goto err_free_pdd;
> +		goto err_free_db;
>   	}
>   
>   	pdd->dev = dev;
> @@ -1529,7 +1529,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   						false);
>   		if (retval) {
>   			pr_err("failed to allocate process context bo\n");
> -			goto err_free_pdd;
> +			goto err_free_db;
>   		}
>   		memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE);
>   	}
> @@ -1541,6 +1541,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   
>   	return pdd;
>   
> +err_free_db:
> +	kfd_free_process_doorbells(pdd->dev, pdd);
> +
>   err_free_pdd:
>   	kfree(pdd);
>   	return NULL;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 5137476ec18e..6c95586f08a6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -344,17 +344,19 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>   		goto err_create_queue;
>   	}
>   
> -	if (q && p_doorbell_offset_in_process)
> +	if (q && p_doorbell_offset_in_process) {
>   		/* Return the doorbell offset within the doorbell page
>   		 * to the caller so it can be passed up to user mode
>   		 * (in bytes).
> -		 * There are always 1024 doorbells per process, so in case
> -		 * of 8-byte doorbells, there are two doorbell pages per
> -		 * process.
> +		 * relative doorbell index = Absolute doorbell index -
> +		 * absolute index of first doorbell in the page.
>   		 */
> -		*p_doorbell_offset_in_process =
> -			(q->properties.doorbell_off * sizeof(uint32_t)) &
> -			(kfd_doorbell_process_slice(dev) - 1);
> +		uint32_t first_db_index = amdgpu_doorbell_index_on_bar(pdd->dev->adev,
> +									pdd->qpd.proc_doorbells, 0);
> +
> +		*p_doorbell_offset_in_process = (q->properties.doorbell_off
> +						- first_db_index) * sizeof(uint32_t);
> +	}
>   
>   	pr_debug("PQM After DQM create queue\n");
>   
> @@ -858,12 +860,6 @@ int kfd_criu_restore_queue(struct kfd_process *p,
>   		goto exit;
>   	}
>   
> -	if (!pdd->doorbell_index &&
> -	    kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) < 0) {
> -		ret = -ENOMEM;
> -		goto exit;
> -	}
> -
>   	/* data stored in this order: mqd, ctl_stack */
>   	mqd = q_extra_data;
>   	ctl_stack = mqd + q_data->mqd_size;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-21 19:58   ` Felix Kuehling
@ 2023-04-22  6:39     ` Shashank Sharma
  2023-04-24 19:56       ` Felix Kuehling
  0 siblings, 1 reply; 28+ messages in thread
From: Shashank Sharma @ 2023-04-22  6:39 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav

Hey Felix,

Thanks for your comments, mine inline.


On 21/04/2023 21:58, Felix Kuehling wrote:
>
> On 2023-04-12 12:25, Shashank Sharma wrote:
>> This patch:
>> - adds a doorbell bo in kfd device structure.
>> - creates doorbell page for kfd kernel usages.
>> - updates the get_kernel_doorbell and free_kernel_doorbell functions
>>    accordingly
>>
>> V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   2 -
>>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 110 ++++++----------------
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   6 ++
>>   3 files changed, 36 insertions(+), 82 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index b8936340742b..49e3c4e021af 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -435,8 +435,6 @@ struct kfd_dev *kgd2kfd_probe(struct 
>> amdgpu_device *adev, bool vf)
>>       atomic_set(&kfd->compute_profile, 0);
>>         mutex_init(&kfd->doorbell_mutex);
>> -    memset(&kfd->doorbell_available_index, 0,
>> -        sizeof(kfd->doorbell_available_index));
>>         atomic_set(&kfd->sram_ecc_flag, 0);
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> index cd4e61bf0493..82b4a56b0afc 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> @@ -61,81 +61,39 @@ size_t kfd_doorbell_process_slice(struct kfd_dev 
>> *kfd)
>>   /* Doorbell calculations for device init. */
>>   int kfd_doorbell_init(struct kfd_dev *kfd)
>>   {
>> -    size_t doorbell_start_offset;
>> -    size_t doorbell_aperture_size;
>> -    size_t doorbell_process_limit;
>> +    int r;
>> +    int size = PAGE_SIZE;
>
> On GPUs with 64-bit doorbells, ROCm uses two pages of doorbells per 
> process. In hindsight that's probably overkill and we could live with 
> a single doorbell page per process in terms of the number of doorbells 
> needed. But this is not easy to change due to the way that doorbells 
> are routed to SDMA engines, at least on Arcturus and later MI GPUs 
> that have lots of SDMA engines and queues. We need enough doorbells in 
> each process's doorbell slice to reach all SDMA queues. On Arcturus 
> that's up to 64 queues. The way the routing is set up, only 32 
> doorbells per page are routed to various SDMA engines, so we need two 
> pages.
>
> Changing the doorbell routing is not trivial due to SRIOV support, 
> where the routing is programmed by the hypervisor driver. The 
> hypervisor driver and all guest drivers (Windows and Linux) have to 
> agree on the routing. Changing it breaks the ABI between hypervisor 
> and guest drivers.

If I understand correctly, you are suspecting that we are allocating 
less doorbells per KFD process, which is the first impression it 
creates. But if you observe closely, we have split the total KFD 
doorbells into two parts,

- KFD kernel level doorbells: doorbells which are being used for kernel 
rings tests, written by kernel doorbell_write calls, saved in struct 
kfd->doorbells

   size = PAGE_SIZE.

- KFD process level doorbells: doorbell pages which are allocated by 
kernel but mapped and written by userspace processes, saved in struct 
pdd->qpd->doorbells

size = kfd_doorbell_process_slice.

We realized that we only need 1-2 doorbells for KFD kernel level stuff 
(so kept it one page), but need 2-page of doorbells for KFD process, so 
they are sized accordingly.

We have also run kfd_test_suit and verified the changes for any 
regression. Hope this helps in explaining the design.

- Shashank

>
> Regards,
>   Felix
>
>
>>   -    /*
>> -     * With MES enabled, just set the doorbell base as it is needed
>> -     * to calculate doorbell physical address.
>> -     */
>> -    if (kfd->shared_resources.enable_mes) {
>> -        kfd->doorbell_base =
>> -            kfd->shared_resources.doorbell_physical_address;
>> -        return 0;
>> -    }
>> -
>> -    /*
>> -     * We start with calculations in bytes because the input data might
>> -     * only be byte-aligned.
>> -     * Only after we have done the rounding can we assume any 
>> alignment.
>> -     */
>> -
>> -    doorbell_start_offset =
>> - roundup(kfd->shared_resources.doorbell_start_offset,
>> -                    kfd_doorbell_process_slice(kfd));
>> -
>> -    doorbell_aperture_size =
>> - rounddown(kfd->shared_resources.doorbell_aperture_size,
>> -                    kfd_doorbell_process_slice(kfd));
>> -
>> -    if (doorbell_aperture_size > doorbell_start_offset)
>> -        doorbell_process_limit =
>> -            (doorbell_aperture_size - doorbell_start_offset) /
>> -                        kfd_doorbell_process_slice(kfd);
>> -    else
>> -        return -ENOSPC;
>> -
>> -    if (!kfd->max_doorbell_slices ||
>> -        doorbell_process_limit < kfd->max_doorbell_slices)
>> -        kfd->max_doorbell_slices = doorbell_process_limit;
>> -
>> -    kfd->doorbell_base = 
>> kfd->shared_resources.doorbell_physical_address +
>> -                doorbell_start_offset;
>> -
>> -    kfd->doorbell_base_dw_offset = doorbell_start_offset / sizeof(u32);
>> -
>> -    kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
>> -                       kfd_doorbell_process_slice(kfd));
>> -
>> -    if (!kfd->doorbell_kernel_ptr)
>> +    /* Bitmap to dynamically allocate doorbells from kernel page */
>> +    kfd->doorbell_bitmap = bitmap_zalloc(size / sizeof(u32), 
>> GFP_KERNEL);
>> +    if (!kfd->doorbell_bitmap) {
>> +        DRM_ERROR("Failed to allocate kernel doorbell bitmap\n");
>>           return -ENOMEM;
>> +    }
>>   -    pr_debug("Doorbell initialization:\n");
>> -    pr_debug("doorbell base           == 0x%08lX\n",
>> -            (uintptr_t)kfd->doorbell_base);
>> -
>> -    pr_debug("doorbell_base_dw_offset      == 0x%08lX\n",
>> -            kfd->doorbell_base_dw_offset);
>> -
>> -    pr_debug("doorbell_process_limit  == 0x%08lX\n",
>> -            doorbell_process_limit);
>> -
>> -    pr_debug("doorbell_kernel_offset  == 0x%08lX\n",
>> -            (uintptr_t)kfd->doorbell_base);
>> -
>> -    pr_debug("doorbell aperture size  == 0x%08lX\n",
>> -            kfd->shared_resources.doorbell_aperture_size);
>> -
>> -    pr_debug("doorbell kernel address == %p\n", 
>> kfd->doorbell_kernel_ptr);
>> +    /* Alloc a doorbell page for KFD kernel usages */
>> +    r = amdgpu_bo_create_kernel(kfd->adev,
>> +                    size,
>> +                    PAGE_SIZE,
>> +                    AMDGPU_GEM_DOMAIN_DOORBELL,
>> +                    &kfd->doorbells,
>> +                    NULL,
>> +                    (void **)&kfd->doorbell_kernel_ptr);
>> +    if (r) {
>> +        pr_err("failed to allocate kernel doorbells\n");
>> +        bitmap_free(kfd->doorbell_bitmap);
>> +        return r;
>> +    }
>>   +    pr_debug("Doorbell kernel address == %p\n", 
>> kfd->doorbell_kernel_ptr);
>>       return 0;
>>   }
>>     void kfd_doorbell_fini(struct kfd_dev *kfd)
>>   {
>> -    if (kfd->doorbell_kernel_ptr)
>> -        iounmap(kfd->doorbell_kernel_ptr);
>> +    bitmap_free(kfd->doorbell_bitmap);
>> +    amdgpu_bo_free_kernel(&kfd->doorbells, NULL,
>> +                 (void **)&kfd->doorbell_kernel_ptr);
>>   }
>>     int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process 
>> *process,
>> @@ -188,22 +146,15 @@ void __iomem *kfd_get_kernel_doorbell(struct 
>> kfd_dev *kfd,
>>       u32 inx;
>>         mutex_lock(&kfd->doorbell_mutex);
>> -    inx = find_first_zero_bit(kfd->doorbell_available_index,
>> -                    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>> +    inx = find_first_zero_bit(kfd->doorbell_bitmap, PAGE_SIZE / 
>> sizeof(u32));
>>   -    __set_bit(inx, kfd->doorbell_available_index);
>> +    __set_bit(inx, kfd->doorbell_bitmap);
>>       mutex_unlock(&kfd->doorbell_mutex);
>>         if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
>>           return NULL;
>>   -    inx *= kfd->device_info.doorbell_size / sizeof(u32);
>> -
>> -    /*
>> -     * Calculating the kernel doorbell offset using the first
>> -     * doorbell page.
>> -     */
>> -    *doorbell_off = kfd->doorbell_base_dw_offset + inx;
>> +    *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, 
>> kfd->doorbells, inx);
>>         pr_debug("Get kernel queue doorbell\n"
>>               "     doorbell offset   == 0x%08X\n"
>> @@ -217,11 +168,10 @@ void kfd_release_kernel_doorbell(struct kfd_dev 
>> *kfd, u32 __iomem *db_addr)
>>   {
>>       unsigned int inx;
>>   -    inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr)
>> -        * sizeof(u32) / kfd->device_info.doorbell_size;
>> +    inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
>>         mutex_lock(&kfd->doorbell_mutex);
>> -    __clear_bit(inx, kfd->doorbell_available_index);
>> +    __clear_bit(inx, kfd->doorbell_bitmap);
>>       mutex_unlock(&kfd->doorbell_mutex);
>>   }
>>   @@ -280,7 +230,7 @@ phys_addr_t kfd_get_process_doorbells(struct 
>> kfd_process_device *pdd)
>>       if (!pdd->doorbell_index) {
>>           int r = kfd_alloc_process_doorbells(pdd->dev,
>>                               &pdd->doorbell_index);
>> -        if (r)
>> +        if (r < 0)
>>               return 0;
>>       }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 552c3ac85a13..0b199eb6dc88 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -346,6 +346,12 @@ struct kfd_dev {
>>         /* HMM page migration MEMORY_DEVICE_PRIVATE mapping */
>>       struct dev_pagemap pgmap;
>> +
>> +    /* Kernel doorbells for KFD device */
>> +    struct amdgpu_bo *doorbells;
>> +
>> +    /* bitmap for dynamic doorbell allocation from this object */
>> +    unsigned long *doorbell_bitmap;
>>   };
>>     enum kfd_mempool {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells
  2023-04-21 20:11   ` Felix Kuehling
@ 2023-04-22  6:45     ` Shashank Sharma
  0 siblings, 0 replies; 28+ messages in thread
From: Shashank Sharma @ 2023-04-22  6:45 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav


On 21/04/2023 22:11, Felix Kuehling wrote:
> On 2023-04-12 12:25, Shashank Sharma wrote:
>> This patch:
>> - adds a doorbell object in kfd pdd structure.
>> - allocates doorbells for a process while creating its pdd.
>> - frees the doorbells with pdd destroy.
>> - removes previous calls to allocate process doorbells as
>>    its not required anymore.
>>
>> PS: This patch ensures that we don't break the existing KFD
>>      functionality, but now KFD userspace library should also
>>      create doorbell pages as AMDGPU GEM objects using libdrm
>>      functions in userspace. The reference code for the same
>>      is available with AMDGPU Usermode queue libdrm MR. Once
>>      this is done, we will not need to create process doorbells
>>      in kernel.
>
> I like this approach of keeping existing functionality working, but 
> having a transition path for user mode. If I understand it correctly, 
> allocating doorbells as GEM objects would enable overcommitment of 
> doorbells. That's a capability we haven't had in KFD up to now. Trying 
> to launch too many processes that need doorbells would simlpy fail.
>
> With overcommitment, idle processes could have their doorbells objects 
> evicted, sot that active processes can work. Doorbell objects would 
> need eviction fences to ensure that processes are not scheduled on the 
> GPU while their doorbell allocation is in use by other processes. The 
> doorbell offset in the queue properties would need to be updated when 
> doorbells are validated and potentially moved to different physical 
> pages.
>
> These details would need to be worked out in kernel mode before user 
> mode can transition to GEM for doorbell allocation.
Noted,
>
>
>>
>> V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel
>>        instead (Alex).
>>      - Do not use custom doorbell structure, instead use separate
>>        variables for bo and doorbell_bitmap (Alex)
>>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 13 ----
>>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |  8 +--
>>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c     | 65 ++++++++++---------
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  9 ++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 21 +++---
>>   .../amd/amdkfd/kfd_process_queue_manager.c    | 22 +++----
>>   6 files changed, 64 insertions(+), 74 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index 6d291aa6386b..0e40756417e5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -327,12 +327,6 @@ static int kfd_ioctl_create_queue(struct file 
>> *filep, struct kfd_process *p,
>>           goto err_bind_process;
>>       }
>>   -    if (!pdd->doorbell_index &&
>> -        kfd_alloc_process_doorbells(dev, &pdd->doorbell_index) < 0) {
>> -        err = -ENOMEM;
>> -        goto err_alloc_doorbells;
>> -    }
>> -
>
> You're moving this to kfd_create_process_device_data, which means 
> processes will allocate and pin doorbells, even if they never create 
> queues. We specifically moved this to first queue creation to allow 
> more ROCm processes to be started, as long as they don't create 
> queues. This is important on systems with many GPUs where not all 
> processes need to use all GPU. It also helps with certain tools that 
> need to initialized ROCr to gather information, but don't need to 
> create queues. See this patch for reference:
>
> commit 16f0013157bf8c95d10b9360491e3c920f85641e
> Author: Felix Kuehling <Felix.Kuehling@amd.com>
> Date:   Wed Aug 3 14:45:45 2022 -0400
>
>     drm/amdkfd: Allocate doorbells only when needed
>         Only allocate doorbells when the first queue is created on a 
> GPU or the
>     doorbells need to be mapped into CPU or GPU virtual address space. 
> This
>     avoids allocating doorbells unnecessarily and can allow more 
> processes
>     to use KFD on multi-GPU systems.
>         Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     Reviewed-by: Kent Russell <kent.Russell@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>
> Until we have an alternative way to allocate doorbells that supports 
> overcommitment, late doorbell allocation is an important feature. This 
> change will cause regressions for some ROCm users.

Thanks for pointing this out. I suspected that doorbell creation is 
getting delayed deliberately, but could not understand why. I will move 
the process level doorbell creation to this same point, which will allow 
late doorbell allocation as well.

- Shashank

>
> Regards,
>   Felix
>
>
>
>>       /* Starting with GFX11, wptr BOs must be mapped to GART for MES 
>> to determine work
>>        * on unmapped queues for usermode queue oversubscription (no 
>> aggregated doorbell)
>>        */
>> @@ -410,7 +404,6 @@ static int kfd_ioctl_create_queue(struct file 
>> *filep, struct kfd_process *p,
>>       if (wptr_bo)
>>           amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo);
>>   err_wptr_map_gart:
>> -err_alloc_doorbells:
>>   err_bind_process:
>>   err_pdd:
>>       mutex_unlock(&p->mutex);
>> @@ -2163,12 +2156,6 @@ static int criu_restore_devices(struct 
>> kfd_process *p,
>>               ret = PTR_ERR(pdd);
>>               goto exit;
>>           }
>> -
>> -        if (!pdd->doorbell_index &&
>> -            kfd_alloc_process_doorbells(pdd->dev, 
>> &pdd->doorbell_index) < 0) {
>> -            ret = -ENOMEM;
>> -            goto exit;
>> -        }
>>       }
>>         /*
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> index ecb4c3abc629..855bf8ce3f16 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>> @@ -371,7 +371,7 @@ static int allocate_doorbell(struct 
>> qcm_process_device *qpd,
>>               unsigned int found;
>>                 found = find_first_zero_bit(qpd->doorbell_bitmap,
>> -                        KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>> +                            KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>>               if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
>>                   pr_debug("No doorbells available");
>>                   return -EBUSY;
>> @@ -381,9 +381,9 @@ static int allocate_doorbell(struct 
>> qcm_process_device *qpd,
>>           }
>>       }
>>   -    q->properties.doorbell_off =
>> -        kfd_get_doorbell_dw_offset_in_bar(dev, qpd_to_pdd(qpd),
>> -                      q->doorbell_id);
>> +    q->properties.doorbell_off = 
>> amdgpu_doorbell_index_on_bar(dev->adev,
>> +                                  qpd->proc_doorbells,
>> +                                  q->doorbell_id);
>>       return 0;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> index 82b4a56b0afc..718cfe9cb4f5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> @@ -227,46 +227,47 @@ uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
>>     phys_addr_t kfd_get_process_doorbells(struct kfd_process_device 
>> *pdd)
>>   {
>> -    if (!pdd->doorbell_index) {
>> -        int r = kfd_alloc_process_doorbells(pdd->dev,
>> -                            &pdd->doorbell_index);
>> -        if (r < 0)
>> -            return 0;
>> -    }
>> +    struct amdgpu_device *adev = pdd->dev->adev;
>> +    uint32_t first_db_index;
>>   -    return pdd->dev->doorbell_base +
>> -        pdd->doorbell_index * kfd_doorbell_process_slice(pdd->dev);
>> +    first_db_index = amdgpu_doorbell_index_on_bar(adev, 
>> pdd->qpd.proc_doorbells, 0);
>> +    return adev->doorbell.base + first_db_index * sizeof(uint32_t);
>>   }
>>   -int kfd_alloc_process_doorbells(struct kfd_dev *kfd, unsigned int 
>> *doorbell_index)
>> +int kfd_alloc_process_doorbells(struct kfd_dev *kfd, struct 
>> kfd_process_device *pdd)
>>   {
>> -    int r = 0;
>> -
>> -    if (!kfd->shared_resources.enable_mes)
>> -        r = ida_simple_get(&kfd->doorbell_ida, 1,
>> -                   kfd->max_doorbell_slices, GFP_KERNEL);
>> -    else
>> -        r = amdgpu_mes_alloc_process_doorbells(
>> -                (struct amdgpu_device *)kfd->adev,
>> -                doorbell_index);
>> +    int r;
>> +    struct qcm_process_device *qpd = &pdd->qpd;
>>   -    if (r > 0)
>> -        *doorbell_index = r;
>> +    /* Allocate bitmap for dynamic doorbell allocation */
>> +    qpd->doorbell_bitmap = 
>> bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
>> +                         GFP_KERNEL);
>> +    if (!qpd->doorbell_bitmap) {
>> +        DRM_ERROR("Failed to allocate process doorbell bitmap\n");
>> +        return -ENOMEM;
>> +    }
>>   -    if (r < 0)
>> -        pr_err("Failed to allocate process doorbells\n");
>> +    /* Allocate doorbells for this process */
>> +    r = amdgpu_bo_create_kernel(kfd->adev,
>> +                    kfd_doorbell_process_slice(kfd),
>> +                    PAGE_SIZE,
>> +                    AMDGPU_GEM_DOMAIN_DOORBELL,
>> +                    &qpd->proc_doorbells,
>> +                    NULL,
>> +                    NULL);
>> +    if (r) {
>> +        DRM_ERROR("Failed to allocate process doorbells\n");
>> +        bitmap_free(qpd->doorbell_bitmap);
>> +        return r;
>> +    }
>>   -    return r;
>> +    return 0;
>>   }
>>   -void kfd_free_process_doorbells(struct kfd_dev *kfd, unsigned int 
>> doorbell_index)
>> +void kfd_free_process_doorbells(struct kfd_dev *kfd, struct 
>> kfd_process_device *pdd)
>>   {
>> -    if (doorbell_index) {
>> -        if (!kfd->shared_resources.enable_mes)
>> -            ida_simple_remove(&kfd->doorbell_ida, doorbell_index);
>> -        else
>> -            amdgpu_mes_free_process_doorbells(
>> -                    (struct amdgpu_device *)kfd->adev,
>> -                    doorbell_index);
>> -    }
>> +    struct qcm_process_device *qpd = &pdd->qpd;
>> +
>> +    bitmap_free(qpd->doorbell_bitmap);
>> +    amdgpu_bo_free_kernel(&qpd->proc_doorbells, NULL, NULL);
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 0b199eb6dc88..dfff77379acb 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -661,7 +661,10 @@ struct qcm_process_device {
>>       uint64_t ib_base;
>>       void *ib_kaddr;
>>   -    /* doorbell resources per process per device */
>> +    /* doorbells for kfd process */
>> +    struct amdgpu_bo *proc_doorbells;
>> +
>> +    /* bitmap for dynamic doorbell allocation from the bo */
>>       unsigned long *doorbell_bitmap;
>>   };
>>   @@ -1009,9 +1012,9 @@ unsigned int 
>> kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
>>                       unsigned int doorbell_id);
>>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd);
>>   int kfd_alloc_process_doorbells(struct kfd_dev *kfd,
>> -                unsigned int *doorbell_index);
>> +                 struct kfd_process_device *pdd);
>>   void kfd_free_process_doorbells(struct kfd_dev *kfd,
>> -                unsigned int doorbell_index);
>> +                 struct kfd_process_device *pdd);
>>   /* GTT Sub-Allocator */
>>     int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index 51b1683ac5c1..217ff72a97b0 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -1037,10 +1037,9 @@ static void kfd_process_destroy_pdds(struct 
>> kfd_process *p)
>>               free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>>                   get_order(KFD_CWSR_TBA_TMA_SIZE));
>>   -        bitmap_free(pdd->qpd.doorbell_bitmap);
>>           idr_destroy(&pdd->alloc_idr);
>>   -        kfd_free_process_doorbells(pdd->dev, pdd->doorbell_index);
>> +        kfd_free_process_doorbells(pdd->dev, pdd);
>>             if (pdd->dev->shared_resources.enable_mes)
>>               amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev,
>> @@ -1453,11 +1452,6 @@ static int init_doorbell_bitmap(struct 
>> qcm_process_device *qpd,
>>       if (!KFD_IS_SOC15(dev))
>>           return 0;
>>   -    qpd->doorbell_bitmap = 
>> bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
>> -                         GFP_KERNEL);
>> -    if (!qpd->doorbell_bitmap)
>> -        return -ENOMEM;
>> -
>>       /* Mask out doorbells reserved for SDMA, IH, and VCN on SOC15. */
>>       pr_debug("reserved doorbell 0x%03x - 0x%03x\n", range_start, 
>> range_end);
>>       pr_debug("reserved doorbell 0x%03x - 0x%03x\n",
>> @@ -1499,9 +1493,15 @@ struct kfd_process_device 
>> *kfd_create_process_device_data(struct kfd_dev *dev,
>>       if (!pdd)
>>           return NULL;
>>   +    retval = kfd_alloc_process_doorbells(dev, pdd);
>> +    if (retval) {
>> +        pr_err("failed to allocate process doorbells\n");
>> +        goto err_free_pdd;
>> +    }
>> +
>>       if (init_doorbell_bitmap(&pdd->qpd, dev)) {
>>           pr_err("Failed to init doorbell for process\n");
>> -        goto err_free_pdd;
>> +        goto err_free_db;
>>       }
>>         pdd->dev = dev;
>> @@ -1529,7 +1529,7 @@ struct kfd_process_device 
>> *kfd_create_process_device_data(struct kfd_dev *dev,
>>                           false);
>>           if (retval) {
>>               pr_err("failed to allocate process context bo\n");
>> -            goto err_free_pdd;
>> +            goto err_free_db;
>>           }
>>           memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE);
>>       }
>> @@ -1541,6 +1541,9 @@ struct kfd_process_device 
>> *kfd_create_process_device_data(struct kfd_dev *dev,
>>         return pdd;
>>   +err_free_db:
>> +    kfd_free_process_doorbells(pdd->dev, pdd);
>> +
>>   err_free_pdd:
>>       kfree(pdd);
>>       return NULL;
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> index 5137476ec18e..6c95586f08a6 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
>> @@ -344,17 +344,19 @@ int pqm_create_queue(struct 
>> process_queue_manager *pqm,
>>           goto err_create_queue;
>>       }
>>   -    if (q && p_doorbell_offset_in_process)
>> +    if (q && p_doorbell_offset_in_process) {
>>           /* Return the doorbell offset within the doorbell page
>>            * to the caller so it can be passed up to user mode
>>            * (in bytes).
>> -         * There are always 1024 doorbells per process, so in case
>> -         * of 8-byte doorbells, there are two doorbell pages per
>> -         * process.
>> +         * relative doorbell index = Absolute doorbell index -
>> +         * absolute index of first doorbell in the page.
>>            */
>> -        *p_doorbell_offset_in_process =
>> -            (q->properties.doorbell_off * sizeof(uint32_t)) &
>> -            (kfd_doorbell_process_slice(dev) - 1);
>> +        uint32_t first_db_index = 
>> amdgpu_doorbell_index_on_bar(pdd->dev->adev,
>> +                                    pdd->qpd.proc_doorbells, 0);
>> +
>> +        *p_doorbell_offset_in_process = (q->properties.doorbell_off
>> +                        - first_db_index) * sizeof(uint32_t);
>> +    }
>>         pr_debug("PQM After DQM create queue\n");
>>   @@ -858,12 +860,6 @@ int kfd_criu_restore_queue(struct kfd_process *p,
>>           goto exit;
>>       }
>>   -    if (!pdd->doorbell_index &&
>> -        kfd_alloc_process_doorbells(pdd->dev, &pdd->doorbell_index) 
>> < 0) {
>> -        ret = -ENOMEM;
>> -        goto exit;
>> -    }
>> -
>>       /* data stored in this order: mqd, ctl_stack */
>>       mqd = q_extra_data;
>>       ctl_stack = mqd + q_data->mqd_size;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-22  6:39     ` Shashank Sharma
@ 2023-04-24 19:56       ` Felix Kuehling
  2023-04-25 19:59         ` Shashank Sharma
  0 siblings, 1 reply; 28+ messages in thread
From: Felix Kuehling @ 2023-04-24 19:56 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav

On 2023-04-22 2:39, Shashank Sharma wrote:
> - KFD process level doorbells: doorbell pages which are allocated by 
> kernel but mapped and written by userspace processes, saved in struct 
> pdd->qpd->doorbells
>
> size = kfd_doorbell_process_slice.
>
> We realized that we only need 1-2 doorbells for KFD kernel level stuff 
> (so kept it one page), but need 2-page of doorbells for KFD process, 
> so they are sized accordingly.
>
> We have also run kfd_test_suit and verified the changes for any 
> regression. Hope this helps in explaining the design.

Right, I missed that this was only for kernel doorbells. I wonder 
whether KFD really needs its own page here. I think we only need a 
doorbell for HWS. And when we use MES, I think even that isn't needed 
because MES packet submissions go through amdgpu. So maybe KFD doesn't 
need its own kernel-mode doorbell page any more on systems with user 
graphics mode queues.

Regards,
   Felix


>
> - Shashank 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-24 19:56       ` Felix Kuehling
@ 2023-04-25 19:59         ` Shashank Sharma
  2023-06-09 19:33           ` Felix Kuehling
  0 siblings, 1 reply; 28+ messages in thread
From: Shashank Sharma @ 2023-04-25 19:59 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav


On 24/04/2023 21:56, Felix Kuehling wrote:
> On 2023-04-22 2:39, Shashank Sharma wrote:
>> - KFD process level doorbells: doorbell pages which are allocated by 
>> kernel but mapped and written by userspace processes, saved in struct 
>> pdd->qpd->doorbells
>>
>> size = kfd_doorbell_process_slice.
>>
>> We realized that we only need 1-2 doorbells for KFD kernel level 
>> stuff (so kept it one page), but need 2-page of doorbells for KFD 
>> process, so they are sized accordingly.
>>
>> We have also run kfd_test_suit and verified the changes for any 
>> regression. Hope this helps in explaining the design.
>
> Right, I missed that this was only for kernel doorbells. I wonder 
> whether KFD really needs its own page here. I think we only need a 
> doorbell for HWS. And when we use MES, I think even that isn't needed 
> because MES packet submissions go through amdgpu. So maybe KFD doesn't 
> need its own kernel-mode doorbell page any more on systems with user 
> graphics mode queues.

Yeah, for any IP with MES enabled, KFD doesn't need kernel level 
doorbells. But I still allocated a page just to make sure we do not 
break any non-MES platforms or use cases where MES is deliberately 
disabled from kernel command line. Hope that works for you.

- Shashank

>
> Regards,
>   Felix
>
>
>>
>> - Shashank 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
  2023-04-25 19:59         ` Shashank Sharma
@ 2023-06-09 19:33           ` Felix Kuehling
  0 siblings, 0 replies; 28+ messages in thread
From: Felix Kuehling @ 2023-06-09 19:33 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav


On 2023-04-25 15:59, Shashank Sharma wrote:
>
> On 24/04/2023 21:56, Felix Kuehling wrote:
>> On 2023-04-22 2:39, Shashank Sharma wrote:
>>> - KFD process level doorbells: doorbell pages which are allocated by 
>>> kernel but mapped and written by userspace processes, saved in 
>>> struct pdd->qpd->doorbells
>>>
>>> size = kfd_doorbell_process_slice.
>>>
>>> We realized that we only need 1-2 doorbells for KFD kernel level 
>>> stuff (so kept it one page), but need 2-page of doorbells for KFD 
>>> process, so they are sized accordingly.
>>>
>>> We have also run kfd_test_suit and verified the changes for any 
>>> regression. Hope this helps in explaining the design.
>>
>> Right, I missed that this was only for kernel doorbells. I wonder 
>> whether KFD really needs its own page here. I think we only need a 
>> doorbell for HWS. And when we use MES, I think even that isn't needed 
>> because MES packet submissions go through amdgpu. So maybe KFD 
>> doesn't need its own kernel-mode doorbell page any more on systems 
>> with user graphics mode queues.
>
> Yeah, for any IP with MES enabled, KFD doesn't need kernel level 
> doorbells. But I still allocated a page just to make sure we do not 
> break any non-MES platforms or use cases where MES is deliberately 
> disabled from kernel command line. Hope that works for you.

Even without MES, we still only need one doorbell for HWS. Allocating a 
whole page for that is wasteful. Anyway, I'm OK with cleaning that up later.

Regards,
   Felix


>
> - Shashank
>
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>> - Shashank 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables
  2023-04-12 16:25 ` [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables Shashank Sharma
@ 2023-06-09 19:34   ` Felix Kuehling
  0 siblings, 0 replies; 28+ messages in thread
From: Felix Kuehling @ 2023-06-09 19:34 UTC (permalink / raw)
  To: Shashank Sharma, amd-gfx
  Cc: Alex Deucher, mukul.joshi, contactshashanksharma,
	Christian Koenig, arvind.yadav

On 2023-04-12 12:25, Shashank Sharma wrote:
> This patch removes some variables and functions from KFD
> doorbell handling code, which are no more required since
> doorbell manager is handling doorbell calculations.
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 32 -----------------------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     | 12 ---------
>   2 files changed, 44 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 718cfe9cb4f5..f4088cfd52db 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -193,38 +193,6 @@ void write_kernel_doorbell64(void __iomem *db, u64 value)
>   	}
>   }
>   
> -unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd,
> -					struct kfd_process_device *pdd,
> -					unsigned int doorbell_id)
> -{
> -	/*
> -	 * doorbell_base_dw_offset accounts for doorbells taken by KGD.
> -	 * index * kfd_doorbell_process_slice/sizeof(u32) adjusts to
> -	 * the process's doorbells. The offset returned is in dword
> -	 * units regardless of the ASIC-dependent doorbell size.
> -	 */
> -	if (!kfd->shared_resources.enable_mes)
> -		return kfd->doorbell_base_dw_offset +
> -			pdd->doorbell_index
> -			* kfd_doorbell_process_slice(kfd) / sizeof(u32) +
> -			doorbell_id *
> -			kfd->device_info.doorbell_size / sizeof(u32);
> -	else
> -		return amdgpu_mes_get_doorbell_dw_offset_in_bar(
> -				(struct amdgpu_device *)kfd->adev,
> -				pdd->doorbell_index, doorbell_id);
> -}
> -
> -uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
> -{
> -	uint64_t num_of_elems = (kfd->shared_resources.doorbell_aperture_size -
> -				kfd->shared_resources.doorbell_start_offset) /
> -					kfd_doorbell_process_slice(kfd) + 1;
> -
> -	return num_of_elems;
> -
> -}
> -
>   phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd)
>   {
>   	struct amdgpu_device *adev = pdd->dev->adev;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index dfff77379acb..1bc6a8ed8cda 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -257,15 +257,6 @@ struct kfd_dev {
>   
>   	unsigned int id;		/* topology stub index */
>   
> -	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
> -					 * KFD. It is aligned for mapping
> -					 * into user mode
> -					 */
> -	size_t doorbell_base_dw_offset;	/* Offset from the start of the PCI
> -					 * doorbell BAR to the first KFD
> -					 * doorbell in dwords. GFX reserves
> -					 * the segment before this offset.
> -					 */
>   	u32 __iomem *doorbell_kernel_ptr; /* This is a pointer for a doorbells
>   					   * page used by kernel queue
>   					   */
> @@ -276,8 +267,6 @@ struct kfd_dev {
>   
>   	const struct kfd2kgd_calls *kfd2kgd;
>   	struct mutex doorbell_mutex;
> -	DECLARE_BITMAP(doorbell_available_index,
> -			KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
>   
>   	void *gtt_mem;
>   	uint64_t gtt_start_gpu_addr;
> @@ -754,7 +743,6 @@ struct kfd_process_device {
>   	struct attribute attr_evict;
>   
>   	struct kobject *kobj_stats;
> -	unsigned int doorbell_index;
>   
>   	/*
>   	 * @cu_occupancy: Reports occupancy of Compute Units (CU) of a process

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-06-09 19:34 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12 16:25 [PATCH v2 00/12] AMDGPU doorbell manager Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 01/12] drm/amdgpu: create a new file for " Shashank Sharma
2023-04-13 10:48   ` Christian König
2023-04-13 10:51     ` Christian König
2023-04-12 16:25 ` [PATCH v2 02/12] drm/amdgpu: don't modify num_doorbells for mes Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 03/12] drm/amdgpu: add UAPI for allocating doorbell memory Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 04/12] drm/amdgpu: accommodate DOMAIN/PL_DOORBELL Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 05/12] drm/amdgpu: initialize ttm for doorbells Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 06/12] drm/amdgpu: create kernel doorbell pages Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 07/12] get absolute offset from doorbell index Shashank Sharma
2023-04-13 10:59   ` Christian König
2023-04-12 16:25 ` [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells Shashank Sharma
2023-04-13 11:02   ` Christian König
2023-04-13 13:08     ` Shashank Sharma
2023-04-21 19:58   ` Felix Kuehling
2023-04-22  6:39     ` Shashank Sharma
2023-04-24 19:56       ` Felix Kuehling
2023-04-25 19:59         ` Shashank Sharma
2023-06-09 19:33           ` Felix Kuehling
2023-04-12 16:25 ` [PATCH v2 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells Shashank Sharma
2023-04-13 11:07   ` Christian König
2023-04-21 20:11   ` Felix Kuehling
2023-04-22  6:45     ` Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables Shashank Sharma
2023-06-09 19:34   ` Felix Kuehling
2023-04-12 16:25 ` [PATCH v2 11/12] drm/amdgpu: use doorbell mgr for MES kernel doorbells Shashank Sharma
2023-04-12 16:25 ` [PATCH v2 12/12] drm/amdgpu: cleanup MES process level doorbells Shashank Sharma
2023-04-13 11:19   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.