All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/27] KFD upstreaming
@ 2019-04-28  7:44 Kuehling, Felix
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix

Assorted KFD changes that have been accumulating on amd-kfd-staging. New
features and fixes included:
* Support for VegaM
* Support for systems with multiple PCI domains
* New SDMA queue type that's optimized for XGMI links
* SDMA MQD allocation changes to support future ASICs with more SDMA queues
* Fix for compute profile switching at process termination
* Fix for a circular lock dependency in MMU notifiers
* Fix for TLB flushing bug with XGMI enabled
* Fix for artificial GTT system memory limitation
* Trap handler updates

Amber Lin (1):
  drm/amdkfd: Add domain number into gpu_id

Felix Kuehling (1):
  drm/amdkfd: Fix a circular lock dependency

Harish Kasiviswanathan (1):
  drm/amdkfd: Fix compute profile switching

Jay Cornwall (4):
  drm/amdkfd: Fix gfx8 MEM_VIOL exception handler
  drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL
  drm/amdkfd: Fix gfx9 XNACK state save/restore
  drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]

Kent Russell (2):
  drm/amdkfd: Add VegaM support
  drm/amdgpu: Fix GTT size calculation

Oak Zeng (16):
  drm/amdkfd: Use 64 bit sdma_bitmap
  drm/amdkfd: Add sdma allocation debug message
  drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id
  drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd
  drm/amdkfd: Fix a potential memory leak
  drm/amdkfd: Introduce asic-specific mqd_manager_init function
  drm/amdkfd: Introduce DIQ type mqd manager
  drm/amdkfd: Init mqd managers in device queue manager init
  drm/amdkfd: Add mqd size in mqd manager struct
  drm/amdkfd: Allocate MQD trunk for HIQ and SDMA
  drm/amdkfd: Move non-sdma mqd allocation out of init_mqd
  drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk
  drm/amdkfd: Fix sdma queue map issue
  drm/amdkfd: Introduce XGMI SDMA queue type
  drm/amdkfd: Expose sdma engine numbers to topology
  drm/amdkfd: Delete alloc_format field from map_queue struct

Yong Zhao (1):
  drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue()

shaoyunl (1):
  drm/amdgpu: Use heavy weight for tlb invalidation on xgmi
    configuration

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  53 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   9 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h    | 483 +++++++++---------
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm |  13 -
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  63 +--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c         |   5 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  51 ++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 354 ++++++++-----
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  14 +-
 .../amd/amdkfd/kfd_device_queue_manager_cik.c |   2 +
 .../amd/amdkfd/kfd_device_queue_manager_v9.c  |   1 +
 .../amd/amdkfd/kfd_device_queue_manager_vi.c  |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |   6 +-
 .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |   4 +-
 .../gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  |  70 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   8 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  53 +-
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  85 +--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  53 +-
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |   4 +-
 .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h   |   7 +-
 .../gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h   |   7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  14 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |  13 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h     |   2 +
 drivers/gpu/drm/amd/include/cik_structs.h     |   3 +-
 drivers/gpu/drm/amd/include/v9_structs.h      |   3 +-
 drivers/gpu/drm/amd/include/vi_structs.h      |   3 +-
 include/uapi/linux/kfd_ioctl.h                |   7 +-
 33 files changed, 826 insertions(+), 587 deletions(-)

-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/27] drm/amdkfd: Use 64 bit sdma_bitmap
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 02/27] drm/amdkfd: Add sdma allocation debug message Kuehling, Felix
                     ` (26 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <Oak.Zeng@amd.com>

Maximumly support 64 sdma queues

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 1d6b15788ebf..0b1044dea765 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -891,7 +891,7 @@ static int initialize_nocpsch(struct device_queue_manager *dqm)
 	}
 
 	dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1;
-	dqm->sdma_bitmap = (1 << get_num_sdma_queues(dqm)) - 1;
+	dqm->sdma_bitmap = (1ULL << get_num_sdma_queues(dqm)) - 1;
 
 	return 0;
 }
@@ -929,8 +929,8 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 	if (dqm->sdma_bitmap == 0)
 		return -ENOMEM;
 
-	bit = ffs(dqm->sdma_bitmap) - 1;
-	dqm->sdma_bitmap &= ~(1 << bit);
+	bit = __ffs64(dqm->sdma_bitmap);
+	dqm->sdma_bitmap &= ~(1ULL << bit);
 	*sdma_queue_id = bit;
 
 	return 0;
@@ -941,7 +941,7 @@ static void deallocate_sdma_queue(struct device_queue_manager *dqm,
 {
 	if (sdma_queue_id >= get_num_sdma_queues(dqm))
 		return;
-	dqm->sdma_bitmap |= (1 << sdma_queue_id);
+	dqm->sdma_bitmap |= (1ULL << sdma_queue_id);
 }
 
 static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
@@ -1047,7 +1047,7 @@ static int initialize_cpsch(struct device_queue_manager *dqm)
 	dqm->queue_count = dqm->processes_count = 0;
 	dqm->sdma_queue_count = 0;
 	dqm->active_runlist = false;
-	dqm->sdma_bitmap = (1 << get_num_sdma_queues(dqm)) - 1;
+	dqm->sdma_bitmap = (1ULL << get_num_sdma_queues(dqm)) - 1;
 
 	INIT_WORK(&dqm->hw_exception_work, kfd_process_hw_exception);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 70e38a2e23b9..2770f3ece89f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -188,7 +188,7 @@ struct device_queue_manager {
 	unsigned int		total_queue_count;
 	unsigned int		next_pipe_to_allocate;
 	unsigned int		*allocated_queues;
-	unsigned int		sdma_bitmap;
+	uint64_t		sdma_bitmap;
 	unsigned int		vmid_bitmap;
 	uint64_t		pipelines_addr;
 	struct kfd_mem_obj	*pipeline_mem;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/27] drm/amdkfd: Add sdma allocation debug message
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2019-04-28  7:44   ` [PATCH 01/27] drm/amdkfd: Use 64 bit sdma_bitmap Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 03/27] drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id Kuehling, Felix
                     ` (25 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <Oak.Zeng@amd.com>

Add debug messages during SDMA queue allocation.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 0b1044dea765..937ed1a7050d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1177,6 +1177,9 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 			q->sdma_id / get_num_sdma_engines(dqm);
 		q->properties.sdma_engine_id =
 			q->sdma_id % get_num_sdma_engines(dqm);
+		pr_debug("SDMA id is:    %d\n", q->sdma_id);
+		pr_debug("SDMA queue id: %d\n", q->properties.sdma_queue_id);
+		pr_debug("SDMA engine id: %d\n", q->properties.sdma_engine_id);
 	}
 
 	retval = allocate_doorbell(qpd, q);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/27] drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2019-04-28  7:44   ` [PATCH 01/27] drm/amdkfd: Use 64 bit sdma_bitmap Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 02/27] drm/amdkfd: Add sdma allocation debug message Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 05/27] drm/amdkfd: Fix a potential memory leak Kuehling, Felix
                     ` (24 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

sdma_queue_id is sdma queue index inside one sdma engine.
sdma_id is sdma queue index among all sdma engines. Use
those two names properly.

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 937ed1a7050d..7e5ead042dc0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -922,7 +922,7 @@ static int stop_nocpsch(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-				unsigned int *sdma_queue_id)
+				unsigned int *sdma_id)
 {
 	int bit;
 
@@ -931,17 +931,17 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 
 	bit = __ffs64(dqm->sdma_bitmap);
 	dqm->sdma_bitmap &= ~(1ULL << bit);
-	*sdma_queue_id = bit;
+	*sdma_id = bit;
 
 	return 0;
 }
 
 static void deallocate_sdma_queue(struct device_queue_manager *dqm,
-				unsigned int sdma_queue_id)
+				unsigned int sdma_id)
 {
-	if (sdma_queue_id >= get_num_sdma_queues(dqm))
+	if (sdma_id >= get_num_sdma_queues(dqm))
 		return;
-	dqm->sdma_bitmap |= (1ULL << sdma_queue_id);
+	dqm->sdma_bitmap |= (1ULL << sdma_id);
 }
 
 static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/27] drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 05/27] drm/amdkfd: Fix a potential memory leak Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 06/27] drm/amdkfd: Introduce asic-specific mqd_manager_init function Kuehling, Felix
                     ` (22 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

FW of some new ASICs requires sdma mqd size to be not more than
128 dwords. Repurpose the last 2 reserved fields of sdma mqd for
driver internal use, so the total mqd size is no bigger than 128
dwords

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/include/cik_structs.h | 3 +--
 drivers/gpu/drm/amd/include/v9_structs.h  | 3 +--
 drivers/gpu/drm/amd/include/vi_structs.h  | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/cik_structs.h b/drivers/gpu/drm/amd/include/cik_structs.h
index 749eab94e335..699e658c3cec 100644
--- a/drivers/gpu/drm/amd/include/cik_structs.h
+++ b/drivers/gpu/drm/amd/include/cik_structs.h
@@ -282,8 +282,7 @@ struct cik_sdma_rlc_registers {
 	uint32_t reserved_123;
 	uint32_t reserved_124;
 	uint32_t reserved_125;
-	uint32_t reserved_126;
-	uint32_t reserved_127;
+	/* reserved_126,127: repurposed for driver-internal use */
 	uint32_t sdma_engine_id;
 	uint32_t sdma_queue_id;
 };
diff --git a/drivers/gpu/drm/amd/include/v9_structs.h b/drivers/gpu/drm/amd/include/v9_structs.h
index ceaf4932258d..8b383dbe1cda 100644
--- a/drivers/gpu/drm/amd/include/v9_structs.h
+++ b/drivers/gpu/drm/amd/include/v9_structs.h
@@ -151,8 +151,7 @@ struct v9_sdma_mqd {
 	uint32_t reserved_123;
 	uint32_t reserved_124;
 	uint32_t reserved_125;
-	uint32_t reserved_126;
-	uint32_t reserved_127;
+	/* reserved_126,127: repurposed for driver-internal use */
 	uint32_t sdma_engine_id;
 	uint32_t sdma_queue_id;
 };
diff --git a/drivers/gpu/drm/amd/include/vi_structs.h b/drivers/gpu/drm/amd/include/vi_structs.h
index 717fbae1d362..c17613287cd0 100644
--- a/drivers/gpu/drm/amd/include/vi_structs.h
+++ b/drivers/gpu/drm/amd/include/vi_structs.h
@@ -151,8 +151,7 @@ struct vi_sdma_mqd {
 	uint32_t reserved_123;
 	uint32_t reserved_124;
 	uint32_t reserved_125;
-	uint32_t reserved_126;
-	uint32_t reserved_127;
+	/* reserved_126,127: repurposed for driver-internal use */
 	uint32_t sdma_engine_id;
 	uint32_t sdma_queue_id;
 };
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/27] drm/amdkfd: Fix a potential memory leak
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 03/27] drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 04/27] drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd Kuehling, Felix
                     ` (23 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

Free mqd_mem_obj it GTT buffer allocation for MQD+control stack fails.

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 9dbba609450e..8fe74b821b32 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -76,6 +76,7 @@ static int init_mqd(struct mqd_manager *mm, void **mqd,
 	struct v9_mqd *m;
 	struct kfd_dev *kfd = mm->dev;
 
+	*mqd_mem_obj = NULL;
 	/* From V9,  for CWSR, the control stack is located on the next page
 	 * boundary after the mqd, we will use the gtt allocation function
 	 * instead of sub-allocation function.
@@ -93,8 +94,10 @@ static int init_mqd(struct mqd_manager *mm, void **mqd,
 	} else
 		retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
 				mqd_mem_obj);
-	if (retval != 0)
+	if (retval) {
+		kfree(*mqd_mem_obj);
 		return -ENOMEM;
+	}
 
 	m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
 	addr = (*mqd_mem_obj)->gpu_addr;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/27] drm/amdkfd: Introduce asic-specific mqd_manager_init function
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 04/27] drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 07/27] drm/amdkfd: Introduce DIQ type mqd manager Kuehling, Felix
                     ` (21 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

Global function mqd_manager_init just calls asic-specific functions and it
is not necessary. Delete it and introduce a mqd_manager_init interface in
dqm for asic-specific mqd manager init. Call mqd_manager_init interface
directly to initialize mqd manager

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  2 ++
 .../amd/amdkfd/kfd_device_queue_manager_cik.c |  2 ++
 .../amd/amdkfd/kfd_device_queue_manager_v9.c  |  1 +
 .../amd/amdkfd/kfd_device_queue_manager_vi.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  | 29 -------------------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 --
 7 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7e5ead042dc0..a5a8643c04fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -583,7 +583,7 @@ static struct mqd_manager *get_mqd_manager(
 
 	mqd_mgr = dqm->mqd_mgrs[type];
 	if (!mqd_mgr) {
-		mqd_mgr = mqd_manager_init(type, dqm->dev);
+		mqd_mgr = dqm->asic_ops.mqd_manager_init(type, dqm->dev);
 		if (!mqd_mgr)
 			pr_err("mqd manager is NULL");
 		dqm->mqd_mgrs[type] = mqd_mgr;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 2770f3ece89f..a5d83ec1c6a8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -158,6 +158,8 @@ struct device_queue_manager_asic_ops {
 	void	(*init_sdma_vm)(struct device_queue_manager *dqm,
 				struct queue *q,
 				struct qcm_process_device *qpd);
+	struct mqd_manager *	(*mqd_manager_init)(enum KFD_MQD_TYPE type,
+				 struct kfd_dev *dev);
 };
 
 /**
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c
index aed4c21417bf..0d26506798cf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c
@@ -48,6 +48,7 @@ void device_queue_manager_init_cik(
 	asic_ops->set_cache_memory_policy = set_cache_memory_policy_cik;
 	asic_ops->update_qpd = update_qpd_cik;
 	asic_ops->init_sdma_vm = init_sdma_vm;
+	asic_ops->mqd_manager_init = mqd_manager_init_cik;
 }
 
 void device_queue_manager_init_cik_hawaii(
@@ -56,6 +57,7 @@ void device_queue_manager_init_cik_hawaii(
 	asic_ops->set_cache_memory_policy = set_cache_memory_policy_cik;
 	asic_ops->update_qpd = update_qpd_cik_hawaii;
 	asic_ops->init_sdma_vm = init_sdma_vm_hawaii;
+	asic_ops->mqd_manager_init = mqd_manager_init_cik_hawaii;
 }
 
 static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
index 417515332c35..e9fe39382371 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
@@ -37,6 +37,7 @@ void device_queue_manager_init_v9(
 {
 	asic_ops->update_qpd = update_qpd_v9;
 	asic_ops->init_sdma_vm = init_sdma_vm_v9;
+	asic_ops->mqd_manager_init = mqd_manager_init_v9;
 }
 
 static uint32_t compute_sh_mem_bases_64bit(struct kfd_process_device *pdd)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_vi.c
index c3a5dcfe877a..3a7cb2f88366 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_vi.c
@@ -54,6 +54,7 @@ void device_queue_manager_init_vi(
 	asic_ops->set_cache_memory_policy = set_cache_memory_policy_vi;
 	asic_ops->update_qpd = update_qpd_vi;
 	asic_ops->init_sdma_vm = init_sdma_vm;
+	asic_ops->mqd_manager_init = mqd_manager_init_vi;
 }
 
 void device_queue_manager_init_vi_tonga(
@@ -62,6 +63,7 @@ void device_queue_manager_init_vi_tonga(
 	asic_ops->set_cache_memory_policy = set_cache_memory_policy_vi_tonga;
 	asic_ops->update_qpd = update_qpd_vi_tonga;
 	asic_ops->init_sdma_vm = init_sdma_vm_tonga;
+	asic_ops->mqd_manager_init = mqd_manager_init_vi_tonga;
 }
 
 static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index aed9b9b82213..eeb2b60a36b5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -24,35 +24,6 @@
 #include "kfd_mqd_manager.h"
 #include "amdgpu_amdkfd.h"
 
-struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
-					struct kfd_dev *dev)
-{
-	switch (dev->device_info->asic_family) {
-	case CHIP_KAVERI:
-		return mqd_manager_init_cik(type, dev);
-	case CHIP_HAWAII:
-		return mqd_manager_init_cik_hawaii(type, dev);
-	case CHIP_CARRIZO:
-		return mqd_manager_init_vi(type, dev);
-	case CHIP_TONGA:
-	case CHIP_FIJI:
-	case CHIP_POLARIS10:
-	case CHIP_POLARIS11:
-	case CHIP_POLARIS12:
-		return mqd_manager_init_vi_tonga(type, dev);
-	case CHIP_VEGA10:
-	case CHIP_VEGA12:
-	case CHIP_VEGA20:
-	case CHIP_RAVEN:
-		return mqd_manager_init_v9(type, dev);
-	default:
-		WARN(1, "Unexpected ASIC family %u",
-		     dev->device_info->asic_family);
-	}
-
-	return NULL;
-}
-
 void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm,
 		const uint32_t *cu_mask, uint32_t cu_mask_count,
 		uint32_t *se_mask)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 9e0230965675..3dd48da0e2d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -816,8 +816,6 @@ void uninit_queue(struct queue *q);
 void print_queue_properties(struct queue_properties *q);
 void print_queue(struct queue *q);
 
-struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
-					struct kfd_dev *dev);
 struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		struct kfd_dev *dev);
 struct mqd_manager *mqd_manager_init_cik_hawaii(enum KFD_MQD_TYPE type,
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/27] drm/amdkfd: Introduce DIQ type mqd manager
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 06/27] drm/amdkfd: Introduce asic-specific mqd_manager_init function Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 08/27] drm/amdkfd: Init mqd managers in device queue manager init Kuehling, Felix
                     ` (20 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

With introduction of new mqd allocation scheme for HIQ,
DIQ and HIQ use different mqd allocation scheme, DIQ
can't reuse HIQ mqd manager

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c         |  3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c      | 11 +++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c       | 11 +++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c       | 11 +++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h                 |  1 +
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c    |  1 -
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index f1596881f20a..58bb3ad233a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -58,6 +58,9 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
 	kq->nop_packet = nop.u32all;
 	switch (type) {
 	case KFD_QUEUE_TYPE_DIQ:
+		kq->mqd_mgr = dev->dqm->ops.get_mqd_manager(dev->dqm,
+						KFD_MQD_TYPE_DIQ);
+		break;
 	case KFD_QUEUE_TYPE_HIQ:
 		kq->mqd_mgr = dev->dqm->ops.get_mqd_manager(dev->dqm,
 						KFD_MQD_TYPE_HIQ);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index ae90a99909ef..e69bb4d3c3a9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -413,6 +413,17 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->is_occupied = is_occupied;
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
+#endif
+		break;
+	case KFD_MQD_TYPE_DIQ:
+		mqd->init_mqd = init_mqd_hiq;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd_hiq;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+#if defined(CONFIG_DEBUG_FS)
+		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
 		break;
 	case KFD_MQD_TYPE_SDMA:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 8fe74b821b32..273aad4f59c8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -475,6 +475,17 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->is_occupied = is_occupied;
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
+#endif
+		break;
+	case KFD_MQD_TYPE_DIQ:
+		mqd->init_mqd = init_mqd_hiq;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd_hiq;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+#if defined(CONFIG_DEBUG_FS)
+		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
 		break;
 	case KFD_MQD_TYPE_SDMA:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 6469b3456f00..67bd590a82fc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -472,6 +472,17 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->is_occupied = is_occupied;
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
+#endif
+		break;
+	case KFD_MQD_TYPE_DIQ:
+		mqd->init_mqd = init_mqd_hiq;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd_hiq;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+#if defined(CONFIG_DEBUG_FS)
+		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
 		break;
 	case KFD_MQD_TYPE_SDMA:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 3dd48da0e2d6..d1d60336172a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -472,6 +472,7 @@ enum KFD_MQD_TYPE {
 	KFD_MQD_TYPE_HIQ,		/* for hiq */
 	KFD_MQD_TYPE_CP,		/* for cp queues and diq */
 	KFD_MQD_TYPE_SDMA,		/* for sdma queues */
+	KFD_MQD_TYPE_DIQ,		/* for diq */
 	KFD_MQD_TYPE_MAX
 };
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index fcaaf93681ac..7671658ef1f1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -470,7 +470,6 @@ int pqm_debugfs_mqds(struct seq_file *m, void *data)
 			case KFD_QUEUE_TYPE_DIQ:
 				seq_printf(m, "  DIQ on device %x\n",
 					   pqn->kq->dev->id);
-				mqd_type = KFD_MQD_TYPE_HIQ;
 				break;
 			default:
 				seq_printf(m,
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/27] drm/amdkfd: Init mqd managers in device queue manager init
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 07/27] drm/amdkfd: Introduce DIQ type mqd manager Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 09/27] drm/amdkfd: Add mqd size in mqd manager struct Kuehling, Felix
                     ` (19 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

Previously mqd managers was initialized on demand. As there
are only a few type of mqd managers, the on demand initialization
doesn't save too much memory. Initialize them on device
queue initialization instead and delete the get_mqd_manager
interface. This makes codes more organized for future changes.

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 127 ++++++------------
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   6 -
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |   6 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |   3 +-
 4 files changed, 47 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index a5a8643c04fc..063625c3646b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -368,9 +368,7 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
 	struct mqd_manager *mqd_mgr;
 	int retval;
 
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm, KFD_MQD_TYPE_COMPUTE);
-	if (!mqd_mgr)
-		return -ENOMEM;
+	mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_COMPUTE];
 
 	retval = allocate_hqd(dqm, q);
 	if (retval)
@@ -425,10 +423,8 @@ static int destroy_queue_nocpsch_locked(struct device_queue_manager *dqm,
 	int retval;
 	struct mqd_manager *mqd_mgr;
 
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-		get_mqd_type_from_queue_type(q->properties.type));
-	if (!mqd_mgr)
-		return -ENOMEM;
+	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+			q->properties.type)];
 
 	if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE) {
 		deallocate_hqd(dqm, q);
@@ -501,12 +497,8 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 		retval = -ENODEV;
 		goto out_unlock;
 	}
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-			get_mqd_type_from_queue_type(q->properties.type));
-	if (!mqd_mgr) {
-		retval = -ENOMEM;
-		goto out_unlock;
-	}
+	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+			q->properties.type)];
 	/*
 	 * Eviction state logic: we only mark active queues as evicted
 	 * to avoid the overhead of restoring inactive queues later
@@ -571,27 +563,6 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 	return retval;
 }
 
-static struct mqd_manager *get_mqd_manager(
-		struct device_queue_manager *dqm, enum KFD_MQD_TYPE type)
-{
-	struct mqd_manager *mqd_mgr;
-
-	if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
-		return NULL;
-
-	pr_debug("mqd type %d\n", type);
-
-	mqd_mgr = dqm->mqd_mgrs[type];
-	if (!mqd_mgr) {
-		mqd_mgr = dqm->asic_ops.mqd_manager_init(type, dqm->dev);
-		if (!mqd_mgr)
-			pr_err("mqd manager is NULL");
-		dqm->mqd_mgrs[type] = mqd_mgr;
-	}
-
-	return mqd_mgr;
-}
-
 static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
@@ -612,13 +583,8 @@ static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
 	list_for_each_entry(q, &qpd->queues_list, list) {
 		if (!q->properties.is_active)
 			continue;
-		mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-			get_mqd_type_from_queue_type(q->properties.type));
-		if (!mqd_mgr) { /* should not be here */
-			pr_err("Cannot evict queue, mqd mgr is NULL\n");
-			retval = -ENOMEM;
-			goto out;
-		}
+		mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+				q->properties.type)];
 		q->properties.is_evicted = true;
 		q->properties.is_active = false;
 		retval = mqd_mgr->destroy_mqd(mqd_mgr, q->mqd,
@@ -717,13 +683,8 @@ static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
 	list_for_each_entry(q, &qpd->queues_list, list) {
 		if (!q->properties.is_evicted)
 			continue;
-		mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-			get_mqd_type_from_queue_type(q->properties.type));
-		if (!mqd_mgr) { /* should not be here */
-			pr_err("Cannot restore queue, mqd mgr is NULL\n");
-			retval = -ENOMEM;
-			goto out;
-		}
+		mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+				q->properties.type)];
 		q->properties.is_evicted = false;
 		q->properties.is_active = true;
 		retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe,
@@ -951,9 +912,7 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 	struct mqd_manager *mqd_mgr;
 	int retval;
 
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm, KFD_MQD_TYPE_SDMA);
-	if (!mqd_mgr)
-		return -ENOMEM;
+	mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_SDMA];
 
 	retval = allocate_sdma_queue(dqm, &q->sdma_id);
 	if (retval)
@@ -1186,17 +1145,8 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	if (retval)
 		goto out_deallocate_sdma_queue;
 
-	/* Do init_mqd before dqm_lock(dqm) to avoid circular locking order:
-	 * lock(dqm) -> bo::reserve
-	 */
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-			get_mqd_type_from_queue_type(q->properties.type));
-
-	if (!mqd_mgr) {
-		retval = -ENOMEM;
-		goto out_deallocate_doorbell;
-	}
-
+	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+			q->properties.type)];
 	/*
 	 * Eviction state logic: we only mark active queues as evicted
 	 * to avoid the overhead of restoring inactive queues later
@@ -1381,12 +1331,8 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
 
 	}
 
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-			get_mqd_type_from_queue_type(q->properties.type));
-	if (!mqd_mgr) {
-		retval = -ENOMEM;
-		goto failed;
-	}
+	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+			q->properties.type)];
 
 	deallocate_doorbell(qpd, q);
 
@@ -1420,7 +1366,6 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
 
 	return retval;
 
-failed:
 failed_try_destroy_debugged_queue:
 
 	dqm_unlock(dqm);
@@ -1566,11 +1511,7 @@ static int get_wave_state(struct device_queue_manager *dqm,
 		goto dqm_unlock;
 	}
 
-	mqd_mgr = dqm->ops.get_mqd_manager(dqm, KFD_MQD_TYPE_COMPUTE);
-	if (!mqd_mgr) {
-		r = -ENOMEM;
-		goto dqm_unlock;
-	}
+	mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_COMPUTE];
 
 	if (!mqd_mgr->get_wave_state) {
 		r = -EINVAL;
@@ -1645,21 +1586,40 @@ static int process_termination_cpsch(struct device_queue_manager *dqm,
 	 * Do uninit_mqd() after dqm_unlock to avoid circular locking.
 	 */
 	list_for_each_entry_safe(q, next, &qpd->queues_list, list) {
-		mqd_mgr = dqm->ops.get_mqd_manager(dqm,
-			get_mqd_type_from_queue_type(q->properties.type));
-		if (!mqd_mgr) {
-			retval = -ENOMEM;
-			goto out;
-		}
+		mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
+				q->properties.type)];
 		list_del(&q->list);
 		qpd->queue_count--;
 		mqd_mgr->uninit_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
 	}
 
-out:
 	return retval;
 }
 
+static int init_mqd_managers(struct device_queue_manager *dqm)
+{
+	int i, j;
+	struct mqd_manager *mqd_mgr;
+
+	for (i = 0; i < KFD_MQD_TYPE_MAX; i++) {
+		mqd_mgr = dqm->asic_ops.mqd_manager_init(i, dqm->dev);
+		if (!mqd_mgr) {
+			pr_err("mqd manager [%d] initialization failed\n", i);
+			goto out_free;
+		}
+		dqm->mqd_mgrs[i] = mqd_mgr;
+	}
+
+	return 0;
+
+out_free:
+	for (j = 0; j < i; j++) {
+		kfree(dqm->mqd_mgrs[j]);
+		dqm->mqd_mgrs[j] = NULL;
+	}
+
+	return -ENOMEM;
+}
 struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 {
 	struct device_queue_manager *dqm;
@@ -1697,7 +1657,6 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.stop = stop_cpsch;
 		dqm->ops.destroy_queue = destroy_queue_cpsch;
 		dqm->ops.update_queue = update_queue;
-		dqm->ops.get_mqd_manager = get_mqd_manager;
 		dqm->ops.register_process = register_process;
 		dqm->ops.unregister_process = unregister_process;
 		dqm->ops.uninitialize = uninitialize;
@@ -1717,7 +1676,6 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.create_queue = create_queue_nocpsch;
 		dqm->ops.destroy_queue = destroy_queue_nocpsch;
 		dqm->ops.update_queue = update_queue;
-		dqm->ops.get_mqd_manager = get_mqd_manager;
 		dqm->ops.register_process = register_process;
 		dqm->ops.unregister_process = unregister_process;
 		dqm->ops.initialize = initialize_nocpsch;
@@ -1768,6 +1726,9 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		goto out_free;
 	}
 
+	if (init_mqd_managers(dqm))
+		goto out_free;
+
 	if (!dqm->ops.initialize(dqm))
 		return dqm;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index a5d83ec1c6a8..a5ef7a6650a5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -48,8 +48,6 @@ struct device_process_node {
  *
  * @update_queue: Queue update routine.
  *
- * @get_mqd_manager: Returns the mqd manager according to the mqd type.
- *
  * @exeute_queues: Dispatches the queues list to the H/W.
  *
  * @register_process: This routine associates a specific process with device.
@@ -97,10 +95,6 @@ struct device_queue_manager_ops {
 	int	(*update_queue)(struct device_queue_manager *dqm,
 				struct queue *q);
 
-	struct mqd_manager * (*get_mqd_manager)
-					(struct device_queue_manager *dqm,
-					enum KFD_MQD_TYPE type);
-
 	int	(*register_process)(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 58bb3ad233a1..7a737b50bed4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -58,12 +58,10 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
 	kq->nop_packet = nop.u32all;
 	switch (type) {
 	case KFD_QUEUE_TYPE_DIQ:
-		kq->mqd_mgr = dev->dqm->ops.get_mqd_manager(dev->dqm,
-						KFD_MQD_TYPE_DIQ);
+		kq->mqd_mgr = dev->dqm->mqd_mgrs[KFD_MQD_TYPE_DIQ];
 		break;
 	case KFD_QUEUE_TYPE_HIQ:
-		kq->mqd_mgr = dev->dqm->ops.get_mqd_manager(dev->dqm,
-						KFD_MQD_TYPE_HIQ);
+		kq->mqd_mgr = dev->dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ];
 		break;
 	default:
 		pr_err("Invalid queue type %d\n", type);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 7671658ef1f1..f18d9cdf9aac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -461,8 +461,7 @@ int pqm_debugfs_mqds(struct seq_file *m, void *data)
 					   q->properties.type, q->device->id);
 				continue;
 			}
-			mqd_mgr = q->device->dqm->ops.get_mqd_manager(
-				q->device->dqm, mqd_type);
+			mqd_mgr = q->device->dqm->mqd_mgrs[mqd_type];
 		} else if (pqn->kq) {
 			q = pqn->kq->queue;
 			mqd_mgr = pqn->kq->mqd_mgr;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/27] drm/amdkfd: Add mqd size in mqd manager struct
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 08/27] drm/amdkfd: Init mqd managers in device queue manager init Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 10/27] drm/amdkfd: Allocate MQD trunk for HIQ and SDMA Kuehling, Felix
                     ` (18 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

Also initialize mqd size on mqd manager initialization

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h     | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 4 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c  | 4 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c  | 4 ++++
 4 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
index f8261313ae7b..009d232fb60b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -99,6 +99,7 @@ struct mqd_manager {
 
 	struct mutex	mqd_mutex;
 	struct kfd_dev	*dev;
+	uint32_t mqd_size;
 };
 
 void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index e69bb4d3c3a9..eec131b801b0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -400,6 +400,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct cik_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -411,6 +412,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct cik_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -422,6 +424,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct cik_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -433,6 +436,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
+		mqd->mqd_size = sizeof(struct cik_sdma_rlc_registers);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 273aad4f59c8..15274a880ea2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -462,6 +462,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
 		mqd->get_wave_state = get_wave_state;
+		mqd->mqd_size = sizeof(struct v9_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -473,6 +474,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct v9_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -484,6 +486,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct v9_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -495,6 +498,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
+		mqd->mqd_size = sizeof(struct v9_sdma_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 67bd590a82fc..ad9dc9a678f2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -459,6 +459,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
 		mqd->get_wave_state = get_wave_state;
+		mqd->mqd_size = sizeof(struct vi_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -470,6 +471,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct vi_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -481,6 +483,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->mqd_size = sizeof(struct vi_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -492,6 +495,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
+		mqd->mqd_size = sizeof(struct vi_sdma_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
 #endif
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/27] drm/amdkfd: Allocate MQD trunk for HIQ and SDMA
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 09/27] drm/amdkfd: Add mqd size in mqd manager struct Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 11/27] drm/amdkfd: Move non-sdma mqd allocation out of init_mqd Kuehling, Felix
                     ` (17 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

MEC FW for some new asic requires all SDMA MQDs to be in a continuous
trunk of memory right after HIQ MQD. Add a field in device queue manager
to hold the HIQ/SDMA MQD memory object and allocate MQD trunk on device
queue manager initialization.

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 32 +++++++++++++++++++
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 063625c3646b..e2de246d681b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1620,6 +1620,25 @@ static int init_mqd_managers(struct device_queue_manager *dqm)
 
 	return -ENOMEM;
 }
+
+/* Allocate one hiq mqd (HWS) and all SDMA mqd in a continuous trunk*/
+static int allocate_hiq_sdma_mqd(struct device_queue_manager *dqm)
+{
+	int retval;
+	struct kfd_dev *dev = dqm->dev;
+	struct kfd_mem_obj *mem_obj = &dqm->hiq_sdma_mqd;
+	uint32_t size = dqm->mqd_mgrs[KFD_MQD_TYPE_SDMA]->mqd_size *
+		dev->device_info->num_sdma_engines *
+		dev->device_info->num_sdma_queues_per_engine +
+		dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ]->mqd_size;
+
+	retval = amdgpu_amdkfd_alloc_gtt_mem(dev->kgd, size,
+		&(mem_obj->gtt_mem), &(mem_obj->gpu_addr),
+		(void *)&(mem_obj->cpu_ptr), true);
+
+	return retval;
+}
+
 struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 {
 	struct device_queue_manager *dqm;
@@ -1729,6 +1748,11 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 	if (init_mqd_managers(dqm))
 		goto out_free;
 
+	if (allocate_hiq_sdma_mqd(dqm)) {
+		pr_err("Failed to allocate hiq sdma mqd trunk buffer\n");
+		goto out_free;
+	}
+
 	if (!dqm->ops.initialize(dqm))
 		return dqm;
 
@@ -1737,9 +1761,17 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 	return NULL;
 }
 
+void deallocate_hiq_sdma_mqd(struct kfd_dev *dev, struct kfd_mem_obj *mqd)
+{
+	WARN(!mqd, "No hiq sdma mqd trunk to free");
+
+	amdgpu_amdkfd_free_gtt_mem(dev->kgd, mqd->gtt_mem);
+}
+
 void device_queue_manager_uninit(struct device_queue_manager *dqm)
 {
 	dqm->ops.uninitialize(dqm);
+	deallocate_hiq_sdma_mqd(dqm->dev, &dqm->hiq_sdma_mqd);
 	kfree(dqm);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index a5ef7a6650a5..3742fd340ec3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -197,6 +197,7 @@ struct device_queue_manager {
 	/* hw exception  */
 	bool			is_hws_hang;
 	struct work_struct	hw_exception_work;
+	struct kfd_mem_obj	hiq_sdma_mqd;
 };
 
 void device_queue_manager_init_cik(
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/27] drm/amdkfd: Move non-sdma mqd allocation out of init_mqd
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 10/27] drm/amdkfd: Allocate MQD trunk for HIQ and SDMA Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 12/27] drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk Kuehling, Felix
                     ` (16 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

This is preparation work to introduce more mqd allocation
scheme

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 20 ++++++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 51 ++++++++++++-------
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 18 +++++--
 3 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index eec131b801b0..a00402077e34 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -66,6 +66,19 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
 		m->compute_static_thread_mgmt_se3);
 }
 
+static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
+					struct queue_properties *q)
+{
+	struct kfd_mem_obj *mqd_mem_obj;
+
+	if (kfd_gtt_sa_allocate(kfd, sizeof(struct cik_mqd),
+			&mqd_mem_obj))
+		return NULL;
+
+	return mqd_mem_obj;
+}
+
+
 static int init_mqd(struct mqd_manager *mm, void **mqd,
 		struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
 		struct queue_properties *q)
@@ -73,11 +86,10 @@ static int init_mqd(struct mqd_manager *mm, void **mqd,
 	uint64_t addr;
 	struct cik_mqd *m;
 	int retval;
+	struct kfd_dev *kfd = mm->dev;
 
-	retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct cik_mqd),
-					mqd_mem_obj);
-
-	if (retval != 0)
+	*mqd_mem_obj = allocate_mqd(kfd, q);
+	if (!*mqd_mem_obj)
 		return -ENOMEM;
 
 	m = (struct cik_mqd *) (*mqd_mem_obj)->cpu_ptr;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 15274a880ea2..8f8166189fd5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -67,38 +67,53 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
 		m->compute_static_thread_mgmt_se3);
 }
 
-static int init_mqd(struct mqd_manager *mm, void **mqd,
-			struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
-			struct queue_properties *q)
+static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
+		struct queue_properties *q)
 {
 	int retval;
-	uint64_t addr;
-	struct v9_mqd *m;
-	struct kfd_dev *kfd = mm->dev;
+	struct kfd_mem_obj *mqd_mem_obj = NULL;
 
-	*mqd_mem_obj = NULL;
 	/* From V9,  for CWSR, the control stack is located on the next page
 	 * boundary after the mqd, we will use the gtt allocation function
 	 * instead of sub-allocation function.
 	 */
 	if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
-		*mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);
-		if (!*mqd_mem_obj)
-			return -ENOMEM;
+		mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
+		if (!mqd_mem_obj)
+			return NULL;
 		retval = amdgpu_amdkfd_alloc_gtt_mem(kfd->kgd,
 			ALIGN(q->ctl_stack_size, PAGE_SIZE) +
 				ALIGN(sizeof(struct v9_mqd), PAGE_SIZE),
-			&((*mqd_mem_obj)->gtt_mem),
-			&((*mqd_mem_obj)->gpu_addr),
-			(void *)&((*mqd_mem_obj)->cpu_ptr), true);
-	} else
-		retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
-				mqd_mem_obj);
+			&(mqd_mem_obj->gtt_mem),
+			&(mqd_mem_obj->gpu_addr),
+			(void *)&(mqd_mem_obj->cpu_ptr), true);
+	} else {
+		retval = kfd_gtt_sa_allocate(kfd, sizeof(struct v9_mqd),
+				&mqd_mem_obj);
+	}
+
 	if (retval) {
-		kfree(*mqd_mem_obj);
-		return -ENOMEM;
+		kfree(mqd_mem_obj);
+		return NULL;
 	}
 
+	return mqd_mem_obj;
+
+}
+
+static int init_mqd(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *q)
+{
+	int retval;
+	uint64_t addr;
+	struct v9_mqd *m;
+	struct kfd_dev *kfd = mm->dev;
+
+	*mqd_mem_obj = allocate_mqd(kfd, q);
+	if (!*mqd_mem_obj)
+		return -ENOMEM;
+
 	m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
 	addr = (*mqd_mem_obj)->gpu_addr;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index ad9dc9a678f2..3296ffbde6ac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -68,6 +68,18 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
 		m->compute_static_thread_mgmt_se3);
 }
 
+static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
+					struct queue_properties *q)
+{
+	struct kfd_mem_obj *mqd_mem_obj;
+
+	if (kfd_gtt_sa_allocate(kfd, sizeof(struct vi_mqd),
+			&mqd_mem_obj))
+		return NULL;
+
+	return mqd_mem_obj;
+}
+
 static int init_mqd(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *q)
@@ -75,10 +87,10 @@ static int init_mqd(struct mqd_manager *mm, void **mqd,
 	int retval;
 	uint64_t addr;
 	struct vi_mqd *m;
+	struct kfd_dev *kfd = mm->dev;
 
-	retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct vi_mqd),
-			mqd_mem_obj);
-	if (retval != 0)
+	*mqd_mem_obj = allocate_mqd(kfd, q);
+	if (!*mqd_mem_obj)
 		return -ENOMEM;
 
 	m = (struct vi_mqd *) (*mqd_mem_obj)->cpu_ptr;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/27] drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 11/27] drm/amdkfd: Move non-sdma mqd allocation out of init_mqd Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 13/27] drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue() Kuehling, Felix
                     ` (15 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <ozeng@amd.com>

Instead of allocat hiq and sdma mqd from sub-allocator, allocate
them from a mqd trunk pool. This is done for all asics

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  | 49 +++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |  7 +++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 20 +++-----
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 22 +++------
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 22 +++------
 5 files changed, 80 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index eeb2b60a36b5..9307811bc427 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -23,6 +23,55 @@
 
 #include "kfd_mqd_manager.h"
 #include "amdgpu_amdkfd.h"
+#include "kfd_device_queue_manager.h"
+
+struct kfd_mem_obj *allocate_hiq_mqd(struct kfd_dev *dev)
+{
+	struct kfd_mem_obj *mqd_mem_obj = NULL;
+
+	mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);
+	if (!mqd_mem_obj)
+		return NULL;
+
+	mqd_mem_obj->gtt_mem = dev->dqm->hiq_sdma_mqd.gtt_mem;
+	mqd_mem_obj->gpu_addr = dev->dqm->hiq_sdma_mqd.gpu_addr;
+	mqd_mem_obj->cpu_ptr = dev->dqm->hiq_sdma_mqd.cpu_ptr;
+
+	return mqd_mem_obj;
+}
+
+struct kfd_mem_obj *allocate_sdma_mqd(struct kfd_dev *dev,
+					struct queue_properties *q)
+{
+	struct kfd_mem_obj *mqd_mem_obj = NULL;
+	uint64_t offset;
+
+	mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);
+	if (!mqd_mem_obj)
+		return NULL;
+
+	offset = (q->sdma_engine_id *
+		dev->device_info->num_sdma_queues_per_engine +
+		q->sdma_queue_id) *
+		dev->dqm->mqd_mgrs[KFD_MQD_TYPE_SDMA]->mqd_size;
+
+	offset += dev->dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ]->mqd_size;
+
+	mqd_mem_obj->gtt_mem = (void *)((uint64_t)dev->dqm->hiq_sdma_mqd.gtt_mem
+				+ offset);
+	mqd_mem_obj->gpu_addr = dev->dqm->hiq_sdma_mqd.gpu_addr + offset;
+	mqd_mem_obj->cpu_ptr = (uint32_t *)((uint64_t)
+				dev->dqm->hiq_sdma_mqd.cpu_ptr + offset);
+
+	return mqd_mem_obj;
+}
+
+void uninit_mqd_hiq_sdma(struct mqd_manager *mm, void *mqd,
+			struct kfd_mem_obj *mqd_mem_obj)
+{
+	WARN_ON(!mqd_mem_obj->gtt_mem);
+	kfree(mqd_mem_obj);
+}
 
 void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm,
 		const uint32_t *cu_mask, uint32_t cu_mask_count,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
index 009d232fb60b..56af256a191b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -102,6 +102,13 @@ struct mqd_manager {
 	uint32_t mqd_size;
 };
 
+struct kfd_mem_obj *allocate_hiq_mqd(struct kfd_dev *dev);
+
+struct kfd_mem_obj *allocate_sdma_mqd(struct kfd_dev *dev,
+					struct queue_properties *q);
+void uninit_mqd_hiq_sdma(struct mqd_manager *mm, void *mqd,
+				struct kfd_mem_obj *mqd_mem_obj);
+
 void mqd_symmetrically_map_cu_mask(struct mqd_manager *mm,
 		const uint32_t *cu_mask, uint32_t cu_mask_count,
 		uint32_t *se_mask);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index a00402077e34..6e8509ec29d9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -71,6 +71,9 @@ static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
 {
 	struct kfd_mem_obj *mqd_mem_obj;
 
+	if (q->type == KFD_QUEUE_TYPE_HIQ)
+		return allocate_hiq_mqd(kfd);
+
 	if (kfd_gtt_sa_allocate(kfd, sizeof(struct cik_mqd),
 			&mqd_mem_obj))
 		return NULL;
@@ -148,12 +151,10 @@ static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
 {
 	int retval;
 	struct cik_sdma_rlc_registers *m;
+	struct kfd_dev *dev = mm->dev;
 
-	retval = kfd_gtt_sa_allocate(mm->dev,
-					sizeof(struct cik_sdma_rlc_registers),
-					mqd_mem_obj);
-
-	if (retval != 0)
+	*mqd_mem_obj = allocate_sdma_mqd(dev, q);
+	if (!*mqd_mem_obj)
 		return -ENOMEM;
 
 	m = (struct cik_sdma_rlc_registers *) (*mqd_mem_obj)->cpu_ptr;
@@ -175,11 +176,6 @@ static void uninit_mqd(struct mqd_manager *mm, void *mqd,
 	kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
 }
 
-static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
-				struct kfd_mem_obj *mqd_mem_obj)
-{
-	kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
-}
 
 static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id,
 		    uint32_t queue_id, struct queue_properties *p,
@@ -419,7 +415,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		break;
 	case KFD_MQD_TYPE_HIQ:
 		mqd->init_mqd = init_mqd_hiq;
-		mqd->uninit_mqd = uninit_mqd;
+		mqd->uninit_mqd = uninit_mqd_hiq_sdma;
 		mqd->load_mqd = load_mqd;
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
@@ -443,7 +439,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		break;
 	case KFD_MQD_TYPE_SDMA:
 		mqd->init_mqd = init_mqd_sdma;
-		mqd->uninit_mqd = uninit_mqd_sdma;
+		mqd->uninit_mqd = uninit_mqd_hiq_sdma;
 		mqd->load_mqd = load_mqd_sdma;
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 8f8166189fd5..4750338199b6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -73,6 +73,9 @@ static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
 	int retval;
 	struct kfd_mem_obj *mqd_mem_obj = NULL;
 
+	if (q->type == KFD_QUEUE_TYPE_HIQ)
+		return allocate_hiq_mqd(kfd);
+
 	/* From V9,  for CWSR, the control stack is located on the next page
 	 * boundary after the mqd, we will use the gtt allocation function
 	 * instead of sub-allocation function.
@@ -346,13 +349,10 @@ static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
 {
 	int retval;
 	struct v9_sdma_mqd *m;
+	struct kfd_dev *dev = mm->dev;
 
-
-	retval = kfd_gtt_sa_allocate(mm->dev,
-			sizeof(struct v9_sdma_mqd),
-			mqd_mem_obj);
-
-	if (retval != 0)
+	*mqd_mem_obj = allocate_sdma_mqd(dev, q);
+	if (!*mqd_mem_obj)
 		return -ENOMEM;
 
 	m = (struct v9_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
@@ -368,12 +368,6 @@ static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
 	return retval;
 }
 
-static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
-		struct kfd_mem_obj *mqd_mem_obj)
-{
-	kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
-}
-
 static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
 		uint32_t pipe_id, uint32_t queue_id,
 		struct queue_properties *p, struct mm_struct *mms)
@@ -484,7 +478,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		break;
 	case KFD_MQD_TYPE_HIQ:
 		mqd->init_mqd = init_mqd_hiq;
-		mqd->uninit_mqd = uninit_mqd;
+		mqd->uninit_mqd = uninit_mqd_hiq_sdma;
 		mqd->load_mqd = load_mqd;
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
@@ -508,7 +502,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		break;
 	case KFD_MQD_TYPE_SDMA:
 		mqd->init_mqd = init_mqd_sdma;
-		mqd->uninit_mqd = uninit_mqd_sdma;
+		mqd->uninit_mqd = uninit_mqd_hiq_sdma;
 		mqd->load_mqd = load_mqd_sdma;
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 3296ffbde6ac..18697c2152a8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -73,6 +73,9 @@ static struct kfd_mem_obj *allocate_mqd(struct kfd_dev *kfd,
 {
 	struct kfd_mem_obj *mqd_mem_obj;
 
+	if (q->type == KFD_QUEUE_TYPE_HIQ)
+		return allocate_hiq_mqd(kfd);
+
 	if (kfd_gtt_sa_allocate(kfd, sizeof(struct vi_mqd),
 			&mqd_mem_obj))
 		return NULL;
@@ -341,13 +344,10 @@ static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
 {
 	int retval;
 	struct vi_sdma_mqd *m;
+	struct kfd_dev *dev = mm->dev;
 
-
-	retval = kfd_gtt_sa_allocate(mm->dev,
-			sizeof(struct vi_sdma_mqd),
-			mqd_mem_obj);
-
-	if (retval != 0)
+	*mqd_mem_obj = allocate_sdma_mqd(dev, q);
+	if (!*mqd_mem_obj)
 		return -ENOMEM;
 
 	m = (struct vi_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
@@ -363,12 +363,6 @@ static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
 	return retval;
 }
 
-static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
-		struct kfd_mem_obj *mqd_mem_obj)
-{
-	kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
-}
-
 static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
 		uint32_t pipe_id, uint32_t queue_id,
 		struct queue_properties *p, struct mm_struct *mms)
@@ -478,7 +472,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		break;
 	case KFD_MQD_TYPE_HIQ:
 		mqd->init_mqd = init_mqd_hiq;
-		mqd->uninit_mqd = uninit_mqd;
+		mqd->uninit_mqd = uninit_mqd_hiq_sdma;
 		mqd->load_mqd = load_mqd;
 		mqd->update_mqd = update_mqd_hiq;
 		mqd->destroy_mqd = destroy_mqd;
@@ -502,7 +496,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		break;
 	case KFD_MQD_TYPE_SDMA:
 		mqd->init_mqd = init_mqd_sdma;
-		mqd->uninit_mqd = uninit_mqd_sdma;
+		mqd->uninit_mqd = uninit_mqd_hiq_sdma;
 		mqd->load_mqd = load_mqd_sdma;
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/27] drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue()
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 12/27] drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 14/27] drm/amdkfd: Fix compute profile switching Kuehling, Felix
                     ` (14 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zhao, Yong, Kuehling, Felix

From: Yong Zhao <Yong.Zhao@amd.com>

This avoids duplicated code.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 29 +++++++------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e2de246d681b..38c66b8ffd31 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -883,7 +883,7 @@ static int stop_nocpsch(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-				unsigned int *sdma_id)
+				struct queue *q)
 {
 	int bit;
 
@@ -892,7 +892,14 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 
 	bit = __ffs64(dqm->sdma_bitmap);
 	dqm->sdma_bitmap &= ~(1ULL << bit);
-	*sdma_id = bit;
+	q->sdma_id = bit;
+
+	q->properties.sdma_engine_id = q->sdma_id % get_num_sdma_engines(dqm);
+	q->properties.sdma_queue_id = q->sdma_id / get_num_sdma_engines(dqm);
+
+	pr_debug("SDMA id is:    %d\n", q->sdma_id);
+	pr_debug("SDMA engine id: %d\n", q->properties.sdma_engine_id);
+	pr_debug("SDMA queue id: %d\n", q->properties.sdma_queue_id);
 
 	return 0;
 }
@@ -914,21 +921,14 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 
 	mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_SDMA];
 
-	retval = allocate_sdma_queue(dqm, &q->sdma_id);
+	retval = allocate_sdma_queue(dqm, q);
 	if (retval)
 		return retval;
 
-	q->properties.sdma_queue_id = q->sdma_id / get_num_sdma_engines(dqm);
-	q->properties.sdma_engine_id = q->sdma_id % get_num_sdma_engines(dqm);
-
 	retval = allocate_doorbell(qpd, q);
 	if (retval)
 		goto out_deallocate_sdma_queue;
 
-	pr_debug("SDMA id is:    %d\n", q->sdma_id);
-	pr_debug("SDMA queue id: %d\n", q->properties.sdma_queue_id);
-	pr_debug("SDMA engine id: %d\n", q->properties.sdma_engine_id);
-
 	dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
 	retval = mqd_mgr->init_mqd(mqd_mgr, &q->mqd, &q->mqd_mem_obj,
 				&q->gart_mqd_addr, &q->properties);
@@ -1129,16 +1129,9 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	}
 
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
-		retval = allocate_sdma_queue(dqm, &q->sdma_id);
+		retval = allocate_sdma_queue(dqm, q);
 		if (retval)
 			goto out;
-		q->properties.sdma_queue_id =
-			q->sdma_id / get_num_sdma_engines(dqm);
-		q->properties.sdma_engine_id =
-			q->sdma_id % get_num_sdma_engines(dqm);
-		pr_debug("SDMA id is:    %d\n", q->sdma_id);
-		pr_debug("SDMA queue id: %d\n", q->properties.sdma_queue_id);
-		pr_debug("SDMA engine id: %d\n", q->properties.sdma_engine_id);
 	}
 
 	retval = allocate_doorbell(qpd, q);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/27] drm/amdkfd: Fix compute profile switching
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 13/27] drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue() Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 15/27] drm/amdkfd: Fix sdma queue map issue Kuehling, Felix
                     ` (13 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Huang, JinHuiEric, Kuehling, Felix, Kasiviswanathan, Harish

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Fix compute profile switching on process termination.

Add a dedicated reference counter to keep track of entry/exit to/from
compute profile. This enables switching compute profiles for other
reasons than process creation or termination.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c          | 16 ++++++++++++++++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c    | 11 ++++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h            |  7 +++++++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index c1e4d44d6137..8202a5db3a35 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -462,6 +462,7 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 	kfd->pdev = pdev;
 	kfd->init_complete = false;
 	kfd->kfd2kgd = f2g;
+	atomic_set(&kfd->compute_profile, 0);
 
 	mutex_init(&kfd->doorbell_mutex);
 	memset(&kfd->doorbell_available_index, 0,
@@ -1036,6 +1037,21 @@ void kgd2kfd_set_sram_ecc_flag(struct kfd_dev *kfd)
 		atomic_inc(&kfd->sram_ecc_flag);
 }
 
+void kfd_inc_compute_active(struct kfd_dev *kfd)
+{
+	if (atomic_inc_return(&kfd->compute_profile) == 1)
+		amdgpu_amdkfd_set_compute_idle(kfd->kgd, false);
+}
+
+void kfd_dec_compute_active(struct kfd_dev *kfd)
+{
+	int count = atomic_dec_return(&kfd->compute_profile);
+
+	if (count == 0)
+		amdgpu_amdkfd_set_compute_idle(kfd->kgd, true);
+	WARN_ONCE(count < 0, "Compute profile ref. count error");
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 /* This function will send a package to HIQ to hang the HWS
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 38c66b8ffd31..bac1f36d38a2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -772,8 +772,8 @@ static int register_process(struct device_queue_manager *dqm,
 
 	retval = dqm->asic_ops.update_qpd(dqm, qpd);
 
-	if (dqm->processes_count++ == 0)
-		amdgpu_amdkfd_set_compute_idle(dqm->dev->kgd, false);
+	dqm->processes_count++;
+	kfd_inc_compute_active(dqm->dev);
 
 	dqm_unlock(dqm);
 
@@ -796,9 +796,8 @@ static int unregister_process(struct device_queue_manager *dqm,
 		if (qpd == cur->qpd) {
 			list_del(&cur->list);
 			kfree(cur);
-			if (--dqm->processes_count == 0)
-				amdgpu_amdkfd_set_compute_idle(
-					dqm->dev->kgd, true);
+			dqm->processes_count--;
+			kfd_dec_compute_active(dqm->dev);
 			goto out;
 		}
 	}
@@ -1479,6 +1478,7 @@ static int process_termination_nocpsch(struct device_queue_manager *dqm,
 			list_del(&cur->list);
 			kfree(cur);
 			dqm->processes_count--;
+			kfd_dec_compute_active(dqm->dev);
 			break;
 		}
 	}
@@ -1562,6 +1562,7 @@ static int process_termination_cpsch(struct device_queue_manager *dqm,
 			list_del(&cur->list);
 			kfree(cur);
 			dqm->processes_count--;
+			kfd_dec_compute_active(dqm->dev);
 			break;
 		}
 	}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d1d60336172a..87328c96b0f1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -279,6 +279,9 @@ struct kfd_dev {
 
 	/* SRAM ECC flag */
 	atomic_t sram_ecc_flag;
+
+	/* Compute Profile ref. count */
+	atomic_t compute_profile;
 };
 
 enum kfd_mempool {
@@ -977,6 +980,10 @@ int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, struct kfd_process *p);
 
 bool kfd_is_locked(void);
 
+/* Compute profile */
+void kfd_inc_compute_active(struct kfd_dev *dev);
+void kfd_dec_compute_active(struct kfd_dev *dev);
+
 /* Debugfs */
 #if defined(CONFIG_DEBUG_FS)
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/27] drm/amdkfd: Fix sdma queue map issue
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 14/27] drm/amdkfd: Fix compute profile switching Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 16/27] drm/amdkfd: Introduce XGMI SDMA queue type Kuehling, Felix
                     ` (12 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <Oak.Zeng@amd.com>

Previous codes assumes there are two sdma engines.
This is not true e.g., Raven only has 1 SDMA engine.
Fix the issue by using sdma engine number info in
device_info.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 +++++++++++--------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index bac1f36d38a2..d41045d3fc3a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1212,12 +1212,17 @@ int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
 	return 0;
 }
 
-static int unmap_sdma_queues(struct device_queue_manager *dqm,
-				unsigned int sdma_engine)
+static int unmap_sdma_queues(struct device_queue_manager *dqm)
 {
-	return pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_SDMA,
-			KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, false,
-			sdma_engine);
+	int i, retval = 0;
+
+	for (i = 0; i < dqm->dev->device_info->num_sdma_engines; i++) {
+		retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_SDMA,
+			KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, false, i);
+		if (retval)
+			return retval;
+	}
+	return retval;
 }
 
 /* dqm->lock mutex has to be locked before calling this function */
@@ -1256,10 +1261,8 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
 	pr_debug("Before destroying queues, sdma queue count is : %u\n",
 		dqm->sdma_queue_count);
 
-	if (dqm->sdma_queue_count > 0) {
-		unmap_sdma_queues(dqm, 0);
-		unmap_sdma_queues(dqm, 1);
-	}
+	if (dqm->sdma_queue_count > 0)
+		unmap_sdma_queues(dqm);
 
 	retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_COMPUTE,
 			filter, filter_param, false, 0);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/27] drm/amdkfd: Introduce XGMI SDMA queue type
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 15/27] drm/amdkfd: Fix sdma queue map issue Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 17/27] drm/amdkfd: Expose sdma engine numbers to topology Kuehling, Felix
                     ` (11 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <Oak.Zeng@amd.com>

Existing QUEUE_TYPE_SDMA means PCIe optimized SDMA queues.
Introduce a new QUEUE_TYPE_SDMA_XGMI, which is optimized
for non-PCIe transfer such as XGMI.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  15 +++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 123 +++++++++++++-----
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   3 +
 .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |   2 +
 .../gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |   2 +
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   4 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |  10 +-
 include/uapi/linux/kfd_ioctl.h                |   7 +-
 10 files changed, 132 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d795e5018270..c731126ada22 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -213,6 +213,8 @@ static int set_queue_properties_from_user(struct queue_properties *q_properties,
 		q_properties->type = KFD_QUEUE_TYPE_COMPUTE;
 	else if (args->queue_type == KFD_IOC_QUEUE_TYPE_SDMA)
 		q_properties->type = KFD_QUEUE_TYPE_SDMA;
+	else if (args->queue_type == KFD_IOC_QUEUE_TYPE_SDMA_XGMI)
+		q_properties->type = KFD_QUEUE_TYPE_SDMA_XGMI;
 	else
 		return -ENOTSUPP;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 8202a5db3a35..1368b41cb92b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -54,6 +54,7 @@ static const struct kfd_device_info kaveri_device_info = {
 	.needs_iommu_device = true,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -71,6 +72,7 @@ static const struct kfd_device_info carrizo_device_info = {
 	.needs_iommu_device = true,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -87,6 +89,7 @@ static const struct kfd_device_info raven_device_info = {
 	.needs_iommu_device = true,
 	.needs_pci_atomics = true,
 	.num_sdma_engines = 1,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 #endif
@@ -105,6 +108,7 @@ static const struct kfd_device_info hawaii_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -121,6 +125,7 @@ static const struct kfd_device_info tonga_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -137,6 +142,7 @@ static const struct kfd_device_info fiji_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -153,6 +159,7 @@ static const struct kfd_device_info fiji_vf_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -170,6 +177,7 @@ static const struct kfd_device_info polaris10_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -186,6 +194,7 @@ static const struct kfd_device_info polaris10_vf_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -202,6 +211,7 @@ static const struct kfd_device_info polaris11_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -218,6 +228,7 @@ static const struct kfd_device_info polaris12_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -234,6 +245,7 @@ static const struct kfd_device_info vega10_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -250,6 +262,7 @@ static const struct kfd_device_info vega10_vf_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -266,6 +279,7 @@ static const struct kfd_device_info vega12_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 2,
 };
 
@@ -282,6 +296,7 @@ static const struct kfd_device_info vega20_device_info = {
 	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
 	.num_sdma_queues_per_engine = 8,
 };
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index d41045d3fc3a..1562590d837e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -60,14 +60,14 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd);
 
 static void deallocate_sdma_queue(struct device_queue_manager *dqm,
-				unsigned int sdma_queue_id);
+				struct queue *q);
 
 static void kfd_process_hw_exception(struct work_struct *work);
 
 static inline
 enum KFD_MQD_TYPE get_mqd_type_from_queue_type(enum kfd_queue_type type)
 {
-	if (type == KFD_QUEUE_TYPE_SDMA)
+	if (type == KFD_QUEUE_TYPE_SDMA || type == KFD_QUEUE_TYPE_SDMA_XGMI)
 		return KFD_MQD_TYPE_SDMA;
 	return KFD_MQD_TYPE_CP;
 }
@@ -107,12 +107,23 @@ static unsigned int get_num_sdma_engines(struct device_queue_manager *dqm)
 	return dqm->dev->device_info->num_sdma_engines;
 }
 
+static unsigned int get_num_xgmi_sdma_engines(struct device_queue_manager *dqm)
+{
+	return dqm->dev->device_info->num_xgmi_sdma_engines;
+}
+
 unsigned int get_num_sdma_queues(struct device_queue_manager *dqm)
 {
 	return dqm->dev->device_info->num_sdma_engines
 			* dqm->dev->device_info->num_sdma_queues_per_engine;
 }
 
+unsigned int get_num_xgmi_sdma_queues(struct device_queue_manager *dqm)
+{
+	return dqm->dev->device_info->num_xgmi_sdma_engines
+			* dqm->dev->device_info->num_sdma_queues_per_engine;
+}
+
 void program_sh_mem_settings(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
@@ -133,7 +144,8 @@ static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
 		 * preserve the user mode ABI.
 		 */
 		q->doorbell_id = q->properties.queue_id;
-	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+			q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
 		/* For SDMA queues on SOC15 with 8-byte doorbell, use static
 		 * doorbell assignments based on the engine and queue id.
 		 * The doobell index distance between RLC (2*i) and (2*i+1)
@@ -174,7 +186,8 @@ static void deallocate_doorbell(struct qcm_process_device *qpd,
 	struct kfd_dev *dev = qpd->dqm->dev;
 
 	if (!KFD_IS_SOC15(dev->device_info->asic_family) ||
-	    q->properties.type == KFD_QUEUE_TYPE_SDMA)
+	    q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+	    q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
 		return;
 
 	old = test_and_clear_bit(q->doorbell_id, qpd->doorbell_bitmap);
@@ -289,7 +302,8 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 
 	if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE)
 		retval = create_compute_queue_nocpsch(dqm, q, qpd);
-	else if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
+	else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+			q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
 		retval = create_sdma_queue_nocpsch(dqm, q, qpd);
 	else
 		retval = -EINVAL;
@@ -307,6 +321,8 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
 		dqm->sdma_queue_count++;
+	else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+		dqm->xgmi_sdma_queue_count++;
 
 	/*
 	 * Unconditionally increment this counter, regardless of the queue's
@@ -430,7 +446,10 @@ static int destroy_queue_nocpsch_locked(struct device_queue_manager *dqm,
 		deallocate_hqd(dqm, q);
 	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
 		dqm->sdma_queue_count--;
-		deallocate_sdma_queue(dqm, q->sdma_id);
+		deallocate_sdma_queue(dqm, q);
+	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
+		dqm->xgmi_sdma_queue_count--;
+		deallocate_sdma_queue(dqm, q);
 	} else {
 		pr_debug("q->properties.type %d is invalid\n",
 				q->properties.type);
@@ -521,7 +540,8 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 		}
 	} else if (prev_active &&
 		   (q->properties.type == KFD_QUEUE_TYPE_COMPUTE ||
-		    q->properties.type == KFD_QUEUE_TYPE_SDMA)) {
+		    q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+		    q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)) {
 		retval = mqd_mgr->destroy_mqd(mqd_mgr, q->mqd,
 				KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN,
 				KFD_UNMAP_LATENCY_MS, q->pipe, q->queue);
@@ -548,7 +568,8 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 		retval = map_queues_cpsch(dqm);
 	else if (q->properties.is_active &&
 		 (q->properties.type == KFD_QUEUE_TYPE_COMPUTE ||
-		  q->properties.type == KFD_QUEUE_TYPE_SDMA)) {
+		  q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+		  q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)) {
 		if (WARN(q->process->mm != current->mm,
 			 "should only run in user thread"))
 			retval = -EFAULT;
@@ -840,6 +861,7 @@ static int initialize_nocpsch(struct device_queue_manager *dqm)
 	INIT_LIST_HEAD(&dqm->queues);
 	dqm->queue_count = dqm->next_pipe_to_allocate = 0;
 	dqm->sdma_queue_count = 0;
+	dqm->xgmi_sdma_queue_count = 0;
 
 	for (pipe = 0; pipe < get_pipes_per_mec(dqm); pipe++) {
 		int pipe_offset = pipe * get_queues_per_pipe(dqm);
@@ -852,6 +874,7 @@ static int initialize_nocpsch(struct device_queue_manager *dqm)
 
 	dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1;
 	dqm->sdma_bitmap = (1ULL << get_num_sdma_queues(dqm)) - 1;
+	dqm->xgmi_sdma_bitmap = (1ULL << get_num_xgmi_sdma_queues(dqm)) - 1;
 
 	return 0;
 }
@@ -886,17 +909,34 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 {
 	int bit;
 
-	if (dqm->sdma_bitmap == 0)
-		return -ENOMEM;
-
-	bit = __ffs64(dqm->sdma_bitmap);
-	dqm->sdma_bitmap &= ~(1ULL << bit);
-	q->sdma_id = bit;
-
-	q->properties.sdma_engine_id = q->sdma_id % get_num_sdma_engines(dqm);
-	q->properties.sdma_queue_id = q->sdma_id / get_num_sdma_engines(dqm);
+	if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+		if (dqm->sdma_bitmap == 0)
+			return -ENOMEM;
+		bit = __ffs64(dqm->sdma_bitmap);
+		dqm->sdma_bitmap &= ~(1ULL << bit);
+		q->sdma_id = bit;
+		q->properties.sdma_engine_id = q->sdma_id %
+				get_num_sdma_engines(dqm);
+		q->properties.sdma_queue_id = q->sdma_id /
+				get_num_sdma_engines(dqm);
+	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
+		if (dqm->xgmi_sdma_bitmap == 0)
+			return -ENOMEM;
+		bit = __ffs64(dqm->xgmi_sdma_bitmap);
+		dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
+		q->sdma_id = bit;
+		/* sdma_engine_id is sdma id including
+		 * both PCIe-optimized SDMAs and XGMI-
+		 * optimized SDMAs. The calculation below
+		 * assumes the first N engines are always
+		 * PCIe-optimized ones
+		 */
+		q->properties.sdma_engine_id = get_num_sdma_engines(dqm) +
+				q->sdma_id % get_num_xgmi_sdma_engines(dqm);
+		q->properties.sdma_queue_id = q->sdma_id /
+				get_num_xgmi_sdma_engines(dqm);
+	}
 
-	pr_debug("SDMA id is:    %d\n", q->sdma_id);
 	pr_debug("SDMA engine id: %d\n", q->properties.sdma_engine_id);
 	pr_debug("SDMA queue id: %d\n", q->properties.sdma_queue_id);
 
@@ -904,11 +944,17 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 }
 
 static void deallocate_sdma_queue(struct device_queue_manager *dqm,
-				unsigned int sdma_id)
+				struct queue *q)
 {
-	if (sdma_id >= get_num_sdma_queues(dqm))
-		return;
-	dqm->sdma_bitmap |= (1ULL << sdma_id);
+	if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+		if (q->sdma_id >= get_num_sdma_queues(dqm))
+			return;
+		dqm->sdma_bitmap |= (1ULL << q->sdma_id);
+	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
+		if (q->sdma_id >= get_num_xgmi_sdma_queues(dqm))
+			return;
+		dqm->xgmi_sdma_bitmap |= (1ULL << q->sdma_id);
+	}
 }
 
 static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
@@ -946,7 +992,7 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 out_deallocate_doorbell:
 	deallocate_doorbell(qpd, q);
 out_deallocate_sdma_queue:
-	deallocate_sdma_queue(dqm, q->sdma_id);
+	deallocate_sdma_queue(dqm, q);
 
 	return retval;
 }
@@ -1004,8 +1050,10 @@ static int initialize_cpsch(struct device_queue_manager *dqm)
 	INIT_LIST_HEAD(&dqm->queues);
 	dqm->queue_count = dqm->processes_count = 0;
 	dqm->sdma_queue_count = 0;
+	dqm->xgmi_sdma_queue_count = 0;
 	dqm->active_runlist = false;
 	dqm->sdma_bitmap = (1ULL << get_num_sdma_queues(dqm)) - 1;
+	dqm->xgmi_sdma_bitmap = (1ULL << get_num_xgmi_sdma_queues(dqm)) - 1;
 
 	INIT_WORK(&dqm->hw_exception_work, kfd_process_hw_exception);
 
@@ -1127,7 +1175,8 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		goto out;
 	}
 
-	if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
 		retval = allocate_sdma_queue(dqm, q);
 		if (retval)
 			goto out;
@@ -1167,6 +1216,8 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
 		dqm->sdma_queue_count++;
+	else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+		dqm->xgmi_sdma_queue_count++;
 	/*
 	 * Unconditionally increment this counter, regardless of the queue's
 	 * type or whether the queue is active.
@@ -1182,8 +1233,9 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 out_deallocate_doorbell:
 	deallocate_doorbell(qpd, q);
 out_deallocate_sdma_queue:
-	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
-		deallocate_sdma_queue(dqm, q->sdma_id);
+	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
+		deallocate_sdma_queue(dqm, q);
 out:
 	return retval;
 }
@@ -1216,7 +1268,8 @@ static int unmap_sdma_queues(struct device_queue_manager *dqm)
 {
 	int i, retval = 0;
 
-	for (i = 0; i < dqm->dev->device_info->num_sdma_engines; i++) {
+	for (i = 0; i < dqm->dev->device_info->num_sdma_engines +
+		dqm->dev->device_info->num_xgmi_sdma_engines; i++) {
 		retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_SDMA,
 			KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, false, i);
 		if (retval)
@@ -1258,10 +1311,10 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
 	if (!dqm->active_runlist)
 		return retval;
 
-	pr_debug("Before destroying queues, sdma queue count is : %u\n",
-		dqm->sdma_queue_count);
+	pr_debug("Before destroying queues, sdma queue count is : %u, xgmi sdma queue count is : %u\n",
+		dqm->sdma_queue_count, dqm->xgmi_sdma_queue_count);
 
-	if (dqm->sdma_queue_count > 0)
+	if (dqm->sdma_queue_count > 0 || dqm->xgmi_sdma_queue_count)
 		unmap_sdma_queues(dqm);
 
 	retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_COMPUTE,
@@ -1333,7 +1386,10 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
 
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
 		dqm->sdma_queue_count--;
-		deallocate_sdma_queue(dqm, q->sdma_id);
+		deallocate_sdma_queue(dqm, q);
+	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
+		dqm->xgmi_sdma_queue_count--;
+		deallocate_sdma_queue(dqm, q);
 	}
 
 	list_del(&q->list);
@@ -1550,7 +1606,10 @@ static int process_termination_cpsch(struct device_queue_manager *dqm,
 	list_for_each_entry(q, &qpd->queues_list, list) {
 		if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
 			dqm->sdma_queue_count--;
-			deallocate_sdma_queue(dqm, q->sdma_id);
+			deallocate_sdma_queue(dqm, q);
+		} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
+			dqm->xgmi_sdma_queue_count--;
+			deallocate_sdma_queue(dqm, q);
 		}
 
 		if (q->properties.is_active)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 3742fd340ec3..88b4c007696e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -181,10 +181,12 @@ struct device_queue_manager {
 	unsigned int		processes_count;
 	unsigned int		queue_count;
 	unsigned int		sdma_queue_count;
+	unsigned int		xgmi_sdma_queue_count;
 	unsigned int		total_queue_count;
 	unsigned int		next_pipe_to_allocate;
 	unsigned int		*allocated_queues;
 	uint64_t		sdma_bitmap;
+	uint64_t		xgmi_sdma_bitmap;
 	unsigned int		vmid_bitmap;
 	uint64_t		pipelines_addr;
 	struct kfd_mem_obj	*pipeline_mem;
@@ -216,6 +218,7 @@ unsigned int get_queues_num(struct device_queue_manager *dqm);
 unsigned int get_queues_per_pipe(struct device_queue_manager *dqm);
 unsigned int get_pipes_per_mec(struct device_queue_manager *dqm);
 unsigned int get_num_sdma_queues(struct device_queue_manager *dqm);
+unsigned int get_num_xgmi_sdma_queues(struct device_queue_manager *dqm);
 
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
index 33830b1a5a54..604570bea6bd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
@@ -175,6 +175,7 @@ static int pm_map_queues_v9(struct packet_manager *pm, uint32_t *buffer,
 			queue_type__mes_map_queues__debug_interface_queue_vi;
 		break;
 	case KFD_QUEUE_TYPE_SDMA:
+	case KFD_QUEUE_TYPE_SDMA_XGMI:
 		packet->bitfields2.engine_sel = q->properties.sdma_engine_id +
 				engine_sel__mes_map_queues__sdma0_vi;
 		use_static = false; /* no static queues under SDMA */
@@ -221,6 +222,7 @@ static int pm_unmap_queues_v9(struct packet_manager *pm, uint32_t *buffer,
 			engine_sel__mes_unmap_queues__compute;
 		break;
 	case KFD_QUEUE_TYPE_SDMA:
+	case KFD_QUEUE_TYPE_SDMA_XGMI:
 		packet->bitfields2.engine_sel =
 			engine_sel__mes_unmap_queues__sdma0 + sdma_engine;
 		break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index bf20c6d32ef3..3cdb19826927 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -212,6 +212,7 @@ static int pm_map_queues_vi(struct packet_manager *pm, uint32_t *buffer,
 			queue_type__mes_map_queues__debug_interface_queue_vi;
 		break;
 	case KFD_QUEUE_TYPE_SDMA:
+	case KFD_QUEUE_TYPE_SDMA_XGMI:
 		packet->bitfields2.engine_sel = q->properties.sdma_engine_id +
 				engine_sel__mes_map_queues__sdma0_vi;
 		use_static = false; /* no static queues under SDMA */
@@ -258,6 +259,7 @@ static int pm_unmap_queues_vi(struct packet_manager *pm, uint32_t *buffer,
 			engine_sel__mes_unmap_queues__compute;
 		break;
 	case KFD_QUEUE_TYPE_SDMA:
+	case KFD_QUEUE_TYPE_SDMA_XGMI:
 		packet->bitfields2.engine_sel =
 			engine_sel__mes_unmap_queues__sdma0 + sdma_engine;
 		break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 045a229436a0..077c47fd4fee 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -48,7 +48,8 @@ static void pm_calc_rlib_size(struct packet_manager *pm,
 
 	process_count = pm->dqm->processes_count;
 	queue_count = pm->dqm->queue_count;
-	compute_queue_count = queue_count - pm->dqm->sdma_queue_count;
+	compute_queue_count = queue_count - pm->dqm->sdma_queue_count -
+				pm->dqm->xgmi_sdma_queue_count;
 
 	/* check if there is over subscription
 	 * Note: the arbitration between the number of VMIDs and
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 87328c96b0f1..c8925b7b6c46 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -188,6 +188,7 @@ struct kfd_device_info {
 	bool needs_iommu_device;
 	bool needs_pci_atomics;
 	unsigned int num_sdma_engines;
+	unsigned int num_xgmi_sdma_engines;
 	unsigned int num_sdma_queues_per_engine;
 };
 
@@ -329,7 +330,8 @@ enum kfd_queue_type  {
 	KFD_QUEUE_TYPE_COMPUTE,
 	KFD_QUEUE_TYPE_SDMA,
 	KFD_QUEUE_TYPE_HIQ,
-	KFD_QUEUE_TYPE_DIQ
+	KFD_QUEUE_TYPE_DIQ,
+	KFD_QUEUE_TYPE_SDMA_XGMI
 };
 
 enum kfd_queue_format {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index f18d9cdf9aac..e652e25ede75 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -186,8 +186,13 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 
 	switch (type) {
 	case KFD_QUEUE_TYPE_SDMA:
-		if (dev->dqm->queue_count >= get_num_sdma_queues(dev->dqm)) {
-			pr_err("Over-subscription is not allowed for SDMA.\n");
+	case KFD_QUEUE_TYPE_SDMA_XGMI:
+		if ((type == KFD_QUEUE_TYPE_SDMA && dev->dqm->sdma_queue_count
+			>= get_num_sdma_queues(dev->dqm)) ||
+			(type == KFD_QUEUE_TYPE_SDMA_XGMI &&
+			dev->dqm->xgmi_sdma_queue_count
+			>= get_num_xgmi_sdma_queues(dev->dqm))) {
+			pr_debug("Over-subscription is not allowed for SDMA.\n");
 			retval = -EPERM;
 			goto err_create_queue;
 		}
@@ -446,6 +451,7 @@ int pqm_debugfs_mqds(struct seq_file *m, void *data)
 			q = pqn->q;
 			switch (q->properties.type) {
 			case KFD_QUEUE_TYPE_SDMA:
+			case KFD_QUEUE_TYPE_SDMA_XGMI:
 				seq_printf(m, "  SDMA queue on device %x\n",
 					   q->device->id);
 				mqd_type = KFD_MQD_TYPE_SDMA;
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 1e7d5f3376b0..20917c59f39c 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -35,9 +35,10 @@ struct kfd_ioctl_get_version_args {
 };
 
 /* For kfd_ioctl_create_queue_args.queue_type. */
-#define KFD_IOC_QUEUE_TYPE_COMPUTE	0
-#define KFD_IOC_QUEUE_TYPE_SDMA		1
-#define KFD_IOC_QUEUE_TYPE_COMPUTE_AQL	2
+#define KFD_IOC_QUEUE_TYPE_COMPUTE		0x0
+#define KFD_IOC_QUEUE_TYPE_SDMA			0x1
+#define KFD_IOC_QUEUE_TYPE_COMPUTE_AQL		0x2
+#define KFD_IOC_QUEUE_TYPE_SDMA_XGMI		0x3
 
 #define KFD_MAX_QUEUE_PERCENTAGE	100
 #define KFD_MAX_QUEUE_PRIORITY		15
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/27] drm/amdkfd: Expose sdma engine numbers to topology
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 16/27] drm/amdkfd: Introduce XGMI SDMA queue type Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 18/27] drm/amdkfd: Delete alloc_format field from map_queue struct Kuehling, Felix
                     ` (10 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <Oak.Zeng@amd.com>

Expose available numbers of both SDMA queue types in the topology.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 7 +++++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 2cb09e088dce..e536f4b6698f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -476,6 +476,10 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 			dev->node_props.drm_render_minor);
 	sysfs_show_64bit_prop(buffer, "hive_id",
 			dev->node_props.hive_id);
+	sysfs_show_32bit_prop(buffer, "num_sdma_engines",
+			dev->node_props.num_sdma_engines);
+	sysfs_show_32bit_prop(buffer, "num_sdma_xgmi_engines",
+			dev->node_props.num_sdma_xgmi_engines);
 
 	if (dev->gpu) {
 		log_max_watch_addr =
@@ -1282,6 +1286,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		gpu->shared_resources.drm_render_minor;
 
 	dev->node_props.hive_id = gpu->hive_id;
+	dev->node_props.num_sdma_engines = gpu->device_info->num_sdma_engines;
+	dev->node_props.num_sdma_xgmi_engines =
+				gpu->device_info->num_xgmi_sdma_engines;
 
 	kfd_fill_mem_clk_max_info(dev);
 	kfd_fill_iolink_non_crat_info(dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 84710cfd23c2..949e885dfb53 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -78,6 +78,8 @@ struct kfd_node_properties {
 	uint32_t max_engine_clk_fcompute;
 	uint32_t max_engine_clk_ccompute;
 	int32_t  drm_render_minor;
+	uint32_t num_sdma_engines;
+	uint32_t num_sdma_xgmi_engines;
 	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
 };
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 19/27] drm/amdkfd: Fix a circular lock dependency
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (17 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 18/27] drm/amdkfd: Delete alloc_format field from map_queue struct Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 20/27] drm/amdkfd: Fix gfx8 MEM_VIOL exception handler Kuehling, Felix
                     ` (8 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix

Fix a circular lock dependency exposed under userptr memory pressure.
The DQM lock is the only one taken inside the MMU notifier. We need
to make sure that no reclaim is done under this lock, and that
no other locks are taken under which reclaim is possible.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 33 ++++++++++++++++---
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 1562590d837e..0bfdb141b6e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -794,10 +794,14 @@ static int register_process(struct device_queue_manager *dqm,
 	retval = dqm->asic_ops.update_qpd(dqm, qpd);
 
 	dqm->processes_count++;
-	kfd_inc_compute_active(dqm->dev);
 
 	dqm_unlock(dqm);
 
+	/* Outside the DQM lock because under the DQM lock we can't do
+	 * reclaim or take other locks that others hold while reclaiming.
+	 */
+	kfd_inc_compute_active(dqm->dev);
+
 	return retval;
 }
 
@@ -818,7 +822,6 @@ static int unregister_process(struct device_queue_manager *dqm,
 			list_del(&cur->list);
 			kfree(cur);
 			dqm->processes_count--;
-			kfd_dec_compute_active(dqm->dev);
 			goto out;
 		}
 	}
@@ -826,6 +829,13 @@ static int unregister_process(struct device_queue_manager *dqm,
 	retval = 1;
 out:
 	dqm_unlock(dqm);
+
+	/* Outside the DQM lock because under the DQM lock we can't do
+	 * reclaim or take other locks that others hold while reclaiming.
+	 */
+	if (!retval)
+		kfd_dec_compute_active(dqm->dev);
+
 	return retval;
 }
 
@@ -1519,6 +1529,7 @@ static int process_termination_nocpsch(struct device_queue_manager *dqm,
 	struct queue *q, *next;
 	struct device_process_node *cur, *next_dpn;
 	int retval = 0;
+	bool found = false;
 
 	dqm_lock(dqm);
 
@@ -1537,12 +1548,19 @@ static int process_termination_nocpsch(struct device_queue_manager *dqm,
 			list_del(&cur->list);
 			kfree(cur);
 			dqm->processes_count--;
-			kfd_dec_compute_active(dqm->dev);
+			found = true;
 			break;
 		}
 	}
 
 	dqm_unlock(dqm);
+
+	/* Outside the DQM lock because under the DQM lock we can't do
+	 * reclaim or take other locks that others hold while reclaiming.
+	 */
+	if (found)
+		kfd_dec_compute_active(dqm->dev);
+
 	return retval;
 }
 
@@ -1588,6 +1606,7 @@ static int process_termination_cpsch(struct device_queue_manager *dqm,
 	struct device_process_node *cur, *next_dpn;
 	enum kfd_unmap_queues_filter filter =
 		KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES;
+	bool found = false;
 
 	retval = 0;
 
@@ -1624,7 +1643,7 @@ static int process_termination_cpsch(struct device_queue_manager *dqm,
 			list_del(&cur->list);
 			kfree(cur);
 			dqm->processes_count--;
-			kfd_dec_compute_active(dqm->dev);
+			found = true;
 			break;
 		}
 	}
@@ -1638,6 +1657,12 @@ static int process_termination_cpsch(struct device_queue_manager *dqm,
 
 	dqm_unlock(dqm);
 
+	/* Outside the DQM lock because under the DQM lock we can't do
+	 * reclaim or take other locks that others hold while reclaiming.
+	 */
+	if (found)
+		kfd_dec_compute_active(dqm->dev);
+
 	/* Lastly, free mqd resources.
 	 * Do uninit_mqd() after dqm_unlock to avoid circular locking.
 	 */
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 18/27] drm/amdkfd: Delete alloc_format field from map_queue struct
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 17/27] drm/amdkfd: Expose sdma engine numbers to topology Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 19/27] drm/amdkfd: Fix a circular lock dependency Kuehling, Felix
                     ` (9 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Zeng, Oak, Kuehling, Felix

From: Oak Zeng <Oak.Zeng@amd.com>

Alloc format was never really supported by MEC FW. FW always
does one per pipe allocation.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 2 --
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 2 --
 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h  | 7 +------
 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h  | 7 +------
 4 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
index 604570bea6bd..3dd731c69b5d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
@@ -153,8 +153,6 @@ static int pm_map_queues_v9(struct packet_manager *pm, uint32_t *buffer,
 
 	packet->header.u32All = pm_build_pm4_header(IT_MAP_QUEUES,
 					sizeof(struct pm4_mes_map_queues));
-	packet->bitfields2.alloc_format =
-		alloc_format__mes_map_queues__one_per_pipe_vi;
 	packet->bitfields2.num_queues = 1;
 	packet->bitfields2.queue_sel =
 		queue_sel__mes_map_queues__map_to_hws_determined_queue_slots_vi;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index 3cdb19826927..2adaf40027eb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -190,8 +190,6 @@ static int pm_map_queues_vi(struct packet_manager *pm, uint32_t *buffer,
 
 	packet->header.u32All = pm_build_pm4_header(IT_MAP_QUEUES,
 					sizeof(struct pm4_mes_map_queues));
-	packet->bitfields2.alloc_format =
-		alloc_format__mes_map_queues__one_per_pipe_vi;
 	packet->bitfields2.num_queues = 1;
 	packet->bitfields2.queue_sel =
 		queue_sel__mes_map_queues__map_to_hws_determined_queue_slots_vi;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
index f2bcf5c092ea..0661339071f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
@@ -255,11 +255,6 @@ enum mes_map_queues_queue_type_enum {
 queue_type__mes_map_queues__low_latency_static_queue_vi = 3
 };
 
-enum mes_map_queues_alloc_format_enum {
-	alloc_format__mes_map_queues__one_per_pipe_vi = 0,
-alloc_format__mes_map_queues__all_on_one_pipe_vi = 1
-};
-
 enum mes_map_queues_engine_sel_enum {
 	engine_sel__mes_map_queues__compute_vi = 0,
 	engine_sel__mes_map_queues__sdma0_vi = 2,
@@ -279,7 +274,7 @@ struct pm4_mes_map_queues {
 			enum mes_map_queues_queue_sel_enum queue_sel:2;
 			uint32_t reserved2:15;
 			enum mes_map_queues_queue_type_enum queue_type:3;
-			enum mes_map_queues_alloc_format_enum alloc_format:2;
+			uint32_t reserved3:2;
 			enum mes_map_queues_engine_sel_enum engine_sel:3;
 			uint32_t num_queues:3;
 		} bitfields2;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h
index 7c8d9b357749..5466cfe1c3cc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h
@@ -216,11 +216,6 @@ enum mes_map_queues_queue_type_vi_enum {
 queue_type__mes_map_queues__low_latency_static_queue_vi = 3
 };
 
-enum mes_map_queues_alloc_format_vi_enum {
-	alloc_format__mes_map_queues__one_per_pipe_vi = 0,
-alloc_format__mes_map_queues__all_on_one_pipe_vi = 1
-};
-
 enum mes_map_queues_engine_sel_vi_enum {
 	engine_sel__mes_map_queues__compute_vi = 0,
 	engine_sel__mes_map_queues__sdma0_vi = 2,
@@ -240,7 +235,7 @@ struct pm4_mes_map_queues {
 			enum mes_map_queues_queue_sel_vi_enum queue_sel:2;
 			uint32_t reserved2:15;
 			enum mes_map_queues_queue_type_vi_enum queue_type:3;
-			enum mes_map_queues_alloc_format_vi_enum alloc_format:2;
+			uint32_t reserved3:2;
 			enum mes_map_queues_engine_sel_vi_enum engine_sel:3;
 			uint32_t num_queues:3;
 		} bitfields2;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 20/27] drm/amdkfd: Fix gfx8 MEM_VIOL exception handler
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (18 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 19/27] drm/amdkfd: Fix a circular lock dependency Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 21/27] drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL Kuehling, Felix
                     ` (7 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Cornwall, Jay

From: Jay Cornwall <Jay.Cornwall@amd.com>

When MEM_VIOL is asserted the context save handler rewinds the
program counter. This is incorrect for any source of the exception.
MEM_VIOL may be raised in normal operation by out-of-bounds access
to LDS or GDS and does not require special handling.

Remove PC adjustment when MEM_VIOL has been raised.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h      |  9 ++-------
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm   | 13 -------------
 2 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 3621efbd5759..ec9a9a99f808 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -21,7 +21,7 @@
  */
 
 static const uint32_t cwsr_trap_gfx8_hex[] = {
-	0xbf820001, 0xbf82012b,
+	0xbf820001, 0xbf820121,
 	0xb8f4f802, 0x89748674,
 	0xb8f5f803, 0x8675ff75,
 	0x00000400, 0xbf850017,
@@ -36,12 +36,7 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 	0x8671ff71, 0x0000ffff,
 	0x8f728374, 0xb972e0c2,
 	0xbf800002, 0xb9740002,
-	0xbe801f70, 0xb8f5f803,
-	0x8675ff75, 0x00000100,
-	0xbf840006, 0xbefa0080,
-	0xb97a0203, 0x8671ff71,
-	0x0000ffff, 0x80f08870,
-	0x82f18071, 0xbefa0080,
+	0xbe801f70, 0xbefa0080,
 	0xb97a0283, 0xbef60068,
 	0xbef70069, 0xb8fa1c07,
 	0x8e7a9c7a, 0x87717a71,
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm
index abe1a5da29fb..a47f5b933120 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm
@@ -282,19 +282,6 @@ if G8SR_DEBUG_TIMESTAMP
         s_waitcnt lgkmcnt(0)         //FIXME, will cause xnack??
 end
 
-    //check whether there is mem_viol
-    s_getreg_b32    s_save_trapsts, hwreg(HW_REG_TRAPSTS)
-    s_and_b32   s_save_trapsts, s_save_trapsts, SQ_WAVE_TRAPSTS_MEM_VIOL_MASK
-    s_cbranch_scc0  L_NO_PC_REWIND
-
-    //if so, need rewind PC assuming GDS operation gets NACKed
-    s_mov_b32       s_save_tmp, 0                                                           //clear mem_viol bit
-    s_setreg_b32    hwreg(HW_REG_TRAPSTS, SQ_WAVE_TRAPSTS_MEM_VIOL_SHIFT, 1), s_save_tmp    //clear mem_viol bit
-    s_and_b32       s_save_pc_hi, s_save_pc_hi, 0x0000ffff    //pc[47:32]
-    s_sub_u32       s_save_pc_lo, s_save_pc_lo, 8             //pc[31:0]-8
-    s_subb_u32      s_save_pc_hi, s_save_pc_hi, 0x0           // -scc
-
-L_NO_PC_REWIND:
     s_mov_b32       s_save_tmp, 0                                                           //clear saveCtx bit
     s_setreg_b32    hwreg(HW_REG_TRAPSTS, SQ_WAVE_TRAPSTS_SAVECTX_SHIFT, 1), s_save_tmp     //clear saveCtx bit
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 22/27] drm/amdkfd: Fix gfx9 XNACK state save/restore
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (20 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 21/27] drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 23/27] drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15] Kuehling, Felix
                     ` (5 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Cornwall, Jay

From: Jay Cornwall <Jay.Cornwall@amd.com>

SQ_WAVE_IB_STS.RCNT grew from 4 bits to 5 in gfx9. Do not truncate
when saving in the high bits of TTMP1.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h       | 12 ++++++------
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm    |  8 ++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 097da0dd3b04..eed845b4e9a7 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -310,8 +310,8 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 	0xbe801f6c, 0x866dff6d,
 	0x0000ffff, 0xbef00080,
 	0xb9700283, 0xb8f02407,
-	0x8e709c70, 0x876d706d,
-	0xb8f003c7, 0x8e709b70,
+	0x8e709b70, 0x876d706d,
+	0xb8f003c7, 0x8e709a70,
 	0x876d706d, 0xb8f0f807,
 	0x8670ff70, 0x00007fff,
 	0xb970f807, 0xbeee007e,
@@ -549,11 +549,11 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 	0x00000048, 0xc0031e77,
 	0x00000058, 0xc0071eb7,
 	0x0000005c, 0xbf8cc07f,
-	0x866fff6d, 0xf0000000,
-	0x8f6f9c6f, 0x8e6f906f,
+	0x866fff6d, 0xf8000000,
+	0x8f6f9b6f, 0x8e6f906f,
 	0xbeee0080, 0x876e6f6e,
-	0x866fff6d, 0x08000000,
-	0x8f6f9b6f, 0x8e6f8f6f,
+	0x866fff6d, 0x04000000,
+	0x8f6f9a6f, 0x8e6f8f6f,
 	0x876e6f6e, 0x866fff70,
 	0x00800000, 0x8f6f976f,
 	0xb96ef807, 0x866dff6d,
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index 6a010c9e55de..e1ac34517642 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -150,10 +150,10 @@ var S_SAVE_SPI_INIT_MTYPE_SHIFT		=   28
 var S_SAVE_SPI_INIT_FIRST_WAVE_MASK	=   0x04000000		//bit[26]: FirstWaveInTG
 var S_SAVE_SPI_INIT_FIRST_WAVE_SHIFT	=   26
 
-var S_SAVE_PC_HI_RCNT_SHIFT		=   28			//FIXME	 check with Brian to ensure all fields other than PC[47:0] can be used
-var S_SAVE_PC_HI_RCNT_MASK		=   0xF0000000		//FIXME
-var S_SAVE_PC_HI_FIRST_REPLAY_SHIFT	=   27			//FIXME
-var S_SAVE_PC_HI_FIRST_REPLAY_MASK	=   0x08000000		//FIXME
+var S_SAVE_PC_HI_RCNT_SHIFT		=   27			//FIXME	 check with Brian to ensure all fields other than PC[47:0] can be used
+var S_SAVE_PC_HI_RCNT_MASK		=   0xF8000000		//FIXME
+var S_SAVE_PC_HI_FIRST_REPLAY_SHIFT	=   26			//FIXME
+var S_SAVE_PC_HI_FIRST_REPLAY_MASK	=   0x04000000		//FIXME
 
 var s_save_spi_init_lo		    =	exec_lo
 var s_save_spi_init_hi		    =	exec_hi
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 21/27] drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (19 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 20/27] drm/amdkfd: Fix gfx8 MEM_VIOL exception handler Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 22/27] drm/amdkfd: Fix gfx9 XNACK state save/restore Kuehling, Felix
                     ` (6 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Cornwall, Jay

From: Jay Cornwall <Jay.Cornwall@amd.com>

If instruction fetch fails the wave cannot be halted and returned to
the shader without raising MEM_VIOL again. Currently the wave is
terminated if this occurs, but this loses information about the cause
of the fault. The debugger would prefer the faulting wave state to be
context-saved.

Poll inside the trap handler until TRAPSTS.SAVECTX indicates context
save is ready. Exit the poll loop and complete the remainder of the
exception handler, then return to the shader. The next instruction
fetch will be from the trap handler and not the faulting PC. Context
save will then deschedule the wave and save its state.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h        | 10 ++++++----
 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 10 ++++++++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index ec9a9a99f808..097da0dd3b04 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,15 +274,17 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-	0xbf820001, 0xbf82015d,
+	0xbf820001, 0xbf820161,
 	0xb8f8f802, 0x89788678,
 	0xb8f1f803, 0x866eff71,
-	0x00000400, 0xbf850037,
+	0x00000400, 0xbf85003b,
 	0x866eff71, 0x00000800,
 	0xbf850003, 0x866eff71,
-	0x00000100, 0xbf840008,
+	0x00000100, 0xbf84000c,
 	0x866eff78, 0x00002000,
-	0xbf840001, 0xbf810000,
+	0xbf840005, 0xbf8e0010,
+	0xb8eef803, 0x866eff6e,
+	0x00000400, 0xbf84fffb,
 	0x8778ff78, 0x00002000,
 	0x80ec886c, 0x82ed806d,
 	0xb8eef807, 0x866fff6e,
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index 0bb9c577b3a2..6a010c9e55de 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -266,10 +266,16 @@ if (!EMU_RUN_HACK)
 
 L_HALT_WAVE:
     // If STATUS.HALT is set then this fault must come from SQC instruction fetch.
-    // We cannot prevent further faults so just terminate the wavefront.
+    // We cannot prevent further faults. Spin wait until context saved.
     s_and_b32       ttmp2, s_save_status, SQ_WAVE_STATUS_HALT_MASK
     s_cbranch_scc0  L_NOT_ALREADY_HALTED
-    s_endpgm
+
+L_WAIT_CTX_SAVE:
+    s_sleep         0x10
+    s_getreg_b32    ttmp2, hwreg(HW_REG_TRAPSTS)
+    s_and_b32       ttmp2, ttmp2, SQ_WAVE_TRAPSTS_SAVECTX_MASK
+    s_cbranch_scc0  L_WAIT_CTX_SAVE
+
 L_NOT_ALREADY_HALTED:
     s_or_b32        s_save_status, s_save_status, SQ_WAVE_STATUS_HALT_MASK
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 23/27] drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (21 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 22/27] drm/amdkfd: Fix gfx9 XNACK state save/restore Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 24/27] drm/amdkfd: Add VegaM support Kuehling, Felix
                     ` (4 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Cornwall, Jay

From: Jay Cornwall <Jay.Cornwall@amd.com>

ttmp[4:5] is initialized by the SPI with SPI_GDBG_TRAP_DATA* values.
These values are more useful to the debugger than ttmp[14:15], which
carries dispatch_scratch_base*. There are too few registers to
preserve both.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h    | 466 +++++++++---------
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  45 +-
 2 files changed, 253 insertions(+), 258 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index eed845b4e9a7..e413d4a71fa3 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,12 +274,12 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-	0xbf820001, 0xbf820161,
+	0xbf820001, 0xbf82015e,
 	0xb8f8f802, 0x89788678,
-	0xb8f1f803, 0x866eff71,
+	0xb8fbf803, 0x866eff7b,
 	0x00000400, 0xbf85003b,
-	0x866eff71, 0x00000800,
-	0xbf850003, 0x866eff71,
+	0x866eff7b, 0x00000800,
+	0xbf850003, 0x866eff7b,
 	0x00000100, 0xbf84000c,
 	0x866eff78, 0x00002000,
 	0xbf840005, 0xbf8e0010,
@@ -292,13 +292,13 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 	0x8977ff77, 0xfc000000,
 	0x87776f77, 0x896eff6e,
 	0x001f8000, 0xb96ef807,
-	0xb8f0f812, 0xb8f1f813,
-	0x8ef08870, 0xc0071bb8,
+	0xb8faf812, 0xb8fbf813,
+	0x8efa887a, 0xc0071bbd,
 	0x00000000, 0xbf8cc07f,
-	0xc0071c38, 0x00000008,
+	0xc0071ebd, 0x00000008,
 	0xbf8cc07f, 0x86ee6e6e,
 	0xbf840001, 0xbe801d6e,
-	0xb8f1f803, 0x8671ff71,
+	0xb8fbf803, 0x867bff7b,
 	0x000001ff, 0xbf850002,
 	0x806c846c, 0x826d806d,
 	0x866dff6d, 0x0000ffff,
@@ -308,258 +308,256 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 	0x8f6e8378, 0xb96ee0c2,
 	0xbf800002, 0xb9780002,
 	0xbe801f6c, 0x866dff6d,
-	0x0000ffff, 0xbef00080,
-	0xb9700283, 0xb8f02407,
-	0x8e709b70, 0x876d706d,
-	0xb8f003c7, 0x8e709a70,
-	0x876d706d, 0xb8f0f807,
-	0x8670ff70, 0x00007fff,
-	0xb970f807, 0xbeee007e,
+	0x0000ffff, 0xbefa0080,
+	0xb97a0283, 0xb8fa2407,
+	0x8e7a9b7a, 0x876d7a6d,
+	0xb8fa03c7, 0x8e7a9a7a,
+	0x876d7a6d, 0xb8faf807,
+	0x867aff7a, 0x00007fff,
+	0xb97af807, 0xbeee007e,
 	0xbeef007f, 0xbefe0180,
-	0xbf900004, 0x87708478,
-	0xb970f802, 0xbf8e0002,
-	0xbf88fffe, 0xb8f02a05,
+	0xbf900004, 0x877a8478,
+	0xb97af802, 0xbf8e0002,
+	0xbf88fffe, 0xb8fa2a05,
+	0x807a817a, 0x8e7a8a7a,
+	0xb8fb1605, 0x807b817b,
+	0x8e7b867b, 0x807a7b7a,
+	0x807a7e7a, 0x827b807f,
+	0x867bff7b, 0x0000ffff,
+	0xc04b1c3d, 0x00000050,
+	0xbf8cc07f, 0xc04b1d3d,
+	0x00000060, 0xbf8cc07f,
+	0xc0431e7d, 0x00000074,
+	0xbf8cc07f, 0xbef4007e,
+	0x8675ff7f, 0x0000ffff,
+	0x8775ff75, 0x00040000,
+	0xbef60080, 0xbef700ff,
+	0x00807fac, 0x867aff7f,
+	0x08000000, 0x8f7a837a,
+	0x87777a77, 0x867aff7f,
+	0x70000000, 0x8f7a817a,
+	0x87777a77, 0xbef1007c,
+	0xbef00080, 0xb8f02a05,
 	0x80708170, 0x8e708a70,
-	0xb8f11605, 0x80718171,
-	0x8e718671, 0x80707170,
-	0x80707e70, 0x8271807f,
-	0x8671ff71, 0x0000ffff,
-	0xc0471cb8, 0x00000040,
-	0xbf8cc07f, 0xc04b1d38,
-	0x00000048, 0xbf8cc07f,
-	0xc0431e78, 0x00000058,
-	0xbf8cc07f, 0xc0471eb8,
-	0x0000005c, 0xbf8cc07f,
-	0xbef4007e, 0x8675ff7f,
-	0x0000ffff, 0x8775ff75,
-	0x00040000, 0xbef60080,
-	0xbef700ff, 0x00807fac,
-	0x8670ff7f, 0x08000000,
-	0x8f708370, 0x87777077,
-	0x8670ff7f, 0x70000000,
-	0x8f708170, 0x87777077,
-	0xbefb007c, 0xbefa0080,
-	0xb8fa2a05, 0x807a817a,
-	0x8e7a8a7a, 0xb8f01605,
-	0x80708170, 0x8e708670,
-	0x807a707a, 0xbef60084,
-	0xbef600ff, 0x01000000,
-	0xbefe007c, 0xbefc007a,
-	0xc0611efa, 0x0000007c,
-	0xbf8cc07f, 0x807a847a,
-	0xbefc007e, 0xbefe007c,
-	0xbefc007a, 0xc0611b3a,
+	0xb8fa1605, 0x807a817a,
+	0x8e7a867a, 0x80707a70,
+	0xbef60084, 0xbef600ff,
+	0x01000000, 0xbefe007c,
+	0xbefc0070, 0xc0611c7a,
 	0x0000007c, 0xbf8cc07f,
-	0x807a847a, 0xbefc007e,
-	0xbefe007c, 0xbefc007a,
-	0xc0611b7a, 0x0000007c,
-	0xbf8cc07f, 0x807a847a,
+	0x80708470, 0xbefc007e,
+	0xbefe007c, 0xbefc0070,
+	0xc0611b3a, 0x0000007c,
+	0xbf8cc07f, 0x80708470,
 	0xbefc007e, 0xbefe007c,
-	0xbefc007a, 0xc0611bba,
+	0xbefc0070, 0xc0611b7a,
 	0x0000007c, 0xbf8cc07f,
-	0x807a847a, 0xbefc007e,
-	0xbefe007c, 0xbefc007a,
-	0xc0611bfa, 0x0000007c,
-	0xbf8cc07f, 0x807a847a,
+	0x80708470, 0xbefc007e,
+	0xbefe007c, 0xbefc0070,
+	0xc0611bba, 0x0000007c,
+	0xbf8cc07f, 0x80708470,
 	0xbefc007e, 0xbefe007c,
-	0xbefc007a, 0xc0611e3a,
+	0xbefc0070, 0xc0611bfa,
 	0x0000007c, 0xbf8cc07f,
-	0x807a847a, 0xbefc007e,
-	0xb8f1f803, 0xbefe007c,
-	0xbefc007a, 0xc0611c7a,
-	0x0000007c, 0xbf8cc07f,
-	0x807a847a, 0xbefc007e,
-	0xbefe007c, 0xbefc007a,
-	0xc0611a3a, 0x0000007c,
-	0xbf8cc07f, 0x807a847a,
+	0x80708470, 0xbefc007e,
+	0xbefe007c, 0xbefc0070,
+	0xc0611e3a, 0x0000007c,
+	0xbf8cc07f, 0x80708470,
+	0xbefc007e, 0xb8fbf803,
+	0xbefe007c, 0xbefc0070,
+	0xc0611efa, 0x0000007c,
+	0xbf8cc07f, 0x80708470,
 	0xbefc007e, 0xbefe007c,
-	0xbefc007a, 0xc0611a7a,
-	0x0000007c, 0xbf8cc07f,
-	0x807a847a, 0xbefc007e,
-	0xb8fbf801, 0xbefe007c,
-	0xbefc007a, 0xc0611efa,
+	0xbefc0070, 0xc0611a3a,
 	0x0000007c, 0xbf8cc07f,
-	0x807a847a, 0xbefc007e,
-	0x8670ff7f, 0x04000000,
-	0xbeef0080, 0x876f6f70,
-	0xb8fa2a05, 0x807a817a,
-	0x8e7a8a7a, 0xb8f11605,
-	0x80718171, 0x8e718471,
-	0x8e768271, 0xbef600ff,
-	0x01000000, 0xbef20174,
-	0x80747a74, 0x82758075,
-	0xbefc0080, 0xbf800000,
-	0xbe802b00, 0xbe822b02,
-	0xbe842b04, 0xbe862b06,
-	0xbe882b08, 0xbe8a2b0a,
-	0xbe8c2b0c, 0xbe8e2b0e,
-	0xc06b003a, 0x00000000,
-	0xbf8cc07f, 0xc06b013a,
-	0x00000010, 0xbf8cc07f,
-	0xc06b023a, 0x00000020,
-	0xbf8cc07f, 0xc06b033a,
-	0x00000030, 0xbf8cc07f,
-	0x8074c074, 0x82758075,
-	0x807c907c, 0xbf0a717c,
-	0xbf85ffe7, 0xbef40172,
-	0xbefa0080, 0xbefe00c1,
-	0xbeff00c1, 0xbee80080,
-	0xbee90080, 0xbef600ff,
-	0x01000000, 0xe0724000,
-	0x7a1d0000, 0xe0724100,
-	0x7a1d0100, 0xe0724200,
-	0x7a1d0200, 0xe0724300,
-	0x7a1d0300, 0xbefe00c1,
-	0xbeff00c1, 0xb8f14306,
-	0x8671c171, 0xbf84002c,
-	0xbf8a0000, 0x8670ff6f,
-	0x04000000, 0xbf840028,
-	0x8e718671, 0x8e718271,
-	0xbef60071, 0xb8fa2a05,
-	0x807a817a, 0x8e7a8a7a,
-	0xb8f01605, 0x80708170,
-	0x8e708670, 0x807a707a,
-	0x807aff7a, 0x00000080,
+	0x80708470, 0xbefc007e,
+	0xbefe007c, 0xbefc0070,
+	0xc0611a7a, 0x0000007c,
+	0xbf8cc07f, 0x80708470,
+	0xbefc007e, 0xb8f1f801,
+	0xbefe007c, 0xbefc0070,
+	0xc0611c7a, 0x0000007c,
+	0xbf8cc07f, 0x80708470,
+	0xbefc007e, 0x867aff7f,
+	0x04000000, 0xbeef0080,
+	0x876f6f7a, 0xb8f02a05,
+	0x80708170, 0x8e708a70,
+	0xb8fb1605, 0x807b817b,
+	0x8e7b847b, 0x8e76827b,
 	0xbef600ff, 0x01000000,
-	0xbefc0080, 0xd28c0002,
-	0x000100c1, 0xd28d0003,
-	0x000204c1, 0xd1060002,
-	0x00011103, 0x7e0602ff,
-	0x00000200, 0xbefc00ff,
-	0x00010000, 0xbe800077,
-	0x8677ff77, 0xff7fffff,
-	0x8777ff77, 0x00058000,
-	0xd8ec0000, 0x00000002,
-	0xbf8cc07f, 0xe0765000,
-	0x7a1d0002, 0x68040702,
-	0xd0c9006a, 0x0000e302,
-	0xbf87fff7, 0xbef70000,
-	0xbefa00ff, 0x00000400,
+	0xbef20174, 0x80747074,
+	0x82758075, 0xbefc0080,
+	0xbf800000, 0xbe802b00,
+	0xbe822b02, 0xbe842b04,
+	0xbe862b06, 0xbe882b08,
+	0xbe8a2b0a, 0xbe8c2b0c,
+	0xbe8e2b0e, 0xc06b003a,
+	0x00000000, 0xbf8cc07f,
+	0xc06b013a, 0x00000010,
+	0xbf8cc07f, 0xc06b023a,
+	0x00000020, 0xbf8cc07f,
+	0xc06b033a, 0x00000030,
+	0xbf8cc07f, 0x8074c074,
+	0x82758075, 0x807c907c,
+	0xbf0a7b7c, 0xbf85ffe7,
+	0xbef40172, 0xbef00080,
 	0xbefe00c1, 0xbeff00c1,
-	0xb8f12a05, 0x80718171,
-	0x8e718271, 0x8e768871,
+	0xbee80080, 0xbee90080,
 	0xbef600ff, 0x01000000,
-	0xbefc0084, 0xbf0a717c,
-	0xbf840015, 0xbf11017c,
-	0x8071ff71, 0x00001000,
-	0x7e000300, 0x7e020301,
-	0x7e040302, 0x7e060303,
-	0xe0724000, 0x7a1d0000,
-	0xe0724100, 0x7a1d0100,
-	0xe0724200, 0x7a1d0200,
-	0xe0724300, 0x7a1d0300,
-	0x807c847c, 0x807aff7a,
-	0x00000400, 0xbf0a717c,
-	0xbf85ffef, 0xbf9c0000,
-	0xbf8200dc, 0xbef4007e,
-	0x8675ff7f, 0x0000ffff,
-	0x8775ff75, 0x00040000,
-	0xbef60080, 0xbef700ff,
-	0x00807fac, 0x866eff7f,
-	0x08000000, 0x8f6e836e,
-	0x87776e77, 0x866eff7f,
-	0x70000000, 0x8f6e816e,
-	0x87776e77, 0x866eff7f,
-	0x04000000, 0xbf84001e,
+	0xe0724000, 0x701d0000,
+	0xe0724100, 0x701d0100,
+	0xe0724200, 0x701d0200,
+	0xe0724300, 0x701d0300,
 	0xbefe00c1, 0xbeff00c1,
-	0xb8ef4306, 0x866fc16f,
-	0xbf840019, 0x8e6f866f,
-	0x8e6f826f, 0xbef6006f,
-	0xb8f82a05, 0x80788178,
-	0x8e788a78, 0xb8ee1605,
-	0x806e816e, 0x8e6e866e,
-	0x80786e78, 0x8078ff78,
+	0xb8fb4306, 0x867bc17b,
+	0xbf84002c, 0xbf8a0000,
+	0x867aff6f, 0x04000000,
+	0xbf840028, 0x8e7b867b,
+	0x8e7b827b, 0xbef6007b,
+	0xb8f02a05, 0x80708170,
+	0x8e708a70, 0xb8fa1605,
+	0x807a817a, 0x8e7a867a,
+	0x80707a70, 0x8070ff70,
 	0x00000080, 0xbef600ff,
 	0x01000000, 0xbefc0080,
-	0xe0510000, 0x781d0000,
-	0xe0510100, 0x781d0000,
-	0x807cff7c, 0x00000200,
-	0x8078ff78, 0x00000200,
-	0xbf0a6f7c, 0xbf85fff6,
-	0xbef80080, 0xbefe00c1,
-	0xbeff00c1, 0xb8ef2a05,
-	0x806f816f, 0x8e6f826f,
-	0x8e76886f, 0xbef600ff,
-	0x01000000, 0xbeee0078,
-	0x8078ff78, 0x00000400,
-	0xbefc0084, 0xbf11087c,
-	0x806fff6f, 0x00008000,
-	0xe0524000, 0x781d0000,
-	0xe0524100, 0x781d0100,
-	0xe0524200, 0x781d0200,
-	0xe0524300, 0x781d0300,
-	0xbf8c0f70, 0x7e000300,
+	0xd28c0002, 0x000100c1,
+	0xd28d0003, 0x000204c1,
+	0xd1060002, 0x00011103,
+	0x7e0602ff, 0x00000200,
+	0xbefc00ff, 0x00010000,
+	0xbe800077, 0x8677ff77,
+	0xff7fffff, 0x8777ff77,
+	0x00058000, 0xd8ec0000,
+	0x00000002, 0xbf8cc07f,
+	0xe0765000, 0x701d0002,
+	0x68040702, 0xd0c9006a,
+	0x0000f702, 0xbf87fff7,
+	0xbef70000, 0xbef000ff,
+	0x00000400, 0xbefe00c1,
+	0xbeff00c1, 0xb8fb2a05,
+	0x807b817b, 0x8e7b827b,
+	0x8e76887b, 0xbef600ff,
+	0x01000000, 0xbefc0084,
+	0xbf0a7b7c, 0xbf840015,
+	0xbf11017c, 0x807bff7b,
+	0x00001000, 0x7e000300,
 	0x7e020301, 0x7e040302,
-	0x7e060303, 0x807c847c,
-	0x8078ff78, 0x00000400,
-	0xbf0a6f7c, 0xbf85ffee,
-	0xbf9c0000, 0xe0524000,
-	0x6e1d0000, 0xe0524100,
-	0x6e1d0100, 0xe0524200,
-	0x6e1d0200, 0xe0524300,
-	0x6e1d0300, 0xb8f82a05,
+	0x7e060303, 0xe0724000,
+	0x701d0000, 0xe0724100,
+	0x701d0100, 0xe0724200,
+	0x701d0200, 0xe0724300,
+	0x701d0300, 0x807c847c,
+	0x8070ff70, 0x00000400,
+	0xbf0a7b7c, 0xbf85ffef,
+	0xbf9c0000, 0xbf8200da,
+	0xbef4007e, 0x8675ff7f,
+	0x0000ffff, 0x8775ff75,
+	0x00040000, 0xbef60080,
+	0xbef700ff, 0x00807fac,
+	0x866eff7f, 0x08000000,
+	0x8f6e836e, 0x87776e77,
+	0x866eff7f, 0x70000000,
+	0x8f6e816e, 0x87776e77,
+	0x866eff7f, 0x04000000,
+	0xbf84001e, 0xbefe00c1,
+	0xbeff00c1, 0xb8ef4306,
+	0x866fc16f, 0xbf840019,
+	0x8e6f866f, 0x8e6f826f,
+	0xbef6006f, 0xb8f82a05,
 	0x80788178, 0x8e788a78,
 	0xb8ee1605, 0x806e816e,
 	0x8e6e866e, 0x80786e78,
-	0x80f8c078, 0xb8ef1605,
-	0x806f816f, 0x8e6f846f,
-	0x8e76826f, 0xbef600ff,
-	0x01000000, 0xbefc006f,
-	0xc031003a, 0x00000078,
-	0x80f8c078, 0xbf8cc07f,
-	0x80fc907c, 0xbf800000,
-	0xbe802d00, 0xbe822d02,
-	0xbe842d04, 0xbe862d06,
-	0xbe882d08, 0xbe8a2d0a,
-	0xbe8c2d0c, 0xbe8e2d0e,
-	0xbf06807c, 0xbf84fff0,
+	0x8078ff78, 0x00000080,
+	0xbef600ff, 0x01000000,
+	0xbefc0080, 0xe0510000,
+	0x781d0000, 0xe0510100,
+	0x781d0000, 0x807cff7c,
+	0x00000200, 0x8078ff78,
+	0x00000200, 0xbf0a6f7c,
+	0xbf85fff6, 0xbef80080,
+	0xbefe00c1, 0xbeff00c1,
+	0xb8ef2a05, 0x806f816f,
+	0x8e6f826f, 0x8e76886f,
+	0xbef600ff, 0x01000000,
+	0xbeee0078, 0x8078ff78,
+	0x00000400, 0xbefc0084,
+	0xbf11087c, 0x806fff6f,
+	0x00008000, 0xe0524000,
+	0x781d0000, 0xe0524100,
+	0x781d0100, 0xe0524200,
+	0x781d0200, 0xe0524300,
+	0x781d0300, 0xbf8c0f70,
+	0x7e000300, 0x7e020301,
+	0x7e040302, 0x7e060303,
+	0x807c847c, 0x8078ff78,
+	0x00000400, 0xbf0a6f7c,
+	0xbf85ffee, 0xbf9c0000,
+	0xe0524000, 0x6e1d0000,
+	0xe0524100, 0x6e1d0100,
+	0xe0524200, 0x6e1d0200,
+	0xe0524300, 0x6e1d0300,
 	0xb8f82a05, 0x80788178,
 	0x8e788a78, 0xb8ee1605,
 	0x806e816e, 0x8e6e866e,
-	0x80786e78, 0xbef60084,
+	0x80786e78, 0x80f8c078,
+	0xb8ef1605, 0x806f816f,
+	0x8e6f846f, 0x8e76826f,
 	0xbef600ff, 0x01000000,
-	0xc0211bfa, 0x00000078,
-	0x80788478, 0xc0211b3a,
+	0xbefc006f, 0xc031003a,
+	0x00000078, 0x80f8c078,
+	0xbf8cc07f, 0x80fc907c,
+	0xbf800000, 0xbe802d00,
+	0xbe822d02, 0xbe842d04,
+	0xbe862d06, 0xbe882d08,
+	0xbe8a2d0a, 0xbe8c2d0c,
+	0xbe8e2d0e, 0xbf06807c,
+	0xbf84fff0, 0xb8f82a05,
+	0x80788178, 0x8e788a78,
+	0xb8ee1605, 0x806e816e,
+	0x8e6e866e, 0x80786e78,
+	0xbef60084, 0xbef600ff,
+	0x01000000, 0xc0211bfa,
 	0x00000078, 0x80788478,
-	0xc0211b7a, 0x00000078,
-	0x80788478, 0xc0211eba,
+	0xc0211b3a, 0x00000078,
+	0x80788478, 0xc0211b7a,
 	0x00000078, 0x80788478,
-	0xc0211efa, 0x00000078,
-	0x80788478, 0xc0211c3a,
+	0xc0211c3a, 0x00000078,
+	0x80788478, 0xc0211c7a,
 	0x00000078, 0x80788478,
-	0xc0211c7a, 0x00000078,
-	0x80788478, 0xc0211a3a,
+	0xc0211eba, 0x00000078,
+	0x80788478, 0xc0211efa,
 	0x00000078, 0x80788478,
-	0xc0211a7a, 0x00000078,
-	0x80788478, 0xc0211cfa,
+	0xc0211a3a, 0x00000078,
+	0x80788478, 0xc0211a7a,
 	0x00000078, 0x80788478,
-	0xbf8cc07f, 0xbefc006f,
-	0xbefe007a, 0xbeff007b,
-	0x866f71ff, 0x000003ff,
-	0xb96f4803, 0x866f71ff,
-	0xfffff800, 0x8f6f8b6f,
-	0xb96fa2c3, 0xb973f801,
-	0xb8ee2a05, 0x806e816e,
-	0x8e6e8a6e, 0xb8ef1605,
-	0x806f816f, 0x8e6f866f,
-	0x806e6f6e, 0x806e746e,
-	0x826f8075, 0x866fff6f,
-	0x0000ffff, 0xc0071cb7,
-	0x00000040, 0xc00b1d37,
-	0x00000048, 0xc0031e77,
-	0x00000058, 0xc0071eb7,
-	0x0000005c, 0xbf8cc07f,
-	0x866fff6d, 0xf8000000,
-	0x8f6f9b6f, 0x8e6f906f,
-	0xbeee0080, 0x876e6f6e,
-	0x866fff6d, 0x04000000,
-	0x8f6f9a6f, 0x8e6f8f6f,
-	0x876e6f6e, 0x866fff70,
-	0x00800000, 0x8f6f976f,
-	0xb96ef807, 0x866dff6d,
-	0x0000ffff, 0x86fe7e7e,
-	0x86ea6a6a, 0x8f6e8370,
-	0xb96ee0c2, 0xbf800002,
-	0xb9700002, 0xbf8a0000,
-	0x95806f6c, 0xbf810000,
+	0xc0211cfa, 0x00000078,
+	0x80788478, 0xbf8cc07f,
+	0xbefc006f, 0xbefe0070,
+	0xbeff0071, 0x866f7bff,
+	0x000003ff, 0xb96f4803,
+	0x866f7bff, 0xfffff800,
+	0x8f6f8b6f, 0xb96fa2c3,
+	0xb973f801, 0xb8ee2a05,
+	0x806e816e, 0x8e6e8a6e,
+	0xb8ef1605, 0x806f816f,
+	0x8e6f866f, 0x806e6f6e,
+	0x806e746e, 0x826f8075,
+	0x866fff6f, 0x0000ffff,
+	0xc00b1c37, 0x00000050,
+	0xc00b1d37, 0x00000060,
+	0xc0031e77, 0x00000074,
+	0xbf8cc07f, 0x866fff6d,
+	0xf8000000, 0x8f6f9b6f,
+	0x8e6f906f, 0xbeee0080,
+	0x876e6f6e, 0x866fff6d,
+	0x04000000, 0x8f6f9a6f,
+	0x8e6f8f6f, 0x876e6f6e,
+	0x866fff7a, 0x00800000,
+	0x8f6f976f, 0xb96ef807,
+	0x866dff6d, 0x0000ffff,
+	0x86fe7e7e, 0x86ea6a6a,
+	0x8f6e837a, 0xb96ee0c2,
+	0xbf800002, 0xb97a0002,
+	0xbf8a0000, 0x95806f6c,
+	0xbf810000, 0x00000000,
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
index e1ac34517642..6bae2e022c6e 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -162,8 +162,8 @@ var s_save_pc_lo	    =	ttmp0		//{TTMP1, TTMP0} = {3'h0,pc_rewind[3:0], HT[0],tra
 var s_save_pc_hi	    =	ttmp1
 var s_save_exec_lo	    =	ttmp2
 var s_save_exec_hi	    =	ttmp3
-var s_save_tmp		    =	ttmp4
-var s_save_trapsts	    =	ttmp5		//not really used until the end of the SAVE routine
+var s_save_tmp		    =	ttmp14
+var s_save_trapsts	    =	ttmp15		//not really used until the end of the SAVE routine
 var s_save_xnack_mask_lo    =	ttmp6
 var s_save_xnack_mask_hi    =	ttmp7
 var s_save_buf_rsrc0	    =	ttmp8
@@ -171,9 +171,9 @@ var s_save_buf_rsrc1	    =	ttmp9
 var s_save_buf_rsrc2	    =	ttmp10
 var s_save_buf_rsrc3	    =	ttmp11
 var s_save_status	    =	ttmp12
-var s_save_mem_offset	    =	ttmp14
+var s_save_mem_offset	    =	ttmp4
 var s_save_alloc_size	    =	s_save_trapsts		//conflict
-var s_save_m0		    =	ttmp15
+var s_save_m0		    =	ttmp5
 var s_save_ttmps_lo	    =	s_save_tmp		//no conflict
 var s_save_ttmps_hi	    =	s_save_trapsts		//no conflict
 
@@ -207,10 +207,10 @@ var s_restore_mode	    =	ttmp7
 
 var s_restore_pc_lo	    =	ttmp0
 var s_restore_pc_hi	    =	ttmp1
-var s_restore_exec_lo	    =	ttmp14
-var s_restore_exec_hi	    = 	ttmp15
-var s_restore_status	    =	ttmp4
-var s_restore_trapsts	    =	ttmp5
+var s_restore_exec_lo	    =	ttmp4
+var s_restore_exec_hi	    = 	ttmp5
+var s_restore_status	    =	ttmp14
+var s_restore_trapsts	    =	ttmp15
 var s_restore_xnack_mask_lo =	xnack_mask_lo
 var s_restore_xnack_mask_hi =	xnack_mask_hi
 var s_restore_buf_rsrc0	    =	ttmp8
@@ -299,12 +299,12 @@ L_FETCH_2ND_TRAP:
     // Read second-level TBA/TMA from first-level TMA and jump if available.
     // ttmp[2:5] and ttmp12 can be used (others hold SPI-initialized debug data)
     // ttmp12 holds SQ_WAVE_STATUS
-    s_getreg_b32    ttmp4, hwreg(HW_REG_SQ_SHADER_TMA_LO)
-    s_getreg_b32    ttmp5, hwreg(HW_REG_SQ_SHADER_TMA_HI)
-    s_lshl_b64      [ttmp4, ttmp5], [ttmp4, ttmp5], 0x8
-    s_load_dwordx2  [ttmp2, ttmp3], [ttmp4, ttmp5], 0x0 glc:1 // second-level TBA
+    s_getreg_b32    ttmp14, hwreg(HW_REG_SQ_SHADER_TMA_LO)
+    s_getreg_b32    ttmp15, hwreg(HW_REG_SQ_SHADER_TMA_HI)
+    s_lshl_b64      [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8
+    s_load_dwordx2  [ttmp2, ttmp3], [ttmp14, ttmp15], 0x0 glc:1 // second-level TBA
     s_waitcnt       lgkmcnt(0)
-    s_load_dwordx2  [ttmp4, ttmp5], [ttmp4, ttmp5], 0x8 glc:1 // second-level TMA
+    s_load_dwordx2  [ttmp14, ttmp15], [ttmp14, ttmp15], 0x8 glc:1 // second-level TMA
     s_waitcnt       lgkmcnt(0)
     s_and_b64       [ttmp2, ttmp3], [ttmp2, ttmp3], [ttmp2, ttmp3]
     s_cbranch_scc0  L_NO_NEXT_TRAP // second-level trap handler not been set
@@ -411,7 +411,7 @@ end
     else
     end
 
-    // Save trap temporaries 6-11, 13-15 initialized by SPI debug dispatch logic
+    // Save trap temporaries 4-11, 13 initialized by SPI debug dispatch logic
     // ttmp SR memory offset : size(VGPR)+size(SGPR)+0x40
     get_vgpr_size_bytes(s_save_ttmps_lo)
     get_sgpr_size_bytes(s_save_ttmps_hi)
@@ -419,13 +419,11 @@ end
     s_add_u32	    s_save_ttmps_lo, s_save_ttmps_lo, s_save_spi_init_lo
     s_addc_u32	    s_save_ttmps_hi, s_save_spi_init_hi, 0x0
     s_and_b32	    s_save_ttmps_hi, s_save_ttmps_hi, 0xFFFF
-    s_store_dwordx2 [ttmp6, ttmp7], [s_save_ttmps_lo, s_save_ttmps_hi], 0x40 glc:1
+    s_store_dwordx4 [ttmp4, ttmp5, ttmp6, ttmp7], [s_save_ttmps_lo, s_save_ttmps_hi], 0x50 glc:1
     ack_sqc_store_workaround()
-    s_store_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_save_ttmps_lo, s_save_ttmps_hi], 0x48 glc:1
+    s_store_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_save_ttmps_lo, s_save_ttmps_hi], 0x60 glc:1
     ack_sqc_store_workaround()
-    s_store_dword   ttmp13, [s_save_ttmps_lo, s_save_ttmps_hi], 0x58 glc:1
-    ack_sqc_store_workaround()
-    s_store_dwordx2 [ttmp14, ttmp15], [s_save_ttmps_lo, s_save_ttmps_hi], 0x5C glc:1
+    s_store_dword   ttmp13, [s_save_ttmps_lo, s_save_ttmps_hi], 0x74 glc:1
     ack_sqc_store_workaround()
 
     /*	    setup Resource Contants    */
@@ -1099,7 +1097,7 @@ end
     //s_setreg_b32  hwreg(HW_REG_TRAPSTS),  s_restore_trapsts	   //don't overwrite SAVECTX bit as it may be set through external SAVECTX during restore
     s_setreg_b32    hwreg(HW_REG_MODE),	    s_restore_mode
 
-    // Restore trap temporaries 6-11, 13-15 initialized by SPI debug dispatch logic
+    // Restore trap temporaries 4-11, 13 initialized by SPI debug dispatch logic
     // ttmp SR memory offset : size(VGPR)+size(SGPR)+0x40
     get_vgpr_size_bytes(s_restore_ttmps_lo)
     get_sgpr_size_bytes(s_restore_ttmps_hi)
@@ -1107,10 +1105,9 @@ end
     s_add_u32	    s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_buf_rsrc0
     s_addc_u32	    s_restore_ttmps_hi, s_restore_buf_rsrc1, 0x0
     s_and_b32	    s_restore_ttmps_hi, s_restore_ttmps_hi, 0xFFFF
-    s_load_dwordx2  [ttmp6, ttmp7], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x40 glc:1
-    s_load_dwordx4  [ttmp8, ttmp9, ttmp10, ttmp11], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x48 glc:1
-    s_load_dword    ttmp13, [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x58 glc:1
-    s_load_dwordx2  [ttmp14, ttmp15], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x5C glc:1
+    s_load_dwordx4  [ttmp4, ttmp5, ttmp6, ttmp7], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x50 glc:1
+    s_load_dwordx4  [ttmp8, ttmp9, ttmp10, ttmp11], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x60 glc:1
+    s_load_dword    ttmp13, [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x74 glc:1
     s_waitcnt	    lgkmcnt(0)
 
     //reuse s_restore_m0 as a temp register
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 25/27] drm/amdkfd: Add domain number into gpu_id
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (23 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 24/27] drm/amdkfd: Add VegaM support Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 26/27] drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration Kuehling, Felix
                     ` (2 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Lin, Amber, Kuehling, Felix

From: Amber Lin <Amber.Lin@amd.com>

A multi-socket server can have multiple PCIe segments so BFD is not enough
to distingush each GPU. Also add domain number into account when generating
gpu_id.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 64099a8494e1..2c06d6c16eab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1082,8 +1082,9 @@ static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
 			local_mem_info.local_mem_size_public;
 
 	buf[0] = gpu->pdev->devfn;
-	buf[1] = gpu->pdev->subsystem_vendor;
-	buf[2] = gpu->pdev->subsystem_device;
+	buf[1] = gpu->pdev->subsystem_vendor |
+		(gpu->pdev->subsystem_device << 16);
+	buf[2] = pci_domain_nr(gpu->pdev->bus);
 	buf[3] = gpu->pdev->device;
 	buf[4] = gpu->pdev->bus->number;
 	buf[5] = lower_32_bits(local_mem_size);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 24/27] drm/amdkfd: Add VegaM support
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (22 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 23/27] drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15] Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 25/27] drm/amdkfd: Add domain number into gpu_id Kuehling, Felix
                     ` (3 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Russell, Kent

From: Kent Russell <kent.russell@amd.com>

Add the VegaM information to KFD

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c         |  5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       | 20 +++++++++++++++++++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |  1 +
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |  1 +
 7 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1714900035d7..59f8ca4297db 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -134,6 +134,7 @@ static struct kfd_gpu_cache_info carrizo_cache_info[] = {
 #define polaris10_cache_info carrizo_cache_info
 #define polaris11_cache_info carrizo_cache_info
 #define polaris12_cache_info carrizo_cache_info
+#define vegam_cache_info carrizo_cache_info
 /* TODO - check & update Vega10 cache details */
 #define vega10_cache_info carrizo_cache_info
 #define raven_cache_info carrizo_cache_info
@@ -652,6 +653,10 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
 		pcache_info = polaris12_cache_info;
 		num_of_cache_types = ARRAY_SIZE(polaris12_cache_info);
 		break;
+	case CHIP_VEGAM:
+		pcache_info = vegam_cache_info;
+		num_of_cache_types = ARRAY_SIZE(vegam_cache_info);
+		break;
 	case CHIP_VEGA10:
 	case CHIP_VEGA12:
 	case CHIP_VEGA20:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 1368b41cb92b..a53dda9071b1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -232,6 +232,23 @@ static const struct kfd_device_info polaris12_device_info = {
 	.num_sdma_queues_per_engine = 2,
 };
 
+static const struct kfd_device_info vegam_device_info = {
+	.asic_family = CHIP_VEGAM,
+	.max_pasid_bits = 16,
+	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
+	.ih_ring_entry_size = 4 * sizeof(uint32_t),
+	.event_interrupt_class = &event_interrupt_class_cik,
+	.num_of_watch_points = 4,
+	.mqd_size_aligned = MQD_SIZE_ALIGNED,
+	.supports_cwsr = true,
+	.needs_iommu_device = false,
+	.needs_pci_atomics = true,
+	.num_sdma_engines = 2,
+	.num_xgmi_sdma_engines = 0,
+	.num_sdma_queues_per_engine = 2,
+};
+
 static const struct kfd_device_info vega10_device_info = {
 	.asic_family = CHIP_VEGA10,
 	.max_pasid_bits = 16,
@@ -387,6 +404,9 @@ static const struct kfd_deviceid supported_devices[] = {
 	{ 0x6995, &polaris12_device_info },	/* Polaris12 */
 	{ 0x6997, &polaris12_device_info },	/* Polaris12 */
 	{ 0x699F, &polaris12_device_info },	/* Polaris12 */
+	{ 0x694C, &vegam_device_info },		/* VegaM */
+	{ 0x694E, &vegam_device_info },		/* VegaM */
+	{ 0x694F, &vegam_device_info },		/* VegaM */
 	{ 0x6860, &vega10_device_info },	/* Vega10 */
 	{ 0x6861, &vega10_device_info },	/* Vega10 */
 	{ 0x6862, &vega10_device_info },	/* Vega10 */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 0bfdb141b6e7..ece35c7a77b5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1811,6 +1811,7 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 	case CHIP_POLARIS10:
 	case CHIP_POLARIS11:
 	case CHIP_POLARIS12:
+	case CHIP_VEGAM:
 		device_queue_manager_init_vi_tonga(&dqm->asic_ops);
 		break;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 213ea5454d11..dc7339825b5c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -398,6 +398,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			case CHIP_POLARIS10:
 			case CHIP_POLARIS11:
 			case CHIP_POLARIS12:
+			case CHIP_VEGAM:
 				kfd_init_apertures_vi(pdd, id);
 				break;
 			case CHIP_VEGA10:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 7a737b50bed4..1cc03b3ddbb9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -315,6 +315,7 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
 	case CHIP_POLARIS10:
 	case CHIP_POLARIS11:
 	case CHIP_POLARIS12:
+	case CHIP_VEGAM:
 		kernel_queue_init_vi(&kq->ops_asic_specific);
 		break;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 077c47fd4fee..808194663a7d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -228,6 +228,7 @@ int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
 	case CHIP_POLARIS10:
 	case CHIP_POLARIS11:
 	case CHIP_POLARIS12:
+	case CHIP_VEGAM:
 		pm->pmf = &kfd_vi_pm_funcs;
 		break;
 	case CHIP_VEGA10:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index e536f4b6698f..64099a8494e1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1306,6 +1306,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 	case CHIP_POLARIS10:
 	case CHIP_POLARIS11:
 	case CHIP_POLARIS12:
+	case CHIP_VEGAM:
 		pr_debug("Adding doorbell packet type capability\n");
 		dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_1_0 <<
 			HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 26/27] drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (24 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 25/27] drm/amdkfd: Add domain number into gpu_id Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
  2019-04-28  7:44   ` [PATCH 27/27] drm/amdgpu: Fix GTT size calculation Kuehling, Felix
  2019-04-29 23:23   ` [PATCH 00/27] KFD upstreaming Kuehling, Felix
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Liu, Shaoyun

From: shaoyunl <Shaoyun.Liu@amd.com>

There is a bug found in vml2 xgmi logic:
mtype is always sent as NC on the VMC to TC interface for a page walk,
regardless of whether the request is being sent to local or remote GPU.
NC means non-coherent and will cause the VMC return data to be cached
in the TCC (versus UC – uncached will not cache the data). Since the
page table updates are being done by SDMA/HDP, then TCC will never be
updated and the GC VML2 will continue to hit on the TCC and never get
the updated page tables and result in a fault.
Heave weigh tlb invalidation does a WB/INVAL of the L1/L2 GL data
caches so TCC will not be hit on next request

Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 53 +++++++++----------
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index ef3d93b995b2..7ec97e903a1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -726,29 +726,8 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-{
-	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
-
-	/* Use legacy mode tlb invalidation.
-	 *
-	 * Currently on Raven the code below is broken for anything but
-	 * legacy mode due to a MMHUB power gating problem. A workaround
-	 * is for MMHUB to wait until the condition PER_VMID_INVALIDATE_REQ
-	 * == PER_VMID_INVALIDATE_ACK instead of simply waiting for the ack
-	 * bit.
-	 *
-	 * TODO 1: agree on the right set of invalidation registers for
-	 * KFD use. Use the last one for now. Invalidate both GC and
-	 * MMHUB.
-	 *
-	 * TODO 2: support range-based invalidation, requires kfg2kgd
-	 * interface change
-	 */
-	amdgpu_gmc_flush_gpu_tlb(adev, vmid, 0);
-}
-
-static int invalidate_tlbs_with_kiq(struct amdgpu_device *adev, uint16_t pasid)
+static int invalidate_tlbs_with_kiq(struct amdgpu_device *adev, uint16_t pasid,
+			uint32_t flush_type)
 {
 	signed long r;
 	uint32_t seq;
@@ -761,7 +740,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device *adev, uint16_t pasid)
 			PACKET3_INVALIDATE_TLBS_DST_SEL(1) |
 			PACKET3_INVALIDATE_TLBS_ALL_HUB(1) |
 			PACKET3_INVALIDATE_TLBS_PASID(pasid) |
-			PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(0)); /* legacy */
+			PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(flush_type));
 	amdgpu_fence_emit_polling(ring, &seq);
 	amdgpu_ring_commit(ring);
 	spin_unlock(&adev->gfx.kiq.ring_lock);
@@ -780,12 +759,16 @@ static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 	int vmid;
 	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
+	uint32_t flush_type = 0;
 
 	if (adev->in_gpu_reset)
 		return -EIO;
+	if (adev->gmc.xgmi.num_physical_nodes &&
+		adev->asic_type == CHIP_VEGA20)
+		flush_type = 2;
 
 	if (ring->sched.ready)
-		return invalidate_tlbs_with_kiq(adev, pasid);
+		return invalidate_tlbs_with_kiq(adev, pasid, flush_type);
 
 	for (vmid = 0; vmid < 16; vmid++) {
 		if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
@@ -793,7 +776,8 @@ static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
 		if (get_atc_vmid_pasid_mapping_valid(kgd, vmid)) {
 			if (get_atc_vmid_pasid_mapping_pasid(kgd, vmid)
 				== pasid) {
-				write_vmid_invalidate_request(kgd, vmid);
+				amdgpu_gmc_flush_gpu_tlb(adev, vmid,
+							 flush_type);
 				break;
 			}
 		}
@@ -811,7 +795,22 @@ static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
 		return 0;
 	}
 
-	write_vmid_invalidate_request(kgd, vmid);
+	/* Use legacy mode tlb invalidation.
+	 *
+	 * Currently on Raven the code below is broken for anything but
+	 * legacy mode due to a MMHUB power gating problem. A workaround
+	 * is for MMHUB to wait until the condition PER_VMID_INVALIDATE_REQ
+	 * == PER_VMID_INVALIDATE_ACK instead of simply waiting for the ack
+	 * bit.
+	 *
+	 * TODO 1: agree on the right set of invalidation registers for
+	 * KFD use. Use the last one for now. Invalidate both GC and
+	 * MMHUB.
+	 *
+	 * TODO 2: support range-based invalidation, requires kfg2kgd
+	 * interface change
+	 */
+	amdgpu_gmc_flush_gpu_tlb(adev, vmid, 0);
 	return 0;
 }
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (25 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 26/27] drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration Kuehling, Felix
@ 2019-04-28  7:44   ` Kuehling, Felix
       [not found]     ` <20190428074331.30107-28-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2019-04-29 23:23   ` [PATCH 00/27] KFD upstreaming Kuehling, Felix
  27 siblings, 1 reply; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-28  7:44 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix, Russell, Kent

From: Kent Russell <kent.russell@amd.com>

GTT size is currently limited to the minimum of VRAM size or 3/4 of
system memory. This severely limits the quanitity of system memory
that can be used by ROCm application.

Increase GTT size to the maximum of VRAM size or system memory size.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c14198737dcd..e9ecc3953673 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 		struct sysinfo si;
 
 		si_meminfo(&si);
-		gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
-			       adev->gmc.mc_vram_size),
-			       ((uint64_t)si.totalram * si.mem_unit * 3/4));
-	}
-	else
+		gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
+				adev->gmc.mc_vram_size,
+				((uint64_t)si.totalram * si.mem_unit));
+	} else
 		gtt_size = (uint64_t)amdgpu_gtt_size << 20;
 
 	/* Initialize GTT memory pool */
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]     ` <20190428074331.30107-28-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-29 12:34       ` Christian König
       [not found]         ` <86fa9fc3-7a8f-9855-ae1d-5c7ccf2b5260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Christian König @ 2019-04-29 12:34 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Russell, Kent

Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
> From: Kent Russell <kent.russell@amd.com>
>
> GTT size is currently limited to the minimum of VRAM size or 3/4 of
> system memory. This severely limits the quanitity of system memory
> that can be used by ROCm application.
>
> Increase GTT size to the maximum of VRAM size or system memory size.

Well, NAK.

This limit was done on purpose because we otherwise the max-texture-size 
would be crashing the system because the OOM killer would be causing a 
system panic.

Using more than 75% of system memory by the GPU at the same time makes 
the system unstable and so we can't allow that by default.

What could maybe work is to reduce amount of system memory by a fixed 
factor, but I of hand don't see a way of fixing this in general.

Regards,
Christian.

>
> Signed-off-by: Kent Russell <kent.russell@amd.com>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c14198737dcd..e9ecc3953673 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>   		struct sysinfo si;
>   
>   		si_meminfo(&si);
> -		gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
> -			       adev->gmc.mc_vram_size),
> -			       ((uint64_t)si.totalram * si.mem_unit * 3/4));
> -	}
> -	else
> +		gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
> +				adev->gmc.mc_vram_size,
> +				((uint64_t)si.totalram * si.mem_unit));
> +	} else
>   		gtt_size = (uint64_t)amdgpu_gtt_size << 20;
>   
>   	/* Initialize GTT memory pool */

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]         ` <86fa9fc3-7a8f-9855-ae1d-5c7ccf2b5260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-29 23:16           ` Kuehling, Felix
       [not found]             ` <1b1ec993-1c4b-8661-9b3f-ac0ad8ae64c7-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-29 23:16 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Russell, Kent

On 2019-04-29 8:34 a.m., Christian König wrote:
> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>> From: Kent Russell <kent.russell@amd.com>
>>
>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>> system memory. This severely limits the quanitity of system memory
>> that can be used by ROCm application.
>>
>> Increase GTT size to the maximum of VRAM size or system memory size.
>
> Well, NAK.
>
> This limit was done on purpose because we otherwise the 
> max-texture-size would be crashing the system because the OOM killer 
> would be causing a system panic.
>
> Using more than 75% of system memory by the GPU at the same time makes 
> the system unstable and so we can't allow that by default.

Like we discussed, the current implementation is too limiting. On a Fiji 
system with 4GB VRAM and 32GB system memory, it limits system memory 
allocations to 4GB. I think this workaround was fixed once before and 
reverted because it broke a CZ system with 1GB system memory. So I 
suspect that this is an issue affecting small memory systems where maybe 
the 1/2 system memory limit in TTM isn't sufficient to protect from OOM 
panics.

The OOM killer problem is a more general problem that potentially 
affects other drivers too. Keeping this GTT limit broken in AMDGPU is an 
inadequate workaround at best. I'd like to look for a better solution, 
probably some adjustment of the TTM system memory limits on systems with 
small memory, to avoid OOM panics on such systems.

Regards,
   Felix


>
> What could maybe work is to reduce amount of system memory by a fixed 
> factor, but I of hand don't see a way of fixing this in general.
>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index c14198737dcd..e9ecc3953673 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>>           struct sysinfo si;
>>             si_meminfo(&si);
>> -        gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>> -                   adev->gmc.mc_vram_size),
>> -                   ((uint64_t)si.totalram * si.mem_unit * 3/4));
>> -    }
>> -    else
>> +        gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>> +                adev->gmc.mc_vram_size,
>> +                ((uint64_t)si.totalram * si.mem_unit));
>> +    } else
>>           gtt_size = (uint64_t)amdgpu_gtt_size << 20;
>>         /* Initialize GTT memory pool */
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/27] KFD upstreaming
       [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (26 preceding siblings ...)
  2019-04-28  7:44   ` [PATCH 27/27] drm/amdgpu: Fix GTT size calculation Kuehling, Felix
@ 2019-04-29 23:23   ` Kuehling, Felix
  27 siblings, 0 replies; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-29 23:23 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Deucher, Alexander

I'll drop patch 27 from this series until the potential OOM issues can 
be sorted out.

The remaining patches are all KFD specific and shouldn't cause any 
trouble for graphics apps. If there are no other objections I'm planning 
to push patches 1-26 to amd-staging-drm-next tomorrow.

Thanks,
   Felix

On 2019-04-28 3:44 a.m., Kuehling, Felix wrote:
> Assorted KFD changes that have been accumulating on amd-kfd-staging. New
> features and fixes included:
> * Support for VegaM
> * Support for systems with multiple PCI domains
> * New SDMA queue type that's optimized for XGMI links
> * SDMA MQD allocation changes to support future ASICs with more SDMA queues
> * Fix for compute profile switching at process termination
> * Fix for a circular lock dependency in MMU notifiers
> * Fix for TLB flushing bug with XGMI enabled
> * Fix for artificial GTT system memory limitation
> * Trap handler updates
>
> Amber Lin (1):
>    drm/amdkfd: Add domain number into gpu_id
>
> Felix Kuehling (1):
>    drm/amdkfd: Fix a circular lock dependency
>
> Harish Kasiviswanathan (1):
>    drm/amdkfd: Fix compute profile switching
>
> Jay Cornwall (4):
>    drm/amdkfd: Fix gfx8 MEM_VIOL exception handler
>    drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL
>    drm/amdkfd: Fix gfx9 XNACK state save/restore
>    drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]
>
> Kent Russell (2):
>    drm/amdkfd: Add VegaM support
>    drm/amdgpu: Fix GTT size calculation
>
> Oak Zeng (16):
>    drm/amdkfd: Use 64 bit sdma_bitmap
>    drm/amdkfd: Add sdma allocation debug message
>    drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id
>    drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd
>    drm/amdkfd: Fix a potential memory leak
>    drm/amdkfd: Introduce asic-specific mqd_manager_init function
>    drm/amdkfd: Introduce DIQ type mqd manager
>    drm/amdkfd: Init mqd managers in device queue manager init
>    drm/amdkfd: Add mqd size in mqd manager struct
>    drm/amdkfd: Allocate MQD trunk for HIQ and SDMA
>    drm/amdkfd: Move non-sdma mqd allocation out of init_mqd
>    drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk
>    drm/amdkfd: Fix sdma queue map issue
>    drm/amdkfd: Introduce XGMI SDMA queue type
>    drm/amdkfd: Expose sdma engine numbers to topology
>    drm/amdkfd: Delete alloc_format field from map_queue struct
>
> Yong Zhao (1):
>    drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue()
>
> shaoyunl (1):
>    drm/amdgpu: Use heavy weight for tlb invalidation on xgmi
>      configuration
>
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  53 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   9 +-
>   .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h    | 483 +++++++++---------
>   .../drm/amd/amdkfd/cwsr_trap_handler_gfx8.asm |  13 -
>   .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |  63 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   2 +
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c         |   5 +
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  51 ++
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 354 ++++++++-----
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  14 +-
>   .../amd/amdkfd/kfd_device_queue_manager_cik.c |   2 +
>   .../amd/amdkfd/kfd_device_queue_manager_v9.c  |   1 +
>   .../amd/amdkfd/kfd_device_queue_manager_vi.c  |   2 +
>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c  |   1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |   6 +-
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |   4 +-
>   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |   4 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  |  70 ++-
>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   8 +
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |  53 +-
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  85 +--
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |  53 +-
>   .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   |   4 +-
>   .../gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h   |   7 +-
>   .../gpu/drm/amd/amdkfd/kfd_pm4_headers_vi.h   |   7 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  14 +-
>   .../amd/amdkfd/kfd_process_queue_manager.c    |  14 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c     |  13 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h     |   2 +
>   drivers/gpu/drm/amd/include/cik_structs.h     |   3 +-
>   drivers/gpu/drm/amd/include/v9_structs.h      |   3 +-
>   drivers/gpu/drm/amd/include/vi_structs.h      |   3 +-
>   include/uapi/linux/kfd_ioctl.h                |   7 +-
>   33 files changed, 826 insertions(+), 587 deletions(-)
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]             ` <1b1ec993-1c4b-8661-9b3f-ac0ad8ae64c7-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-30  9:32               ` Christian König
       [not found]                 ` <134a4999-776f-44c6-99a2-42e8b9366a73-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Christian König @ 2019-04-30  9:32 UTC (permalink / raw)
  To: Kuehling, Felix, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Russell, Kent

Am 30.04.19 um 01:16 schrieb Kuehling, Felix:
> On 2019-04-29 8:34 a.m., Christian König wrote:
>> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>>> From: Kent Russell <kent.russell@amd.com>
>>>
>>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>>> system memory. This severely limits the quanitity of system memory
>>> that can be used by ROCm application.
>>>
>>> Increase GTT size to the maximum of VRAM size or system memory size.
>> Well, NAK.
>>
>> This limit was done on purpose because we otherwise the
>> max-texture-size would be crashing the system because the OOM killer
>> would be causing a system panic.
>>
>> Using more than 75% of system memory by the GPU at the same time makes
>> the system unstable and so we can't allow that by default.
> Like we discussed, the current implementation is too limiting. On a Fiji
> system with 4GB VRAM and 32GB system memory, it limits system memory
> allocations to 4GB. I think this workaround was fixed once before and
> reverted because it broke a CZ system with 1GB system memory. So I
> suspect that this is an issue affecting small memory systems where maybe
> the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
> panics.

Well it not only broke on a 1GB CZ system, this was just where Andrey 
reproduced it. We got reports from all kind of systems.

> The OOM killer problem is a more general problem that potentially
> affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
> inadequate workaround at best. I'd like to look for a better solution,
> probably some adjustment of the TTM system memory limits on systems with
> small memory, to avoid OOM panics on such systems.

The core problem here is that the OOM killer explicitly doesn't want to 
block for shaders to finish whatever it is doing.

So currently as soon as the hardware is using some memory it can't be 
reclaimed immediately.

The original limit in TTM was 2/3 of system memory and that worked 
really reliable and we ran into problems only after raising it to 3/4.

To sum it up the requirement of using almost all system memory by a GPU 
is simply not possible upstream and even in any production system rather 
questionable.

The only real solution I can see is to be able to reliable kill shaders 
in an OOM situation.

Regards,
Christian.

>
> Regards,
>     Felix
>
>
>> What could maybe work is to reduce amount of system memory by a fixed
>> factor, but I of hand don't see a way of fixing this in general.
>>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
>>>    1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index c14198737dcd..e9ecc3953673 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>>>            struct sysinfo si;
>>>              si_meminfo(&si);
>>> -        gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>>> -                   adev->gmc.mc_vram_size),
>>> -                   ((uint64_t)si.totalram * si.mem_unit * 3/4));
>>> -    }
>>> -    else
>>> +        gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>>> +                adev->gmc.mc_vram_size,
>>> +                ((uint64_t)si.totalram * si.mem_unit));
>>> +    } else
>>>            gtt_size = (uint64_t)amdgpu_gtt_size << 20;
>>>          /* Initialize GTT memory pool */
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]                 ` <134a4999-776f-44c6-99a2-42e8b9366a73-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-30 15:36                   ` Kuehling, Felix
       [not found]                     ` <9f882acd-c48f-3bbd-2d90-659c2edead39-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-30 15:36 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Russell, Kent


On 2019-04-30 5:32 a.m., Christian König wrote:
> [CAUTION: External Email]
>
> Am 30.04.19 um 01:16 schrieb Kuehling, Felix:
>> On 2019-04-29 8:34 a.m., Christian König wrote:
>>> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>>>> From: Kent Russell <kent.russell@amd.com>
>>>>
>>>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>>>> system memory. This severely limits the quanitity of system memory
>>>> that can be used by ROCm application.
>>>>
>>>> Increase GTT size to the maximum of VRAM size or system memory size.
>>> Well, NAK.
>>>
>>> This limit was done on purpose because we otherwise the
>>> max-texture-size would be crashing the system because the OOM killer
>>> would be causing a system panic.
>>>
>>> Using more than 75% of system memory by the GPU at the same time makes
>>> the system unstable and so we can't allow that by default.
>> Like we discussed, the current implementation is too limiting. On a Fiji
>> system with 4GB VRAM and 32GB system memory, it limits system memory
>> allocations to 4GB. I think this workaround was fixed once before and
>> reverted because it broke a CZ system with 1GB system memory. So I
>> suspect that this is an issue affecting small memory systems where maybe
>> the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
>> panics.
>
> Well it not only broke on a 1GB CZ system, this was just where Andrey
> reproduced it. We got reports from all kind of systems.

I'd like to see those reports. This patch has been included in Linux Pro 
releases since 18.20. I'm not aware that anyone complained about it.


>
>> The OOM killer problem is a more general problem that potentially
>> affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
>> inadequate workaround at best. I'd like to look for a better solution,
>> probably some adjustment of the TTM system memory limits on systems with
>> small memory, to avoid OOM panics on such systems.
>
> The core problem here is that the OOM killer explicitly doesn't want to
> block for shaders to finish whatever it is doing.
>
> So currently as soon as the hardware is using some memory it can't be
> reclaimed immediately.
>
> The original limit in TTM was 2/3 of system memory and that worked
> really reliable and we ran into problems only after raising it to 3/4.

The TTM system memory limit is still 3/8 soft and 1/2 hard, 3/4 for 
emergencies. See ttm_mem_init_kernel_zone. AFAICT, the emergency limit 
is only available to root.

This GTT limit kicks in before I get anywhere close to the TTM limit. 
That's why I think it is both broken and redundant.


>
> To sum it up the requirement of using almost all system memory by a GPU
> is simply not possible upstream and even in any production system rather
> questionable.

It should be doable with userptr, which now uses unpinned pages through 
HMM. Currently the GTT limit affects the largest possible userptr 
allocation, though not the total sum of all userptr allocations. Maybe 
making userptr completely independent of GTT size would be an easier 
problem to tackle. Then I can leave the GTT limit alone.


>
> The only real solution I can see is to be able to reliable kill shaders
> in an OOM situation.

Well, we can in fact preempt our compute shaders with low latency. 
Killing a KFD process will do exactly that.

Regards,
   Felix


>
> Regards,
> Christian.
>
>>
>> Regards,
>>     Felix
>>
>>
>>> What could maybe work is to reduce amount of system memory by a fixed
>>> factor, but I of hand don't see a way of fixing this in general.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 ++++-----
>>>>    1 file changed, 4 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index c14198737dcd..e9ecc3953673 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -1740,11 +1740,10 @@ int amdgpu_ttm_init(struct amdgpu_device 
>>>> *adev)
>>>>            struct sysinfo si;
>>>>              si_meminfo(&si);
>>>> -        gtt_size = min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>>>> -                   adev->gmc.mc_vram_size),
>>>> -                   ((uint64_t)si.totalram * si.mem_unit * 3/4));
>>>> -    }
>>>> -    else
>>>> +        gtt_size = max3((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),
>>>> +                adev->gmc.mc_vram_size,
>>>> +                ((uint64_t)si.totalram * si.mem_unit));
>>>> +    } else
>>>>            gtt_size = (uint64_t)amdgpu_gtt_size << 20;
>>>>          /* Initialize GTT memory pool */
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]                     ` <9f882acd-c48f-3bbd-2d90-659c2edead39-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-30 17:03                       ` Koenig, Christian
       [not found]                         ` <f5c698ad-2aff-b3c5-2041-05a10983438a-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Koenig, Christian @ 2019-04-30 17:03 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Russell, Kent

Am 30.04.19 um 17:36 schrieb Kuehling, Felix:
> On 2019-04-30 5:32 a.m., Christian König wrote:
>> [CAUTION: External Email]
>>
>> Am 30.04.19 um 01:16 schrieb Kuehling, Felix:
>>> On 2019-04-29 8:34 a.m., Christian König wrote:
>>>> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>>>>> From: Kent Russell <kent.russell@amd.com>
>>>>>
>>>>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>>>>> system memory. This severely limits the quanitity of system memory
>>>>> that can be used by ROCm application.
>>>>>
>>>>> Increase GTT size to the maximum of VRAM size or system memory size.
>>>> Well, NAK.
>>>>
>>>> This limit was done on purpose because we otherwise the
>>>> max-texture-size would be crashing the system because the OOM killer
>>>> would be causing a system panic.
>>>>
>>>> Using more than 75% of system memory by the GPU at the same time makes
>>>> the system unstable and so we can't allow that by default.
>>> Like we discussed, the current implementation is too limiting. On a Fiji
>>> system with 4GB VRAM and 32GB system memory, it limits system memory
>>> allocations to 4GB. I think this workaround was fixed once before and
>>> reverted because it broke a CZ system with 1GB system memory. So I
>>> suspect that this is an issue affecting small memory systems where maybe
>>> the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
>>> panics.
>> Well it not only broke on a 1GB CZ system, this was just where Andrey
>> reproduced it. We got reports from all kind of systems.
> I'd like to see those reports. This patch has been included in Linux Pro
> releases since 18.20. I'm not aware that anyone complained about it.

Well to be honest our Pro driver is actually not used that widely and 
only used on rather homogeneous systems.

Which is not really surprising since we only advise to use it on 
professional use cases.

>>> The OOM killer problem is a more general problem that potentially
>>> affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
>>> inadequate workaround at best. I'd like to look for a better solution,
>>> probably some adjustment of the TTM system memory limits on systems with
>>> small memory, to avoid OOM panics on such systems.
>> The core problem here is that the OOM killer explicitly doesn't want to
>> block for shaders to finish whatever it is doing.
>>
>> So currently as soon as the hardware is using some memory it can't be
>> reclaimed immediately.
>>
>> The original limit in TTM was 2/3 of system memory and that worked
>> really reliable and we ran into problems only after raising it to 3/4.
> The TTM system memory limit is still 3/8 soft and 1/2 hard, 3/4 for
> emergencies. See ttm_mem_init_kernel_zone. AFAICT, the emergency limit
> is only available to root.

Ah! I think I know why those limits doesn't kick in here!

When GTT space is used by evictions from VRAM then we will use the 
emergency limit as well.

> This GTT limit kicks in before I get anywhere close to the TTM limit.
> That's why I think it is both broken and redundant.

That was also the argument when we removed it the last time, but it got 
immediately reverted.

>> To sum it up the requirement of using almost all system memory by a GPU
>> is simply not possible upstream and even in any production system rather
>> questionable.
> It should be doable with userptr, which now uses unpinned pages through
> HMM. Currently the GTT limit affects the largest possible userptr
> allocation, though not the total sum of all userptr allocations. Maybe
> making userptr completely independent of GTT size would be an easier
> problem to tackle. Then I can leave the GTT limit alone.

Well this way we would only avoid the symptoms, but not the real problem.

>> The only real solution I can see is to be able to reliable kill shaders
>> in an OOM situation.
> Well, we can in fact preempt our compute shaders with low latency.
> Killing a KFD process will do exactly that.

I've taken a look at that thing as well and to be honest it is not even 
remotely sufficient.

We need something which stops the hardware *immediately* from accessing 
system memory, and not wait for the SQ to kill all waves, flush caches 
etc...

One possibility I'm playing around with for a while is to replace the 
root PD for the VMIDs in question on the fly. E.g. we just let it point 
to some dummy which redirects everything into nirvana.

But implementing this is easier said than done...

Regards,
Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]                         ` <f5c698ad-2aff-b3c5-2041-05a10983438a-5C7GfCeVMHo@public.gmane.org>
@ 2019-04-30 17:25                           ` Kuehling, Felix
       [not found]                             ` <8ba952ab-4836-4ca3-cd80-99f7367a7979-5C7GfCeVMHo@public.gmane.org>
  2019-07-13 20:24                           ` Felix Kuehling
  1 sibling, 1 reply; 37+ messages in thread
From: Kuehling, Felix @ 2019-04-30 17:25 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Russell, Kent

On 2019-04-30 1:03 p.m., Koenig, Christian wrote:
> Am 30.04.19 um 17:36 schrieb Kuehling, Felix:
>> On 2019-04-30 5:32 a.m., Christian König wrote:
>>> [CAUTION: External Email]
>>>
>>> Am 30.04.19 um 01:16 schrieb Kuehling, Felix:
>>>> On 2019-04-29 8:34 a.m., Christian König wrote:
>>>>> Am 28.04.19 um 09:44 schrieb Kuehling, Felix:
>>>>>> From: Kent Russell <kent.russell@amd.com>
>>>>>>
>>>>>> GTT size is currently limited to the minimum of VRAM size or 3/4 of
>>>>>> system memory. This severely limits the quanitity of system memory
>>>>>> that can be used by ROCm application.
>>>>>>
>>>>>> Increase GTT size to the maximum of VRAM size or system memory size.
>>>>> Well, NAK.
>>>>>
>>>>> This limit was done on purpose because we otherwise the
>>>>> max-texture-size would be crashing the system because the OOM killer
>>>>> would be causing a system panic.
>>>>>
>>>>> Using more than 75% of system memory by the GPU at the same time makes
>>>>> the system unstable and so we can't allow that by default.
>>>> Like we discussed, the current implementation is too limiting. On a Fiji
>>>> system with 4GB VRAM and 32GB system memory, it limits system memory
>>>> allocations to 4GB. I think this workaround was fixed once before and
>>>> reverted because it broke a CZ system with 1GB system memory. So I
>>>> suspect that this is an issue affecting small memory systems where maybe
>>>> the 1/2 system memory limit in TTM isn't sufficient to protect from OOM
>>>> panics.
>>> Well it not only broke on a 1GB CZ system, this was just where Andrey
>>> reproduced it. We got reports from all kind of systems.
>> I'd like to see those reports. This patch has been included in Linux Pro
>> releases since 18.20. I'm not aware that anyone complained about it.
> Well to be honest our Pro driver is actually not used that widely and
> only used on rather homogeneous systems.
>
> Which is not really surprising since we only advise to use it on
> professional use cases.
>
>>>> The OOM killer problem is a more general problem that potentially
>>>> affects other drivers too. Keeping this GTT limit broken in AMDGPU is an
>>>> inadequate workaround at best. I'd like to look for a better solution,
>>>> probably some adjustment of the TTM system memory limits on systems with
>>>> small memory, to avoid OOM panics on such systems.
>>> The core problem here is that the OOM killer explicitly doesn't want to
>>> block for shaders to finish whatever it is doing.
>>>
>>> So currently as soon as the hardware is using some memory it can't be
>>> reclaimed immediately.
>>>
>>> The original limit in TTM was 2/3 of system memory and that worked
>>> really reliable and we ran into problems only after raising it to 3/4.
>> The TTM system memory limit is still 3/8 soft and 1/2 hard, 3/4 for
>> emergencies. See ttm_mem_init_kernel_zone. AFAICT, the emergency limit
>> is only available to root.
> Ah! I think I know why those limits doesn't kick in here!
>
> When GTT space is used by evictions from VRAM then we will use the
> emergency limit as well.
>
>> This GTT limit kicks in before I get anywhere close to the TTM limit.
>> That's why I think it is both broken and redundant.
> That was also the argument when we removed it the last time, but it got
> immediately reverted.
>
>>> To sum it up the requirement of using almost all system memory by a GPU
>>> is simply not possible upstream and even in any production system rather
>>> questionable.
>> It should be doable with userptr, which now uses unpinned pages through
>> HMM. Currently the GTT limit affects the largest possible userptr
>> allocation, though not the total sum of all userptr allocations. Maybe
>> making userptr completely independent of GTT size would be an easier
>> problem to tackle. Then I can leave the GTT limit alone.
> Well this way we would only avoid the symptoms, but not the real problem.

It allocates pages in user mode rather than kernel mode. That means, OOM 
situations take a completely different code path. Before running out of 
memory completely, triggering the OOM killer, the kernel would start 
swapping pages, which would trigger the MMU notifier to stop the user 
mode queues or invalidate GPU page table entries, and allow the pages to 
be swapped out.


>
>>> The only real solution I can see is to be able to reliable kill shaders
>>> in an OOM situation.
>> Well, we can in fact preempt our compute shaders with low latency.
>> Killing a KFD process will do exactly that.
> I've taken a look at that thing as well and to be honest it is not even
> remotely sufficient.
>
> We need something which stops the hardware *immediately* from accessing
> system memory, and not wait for the SQ to kill all waves, flush caches
> etc...

It's apparently sufficient to use in our MMU notifier. There is also a 
way to disable the grace period that allows short waves to complete 
before being preempted, though we're not using that at the moment.


>
> One possibility I'm playing around with for a while is to replace the
> root PD for the VMIDs in question on the fly. E.g. we just let it point
> to some dummy which redirects everything into nirvana.

Even that's not sufficient. You'll also need to free the pages 
immediately. For KFD processes, cleaning up of memory is done in a 
worker thread that gets kicked off by a release MMU notifier when the 
process' mm_struct is taken down.

Then there is still TTM's delayed freeing of BOs that waits for fences. 
So you'd need to signal all the BO fences to allow them to be released.

TBH, I don't understand why waiting is not an option, if the alternative 
is a kernel panic. If your OOM killer kicks in, your system is basically 
dead. Waiting for a fraction of a second to let a GPU finish its memory 
access should be a small price to pay in that situation.

Regards,
   Felix


>
> But implementing this is easier said than done...
>
> Regards,
> Christian.
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]                             ` <8ba952ab-4836-4ca3-cd80-99f7367a7979-5C7GfCeVMHo@public.gmane.org>
@ 2019-05-02 13:06                               ` Koenig, Christian
  0 siblings, 0 replies; 37+ messages in thread
From: Koenig, Christian @ 2019-05-02 13:06 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Russell, Kent

Am 30.04.19 um 19:25 schrieb Kuehling, Felix:
> [SNIP]
>>>> To sum it up the requirement of using almost all system memory by a GPU
>>>> is simply not possible upstream and even in any production system rather
>>>> questionable.
>>> It should be doable with userptr, which now uses unpinned pages through
>>> HMM. Currently the GTT limit affects the largest possible userptr
>>> allocation, though not the total sum of all userptr allocations. Maybe
>>> making userptr completely independent of GTT size would be an easier
>>> problem to tackle. Then I can leave the GTT limit alone.
>> Well this way we would only avoid the symptoms, but not the real problem.
> It allocates pages in user mode rather than kernel mode. That means, OOM
> situations take a completely different code path. Before running out of
> memory completely, triggering the OOM killer, the kernel would start
> swapping pages, which would trigger the MMU notifier to stop the user
> mode queues or invalidate GPU page table entries, and allow the pages to
> be swapped out.

Well it at least removes the extra layer in TTM we have here.

But what I meant is that it still doesn't fix our underlying problem of 
stopping the hardware immediately.

>>>> The only real solution I can see is to be able to reliable kill shaders
>>>> in an OOM situation.
>>> Well, we can in fact preempt our compute shaders with low latency.
>>> Killing a KFD process will do exactly that.
>> I've taken a look at that thing as well and to be honest it is not even
>> remotely sufficient.
>>
>> We need something which stops the hardware *immediately* from accessing
>> system memory, and not wait for the SQ to kill all waves, flush caches
>> etc...
> It's apparently sufficient to use in our MMU notifier. There is also a
> way to disable the grace period that allows short waves to complete
> before being preempted, though we're not using that at the moment.
>
>
>> One possibility I'm playing around with for a while is to replace the
>> root PD for the VMIDs in question on the fly. E.g. we just let it point
>> to some dummy which redirects everything into nirvana.
> Even that's not sufficient. You'll also need to free the pages
> immediately. For KFD processes, cleaning up of memory is done in a
> worker thread that gets kicked off by a release MMU notifier when the
> process' mm_struct is taken down.

Yeah, that worker is what I meant with that this whole thing is not 
sufficient. BTW: How does that still work with HMM? I mean HMM doesn't 
take a reference to the pages any more.

But let me say it differently: When we want the OOM killer to work 
correctly we need to have a short cut path which doesn't takes any locks 
or allocates memory.

What we can do is: Write some registers and then maybe busy wait for the 
TLB flush to complete.

What we can't do is: Waiting for the SQ to kill waves, waiting for 
fences etc...

> Then there is still TTM's delayed freeing of BOs that waits for fences.
> So you'd need to signal all the BO fences to allow them to be released.

Actually that doesn't apply in the critical code path. In this situation 
TTM tries to free things up it doesn't need to wait for immediately.

What we are missing here is something like a kill interface for fences 
maybe...

> TBH, I don't understand why waiting is not an option, if the alternative
> is a kernel panic. If your OOM killer kicks in, your system is basically
> dead. Waiting for a fraction of a second to let a GPU finish its memory
> access should be a small price to pay in that situation.

Oh, yeah that is really good point as well.

I think that this restriction was created to make sure that the OOM 
killer always makes progress and doesn't waits for things like network 
congestion.

Now the Linux MM is not really made for long term I/O mappings anyway. 
And that was also recently a topic on the lists in the context of HMM 
(there is a LWN summary about it). Probably worth bringing that 
discussion up once more.

Christian.

>
> Regards,
>     Felix
>
>
>> But implementing this is easier said than done...
>>
>> Regards,
>> Christian.
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 27/27] drm/amdgpu: Fix GTT size calculation
       [not found]                         ` <f5c698ad-2aff-b3c5-2041-05a10983438a-5C7GfCeVMHo@public.gmane.org>
  2019-04-30 17:25                           ` Kuehling, Felix
@ 2019-07-13 20:24                           ` Felix Kuehling
  1 sibling, 0 replies; 37+ messages in thread
From: Felix Kuehling @ 2019-07-13 20:24 UTC (permalink / raw)
  To: Koenig, Christian, Kuehling, Felix,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Russell, Kent


[-- Attachment #1.1.1.1: Type: text/plain, Size: 1602 bytes --]

Am 2019-04-30 um 1:03 p.m. schrieb Koenig, Christian:
>>> The only real solution I can see is to be able to reliable kill shaders
>>> in an OOM situation.
>> Well, we can in fact preempt our compute shaders with low latency.
>> Killing a KFD process will do exactly that.
> I've taken a look at that thing as well and to be honest it is not even 
> remotely sufficient.
>
> We need something which stops the hardware *immediately* from accessing 
> system memory, and not wait for the SQ to kill all waves, flush caches 
> etc...
>
> One possibility I'm playing around with for a while is to replace the 
> root PD for the VMIDs in question on the fly. E.g. we just let it point 
> to some dummy which redirects everything into nirvana.
>
> But implementing this is easier said than done...

Warming up this thread, since I just fixed another bug that was enabled
by artificial memory pressure due to the GTT limit.

I think disabling the PD for the VMIDs is a good idea. A problem is that
HWS firmware updates PD pointers in the background for its VMIDs. So
this would require a reliable and fast way to kill the HWS first.

An alternative I thought about is, disabling bus access at the BIF level
if that's possible somehow. Basically we would instantaneously kill all
GPU system memory access, signal all fences or just remove all fences
from all BO reservations (reservation_object_add_excl_fence(resv, NULL))
to allow memory to be freed, let the OOM killer do its thing, and when
the dust settles, reset the GPU.

Regards,
  Felix

>
> Regards,
> Christian.

[-- Attachment #1.1.1.2: Type: text/html, Size: 2629 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2019-07-13 20:24 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-28  7:44 [PATCH 00/27] KFD upstreaming Kuehling, Felix
     [not found] ` <20190428074331.30107-1-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2019-04-28  7:44   ` [PATCH 01/27] drm/amdkfd: Use 64 bit sdma_bitmap Kuehling, Felix
2019-04-28  7:44   ` [PATCH 02/27] drm/amdkfd: Add sdma allocation debug message Kuehling, Felix
2019-04-28  7:44   ` [PATCH 03/27] drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id Kuehling, Felix
2019-04-28  7:44   ` [PATCH 05/27] drm/amdkfd: Fix a potential memory leak Kuehling, Felix
2019-04-28  7:44   ` [PATCH 04/27] drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd Kuehling, Felix
2019-04-28  7:44   ` [PATCH 06/27] drm/amdkfd: Introduce asic-specific mqd_manager_init function Kuehling, Felix
2019-04-28  7:44   ` [PATCH 07/27] drm/amdkfd: Introduce DIQ type mqd manager Kuehling, Felix
2019-04-28  7:44   ` [PATCH 08/27] drm/amdkfd: Init mqd managers in device queue manager init Kuehling, Felix
2019-04-28  7:44   ` [PATCH 09/27] drm/amdkfd: Add mqd size in mqd manager struct Kuehling, Felix
2019-04-28  7:44   ` [PATCH 10/27] drm/amdkfd: Allocate MQD trunk for HIQ and SDMA Kuehling, Felix
2019-04-28  7:44   ` [PATCH 11/27] drm/amdkfd: Move non-sdma mqd allocation out of init_mqd Kuehling, Felix
2019-04-28  7:44   ` [PATCH 12/27] drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk Kuehling, Felix
2019-04-28  7:44   ` [PATCH 13/27] drm/amdkfd: Move sdma_queue_id calculation into allocate_sdma_queue() Kuehling, Felix
2019-04-28  7:44   ` [PATCH 14/27] drm/amdkfd: Fix compute profile switching Kuehling, Felix
2019-04-28  7:44   ` [PATCH 15/27] drm/amdkfd: Fix sdma queue map issue Kuehling, Felix
2019-04-28  7:44   ` [PATCH 16/27] drm/amdkfd: Introduce XGMI SDMA queue type Kuehling, Felix
2019-04-28  7:44   ` [PATCH 17/27] drm/amdkfd: Expose sdma engine numbers to topology Kuehling, Felix
2019-04-28  7:44   ` [PATCH 18/27] drm/amdkfd: Delete alloc_format field from map_queue struct Kuehling, Felix
2019-04-28  7:44   ` [PATCH 19/27] drm/amdkfd: Fix a circular lock dependency Kuehling, Felix
2019-04-28  7:44   ` [PATCH 20/27] drm/amdkfd: Fix gfx8 MEM_VIOL exception handler Kuehling, Felix
2019-04-28  7:44   ` [PATCH 21/27] drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL Kuehling, Felix
2019-04-28  7:44   ` [PATCH 22/27] drm/amdkfd: Fix gfx9 XNACK state save/restore Kuehling, Felix
2019-04-28  7:44   ` [PATCH 23/27] drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15] Kuehling, Felix
2019-04-28  7:44   ` [PATCH 24/27] drm/amdkfd: Add VegaM support Kuehling, Felix
2019-04-28  7:44   ` [PATCH 25/27] drm/amdkfd: Add domain number into gpu_id Kuehling, Felix
2019-04-28  7:44   ` [PATCH 26/27] drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration Kuehling, Felix
2019-04-28  7:44   ` [PATCH 27/27] drm/amdgpu: Fix GTT size calculation Kuehling, Felix
     [not found]     ` <20190428074331.30107-28-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2019-04-29 12:34       ` Christian König
     [not found]         ` <86fa9fc3-7a8f-9855-ae1d-5c7ccf2b5260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-29 23:16           ` Kuehling, Felix
     [not found]             ` <1b1ec993-1c4b-8661-9b3f-ac0ad8ae64c7-5C7GfCeVMHo@public.gmane.org>
2019-04-30  9:32               ` Christian König
     [not found]                 ` <134a4999-776f-44c6-99a2-42e8b9366a73-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-30 15:36                   ` Kuehling, Felix
     [not found]                     ` <9f882acd-c48f-3bbd-2d90-659c2edead39-5C7GfCeVMHo@public.gmane.org>
2019-04-30 17:03                       ` Koenig, Christian
     [not found]                         ` <f5c698ad-2aff-b3c5-2041-05a10983438a-5C7GfCeVMHo@public.gmane.org>
2019-04-30 17:25                           ` Kuehling, Felix
     [not found]                             ` <8ba952ab-4836-4ca3-cd80-99f7367a7979-5C7GfCeVMHo@public.gmane.org>
2019-05-02 13:06                               ` Koenig, Christian
2019-07-13 20:24                           ` Felix Kuehling
2019-04-29 23:23   ` [PATCH 00/27] KFD upstreaming Kuehling, Felix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.