amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
@ 2020-07-31  7:51 Monk Liu
  2020-07-31  7:51 ` [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) Monk Liu
  2020-07-31 13:57 ` [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Felix Kuehling
  0 siblings, 2 replies; 8+ messages in thread
From: Monk Liu @ 2020-07-31  7:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Monk Liu

KIQ will hang if we try below steps:
modprobe amdgpu
rmmod amdgpu
modprobe amdgpu sched_hw_submission=4

the cause is that due to KIQ is always living there even
after we unload KMD thus when doing the realod of KMD
KIQ will crash upon its register programed with different
values with the previous configuration (the config
like HQD addr, ring size, is easily changed if we alter
the sched_hw_submission)

the fix is we must inactive KIQ first before touching any
of its registgers

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index db9f1e8..f571e25 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct amdgpu_ring *ring)
 	struct v10_compute_mqd *mqd = ring->mqd_ptr;
 	int j;
 
+	/* activate the queue */
+	WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
+
 	/* disable wptr polling */
 	WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)
  2020-07-31  7:51 [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Monk Liu
@ 2020-07-31  7:51 ` Monk Liu
  2020-07-31 13:53   ` Felix Kuehling
  2020-07-31 13:57 ` [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Felix Kuehling
  1 sibling, 1 reply; 8+ messages in thread
From: Monk Liu @ 2020-07-31  7:51 UTC (permalink / raw)
  To: amd-gfx; +Cc: Monk Liu

what:
the MQD's save and restore of KCQ (kernel compute queue)
cost lots of clocks during world switch which impacts a lot
to multi-VF performance

how:
introduce a paramter to control the number of KCQ to avoid
performance drop if there is no kernel compute queue needed

notes:
this paramter only affects gfx 8/9/10

v2:
refine namings

v3:
choose queues for each ring to that try best to cross pipes evenly.

v4:
fix indentation
some cleanupsin the gfx_compute_queue_acquire()

v5:
further fix on indentations
more cleanupsin gfx_compute_queue_acquire()

TODO:
in the future we will let hypervisor driver to set this paramter
automatically thus no need for user to configure it through
modprobe in virtual machine

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 49 ++++++++++++------------------
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c     | 30 +++++++++---------
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 29 +++++++++---------
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 31 ++++++++++---------
 7 files changed, 76 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e97c088..de11136 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -201,6 +201,7 @@ extern int amdgpu_si_support;
 #ifdef CONFIG_DRM_AMDGPU_CIK
 extern int amdgpu_cik_support;
 #endif
+extern int amdgpu_num_kcq;
 
 #define AMDGPU_VM_MAX_NUM_CTX			4096
 #define AMDGPU_SG_THRESHOLD			(256*1024*1024)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 62ecac9..cf445bab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1199,6 +1199,11 @@ static int amdgpu_device_check_arguments(struct amdgpu_device *adev)
 
 	amdgpu_gmc_tmz_set(adev);
 
+	if (amdgpu_num_kcq > 8 || amdgpu_num_kcq < 0) {
+		amdgpu_num_kcq = 8;
+		dev_warn(adev->dev, "set kernel compute queue number to 8 due to invalid paramter provided by user\n");
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6291f5f..b545c40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -150,6 +150,7 @@ int amdgpu_noretry;
 int amdgpu_force_asic_type = -1;
 int amdgpu_tmz = 0;
 int amdgpu_reset_method = -1; /* auto */
+int amdgpu_num_kcq = -1;
 
 struct amdgpu_mgpu_info mgpu_info = {
 	.mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
@@ -765,6 +766,9 @@ module_param_named(tmz, amdgpu_tmz, int, 0444);
 MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco)");
 module_param_named(reset_method, amdgpu_reset_method, int, 0444);
 
+MODULE_PARM_DESC(num_kcq, "number of kernel compute queue user want to setup (8 if set to greater than 8 or less than 0, only affect gfx 8+)");
+module_param_named(num_kcq, amdgpu_num_kcq, int, 0444);
+
 static const struct pci_device_id pciidlist[] = {
 #ifdef  CONFIG_DRM_AMDGPU_SI
 	{0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 8eff017..0cd9de6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -202,40 +202,29 @@ bool amdgpu_gfx_is_high_priority_compute_queue(struct amdgpu_device *adev,
 
 void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
 {
-	int i, queue, pipe, mec;
+	int i, queue, pipe;
 	bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
-
-	/* policy for amdgpu compute queue ownership */
-	for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
-		queue = i % adev->gfx.mec.num_queue_per_pipe;
-		pipe = (i / adev->gfx.mec.num_queue_per_pipe)
-			% adev->gfx.mec.num_pipe_per_mec;
-		mec = (i / adev->gfx.mec.num_queue_per_pipe)
-			/ adev->gfx.mec.num_pipe_per_mec;
-
-		/* we've run out of HW */
-		if (mec >= adev->gfx.mec.num_mec)
-			break;
-
-		if (multipipe_policy) {
-			/* policy: amdgpu owns the first two queues of the first MEC */
-			if (mec == 0 && queue < 2)
-				set_bit(i, adev->gfx.mec.queue_bitmap);
-		} else {
-			/* policy: amdgpu owns all queues in the first pipe */
-			if (mec == 0 && pipe == 0)
-				set_bit(i, adev->gfx.mec.queue_bitmap);
+	int max_queues_per_mec = min(adev->gfx.mec.num_pipe_per_mec *
+				     adev->gfx.mec.num_queue_per_pipe,
+				     adev->gfx.num_compute_rings);
+
+	if (multipipe_policy) {
+		/* policy: make queues evenly cross all pipes on MEC1 only */
+		for (i = 0; i < max_queues_per_mec; i++) {
+			pipe = i % adev->gfx.mec.num_pipe_per_mec;
+			queue = (i / adev->gfx.mec.num_pipe_per_mec) %
+				adev->gfx.mec.num_queue_per_pipe;
+
+			set_bit(pipe * adev->gfx.mec.num_queue_per_pipe + queue,
+					adev->gfx.mec.queue_bitmap);
 		}
+	} else {
+		/* policy: amdgpu owns all queues in the given pipe */
+		for (i = 0; i < max_queues_per_mec; ++i)
+			set_bit(i, adev->gfx.mec.queue_bitmap);
 	}
 
-	/* update the number of active compute rings */
-	adev->gfx.num_compute_rings =
-		bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
-
-	/* If you hit this case and edited the policy, you probably just
-	 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
-	if (WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS))
-		adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
+	dev_dbg(adev->dev, "mec queue bitmap weight=%d\n", bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES));
 }
 
 void amdgpu_gfx_graphics_queue_acquire(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index f571e25..4172bc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4022,21 +4022,23 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev)
 	amdgpu_gfx_compute_queue_acquire(adev);
 	mec_hpd_size = adev->gfx.num_compute_rings * GFX10_MEC_HPD_SIZE;
 
-	r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
-				      AMDGPU_GEM_DOMAIN_GTT,
-				      &adev->gfx.mec.hpd_eop_obj,
-				      &adev->gfx.mec.hpd_eop_gpu_addr,
-				      (void **)&hpd);
-	if (r) {
-		dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
-		gfx_v10_0_mec_fini(adev);
-		return r;
-	}
+	if (mec_hpd_size) {
+		r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
+					      AMDGPU_GEM_DOMAIN_GTT,
+					      &adev->gfx.mec.hpd_eop_obj,
+					      &adev->gfx.mec.hpd_eop_gpu_addr,
+					      (void **)&hpd);
+		if (r) {
+			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
+			gfx_v10_0_mec_fini(adev);
+			return r;
+		}
 
-	memset(hpd, 0, mec_hpd_size);
+		memset(hpd, 0, mec_hpd_size);
 
-	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
-	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
+		amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
+		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
+	}
 
 	if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
 		mec_hdr = (const struct gfx_firmware_header_v1_0 *)adev->gfx.mec_fw->data;
@@ -7162,7 +7164,7 @@ static int gfx_v10_0_early_init(void *handle)
 		break;
 	}
 
-	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
+	adev->gfx.num_compute_rings = amdgpu_num_kcq;
 
 	gfx_v10_0_set_kiq_pm4_funcs(adev);
 	gfx_v10_0_set_ring_funcs(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 8d72089..7df567a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1343,21 +1343,22 @@ static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 	amdgpu_gfx_compute_queue_acquire(adev);
 
 	mec_hpd_size = adev->gfx.num_compute_rings * GFX8_MEC_HPD_SIZE;
+	if (mec_hpd_size) {
+		r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
+					      AMDGPU_GEM_DOMAIN_VRAM,
+					      &adev->gfx.mec.hpd_eop_obj,
+					      &adev->gfx.mec.hpd_eop_gpu_addr,
+					      (void **)&hpd);
+		if (r) {
+			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
+			return r;
+		}
 
-	r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
-				      AMDGPU_GEM_DOMAIN_VRAM,
-				      &adev->gfx.mec.hpd_eop_obj,
-				      &adev->gfx.mec.hpd_eop_gpu_addr,
-				      (void **)&hpd);
-	if (r) {
-		dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
-		return r;
-	}
-
-	memset(hpd, 0, mec_hpd_size);
+		memset(hpd, 0, mec_hpd_size);
 
-	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
-	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
+		amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
+		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
+	}
 
 	return 0;
 }
@@ -5294,7 +5295,7 @@ static int gfx_v8_0_early_init(void *handle)
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	adev->gfx.num_gfx_rings = GFX8_NUM_GFX_RINGS;
-	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
+	adev->gfx.num_compute_rings = amdgpu_num_kcq;
 	adev->gfx.funcs = &gfx_v8_0_gfx_funcs;
 	gfx_v8_0_set_ring_funcs(adev);
 	gfx_v8_0_set_irq_funcs(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index e4e751f..ef07e59 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1938,22 +1938,23 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev)
 	/* take ownership of the relevant compute queues */
 	amdgpu_gfx_compute_queue_acquire(adev);
 	mec_hpd_size = adev->gfx.num_compute_rings * GFX9_MEC_HPD_SIZE;
+	if (mec_hpd_size) {
+		r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
+					      AMDGPU_GEM_DOMAIN_VRAM,
+					      &adev->gfx.mec.hpd_eop_obj,
+					      &adev->gfx.mec.hpd_eop_gpu_addr,
+					      (void **)&hpd);
+		if (r) {
+			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
+			gfx_v9_0_mec_fini(adev);
+			return r;
+		}
 
-	r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
-				      AMDGPU_GEM_DOMAIN_VRAM,
-				      &adev->gfx.mec.hpd_eop_obj,
-				      &adev->gfx.mec.hpd_eop_gpu_addr,
-				      (void **)&hpd);
-	if (r) {
-		dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
-		gfx_v9_0_mec_fini(adev);
-		return r;
-	}
-
-	memset(hpd, 0, mec_hpd_size);
+		memset(hpd, 0, mec_hpd_size);
 
-	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
-	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
+		amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
+		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
+	}
 
 	mec_hdr = (const struct gfx_firmware_header_v1_0 *)adev->gfx.mec_fw->data;
 
@@ -4625,7 +4626,7 @@ static int gfx_v9_0_early_init(void *handle)
 		adev->gfx.num_gfx_rings = 0;
 	else
 		adev->gfx.num_gfx_rings = GFX9_NUM_GFX_RINGS;
-	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
+	adev->gfx.num_compute_rings = amdgpu_num_kcq;
 	gfx_v9_0_set_kiq_pm4_funcs(adev);
 	gfx_v9_0_set_ring_funcs(adev);
 	gfx_v9_0_set_irq_funcs(adev);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)
  2020-07-31  7:51 ` [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) Monk Liu
@ 2020-07-31 13:53   ` Felix Kuehling
  2020-10-15 15:05     ` Marek Olšák
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2020-07-31 13:53 UTC (permalink / raw)
  To: Monk Liu, amd-gfx

Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> what:
> the MQD's save and restore of KCQ (kernel compute queue)
> cost lots of clocks during world switch which impacts a lot
> to multi-VF performance
>
> how:
> introduce a paramter to control the number of KCQ to avoid
> performance drop if there is no kernel compute queue needed
>
> notes:
> this paramter only affects gfx 8/9/10
>
> v2:
> refine namings
>
> v3:
> choose queues for each ring to that try best to cross pipes evenly.
>
> v4:
> fix indentation
> some cleanupsin the gfx_compute_queue_acquire()
>
> v5:
> further fix on indentations
> more cleanupsin gfx_compute_queue_acquire()
>
> TODO:
> in the future we will let hypervisor driver to set this paramter
> automatically thus no need for user to configure it through
> modprobe in virtual machine
>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

This patch is Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 49 ++++++++++++------------------
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c     | 30 +++++++++---------
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 29 +++++++++---------
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 31 ++++++++++---------
>  7 files changed, 76 insertions(+), 73 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index e97c088..de11136 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -201,6 +201,7 @@ extern int amdgpu_si_support;
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>  extern int amdgpu_cik_support;
>  #endif
> +extern int amdgpu_num_kcq;
>  
>  #define AMDGPU_VM_MAX_NUM_CTX			4096
>  #define AMDGPU_SG_THRESHOLD			(256*1024*1024)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 62ecac9..cf445bab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1199,6 +1199,11 @@ static int amdgpu_device_check_arguments(struct amdgpu_device *adev)
>  
>  	amdgpu_gmc_tmz_set(adev);
>  
> +	if (amdgpu_num_kcq > 8 || amdgpu_num_kcq < 0) {
> +		amdgpu_num_kcq = 8;
> +		dev_warn(adev->dev, "set kernel compute queue number to 8 due to invalid paramter provided by user\n");
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6291f5f..b545c40 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -150,6 +150,7 @@ int amdgpu_noretry;
>  int amdgpu_force_asic_type = -1;
>  int amdgpu_tmz = 0;
>  int amdgpu_reset_method = -1; /* auto */
> +int amdgpu_num_kcq = -1;
>  
>  struct amdgpu_mgpu_info mgpu_info = {
>  	.mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
> @@ -765,6 +766,9 @@ module_param_named(tmz, amdgpu_tmz, int, 0444);
>  MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco)");
>  module_param_named(reset_method, amdgpu_reset_method, int, 0444);
>  
> +MODULE_PARM_DESC(num_kcq, "number of kernel compute queue user want to setup (8 if set to greater than 8 or less than 0, only affect gfx 8+)");
> +module_param_named(num_kcq, amdgpu_num_kcq, int, 0444);
> +
>  static const struct pci_device_id pciidlist[] = {
>  #ifdef  CONFIG_DRM_AMDGPU_SI
>  	{0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 8eff017..0cd9de6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -202,40 +202,29 @@ bool amdgpu_gfx_is_high_priority_compute_queue(struct amdgpu_device *adev,
>  
>  void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>  {
> -	int i, queue, pipe, mec;
> +	int i, queue, pipe;
>  	bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
> -
> -	/* policy for amdgpu compute queue ownership */
> -	for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
> -		queue = i % adev->gfx.mec.num_queue_per_pipe;
> -		pipe = (i / adev->gfx.mec.num_queue_per_pipe)
> -			% adev->gfx.mec.num_pipe_per_mec;
> -		mec = (i / adev->gfx.mec.num_queue_per_pipe)
> -			/ adev->gfx.mec.num_pipe_per_mec;
> -
> -		/* we've run out of HW */
> -		if (mec >= adev->gfx.mec.num_mec)
> -			break;
> -
> -		if (multipipe_policy) {
> -			/* policy: amdgpu owns the first two queues of the first MEC */
> -			if (mec == 0 && queue < 2)
> -				set_bit(i, adev->gfx.mec.queue_bitmap);
> -		} else {
> -			/* policy: amdgpu owns all queues in the first pipe */
> -			if (mec == 0 && pipe == 0)
> -				set_bit(i, adev->gfx.mec.queue_bitmap);
> +	int max_queues_per_mec = min(adev->gfx.mec.num_pipe_per_mec *
> +				     adev->gfx.mec.num_queue_per_pipe,
> +				     adev->gfx.num_compute_rings);
> +
> +	if (multipipe_policy) {
> +		/* policy: make queues evenly cross all pipes on MEC1 only */
> +		for (i = 0; i < max_queues_per_mec; i++) {
> +			pipe = i % adev->gfx.mec.num_pipe_per_mec;
> +			queue = (i / adev->gfx.mec.num_pipe_per_mec) %
> +				adev->gfx.mec.num_queue_per_pipe;
> +
> +			set_bit(pipe * adev->gfx.mec.num_queue_per_pipe + queue,
> +					adev->gfx.mec.queue_bitmap);
>  		}
> +	} else {
> +		/* policy: amdgpu owns all queues in the given pipe */
> +		for (i = 0; i < max_queues_per_mec; ++i)
> +			set_bit(i, adev->gfx.mec.queue_bitmap);
>  	}
>  
> -	/* update the number of active compute rings */
> -	adev->gfx.num_compute_rings =
> -		bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
> -
> -	/* If you hit this case and edited the policy, you probably just
> -	 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
> -	if (WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS))
> -		adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> +	dev_dbg(adev->dev, "mec queue bitmap weight=%d\n", bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES));
>  }
>  
>  void amdgpu_gfx_graphics_queue_acquire(struct amdgpu_device *adev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index f571e25..4172bc8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -4022,21 +4022,23 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev)
>  	amdgpu_gfx_compute_queue_acquire(adev);
>  	mec_hpd_size = adev->gfx.num_compute_rings * GFX10_MEC_HPD_SIZE;
>  
> -	r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> -				      AMDGPU_GEM_DOMAIN_GTT,
> -				      &adev->gfx.mec.hpd_eop_obj,
> -				      &adev->gfx.mec.hpd_eop_gpu_addr,
> -				      (void **)&hpd);
> -	if (r) {
> -		dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> -		gfx_v10_0_mec_fini(adev);
> -		return r;
> -	}
> +	if (mec_hpd_size) {
> +		r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> +					      AMDGPU_GEM_DOMAIN_GTT,
> +					      &adev->gfx.mec.hpd_eop_obj,
> +					      &adev->gfx.mec.hpd_eop_gpu_addr,
> +					      (void **)&hpd);
> +		if (r) {
> +			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> +			gfx_v10_0_mec_fini(adev);
> +			return r;
> +		}
>  
> -	memset(hpd, 0, mec_hpd_size);
> +		memset(hpd, 0, mec_hpd_size);
>  
> -	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> -	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> +		amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> +		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> +	}
>  
>  	if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
>  		mec_hdr = (const struct gfx_firmware_header_v1_0 *)adev->gfx.mec_fw->data;
> @@ -7162,7 +7164,7 @@ static int gfx_v10_0_early_init(void *handle)
>  		break;
>  	}
>  
> -	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> +	adev->gfx.num_compute_rings = amdgpu_num_kcq;
>  
>  	gfx_v10_0_set_kiq_pm4_funcs(adev);
>  	gfx_v10_0_set_ring_funcs(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 8d72089..7df567a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -1343,21 +1343,22 @@ static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
>  	amdgpu_gfx_compute_queue_acquire(adev);
>  
>  	mec_hpd_size = adev->gfx.num_compute_rings * GFX8_MEC_HPD_SIZE;
> +	if (mec_hpd_size) {
> +		r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> +					      AMDGPU_GEM_DOMAIN_VRAM,
> +					      &adev->gfx.mec.hpd_eop_obj,
> +					      &adev->gfx.mec.hpd_eop_gpu_addr,
> +					      (void **)&hpd);
> +		if (r) {
> +			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> +			return r;
> +		}
>  
> -	r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> -				      AMDGPU_GEM_DOMAIN_VRAM,
> -				      &adev->gfx.mec.hpd_eop_obj,
> -				      &adev->gfx.mec.hpd_eop_gpu_addr,
> -				      (void **)&hpd);
> -	if (r) {
> -		dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> -		return r;
> -	}
> -
> -	memset(hpd, 0, mec_hpd_size);
> +		memset(hpd, 0, mec_hpd_size);
>  
> -	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> -	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> +		amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> +		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> +	}
>  
>  	return 0;
>  }
> @@ -5294,7 +5295,7 @@ static int gfx_v8_0_early_init(void *handle)
>  	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>  
>  	adev->gfx.num_gfx_rings = GFX8_NUM_GFX_RINGS;
> -	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> +	adev->gfx.num_compute_rings = amdgpu_num_kcq;
>  	adev->gfx.funcs = &gfx_v8_0_gfx_funcs;
>  	gfx_v8_0_set_ring_funcs(adev);
>  	gfx_v8_0_set_irq_funcs(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index e4e751f..ef07e59 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -1938,22 +1938,23 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev)
>  	/* take ownership of the relevant compute queues */
>  	amdgpu_gfx_compute_queue_acquire(adev);
>  	mec_hpd_size = adev->gfx.num_compute_rings * GFX9_MEC_HPD_SIZE;
> +	if (mec_hpd_size) {
> +		r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> +					      AMDGPU_GEM_DOMAIN_VRAM,
> +					      &adev->gfx.mec.hpd_eop_obj,
> +					      &adev->gfx.mec.hpd_eop_gpu_addr,
> +					      (void **)&hpd);
> +		if (r) {
> +			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> +			gfx_v9_0_mec_fini(adev);
> +			return r;
> +		}
>  
> -	r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> -				      AMDGPU_GEM_DOMAIN_VRAM,
> -				      &adev->gfx.mec.hpd_eop_obj,
> -				      &adev->gfx.mec.hpd_eop_gpu_addr,
> -				      (void **)&hpd);
> -	if (r) {
> -		dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> -		gfx_v9_0_mec_fini(adev);
> -		return r;
> -	}
> -
> -	memset(hpd, 0, mec_hpd_size);
> +		memset(hpd, 0, mec_hpd_size);
>  
> -	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> -	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> +		amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> +		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> +	}
>  
>  	mec_hdr = (const struct gfx_firmware_header_v1_0 *)adev->gfx.mec_fw->data;
>  
> @@ -4625,7 +4626,7 @@ static int gfx_v9_0_early_init(void *handle)
>  		adev->gfx.num_gfx_rings = 0;
>  	else
>  		adev->gfx.num_gfx_rings = GFX9_NUM_GFX_RINGS;
> -	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> +	adev->gfx.num_compute_rings = amdgpu_num_kcq;
>  	gfx_v9_0_set_kiq_pm4_funcs(adev);
>  	gfx_v9_0_set_ring_funcs(adev);
>  	gfx_v9_0_set_irq_funcs(adev);
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
  2020-07-31  7:51 [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Monk Liu
  2020-07-31  7:51 ` [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) Monk Liu
@ 2020-07-31 13:57 ` Felix Kuehling
  2020-08-03  1:54   ` Liu, Monk
  1 sibling, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2020-07-31 13:57 UTC (permalink / raw)
  To: Monk Liu, amd-gfx

In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right
place to stop the KIQ? Otherwise KIQ will hang as soon as someone
allocates the memory that was previously used for the KIQ ring buffer
and overwrites it with something that's not a valid PM4 packet.

Regards,
  Felix

Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> KIQ will hang if we try below steps:
> modprobe amdgpu
> rmmod amdgpu
> modprobe amdgpu sched_hw_submission=4
>
> the cause is that due to KIQ is always living there even
> after we unload KMD thus when doing the realod of KMD
> KIQ will crash upon its register programed with different
> values with the previous configuration (the config
> like HQD addr, ring size, is easily changed if we alter
> the sched_hw_submission)
>
> the fix is we must inactive KIQ first before touching any
> of its registgers
>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index db9f1e8..f571e25 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct amdgpu_ring *ring)
>  	struct v10_compute_mqd *mqd = ring->mqd_ptr;
>  	int j;
>  
> +	/* activate the queue */
> +	WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
> +
>  	/* disable wptr polling */
>  	WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>  
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
  2020-07-31 13:57 ` [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Felix Kuehling
@ 2020-08-03  1:54   ` Liu, Monk
  2020-08-04  6:31     ` Liu, Monk
  0 siblings, 1 reply; 8+ messages in thread
From: Liu, Monk @ 2020-08-03  1:54 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place to stop the KIQ

KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV ) for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch

But this is really a weird bug because even with the same approach it doesn't make KIQ (CP) hang on GFX9, only GFX10 need this patch ....

By now I do not have other good explanation or better fix than this one

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Friday, July 31, 2020 9:57 PM
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the memory that was previously used for the KIQ ring buffer and overwrites it with something that's not a valid PM4 packet.

Regards,
  Felix

Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> KIQ will hang if we try below steps:
> modprobe amdgpu
> rmmod amdgpu
> modprobe amdgpu sched_hw_submission=4
>
> the cause is that due to KIQ is always living there even after we
> unload KMD thus when doing the realod of KMD KIQ will crash upon its
> register programed with different values with the previous
> configuration (the config like HQD addr, ring size, is easily changed
> if we alter the sched_hw_submission)
>
> the fix is we must inactive KIQ first before touching any of its
> registgers
>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index db9f1e8..f571e25 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct amdgpu_ring *ring)
>  struct v10_compute_mqd *mqd = ring->mqd_ptr;
>  int j;
>
> +/* activate the queue */
> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
> +
>  /* disable wptr polling */
>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
  2020-08-03  1:54   ` Liu, Monk
@ 2020-08-04  6:31     ` Liu, Monk
  2020-08-04  8:01       ` Deng, Emily
  0 siblings, 1 reply; 8+ messages in thread
From: Liu, Monk @ 2020-08-04  6:31 UTC (permalink / raw)
  To: amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Ping ... this is a severe bug fix

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Liu, Monk
Sent: Monday, August 3, 2020 9:55 AM
To: Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

[AMD Official Use Only - Internal Distribution Only]

[AMD Official Use Only - Internal Distribution Only]

>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the
>>right place to stop the KIQ

KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV ) for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch

But this is really a weird bug because even with the same approach it doesn't make KIQ (CP) hang on GFX9, only GFX10 need this patch ....

By now I do not have other good explanation or better fix than this one

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Friday, July 31, 2020 9:57 PM
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the memory that was previously used for the KIQ ring buffer and overwrites it with something that's not a valid PM4 packet.

Regards,
  Felix

Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> KIQ will hang if we try below steps:
> modprobe amdgpu
> rmmod amdgpu
> modprobe amdgpu sched_hw_submission=4
>
> the cause is that due to KIQ is always living there even after we
> unload KMD thus when doing the realod of KMD KIQ will crash upon its
> register programed with different values with the previous
> configuration (the config like HQD addr, ring size, is easily changed
> if we alter the sched_hw_submission)
>
> the fix is we must inactive KIQ first before touching any of its
> registgers
>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index db9f1e8..f571e25 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct
> amdgpu_ring *ring)  struct v10_compute_mqd *mqd = ring->mqd_ptr;  int
> j;
>
> +/* activate the queue */
> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
> +
>  /* disable wptr polling */
>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C4837e2d566b44af845f608d837503a3b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637320165018899834&amp;sdata=TED%2BkhlYyAIyTmLJAZBBBHHnE6PRg4fpUsZhD9ke%2BPU%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
  2020-08-04  6:31     ` Liu, Monk
@ 2020-08-04  8:01       ` Deng, Emily
  0 siblings, 0 replies; 8+ messages in thread
From: Deng, Emily @ 2020-08-04  8:01 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Liu,
>Monk
>Sent: Tuesday, August 4, 2020 2:31 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping ... this is a severe bug fix
>
>_____________________________________
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Liu,
>Monk
>Sent: Monday, August 3, 2020 9:55 AM
>To: Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>[AMD Official Use Only - Internal Distribution Only]
>
>[AMD Official Use Only - Internal Distribution Only]
>
>>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the
>>>right place to stop the KIQ
>
>KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV )
>for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch
>
>But this is really a weird bug because even with the same approach it doesn't
>make KIQ (CP) hang on GFX9, only GFX10 need this patch ....
>
>By now I do not have other good explanation or better fix than this one
>
>_____________________________________
>Monk Liu|GPU Virtualization Team |AMD
>
>
>-----Original Message-----
>From: Kuehling, Felix <Felix.Kuehling@amd.com>
>Sent: Friday, July 31, 2020 9:57 PM
>To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
>
>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place
>to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the
>memory that was previously used for the KIQ ring buffer and overwrites it with
>something that's not a valid PM4 packet.
>
>Regards,
>  Felix
>
>Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
>> KIQ will hang if we try below steps:
>> modprobe amdgpu
>> rmmod amdgpu
>> modprobe amdgpu sched_hw_submission=4
>>
>> the cause is that due to KIQ is always living there even after we
>> unload KMD thus when doing the realod of KMD KIQ will crash upon its
>> register programed with different values with the previous
>> configuration (the config like HQD addr, ring size, is easily changed
>> if we alter the sched_hw_submission)
>>
>> the fix is we must inactive KIQ first before touching any of its
>> registgers
>>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index db9f1e8..f571e25 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct
>> amdgpu_ring *ring)  struct v10_compute_mqd *mqd = ring->mqd_ptr;  int
>> j;
>>
>> +/* activate the queue */
>> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
>> +
Could we move follow to here?
if (RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1) {
WREG32_SOC15(GC, 0, mmCP_HQD_DEQUEUE_REQUEST, 1);
for (j = 0; j < adev->usec_timeout; j++) {
if (!(RREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE) & 1))
break;
udelay(1);
}
>>  /* disable wptr polling */
>>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&amp;data=02%7C01%7CEmily.Deng%40amd.com%7C1236f42617d246b20
>bc108d8384007e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637321194957236933&amp;sdata=0%2BzHvJ1n4TZOYss4v1pR6i8bxq46JE73
>UIi%2B49x9joU%3D&amp;reserved=0
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre
>edesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&amp;data=02%7C01%7CEmily.Deng%40amd.com%7C1236f42617d246b20
>bc108d8384007e4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637321194957236933&amp;sdata=0%2BzHvJ1n4TZOYss4v1pR6i8bxq46JE73
>UIi%2B49x9joU%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)
  2020-07-31 13:53   ` Felix Kuehling
@ 2020-10-15 15:05     ` Marek Olšák
  0 siblings, 0 replies; 8+ messages in thread
From: Marek Olšák @ 2020-10-15 15:05 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx mailing list, Monk Liu


[-- Attachment #1.1: Type: text/plain, Size: 14905 bytes --]

Monk or perhaps Felix,

Do you by any chance know why the CS ioctl returns -EINVAL for all compute
submissions if num_kcq <= 4 and what could cause that?

If not, is there any way to disable mid-IB preemption for compute?

Thanks,
Marek

On Fri, Jul 31, 2020 at 9:53 AM Felix Kuehling <felix.kuehling@amd.com>
wrote:

> Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> > what:
> > the MQD's save and restore of KCQ (kernel compute queue)
> > cost lots of clocks during world switch which impacts a lot
> > to multi-VF performance
> >
> > how:
> > introduce a paramter to control the number of KCQ to avoid
> > performance drop if there is no kernel compute queue needed
> >
> > notes:
> > this paramter only affects gfx 8/9/10
> >
> > v2:
> > refine namings
> >
> > v3:
> > choose queues for each ring to that try best to cross pipes evenly.
> >
> > v4:
> > fix indentation
> > some cleanupsin the gfx_compute_queue_acquire()
> >
> > v5:
> > further fix on indentations
> > more cleanupsin gfx_compute_queue_acquire()
> >
> > TODO:
> > in the future we will let hypervisor driver to set this paramter
> > automatically thus no need for user to configure it through
> > modprobe in virtual machine
> >
> > Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>
> This patch is Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 +++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 +++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 49
> ++++++++++++------------------
> >  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c     | 30 +++++++++---------
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 29 +++++++++---------
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 31 ++++++++++---------
> >  7 files changed, 76 insertions(+), 73 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index e97c088..de11136 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -201,6 +201,7 @@ extern int amdgpu_si_support;
> >  #ifdef CONFIG_DRM_AMDGPU_CIK
> >  extern int amdgpu_cik_support;
> >  #endif
> > +extern int amdgpu_num_kcq;
> >
> >  #define AMDGPU_VM_MAX_NUM_CTX                        4096
> >  #define AMDGPU_SG_THRESHOLD                  (256*1024*1024)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 62ecac9..cf445bab 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -1199,6 +1199,11 @@ static int amdgpu_device_check_arguments(struct
> amdgpu_device *adev)
> >
> >       amdgpu_gmc_tmz_set(adev);
> >
> > +     if (amdgpu_num_kcq > 8 || amdgpu_num_kcq < 0) {
> > +             amdgpu_num_kcq = 8;
> > +             dev_warn(adev->dev, "set kernel compute queue number to 8
> due to invalid paramter provided by user\n");
> > +     }
> > +
> >       return 0;
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 6291f5f..b545c40 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -150,6 +150,7 @@ int amdgpu_noretry;
> >  int amdgpu_force_asic_type = -1;
> >  int amdgpu_tmz = 0;
> >  int amdgpu_reset_method = -1; /* auto */
> > +int amdgpu_num_kcq = -1;
> >
> >  struct amdgpu_mgpu_info mgpu_info = {
> >       .mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
> > @@ -765,6 +766,9 @@ module_param_named(tmz, amdgpu_tmz, int, 0444);
> >  MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default),
> 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco)");
> >  module_param_named(reset_method, amdgpu_reset_method, int, 0444);
> >
> > +MODULE_PARM_DESC(num_kcq, "number of kernel compute queue user want to
> setup (8 if set to greater than 8 or less than 0, only affect gfx 8+)");
> > +module_param_named(num_kcq, amdgpu_num_kcq, int, 0444);
> > +
> >  static const struct pci_device_id pciidlist[] = {
> >  #ifdef  CONFIG_DRM_AMDGPU_SI
> >       {0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index 8eff017..0cd9de6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -202,40 +202,29 @@ bool
> amdgpu_gfx_is_high_priority_compute_queue(struct amdgpu_device *adev,
> >
> >  void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
> >  {
> > -     int i, queue, pipe, mec;
> > +     int i, queue, pipe;
> >       bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
> > -
> > -     /* policy for amdgpu compute queue ownership */
> > -     for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
> > -             queue = i % adev->gfx.mec.num_queue_per_pipe;
> > -             pipe = (i / adev->gfx.mec.num_queue_per_pipe)
> > -                     % adev->gfx.mec.num_pipe_per_mec;
> > -             mec = (i / adev->gfx.mec.num_queue_per_pipe)
> > -                     / adev->gfx.mec.num_pipe_per_mec;
> > -
> > -             /* we've run out of HW */
> > -             if (mec >= adev->gfx.mec.num_mec)
> > -                     break;
> > -
> > -             if (multipipe_policy) {
> > -                     /* policy: amdgpu owns the first two queues of the
> first MEC */
> > -                     if (mec == 0 && queue < 2)
> > -                             set_bit(i, adev->gfx.mec.queue_bitmap);
> > -             } else {
> > -                     /* policy: amdgpu owns all queues in the first
> pipe */
> > -                     if (mec == 0 && pipe == 0)
> > -                             set_bit(i, adev->gfx.mec.queue_bitmap);
> > +     int max_queues_per_mec = min(adev->gfx.mec.num_pipe_per_mec *
> > +                                  adev->gfx.mec.num_queue_per_pipe,
> > +                                  adev->gfx.num_compute_rings);
> > +
> > +     if (multipipe_policy) {
> > +             /* policy: make queues evenly cross all pipes on MEC1 only
> */
> > +             for (i = 0; i < max_queues_per_mec; i++) {
> > +                     pipe = i % adev->gfx.mec.num_pipe_per_mec;
> > +                     queue = (i / adev->gfx.mec.num_pipe_per_mec) %
> > +                             adev->gfx.mec.num_queue_per_pipe;
> > +
> > +                     set_bit(pipe * adev->gfx.mec.num_queue_per_pipe +
> queue,
> > +                                     adev->gfx.mec.queue_bitmap);
> >               }
> > +     } else {
> > +             /* policy: amdgpu owns all queues in the given pipe */
> > +             for (i = 0; i < max_queues_per_mec; ++i)
> > +                     set_bit(i, adev->gfx.mec.queue_bitmap);
> >       }
> >
> > -     /* update the number of active compute rings */
> > -     adev->gfx.num_compute_rings =
> > -             bitmap_weight(adev->gfx.mec.queue_bitmap,
> AMDGPU_MAX_COMPUTE_QUEUES);
> > -
> > -     /* If you hit this case and edited the policy, you probably just
> > -      * need to increase AMDGPU_MAX_COMPUTE_RINGS */
> > -     if (WARN_ON(adev->gfx.num_compute_rings >
> AMDGPU_MAX_COMPUTE_RINGS))
> > -             adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> > +     dev_dbg(adev->dev, "mec queue bitmap weight=%d\n",
> bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES));
> >  }
> >
> >  void amdgpu_gfx_graphics_queue_acquire(struct amdgpu_device *adev)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > index f571e25..4172bc8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > @@ -4022,21 +4022,23 @@ static int gfx_v10_0_mec_init(struct
> amdgpu_device *adev)
> >       amdgpu_gfx_compute_queue_acquire(adev);
> >       mec_hpd_size = adev->gfx.num_compute_rings * GFX10_MEC_HPD_SIZE;
> >
> > -     r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> > -                                   AMDGPU_GEM_DOMAIN_GTT,
> > -                                   &adev->gfx.mec.hpd_eop_obj,
> > -                                   &adev->gfx.mec.hpd_eop_gpu_addr,
> > -                                   (void **)&hpd);
> > -     if (r) {
> > -             dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> > -             gfx_v10_0_mec_fini(adev);
> > -             return r;
> > -     }
> > +     if (mec_hpd_size) {
> > +             r = amdgpu_bo_create_reserved(adev, mec_hpd_size,
> PAGE_SIZE,
> > +                                           AMDGPU_GEM_DOMAIN_GTT,
> > +                                           &adev->gfx.mec.hpd_eop_obj,
> > +
>  &adev->gfx.mec.hpd_eop_gpu_addr,
> > +                                           (void **)&hpd);
> > +             if (r) {
> > +                     dev_warn(adev->dev, "(%d) create HDP EOP bo
> failed\n", r);
> > +                     gfx_v10_0_mec_fini(adev);
> > +                     return r;
> > +             }
> >
> > -     memset(hpd, 0, mec_hpd_size);
> > +             memset(hpd, 0, mec_hpd_size);
> >
> > -     amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> > -     amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> > +             amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> > +             amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> > +     }
> >
> >       if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) {
> >               mec_hdr = (const struct gfx_firmware_header_v1_0
> *)adev->gfx.mec_fw->data;
> > @@ -7162,7 +7164,7 @@ static int gfx_v10_0_early_init(void *handle)
> >               break;
> >       }
> >
> > -     adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> > +     adev->gfx.num_compute_rings = amdgpu_num_kcq;
> >
> >       gfx_v10_0_set_kiq_pm4_funcs(adev);
> >       gfx_v10_0_set_ring_funcs(adev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > index 8d72089..7df567a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > @@ -1343,21 +1343,22 @@ static int gfx_v8_0_mec_init(struct
> amdgpu_device *adev)
> >       amdgpu_gfx_compute_queue_acquire(adev);
> >
> >       mec_hpd_size = adev->gfx.num_compute_rings * GFX8_MEC_HPD_SIZE;
> > +     if (mec_hpd_size) {
> > +             r = amdgpu_bo_create_reserved(adev, mec_hpd_size,
> PAGE_SIZE,
> > +                                           AMDGPU_GEM_DOMAIN_VRAM,
> > +                                           &adev->gfx.mec.hpd_eop_obj,
> > +
>  &adev->gfx.mec.hpd_eop_gpu_addr,
> > +                                           (void **)&hpd);
> > +             if (r) {
> > +                     dev_warn(adev->dev, "(%d) create HDP EOP bo
> failed\n", r);
> > +                     return r;
> > +             }
> >
> > -     r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> > -                                   AMDGPU_GEM_DOMAIN_VRAM,
> > -                                   &adev->gfx.mec.hpd_eop_obj,
> > -                                   &adev->gfx.mec.hpd_eop_gpu_addr,
> > -                                   (void **)&hpd);
> > -     if (r) {
> > -             dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> > -             return r;
> > -     }
> > -
> > -     memset(hpd, 0, mec_hpd_size);
> > +             memset(hpd, 0, mec_hpd_size);
> >
> > -     amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> > -     amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> > +             amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> > +             amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> > +     }
> >
> >       return 0;
> >  }
> > @@ -5294,7 +5295,7 @@ static int gfx_v8_0_early_init(void *handle)
> >       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> >       adev->gfx.num_gfx_rings = GFX8_NUM_GFX_RINGS;
> > -     adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> > +     adev->gfx.num_compute_rings = amdgpu_num_kcq;
> >       adev->gfx.funcs = &gfx_v8_0_gfx_funcs;
> >       gfx_v8_0_set_ring_funcs(adev);
> >       gfx_v8_0_set_irq_funcs(adev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index e4e751f..ef07e59 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -1938,22 +1938,23 @@ static int gfx_v9_0_mec_init(struct
> amdgpu_device *adev)
> >       /* take ownership of the relevant compute queues */
> >       amdgpu_gfx_compute_queue_acquire(adev);
> >       mec_hpd_size = adev->gfx.num_compute_rings * GFX9_MEC_HPD_SIZE;
> > +     if (mec_hpd_size) {
> > +             r = amdgpu_bo_create_reserved(adev, mec_hpd_size,
> PAGE_SIZE,
> > +                                           AMDGPU_GEM_DOMAIN_VRAM,
> > +                                           &adev->gfx.mec.hpd_eop_obj,
> > +
>  &adev->gfx.mec.hpd_eop_gpu_addr,
> > +                                           (void **)&hpd);
> > +             if (r) {
> > +                     dev_warn(adev->dev, "(%d) create HDP EOP bo
> failed\n", r);
> > +                     gfx_v9_0_mec_fini(adev);
> > +                     return r;
> > +             }
> >
> > -     r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
> > -                                   AMDGPU_GEM_DOMAIN_VRAM,
> > -                                   &adev->gfx.mec.hpd_eop_obj,
> > -                                   &adev->gfx.mec.hpd_eop_gpu_addr,
> > -                                   (void **)&hpd);
> > -     if (r) {
> > -             dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
> > -             gfx_v9_0_mec_fini(adev);
> > -             return r;
> > -     }
> > -
> > -     memset(hpd, 0, mec_hpd_size);
> > +             memset(hpd, 0, mec_hpd_size);
> >
> > -     amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> > -     amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> > +             amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
> > +             amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
> > +     }
> >
> >       mec_hdr = (const struct gfx_firmware_header_v1_0
> *)adev->gfx.mec_fw->data;
> >
> > @@ -4625,7 +4626,7 @@ static int gfx_v9_0_early_init(void *handle)
> >               adev->gfx.num_gfx_rings = 0;
> >       else
> >               adev->gfx.num_gfx_rings = GFX9_NUM_GFX_RINGS;
> > -     adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
> > +     adev->gfx.num_compute_rings = amdgpu_num_kcq;
> >       gfx_v9_0_set_kiq_pm4_funcs(adev);
> >       gfx_v9_0_set_ring_funcs(adev);
> >       gfx_v9_0_set_irq_funcs(adev);
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 19447 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-15 15:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31  7:51 [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Monk Liu
2020-07-31  7:51 ` [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) Monk Liu
2020-07-31 13:53   ` Felix Kuehling
2020-10-15 15:05     ` Marek Olšák
2020-07-31 13:57 ` [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Felix Kuehling
2020-08-03  1:54   ` Liu, Monk
2020-08-04  6:31     ` Liu, Monk
2020-08-04  8:01       ` Deng, Emily

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).