All of lore.kernel.org
 help / color / mirror / Atom feed
* Add support for high priority scheduling in amdgpu
@ 2017-02-28 22:14 Andres Rodriguez
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This patch series introduces a mechanism that allows users with sufficient
privileges to categorize their work as "high priority". A userspace app can
create a high priority amdgpu context, where any work submitted to this context
will receive preferential treatment over any other work.

High priority contexts will be scheduled ahead of other contexts by the sw gpu
scheduler. This functionality is generic for all HW blocks.

Optionally, a ring can implement a set_priority() function that allows
programming HW specific features to elevate a ring's priority.

This patch series implements set_priority() for gfx8 compute rings. It takes
advantage of SPI scheduling and CU reservation to provide improved frame
latencies for high priority contexts.

For compute + compute scenarios we get near perfect scheduling latency. E.g.
one high priority ComputeParticles + one low priority ComputeParticles:
    - High priority ComputeParticles: 2.0-2.6 ms/frame
    - Regular ComputeParticles: 35.2-68.5 ms/frame

For compute + gfx scenarios the high priority compute application does
experience some latency variance. However, the variance has smaller bounds and
a smalled deviation then without high priority scheduling.

Following is a graph of the frame time experienced by a high priority compute
app in 4 different scenarios to exemplify the compute + gfx latency variance:
    - ComputeParticles: this scenario invloves running the compute particles
      sample on its own.
    - +SSAO: Previous scenario with the addition of running the ssao sample
      application that clogs the GFX ring with constant work.
    - +SPI Priority: Previous scenario with the addition of SPI priority
      programming for compute rings.
    - +CU Reserve: Previous scenario with the addition of dynamic CU
      reservation for compute rings.

Graph link:
https://plot.ly/~lostgoat/9/

As seen above, high priority contexts for compute allow us to schedule work
with enhanced confidence of completion latency under high GPU loads. This
property will be important for VR reprojection workloads.

Note: The first part of this series is a resend of "Change queue/pipe split
between amdkfd and amdgpu" with the following changes:
    - Fixed kfdtest on Kaveri due to shift overflow. Refer to: "drm/amdkfdallow
      split HQD on per-queue granularity v3"
    - Used Felix's suggestions for a simplified HQD programming sequence
    - Added a workaround for a Tonga HW bug during HQD programming

This series is also available at:
https://github.com/lostgoat/linux/tree/wip-high-priority

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/22] drm/amdgpu: refactor MQD/HQD initialization
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 02/22] drm/amdgpu: doorbell registers need only be set once v2 Andres Rodriguez
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

The MQD programming sequence currently exists in 3 different places.
Refactor it to absorb all the duplicates.

The success path remains mostly identical except for a slightly
different order in the non-kiq case. This shouldn't matter if the HQD
is disabled.

The error handling paths have been updated to deal with the new code
structure.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 447 ++++++++++++++++++----------------
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 417 +++++++++++--------------------
 2 files changed, 387 insertions(+), 477 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 1f93545..8e1e601 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -42,20 +42,22 @@
 #include "gca/gfx_7_2_sh_mask.h"
 
 #include "gmc/gmc_7_0_d.h"
 #include "gmc/gmc_7_0_sh_mask.h"
 
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 
 #define GFX7_NUM_GFX_RINGS     1
 #define GFX7_NUM_COMPUTE_RINGS 8
+#define GFX7_MEC_HPD_SIZE      2048
+
 
 static void gfx_v7_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_gds_init(struct amdgpu_device *adev);
 
 MODULE_FIRMWARE("radeon/bonaire_pfp.bin");
 MODULE_FIRMWARE("radeon/bonaire_me.bin");
 MODULE_FIRMWARE("radeon/bonaire_ce.bin");
 MODULE_FIRMWARE("radeon/bonaire_rlc.bin");
 MODULE_FIRMWARE("radeon/bonaire_mec.bin");
@@ -2792,40 +2794,38 @@ static void gfx_v7_0_mec_fini(struct amdgpu_device *adev)
 		if (unlikely(r != 0))
 			dev_warn(adev->dev, "(%d) reserve HPD EOP bo failed\n", r);
 		amdgpu_bo_unpin(adev->gfx.mec.hpd_eop_obj);
 		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
 		amdgpu_bo_unref(&adev->gfx.mec.hpd_eop_obj);
 		adev->gfx.mec.hpd_eop_obj = NULL;
 	}
 }
 
-#define MEC_HPD_SIZE 2048
-
 static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
 
 	/*
 	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
 	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
 	 * Nonetheless, we assign only 1 pipe because all other pipes will
 	 * be handled by KFD
 	 */
 	adev->gfx.mec.num_mec = 1;
 	adev->gfx.mec.num_pipe = 1;
 	adev->gfx.mec.num_queue = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * 8;
 
 	if (adev->gfx.mec.hpd_eop_obj == NULL) {
 		r = amdgpu_bo_create(adev,
-				     adev->gfx.mec.num_mec *adev->gfx.mec.num_pipe * MEC_HPD_SIZE * 2,
+				     adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * GFX7_MEC_HPD_SIZE * 2,
 				     PAGE_SIZE, true,
 				     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				     &adev->gfx.mec.hpd_eop_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
 			return r;
 		}
 	}
 
 	r = amdgpu_bo_reserve(adev->gfx.mec.hpd_eop_obj, false);
@@ -2841,21 +2841,21 @@ static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 		return r;
 	}
 	r = amdgpu_bo_kmap(adev->gfx.mec.hpd_eop_obj, (void **)&hpd);
 	if (r) {
 		dev_warn(adev->dev, "(%d) map HDP EOP bo failed\n", r);
 		gfx_v7_0_mec_fini(adev);
 		return r;
 	}
 
 	/* clear memory.  Not sure if this is required or not */
-	memset(hpd, 0, adev->gfx.mec.num_mec *adev->gfx.mec.num_pipe * MEC_HPD_SIZE * 2);
+	memset(hpd, 0, adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * GFX7_MEC_HPD_SIZE * 2);
 
 	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
 	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
 	return 0;
 }
 
 struct hqd_registers
 {
 	u32 cp_mqd_base_addr;
@@ -2916,261 +2916,296 @@ struct bonaire_mqd
 	u32 restart[3];
 	u32 thread_trace_enable;
 	u32 reserved1;
 	u32 user_data[16];
 	u32 vgtcs_invoke_count[2];
 	struct hqd_registers queue_state;
 	u32 dequeue_cntr;
 	u32 interrupt_queue[64];
 };
 
-/**
- * gfx_v7_0_cp_compute_resume - setup the compute queue registers
- *
- * @adev: amdgpu_device pointer
- *
- * Program the compute queues and test them to make sure they
- * are working.
- * Returns 0 for success, error for failure.
- */
-static int gfx_v7_0_cp_compute_resume(struct amdgpu_device *adev)
+static void gfx_v7_0_compute_pipe_init(struct amdgpu_device *adev, int me, int pipe)
 {
-	int r, i, j;
-	u32 tmp;
-	bool use_doorbell = true;
-	u64 hqd_gpu_addr;
-	u64 mqd_gpu_addr;
 	u64 eop_gpu_addr;
-	u64 wb_gpu_addr;
-	u32 *buf;
-	struct bonaire_mqd *mqd;
-	struct amdgpu_ring *ring;
-
-	/* fix up chicken bits */
-	tmp = RREG32(mmCP_CPF_DEBUG);
-	tmp |= (1 << 23);
-	WREG32(mmCP_CPF_DEBUG, tmp);
+	u32 tmp;
+	size_t eop_offset = me * pipe * GFX7_MEC_HPD_SIZE * 2;
 
-	/* init the pipes */
 	mutex_lock(&adev->srbm_mutex);
-	for (i = 0; i < (adev->gfx.mec.num_pipe * adev->gfx.mec.num_mec); i++) {
-		int me = (i < 4) ? 1 : 2;
-		int pipe = (i < 4) ? i : (i - 4);
+	eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + eop_offset;
 
-		eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
+	cik_srbm_select(adev, me, pipe, 0, 0);
 
-		cik_srbm_select(adev, me, pipe, 0, 0);
+	/* write the EOP addr */
+	WREG32(mmCP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+	WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
-		/* write the EOP addr */
-		WREG32(mmCP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-		WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
+	/* set the VMID assigned */
+	WREG32(mmCP_HPD_EOP_VMID, 0);
 
-		/* set the VMID assigned */
-		WREG32(mmCP_HPD_EOP_VMID, 0);
+	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+	tmp = RREG32(mmCP_HPD_EOP_CONTROL);
+	tmp &= ~CP_HPD_EOP_CONTROL__EOP_SIZE_MASK;
+	tmp |= order_base_2(GFX7_MEC_HPD_SIZE / 8);
+	WREG32(mmCP_HPD_EOP_CONTROL, tmp);
 
-		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-		tmp = RREG32(mmCP_HPD_EOP_CONTROL);
-		tmp &= ~CP_HPD_EOP_CONTROL__EOP_SIZE_MASK;
-		tmp |= order_base_2(MEC_HPD_SIZE / 8);
-		WREG32(mmCP_HPD_EOP_CONTROL, tmp);
-	}
 	cik_srbm_select(adev, 0, 0, 0, 0);
 	mutex_unlock(&adev->srbm_mutex);
+}
 
-	/* init the queues.  Just two for now. */
-	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-		ring = &adev->gfx.compute_ring[i];
+static int gfx_v7_0_mqd_deactivate(struct amdgpu_device *adev)
+{
+	int i;
 
-		if (ring->mqd_obj == NULL) {
-			r = amdgpu_bo_create(adev,
-					     sizeof(struct bonaire_mqd),
-					     PAGE_SIZE, true,
-					     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
-					     &ring->mqd_obj);
-			if (r) {
-				dev_warn(adev->dev, "(%d) create MQD bo failed\n", r);
-				return r;
-			}
+	/* disable the queue if it's active */
+	if (RREG32(mmCP_HQD_ACTIVE) & 1) {
+		WREG32(mmCP_HQD_DEQUEUE_REQUEST, 1);
+		for (i = 0; i < adev->usec_timeout; i++) {
+			if (!(RREG32(mmCP_HQD_ACTIVE) & 1))
+				break;
+			udelay(1);
 		}
 
-		r = amdgpu_bo_reserve(ring->mqd_obj, false);
-		if (unlikely(r != 0)) {
-			gfx_v7_0_cp_compute_fini(adev);
-			return r;
-		}
-		r = amdgpu_bo_pin(ring->mqd_obj, AMDGPU_GEM_DOMAIN_GTT,
-				  &mqd_gpu_addr);
-		if (r) {
-			dev_warn(adev->dev, "(%d) pin MQD bo failed\n", r);
-			gfx_v7_0_cp_compute_fini(adev);
-			return r;
-		}
-		r = amdgpu_bo_kmap(ring->mqd_obj, (void **)&buf);
-		if (r) {
-			dev_warn(adev->dev, "(%d) map MQD bo failed\n", r);
-			gfx_v7_0_cp_compute_fini(adev);
-			return r;
-		}
+		if (i == adev->usec_timeout)
+			return -ETIMEDOUT;
 
-		/* init the mqd struct */
-		memset(buf, 0, sizeof(struct bonaire_mqd));
+		WREG32(mmCP_HQD_DEQUEUE_REQUEST, 0);
+		WREG32(mmCP_HQD_PQ_RPTR, 0);
+		WREG32(mmCP_HQD_PQ_WPTR, 0);
+	}
 
-		mqd = (struct bonaire_mqd *)buf;
-		mqd->header = 0xC0310800;
-		mqd->static_thread_mgmt01[0] = 0xffffffff;
-		mqd->static_thread_mgmt01[1] = 0xffffffff;
-		mqd->static_thread_mgmt23[0] = 0xffffffff;
-		mqd->static_thread_mgmt23[1] = 0xffffffff;
+	return 0;
+}
 
-		mutex_lock(&adev->srbm_mutex);
-		cik_srbm_select(adev, ring->me,
-				ring->pipe,
-				ring->queue, 0);
+static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
+			     struct bonaire_mqd *mqd,
+			     uint64_t mqd_gpu_addr,
+			     struct amdgpu_ring *ring)
+{
+	u64 hqd_gpu_addr;
+	u64 wb_gpu_addr;
 
-		/* disable wptr polling */
-		tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
-		tmp &= ~CP_PQ_WPTR_POLL_CNTL__EN_MASK;
-		WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
+	/* init the mqd struct */
+	memset(mqd, 0, sizeof(struct bonaire_mqd));
 
-		/* enable doorbell? */
-		mqd->queue_state.cp_hqd_pq_doorbell_control =
-			RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
-		if (use_doorbell)
-			mqd->queue_state.cp_hqd_pq_doorbell_control |= CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
-		else
-			mqd->queue_state.cp_hqd_pq_doorbell_control &= ~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
-		WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL,
-		       mqd->queue_state.cp_hqd_pq_doorbell_control);
-
-		/* disable the queue if it's active */
-		mqd->queue_state.cp_hqd_dequeue_request = 0;
-		mqd->queue_state.cp_hqd_pq_rptr = 0;
-		mqd->queue_state.cp_hqd_pq_wptr= 0;
-		if (RREG32(mmCP_HQD_ACTIVE) & 1) {
-			WREG32(mmCP_HQD_DEQUEUE_REQUEST, 1);
-			for (j = 0; j < adev->usec_timeout; j++) {
-				if (!(RREG32(mmCP_HQD_ACTIVE) & 1))
-					break;
-				udelay(1);
-			}
-			WREG32(mmCP_HQD_DEQUEUE_REQUEST, mqd->queue_state.cp_hqd_dequeue_request);
-			WREG32(mmCP_HQD_PQ_RPTR, mqd->queue_state.cp_hqd_pq_rptr);
-			WREG32(mmCP_HQD_PQ_WPTR, mqd->queue_state.cp_hqd_pq_wptr);
-		}
+	mqd->header = 0xC0310800;
+	mqd->static_thread_mgmt01[0] = 0xffffffff;
+	mqd->static_thread_mgmt01[1] = 0xffffffff;
+	mqd->static_thread_mgmt23[0] = 0xffffffff;
+	mqd->static_thread_mgmt23[1] = 0xffffffff;
 
-		/* set the pointer to the MQD */
-		mqd->queue_state.cp_mqd_base_addr = mqd_gpu_addr & 0xfffffffc;
-		mqd->queue_state.cp_mqd_base_addr_hi = upper_32_bits(mqd_gpu_addr);
-		WREG32(mmCP_MQD_BASE_ADDR, mqd->queue_state.cp_mqd_base_addr);
-		WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->queue_state.cp_mqd_base_addr_hi);
-		/* set MQD vmid to 0 */
-		mqd->queue_state.cp_mqd_control = RREG32(mmCP_MQD_CONTROL);
-		mqd->queue_state.cp_mqd_control &= ~CP_MQD_CONTROL__VMID_MASK;
-		WREG32(mmCP_MQD_CONTROL, mqd->queue_state.cp_mqd_control);
-
-		/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
-		hqd_gpu_addr = ring->gpu_addr >> 8;
-		mqd->queue_state.cp_hqd_pq_base = hqd_gpu_addr;
-		mqd->queue_state.cp_hqd_pq_base_hi = upper_32_bits(hqd_gpu_addr);
-		WREG32(mmCP_HQD_PQ_BASE, mqd->queue_state.cp_hqd_pq_base);
-		WREG32(mmCP_HQD_PQ_BASE_HI, mqd->queue_state.cp_hqd_pq_base_hi);
-
-		/* set up the HQD, this is similar to CP_RB0_CNTL */
-		mqd->queue_state.cp_hqd_pq_control = RREG32(mmCP_HQD_PQ_CONTROL);
-		mqd->queue_state.cp_hqd_pq_control &=
-			~(CP_HQD_PQ_CONTROL__QUEUE_SIZE_MASK |
-					CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE_MASK);
-
-		mqd->queue_state.cp_hqd_pq_control |=
-			order_base_2(ring->ring_size / 8);
-		mqd->queue_state.cp_hqd_pq_control |=
-			(order_base_2(AMDGPU_GPU_PAGE_SIZE/8) << 8);
+	/* enable doorbell? */
+	mqd->queue_state.cp_hqd_pq_doorbell_control =
+		RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
+	if (ring->use_doorbell)
+		mqd->queue_state.cp_hqd_pq_doorbell_control |= CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
+	else
+		mqd->queue_state.cp_hqd_pq_doorbell_control &= ~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
+
+	/* set the pointer to the MQD */
+	mqd->queue_state.cp_mqd_base_addr = mqd_gpu_addr & 0xfffffffc;
+	mqd->queue_state.cp_mqd_base_addr_hi = upper_32_bits(mqd_gpu_addr);
+
+	/* set MQD vmid to 0 */
+	mqd->queue_state.cp_mqd_control = RREG32(mmCP_MQD_CONTROL);
+	mqd->queue_state.cp_mqd_control &= ~CP_MQD_CONTROL__VMID_MASK;
+
+	/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
+	hqd_gpu_addr = ring->gpu_addr >> 8;
+	mqd->queue_state.cp_hqd_pq_base = hqd_gpu_addr;
+	mqd->queue_state.cp_hqd_pq_base_hi = upper_32_bits(hqd_gpu_addr);
+
+	/* set up the HQD, this is similar to CP_RB0_CNTL */
+	mqd->queue_state.cp_hqd_pq_control = RREG32(mmCP_HQD_PQ_CONTROL);
+	mqd->queue_state.cp_hqd_pq_control &=
+		~(CP_HQD_PQ_CONTROL__QUEUE_SIZE_MASK |
+				CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE_MASK);
+
+	mqd->queue_state.cp_hqd_pq_control |=
+		order_base_2(ring->ring_size / 8);
+	mqd->queue_state.cp_hqd_pq_control |=
+		(order_base_2(AMDGPU_GPU_PAGE_SIZE/8) << 8);
 #ifdef __BIG_ENDIAN
-		mqd->queue_state.cp_hqd_pq_control |=
-			2 << CP_HQD_PQ_CONTROL__ENDIAN_SWAP__SHIFT;
+	mqd->queue_state.cp_hqd_pq_control |=
+		2 << CP_HQD_PQ_CONTROL__ENDIAN_SWAP__SHIFT;
 #endif
-		mqd->queue_state.cp_hqd_pq_control &=
-			~(CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK |
+	mqd->queue_state.cp_hqd_pq_control &=
+		~(CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK |
 				CP_HQD_PQ_CONTROL__ROQ_PQ_IB_FLIP_MASK |
 				CP_HQD_PQ_CONTROL__PQ_VOLATILE_MASK);
-		mqd->queue_state.cp_hqd_pq_control |=
-			CP_HQD_PQ_CONTROL__PRIV_STATE_MASK |
-			CP_HQD_PQ_CONTROL__KMD_QUEUE_MASK; /* assuming kernel queue control */
-		WREG32(mmCP_HQD_PQ_CONTROL, mqd->queue_state.cp_hqd_pq_control);
-
-		/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
-		wb_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
-		mqd->queue_state.cp_hqd_pq_wptr_poll_addr = wb_gpu_addr & 0xfffffffc;
-		mqd->queue_state.cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
-		WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->queue_state.cp_hqd_pq_wptr_poll_addr);
-		WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI,
-		       mqd->queue_state.cp_hqd_pq_wptr_poll_addr_hi);
-
-		/* set the wb address wether it's enabled or not */
-		wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
-		mqd->queue_state.cp_hqd_pq_rptr_report_addr = wb_gpu_addr & 0xfffffffc;
-		mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi =
-			upper_32_bits(wb_gpu_addr) & 0xffff;
-		WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR,
-		       mqd->queue_state.cp_hqd_pq_rptr_report_addr);
-		WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
-		       mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi);
-
-		/* enable the doorbell if requested */
-		if (use_doorbell) {
-			mqd->queue_state.cp_hqd_pq_doorbell_control =
-				RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
-			mqd->queue_state.cp_hqd_pq_doorbell_control &=
-				~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET_MASK;
-			mqd->queue_state.cp_hqd_pq_doorbell_control |=
-				(ring->doorbell_index <<
-				 CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT);
-			mqd->queue_state.cp_hqd_pq_doorbell_control |=
-				CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
-			mqd->queue_state.cp_hqd_pq_doorbell_control &=
-				~(CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_SOURCE_MASK |
-				CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_HIT_MASK);
+	mqd->queue_state.cp_hqd_pq_control |=
+		CP_HQD_PQ_CONTROL__PRIV_STATE_MASK |
+		CP_HQD_PQ_CONTROL__KMD_QUEUE_MASK; /* assuming kernel queue control */
 
-		} else {
-			mqd->queue_state.cp_hqd_pq_doorbell_control = 0;
+	/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
+	wb_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
+	mqd->queue_state.cp_hqd_pq_wptr_poll_addr = wb_gpu_addr & 0xfffffffc;
+	mqd->queue_state.cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
+
+	/* set the wb address wether it's enabled or not */
+	wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
+	mqd->queue_state.cp_hqd_pq_rptr_report_addr = wb_gpu_addr & 0xfffffffc;
+	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi =
+		upper_32_bits(wb_gpu_addr) & 0xffff;
+
+	/* enable the doorbell if requested */
+	if (ring->use_doorbell) {
+		mqd->queue_state.cp_hqd_pq_doorbell_control =
+			RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
+		mqd->queue_state.cp_hqd_pq_doorbell_control &=
+			~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET_MASK;
+		mqd->queue_state.cp_hqd_pq_doorbell_control |=
+			(ring->doorbell_index <<
+			 CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT);
+		mqd->queue_state.cp_hqd_pq_doorbell_control |=
+			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
+		mqd->queue_state.cp_hqd_pq_doorbell_control &=
+			~(CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_SOURCE_MASK |
+					CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_HIT_MASK);
+
+	} else {
+		mqd->queue_state.cp_hqd_pq_doorbell_control = 0;
+	}
+
+	/* read and write pointers, similar to CP_RB0_WPTR/_RPTR */
+	ring->wptr = 0;
+	mqd->queue_state.cp_hqd_pq_wptr = ring->wptr;
+	mqd->queue_state.cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
+
+	/* set the vmid for the queue */
+	mqd->queue_state.cp_hqd_vmid = 0;
+
+	/* activate the queue */
+	mqd->queue_state.cp_hqd_active = 1;
+}
+
+static int gfx_v7_0_mqd_commit(struct amdgpu_device *adev,
+			       struct bonaire_mqd *mqd)
+{
+	u32 tmp;
+
+	/* disable wptr polling */
+	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
+	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
+	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
+
+	/* program MQD field to HW */
+	WREG32(mmCP_MQD_BASE_ADDR, mqd->queue_state.cp_mqd_base_addr);
+	WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->queue_state.cp_mqd_base_addr_hi);
+	WREG32(mmCP_MQD_CONTROL, mqd->queue_state.cp_mqd_control);
+	WREG32(mmCP_HQD_PQ_BASE, mqd->queue_state.cp_hqd_pq_base);
+	WREG32(mmCP_HQD_PQ_BASE_HI, mqd->queue_state.cp_hqd_pq_base_hi);
+	WREG32(mmCP_HQD_PQ_CONTROL, mqd->queue_state.cp_hqd_pq_control);
+	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->queue_state.cp_hqd_pq_wptr_poll_addr);
+	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->queue_state.cp_hqd_pq_wptr_poll_addr_hi);
+	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, mqd->queue_state.cp_hqd_pq_rptr_report_addr);
+	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI, mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi);
+	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->queue_state.cp_hqd_pq_doorbell_control);
+	WREG32(mmCP_HQD_PQ_WPTR, mqd->queue_state.cp_hqd_pq_wptr);
+	WREG32(mmCP_HQD_VMID, mqd->queue_state.cp_hqd_vmid);
+
+	/* activate the HQD */
+	WREG32(mmCP_HQD_ACTIVE, mqd->queue_state.cp_hqd_active);
+
+	return 0;
+}
+
+static int gfx_v7_0_compute_queue_init(struct amdgpu_device *adev, int ring_id)
+{
+	int r;
+	u64 mqd_gpu_addr;
+	struct bonaire_mqd *mqd;
+	struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
+
+	if (ring->mqd_obj == NULL) {
+		r = amdgpu_bo_create(adev,
+				sizeof(struct bonaire_mqd),
+				PAGE_SIZE, true,
+				AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
+				&ring->mqd_obj);
+		if (r) {
+			dev_warn(adev->dev, "(%d) create MQD bo failed\n", r);
+			return r;
 		}
-		WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL,
-		       mqd->queue_state.cp_hqd_pq_doorbell_control);
+	}
+
+	r = amdgpu_bo_reserve(ring->mqd_obj, false);
+	if (unlikely(r != 0))
+		goto out;
+
+	r = amdgpu_bo_pin(ring->mqd_obj, AMDGPU_GEM_DOMAIN_GTT,
+			&mqd_gpu_addr);
+	if (r) {
+		dev_warn(adev->dev, "(%d) pin MQD bo failed\n", r);
+		goto out_unreserve;
+	}
+	r = amdgpu_bo_kmap(ring->mqd_obj, (void **)&mqd);
+	if (r) {
+		dev_warn(adev->dev, "(%d) map MQD bo failed\n", r);
+		goto out_unreserve;
+	}
+
+	mutex_lock(&adev->srbm_mutex);
+	cik_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
 
-		/* read and write pointers, similar to CP_RB0_WPTR/_RPTR */
-		ring->wptr = 0;
-		mqd->queue_state.cp_hqd_pq_wptr = ring->wptr;
-		WREG32(mmCP_HQD_PQ_WPTR, mqd->queue_state.cp_hqd_pq_wptr);
-		mqd->queue_state.cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
+	gfx_v7_0_mqd_init(adev, mqd, mqd_gpu_addr, ring);
+	gfx_v7_0_mqd_deactivate(adev);
+	gfx_v7_0_mqd_commit(adev, mqd);
 
-		/* set the vmid for the queue */
-		mqd->queue_state.cp_hqd_vmid = 0;
-		WREG32(mmCP_HQD_VMID, mqd->queue_state.cp_hqd_vmid);
+	cik_srbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
 
-		/* activate the queue */
-		mqd->queue_state.cp_hqd_active = 1;
-		WREG32(mmCP_HQD_ACTIVE, mqd->queue_state.cp_hqd_active);
+	amdgpu_bo_kunmap(ring->mqd_obj);
+out_unreserve:
+	amdgpu_bo_unreserve(ring->mqd_obj);
+out:
+	return 0;
+}
+
+/**
+ * gfx_v7_0_cp_compute_resume - setup the compute queue registers
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Program the compute queues and test them to make sure they
+ * are working.
+ * Returns 0 for success, error for failure.
+ */
+static int gfx_v7_0_cp_compute_resume(struct amdgpu_device *adev)
+{
+	int r, i, j;
+	u32 tmp;
+	struct amdgpu_ring *ring;
 
-		cik_srbm_select(adev, 0, 0, 0, 0);
-		mutex_unlock(&adev->srbm_mutex);
+	/* fix up chicken bits */
+	tmp = RREG32(mmCP_CPF_DEBUG);
+	tmp |= (1 << 23);
+	WREG32(mmCP_CPF_DEBUG, tmp);
 
-		amdgpu_bo_kunmap(ring->mqd_obj);
-		amdgpu_bo_unreserve(ring->mqd_obj);
+	/* init the pipes */
+	for (i = 0; i < adev->gfx.mec.num_mec; i++)
+		for (j = 0; j < adev->gfx.mec.num_pipe; j++)
+			gfx_v7_0_compute_pipe_init(adev, i, j);
 
-		ring->ready = true;
+	/* init the queues */
+	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+		r = gfx_v7_0_compute_queue_init(adev, i);
+		if (r) {
+			gfx_v7_0_cp_compute_fini(adev);
+			return r;
+		}
 	}
 
 	gfx_v7_0_cp_compute_enable(adev, true);
 
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		ring = &adev->gfx.compute_ring[i];
-
+		ring->ready = true;
 		r = amdgpu_ring_test_ring(ring);
 		if (r)
 			ring->ready = false;
 	}
 
 	return 0;
 }
 
 static void gfx_v7_0_cp_enable(struct amdgpu_device *adev, bool enable)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 67afc90..1c8589a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -46,20 +46,22 @@
 #include "gca/gfx_8_0_sh_mask.h"
 #include "gca/gfx_8_0_enum.h"
 
 #include "dce/dce_10_0_d.h"
 #include "dce/dce_10_0_sh_mask.h"
 
 #include "smu/smu_7_1_3_d.h"
 
 #define GFX8_NUM_GFX_RINGS     1
 #define GFX8_NUM_COMPUTE_RINGS 8
+#define GFX8_MEC_HPD_SIZE 2048
+
 
 #define TOPAZ_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define CARRIZO_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define POLARIS11_GB_ADDR_CONFIG_GOLDEN 0x22011002
 #define TONGA_GB_ADDR_CONFIG_GOLDEN 0x22011003
 
 #define ARRAY_MODE(x)					((x) << GB_TILE_MODE0__ARRAY_MODE__SHIFT)
 #define PIPE_CONFIG(x)					((x) << GB_TILE_MODE0__PIPE_CONFIG__SHIFT)
 #define TILE_SPLIT(x)					((x) << GB_TILE_MODE0__TILE_SPLIT__SHIFT)
 #define MICRO_TILE_MODE_NEW(x)				((x) << GB_TILE_MODE0__MICRO_TILE_MODE_NEW__SHIFT)
@@ -1409,38 +1411,38 @@ static int gfx_v8_0_kiq_init_ring(struct amdgpu_device *adev,
 static void gfx_v8_0_kiq_free_ring(struct amdgpu_ring *ring,
 				   struct amdgpu_irq_src *irq)
 {
 	if (amdgpu_sriov_vf(ring->adev))
 		amdgpu_wb_free(ring->adev, ring->adev->virt.reg_val_offs);
 
 	amdgpu_ring_fini(ring);
 	irq->data = NULL;
 }
 
-#define MEC_HPD_SIZE 2048
+#define GFX8_MEC_HPD_SIZE 2048
 
 static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
 
 	/*
 	 * we assign only 1 pipe because all other pipes will
 	 * be handled by KFD
 	 */
 	adev->gfx.mec.num_mec = 1;
 	adev->gfx.mec.num_pipe = 1;
 	adev->gfx.mec.num_queue = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * 8;
 
 	if (adev->gfx.mec.hpd_eop_obj == NULL) {
 		r = amdgpu_bo_create(adev,
-				     adev->gfx.mec.num_queue * MEC_HPD_SIZE,
+				     adev->gfx.mec.num_queue * GFX8_MEC_HPD_SIZE,
 				     PAGE_SIZE, true,
 				     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				     &adev->gfx.mec.hpd_eop_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
 			return r;
 		}
 	}
 
 	r = amdgpu_bo_reserve(adev->gfx.mec.hpd_eop_obj, false);
@@ -1455,21 +1457,21 @@ static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 		gfx_v8_0_mec_fini(adev);
 		return r;
 	}
 	r = amdgpu_bo_kmap(adev->gfx.mec.hpd_eop_obj, (void **)&hpd);
 	if (r) {
 		dev_warn(adev->dev, "(%d) map HDP EOP bo failed\n", r);
 		gfx_v8_0_mec_fini(adev);
 		return r;
 	}
 
-	memset(hpd, 0, adev->gfx.mec.num_queue * MEC_HPD_SIZE);
+	memset(hpd, 0, adev->gfx.mec.num_queue * GFX8_MEC_HPD_SIZE);
 
 	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
 	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
 	return 0;
 }
 
 static void gfx_v8_0_kiq_fini(struct amdgpu_device *adev)
 {
 	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
@@ -1477,29 +1479,29 @@ static void gfx_v8_0_kiq_fini(struct amdgpu_device *adev)
 	amdgpu_bo_free_kernel(&kiq->eop_obj, &kiq->eop_gpu_addr, NULL);
 	kiq->eop_obj = NULL;
 }
 
 static int gfx_v8_0_kiq_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
 	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
 
-	r = amdgpu_bo_create_kernel(adev, MEC_HPD_SIZE, PAGE_SIZE,
+	r = amdgpu_bo_create_kernel(adev, GFX8_MEC_HPD_SIZE, PAGE_SIZE,
 				    AMDGPU_GEM_DOMAIN_GTT, &kiq->eop_obj,
 				    &kiq->eop_gpu_addr, (void **)&hpd);
 	if (r) {
 		dev_warn(adev->dev, "failed to create KIQ bo (%d).\n", r);
 		return r;
 	}
 
-	memset(hpd, 0, MEC_HPD_SIZE);
+	memset(hpd, 0, GFX8_MEC_HPD_SIZE);
 
 	amdgpu_bo_kunmap(kiq->eop_obj);
 
 	return 0;
 }
 
 static const u32 vgpr_init_compute_shader[] =
 {
 	0x7e000209, 0x7e020208,
 	0x7e040207, 0x7e060206,
@@ -4658,56 +4660,54 @@ static void gfx_v8_0_map_queue_enable(struct amdgpu_ring *kiq_ring,
 
 static int gfx_v8_0_mqd_init(struct amdgpu_device *adev,
 			     struct vi_mqd *mqd,
 			     uint64_t mqd_gpu_addr,
 			     uint64_t eop_gpu_addr,
 			     struct amdgpu_ring *ring)
 {
 	uint64_t hqd_gpu_addr, wb_gpu_addr, eop_base_addr;
 	uint32_t tmp;
 
+	/* init the mqd struct */
+	memset(mqd, 0, sizeof(struct vi_mqd));
+
 	mqd->header = 0xC0310800;
 	mqd->compute_pipelinestat_enable = 0x00000001;
 	mqd->compute_static_thread_mgmt_se0 = 0xffffffff;
 	mqd->compute_static_thread_mgmt_se1 = 0xffffffff;
 	mqd->compute_static_thread_mgmt_se2 = 0xffffffff;
 	mqd->compute_static_thread_mgmt_se3 = 0xffffffff;
 	mqd->compute_misc_reserved = 0x00000003;
 
 	eop_base_addr = eop_gpu_addr >> 8;
 	mqd->cp_hqd_eop_base_addr_lo = eop_base_addr;
 	mqd->cp_hqd_eop_base_addr_hi = upper_32_bits(eop_base_addr);
 
 	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
 	tmp = RREG32(mmCP_HQD_EOP_CONTROL);
 	tmp = REG_SET_FIELD(tmp, CP_HQD_EOP_CONTROL, EOP_SIZE,
-			(order_base_2(MEC_HPD_SIZE / 4) - 1));
+			(order_base_2(GFX8_MEC_HPD_SIZE / 4) - 1));
 
 	mqd->cp_hqd_eop_control = tmp;
 
 	/* enable doorbell? */
 	tmp = RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
 
 	if (ring->use_doorbell)
 		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL,
 					 DOORBELL_EN, 1);
 	else
 		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL,
 					 DOORBELL_EN, 0);
 
 	mqd->cp_hqd_pq_doorbell_control = tmp;
 
-	/* disable the queue if it's active */
-	mqd->cp_hqd_dequeue_request = 0;
-	mqd->cp_hqd_pq_rptr = 0;
-	mqd->cp_hqd_pq_wptr = 0;
-
 	/* set the pointer to the MQD */
 	mqd->cp_mqd_base_addr_lo = mqd_gpu_addr & 0xfffffffc;
 	mqd->cp_mqd_base_addr_hi = upper_32_bits(mqd_gpu_addr);
 
 	/* set MQD vmid to 0 */
 	tmp = RREG32(mmCP_MQD_CONTROL);
 	tmp = REG_SET_FIELD(tmp, CP_MQD_CONTROL, VMID, 0);
 	mqd->cp_mqd_control = tmp;
 
 	/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
@@ -4769,53 +4769,87 @@ static int gfx_v8_0_mqd_init(struct amdgpu_device *adev,
 	tmp = RREG32(mmCP_HQD_PERSISTENT_STATE);
 	tmp = REG_SET_FIELD(tmp, CP_HQD_PERSISTENT_STATE, PRELOAD_SIZE, 0x53);
 	mqd->cp_hqd_persistent_state = tmp;
 
 	/* activate the queue */
 	mqd->cp_hqd_active = 1;
 
 	return 0;
 }
 
-static int gfx_v8_0_kiq_init_register(struct amdgpu_device *adev,
-				      struct vi_mqd *mqd,
-				      struct amdgpu_ring *ring)
+static int gfx_v8_0_mqd_deactivate(struct amdgpu_device *adev)
+{
+	int i;
+
+	/* disable the queue if it's active */
+	if (RREG32(mmCP_HQD_ACTIVE) & 1) {
+		WREG32(mmCP_HQD_DEQUEUE_REQUEST, 1);
+		for (i = 0; i < adev->usec_timeout; i++) {
+			if (!(RREG32(mmCP_HQD_ACTIVE) & 1))
+				break;
+			udelay(1);
+		}
+
+		if (i == adev->usec_timeout)
+			return -ETIMEDOUT;
+
+		WREG32(mmCP_HQD_DEQUEUE_REQUEST, 0);
+		WREG32(mmCP_HQD_PQ_RPTR, 0);
+		WREG32(mmCP_HQD_PQ_WPTR, 0);
+	}
+
+	return 0;
+}
+
+static void gfx_v8_0_enable_doorbell(struct amdgpu_device *adev, bool enable)
+{
+	uint32_t tmp;
+
+	if (!enable)
+		return;
+
+	if ((adev->asic_type == CHIP_CARRIZO) ||
+			(adev->asic_type == CHIP_FIJI) ||
+			(adev->asic_type == CHIP_STONEY) ||
+			(adev->asic_type == CHIP_POLARIS11) ||
+			(adev->asic_type == CHIP_POLARIS10)) {
+		WREG32(mmCP_MEC_DOORBELL_RANGE_LOWER, AMDGPU_DOORBELL_KIQ << 2);
+		WREG32(mmCP_MEC_DOORBELL_RANGE_UPPER, AMDGPU_DOORBELL_MEC_RING7 << 2);
+	}
+
+	tmp = RREG32(mmCP_PQ_STATUS);
+	tmp = REG_SET_FIELD(tmp, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
+	WREG32(mmCP_PQ_STATUS, tmp);
+}
+
+static int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 {
 	uint32_t tmp;
-	int j;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
 	WREG32(mmCP_HQD_EOP_BASE_ADDR, mqd->cp_hqd_eop_base_addr_lo);
 	WREG32(mmCP_HQD_EOP_BASE_ADDR_HI, mqd->cp_hqd_eop_base_addr_hi);
 
 	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
 	WREG32(mmCP_HQD_EOP_CONTROL, mqd->cp_hqd_eop_control);
 
 	/* enable doorbell? */
 	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
 
-	/* disable the queue if it's active */
-	if (RREG32(mmCP_HQD_ACTIVE) & 1) {
-		WREG32(mmCP_HQD_DEQUEUE_REQUEST, 1);
-		for (j = 0; j < adev->usec_timeout; j++) {
-			if (!(RREG32(mmCP_HQD_ACTIVE) & 1))
-				break;
-			udelay(1);
-		}
-		WREG32(mmCP_HQD_DEQUEUE_REQUEST, mqd->cp_hqd_dequeue_request);
-		WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
-		WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-	}
+	/* set pq read/write pointers */
+	WREG32(mmCP_HQD_DEQUEUE_REQUEST, mqd->cp_hqd_dequeue_request);
+	WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
+	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
 
 	/* set the pointer to the MQD */
 	WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
 	WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->cp_mqd_base_addr_hi);
 
 	/* set MQD vmid to 0 */
 	WREG32(mmCP_MQD_CONTROL, mqd->cp_mqd_control);
 
 	/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
 	WREG32(mmCP_HQD_PQ_BASE, mqd->cp_hqd_pq_base_lo);
@@ -4828,78 +4862,65 @@ static int gfx_v8_0_kiq_init_register(struct amdgpu_device *adev,
 	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR,
 				mqd->cp_hqd_pq_rptr_report_addr_lo);
 	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
 				mqd->cp_hqd_pq_rptr_report_addr_hi);
 
 	/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
 	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
 	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
 
 	/* enable the doorbell if requested */
-	if (ring->use_doorbell) {
-		if ((adev->asic_type == CHIP_CARRIZO) ||
-				(adev->asic_type == CHIP_FIJI) ||
-				(adev->asic_type == CHIP_STONEY)) {
-			WREG32(mmCP_MEC_DOORBELL_RANGE_LOWER,
-						AMDGPU_DOORBELL_KIQ << 2);
-			WREG32(mmCP_MEC_DOORBELL_RANGE_UPPER,
-						AMDGPU_DOORBELL_MEC_RING7 << 2);
-		}
-	}
 	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
 
 	/* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
 	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
 
 	/* set the vmid for the queue */
 	WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
 
 	WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
 
 	/* activate the queue */
 	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
 
-	if (ring->use_doorbell) {
-		tmp = RREG32(mmCP_PQ_STATUS);
-		tmp = REG_SET_FIELD(tmp, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
-		WREG32(mmCP_PQ_STATUS, tmp);
-	}
-
 	return 0;
 }
 
-static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring *ring,
+static int gfx_v8_0_kiq_queue_init(struct amdgpu_ring *ring,
 				   struct vi_mqd *mqd,
 				   u64 mqd_gpu_addr)
 {
 	struct amdgpu_device *adev = ring->adev;
 	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
 	uint64_t eop_gpu_addr;
 	bool is_kiq = false;
 
 	if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
 		is_kiq = true;
 
 	if (is_kiq) {
 		eop_gpu_addr = kiq->eop_gpu_addr;
 		gfx_v8_0_kiq_setting(&kiq->ring);
 	} else
 		eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr +
-					ring->queue * MEC_HPD_SIZE;
+					ring->queue * GFX8_MEC_HPD_SIZE;
 
 	mutex_lock(&adev->srbm_mutex);
 	vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
 
 	gfx_v8_0_mqd_init(adev, mqd, mqd_gpu_addr, eop_gpu_addr, ring);
 
-	if (is_kiq)
-		gfx_v8_0_kiq_init_register(adev, mqd, ring);
+	if (is_kiq) {
+		gfx_v8_0_mqd_deactivate(adev);
+		gfx_v8_0_enable_doorbell(adev, ring->use_doorbell);
+		gfx_v8_0_mqd_commit(adev, mqd);
+	}
 
 	vi_srbm_select(adev, 0, 0, 0, 0);
 	mutex_unlock(&adev->srbm_mutex);
 
 	if (is_kiq)
 		gfx_v8_0_kiq_enable(ring);
 	else
 		gfx_v8_0_map_queue_enable(&kiq->ring, ring);
 
 	return 0;
@@ -4922,33 +4943,34 @@ static void gfx_v8_0_kiq_free_queue(struct amdgpu_device *adev)
 }
 
 static int gfx_v8_0_kiq_setup_queue(struct amdgpu_device *adev,
 				    struct amdgpu_ring *ring)
 {
 	struct vi_mqd *mqd;
 	u64 mqd_gpu_addr;
 	u32 *buf;
 	int r = 0;
 
-	r = amdgpu_bo_create_kernel(adev, sizeof(struct vi_mqd), PAGE_SIZE,
-				    AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj,
-				    &mqd_gpu_addr, (void **)&buf);
+	r = amdgpu_bo_create_kernel(adev, sizeof(struct vi_mqd),
+			PAGE_SIZE, AMDGPU_GEM_DOMAIN_GTT,
+			&ring->mqd_obj, &mqd_gpu_addr,
+			(void **)&buf);
 	if (r) {
 		dev_warn(adev->dev, "failed to create ring mqd ob (%d)", r);
 		return r;
 	}
 
 	/* init the mqd struct */
 	memset(buf, 0, sizeof(struct vi_mqd));
 	mqd = (struct vi_mqd *)buf;
 
-	r = gfx_v8_0_kiq_init_queue(ring, mqd, mqd_gpu_addr);
+	r = gfx_v8_0_kiq_queue_init(ring, mqd, mqd_gpu_addr);
 	if (r)
 		return r;
 
 	amdgpu_bo_kunmap(ring->mqd_obj);
 
 	return 0;
 }
 
 static int gfx_v8_0_kiq_resume(struct amdgpu_device *adev)
 {
@@ -4980,260 +5002,113 @@ static int gfx_v8_0_kiq_resume(struct amdgpu_device *adev)
 
 	ring = &adev->gfx.kiq.ring;
 	ring->ready = true;
 	r = amdgpu_ring_test_ring(ring);
 	if (r)
 		ring->ready = false;
 
 	return 0;
 }
 
-static int gfx_v8_0_cp_compute_resume(struct amdgpu_device *adev)
+static int gfx_v8_0_compute_queue_init(struct amdgpu_device *adev,
+				       int ring_id)
 {
-	int r, i, j;
-	u32 tmp;
-	bool use_doorbell = true;
-	u64 hqd_gpu_addr;
-	u64 mqd_gpu_addr;
+	int r;
 	u64 eop_gpu_addr;
-	u64 wb_gpu_addr;
-	u32 *buf;
+	u64 mqd_gpu_addr;
 	struct vi_mqd *mqd;
+	struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
 
-	/* init the queues.  */
-	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-		struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
-
-		if (ring->mqd_obj == NULL) {
-			r = amdgpu_bo_create(adev,
-					     sizeof(struct vi_mqd),
-					     PAGE_SIZE, true,
-					     AMDGPU_GEM_DOMAIN_GTT, 0, NULL,
-					     NULL, &ring->mqd_obj);
-			if (r) {
-				dev_warn(adev->dev, "(%d) create MQD bo failed\n", r);
-				return r;
-			}
-		}
-
-		r = amdgpu_bo_reserve(ring->mqd_obj, false);
-		if (unlikely(r != 0)) {
-			gfx_v8_0_cp_compute_fini(adev);
-			return r;
-		}
-		r = amdgpu_bo_pin(ring->mqd_obj, AMDGPU_GEM_DOMAIN_GTT,
-				  &mqd_gpu_addr);
-		if (r) {
-			dev_warn(adev->dev, "(%d) pin MQD bo failed\n", r);
-			gfx_v8_0_cp_compute_fini(adev);
-			return r;
-		}
-		r = amdgpu_bo_kmap(ring->mqd_obj, (void **)&buf);
+	if (ring->mqd_obj == NULL) {
+		r = amdgpu_bo_create(adev,
+				sizeof(struct vi_mqd),
+				PAGE_SIZE, true,
+				AMDGPU_GEM_DOMAIN_GTT, 0, NULL,
+				NULL, &ring->mqd_obj);
 		if (r) {
-			dev_warn(adev->dev, "(%d) map MQD bo failed\n", r);
-			gfx_v8_0_cp_compute_fini(adev);
+			dev_warn(adev->dev, "(%d) create MQD bo failed\n", r);
 			return r;
 		}
+	}
 
-		/* init the mqd struct */
-		memset(buf, 0, sizeof(struct vi_mqd));
-
-		mqd = (struct vi_mqd *)buf;
-		mqd->header = 0xC0310800;
-		mqd->compute_pipelinestat_enable = 0x00000001;
-		mqd->compute_static_thread_mgmt_se0 = 0xffffffff;
-		mqd->compute_static_thread_mgmt_se1 = 0xffffffff;
-		mqd->compute_static_thread_mgmt_se2 = 0xffffffff;
-		mqd->compute_static_thread_mgmt_se3 = 0xffffffff;
-		mqd->compute_misc_reserved = 0x00000003;
-
-		mutex_lock(&adev->srbm_mutex);
-		vi_srbm_select(adev, ring->me,
-			       ring->pipe,
-			       ring->queue, 0);
-
-		eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE);
-		eop_gpu_addr >>= 8;
-
-		/* write the EOP addr */
-		WREG32(mmCP_HQD_EOP_BASE_ADDR, eop_gpu_addr);
-		WREG32(mmCP_HQD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr));
-
-		/* set the VMID assigned */
-		WREG32(mmCP_HQD_VMID, 0);
-
-		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-		tmp = RREG32(mmCP_HQD_EOP_CONTROL);
-		tmp = REG_SET_FIELD(tmp, CP_HQD_EOP_CONTROL, EOP_SIZE,
-				    (order_base_2(MEC_HPD_SIZE / 4) - 1));
-		WREG32(mmCP_HQD_EOP_CONTROL, tmp);
-
-		/* disable wptr polling */
-		tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
-		tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
-		WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
-
-		mqd->cp_hqd_eop_base_addr_lo =
-			RREG32(mmCP_HQD_EOP_BASE_ADDR);
-		mqd->cp_hqd_eop_base_addr_hi =
-			RREG32(mmCP_HQD_EOP_BASE_ADDR_HI);
-
-		/* enable doorbell? */
-		tmp = RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
-		if (use_doorbell) {
-			tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
-		} else {
-			tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 0);
-		}
-		WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, tmp);
-		mqd->cp_hqd_pq_doorbell_control = tmp;
-
-		/* disable the queue if it's active */
-		mqd->cp_hqd_dequeue_request = 0;
-		mqd->cp_hqd_pq_rptr = 0;
-		mqd->cp_hqd_pq_wptr= 0;
-		if (RREG32(mmCP_HQD_ACTIVE) & 1) {
-			WREG32(mmCP_HQD_DEQUEUE_REQUEST, 1);
-			for (j = 0; j < adev->usec_timeout; j++) {
-				if (!(RREG32(mmCP_HQD_ACTIVE) & 1))
-					break;
-				udelay(1);
-			}
-			WREG32(mmCP_HQD_DEQUEUE_REQUEST, mqd->cp_hqd_dequeue_request);
-			WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
-			WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-		}
+	r = amdgpu_bo_reserve(ring->mqd_obj, false);
+	if (unlikely(r != 0))
+		goto out;
 
-		/* set the pointer to the MQD */
-		mqd->cp_mqd_base_addr_lo = mqd_gpu_addr & 0xfffffffc;
-		mqd->cp_mqd_base_addr_hi = upper_32_bits(mqd_gpu_addr);
-		WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
-		WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->cp_mqd_base_addr_hi);
-
-		/* set MQD vmid to 0 */
-		tmp = RREG32(mmCP_MQD_CONTROL);
-		tmp = REG_SET_FIELD(tmp, CP_MQD_CONTROL, VMID, 0);
-		WREG32(mmCP_MQD_CONTROL, tmp);
-		mqd->cp_mqd_control = tmp;
-
-		/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
-		hqd_gpu_addr = ring->gpu_addr >> 8;
-		mqd->cp_hqd_pq_base_lo = hqd_gpu_addr;
-		mqd->cp_hqd_pq_base_hi = upper_32_bits(hqd_gpu_addr);
-		WREG32(mmCP_HQD_PQ_BASE, mqd->cp_hqd_pq_base_lo);
-		WREG32(mmCP_HQD_PQ_BASE_HI, mqd->cp_hqd_pq_base_hi);
-
-		/* set up the HQD, this is similar to CP_RB0_CNTL */
-		tmp = RREG32(mmCP_HQD_PQ_CONTROL);
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, QUEUE_SIZE,
-				    (order_base_2(ring->ring_size / 4) - 1));
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, RPTR_BLOCK_SIZE,
-			       ((order_base_2(AMDGPU_GPU_PAGE_SIZE / 4) - 1) << 8));
-#ifdef __BIG_ENDIAN
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, ENDIAN_SWAP, 1);
-#endif
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, ROQ_PQ_IB_FLIP, 0);
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1);
-		WREG32(mmCP_HQD_PQ_CONTROL, tmp);
-		mqd->cp_hqd_pq_control = tmp;
-
-		/* set the wb address wether it's enabled or not */
-		wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
-		mqd->cp_hqd_pq_rptr_report_addr_lo = wb_gpu_addr & 0xfffffffc;
-		mqd->cp_hqd_pq_rptr_report_addr_hi =
-			upper_32_bits(wb_gpu_addr) & 0xffff;
-		WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR,
-		       mqd->cp_hqd_pq_rptr_report_addr_lo);
-		WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
-		       mqd->cp_hqd_pq_rptr_report_addr_hi);
-
-		/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
-		wb_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
-		mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
-		mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
-		WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
-		WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI,
-		       mqd->cp_hqd_pq_wptr_poll_addr_hi);
-
-		/* enable the doorbell if requested */
-		if (use_doorbell) {
-			if ((adev->asic_type == CHIP_CARRIZO) ||
-			    (adev->asic_type == CHIP_FIJI) ||
-			    (adev->asic_type == CHIP_STONEY) ||
-			    (adev->asic_type == CHIP_POLARIS11) ||
-			    (adev->asic_type == CHIP_POLARIS10) ||
-			    (adev->asic_type == CHIP_POLARIS12)) {
-				WREG32(mmCP_MEC_DOORBELL_RANGE_LOWER,
-				       AMDGPU_DOORBELL_KIQ << 2);
-				WREG32(mmCP_MEC_DOORBELL_RANGE_UPPER,
-				       AMDGPU_DOORBELL_MEC_RING7 << 2);
-			}
-			tmp = RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
-			tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL,
-					    DOORBELL_OFFSET, ring->doorbell_index);
-			tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
-			tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_SOURCE, 0);
-			tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_HIT, 0);
-			mqd->cp_hqd_pq_doorbell_control = tmp;
+	r = amdgpu_bo_pin(ring->mqd_obj, AMDGPU_GEM_DOMAIN_GTT,
+			&mqd_gpu_addr);
+	if (r) {
+		dev_warn(adev->dev, "(%d) pin MQD bo failed\n", r);
+		goto out_unreserve;
+	}
+	r = amdgpu_bo_kmap(ring->mqd_obj, (void **)&mqd);
+	if (r) {
+		dev_warn(adev->dev, "(%d) map MQD bo failed\n", r);
+		goto out_unreserve;
+	}
 
-		} else {
-			mqd->cp_hqd_pq_doorbell_control = 0;
-		}
-		WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL,
-		       mqd->cp_hqd_pq_doorbell_control);
-
-		/* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
-		ring->wptr = 0;
-		mqd->cp_hqd_pq_wptr = ring->wptr;
-		WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-		mqd->cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
-
-		/* set the vmid for the queue */
-		mqd->cp_hqd_vmid = 0;
-		WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
-
-		tmp = RREG32(mmCP_HQD_PERSISTENT_STATE);
-		tmp = REG_SET_FIELD(tmp, CP_HQD_PERSISTENT_STATE, PRELOAD_SIZE, 0x53);
-		WREG32(mmCP_HQD_PERSISTENT_STATE, tmp);
-		mqd->cp_hqd_persistent_state = tmp;
-		if (adev->asic_type == CHIP_STONEY ||
-			adev->asic_type == CHIP_POLARIS11 ||
-			adev->asic_type == CHIP_POLARIS10 ||
-			adev->asic_type == CHIP_POLARIS12) {
-			tmp = RREG32(mmCP_ME1_PIPE3_INT_CNTL);
-			tmp = REG_SET_FIELD(tmp, CP_ME1_PIPE3_INT_CNTL, GENERIC2_INT_ENABLE, 1);
-			WREG32(mmCP_ME1_PIPE3_INT_CNTL, tmp);
-		}
+	eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + (ring_id * GFX8_MEC_HPD_SIZE);
+	eop_gpu_addr >>= 8;
+
+	/* init the mqd struct */
+	memset(mqd, 0, sizeof(struct vi_mqd));
+
+	mutex_lock(&adev->srbm_mutex);
+	vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
+
+	gfx_v8_0_mqd_init(adev, mqd, mqd_gpu_addr, eop_gpu_addr, ring);
 
-		/* activate the queue */
-		mqd->cp_hqd_active = 1;
-		WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
+	gfx_v8_0_mqd_deactivate(adev);
+	gfx_v8_0_enable_doorbell(adev, ring->use_doorbell);
+	gfx_v8_0_mqd_commit(adev, mqd);
 
-		vi_srbm_select(adev, 0, 0, 0, 0);
-		mutex_unlock(&adev->srbm_mutex);
+	vi_srbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
+
+	amdgpu_bo_kunmap(ring->mqd_obj);
+out_unreserve:
+	amdgpu_bo_unreserve(ring->mqd_obj);
+out:
+	return r;
+}
 
-		amdgpu_bo_kunmap(ring->mqd_obj);
-		amdgpu_bo_unreserve(ring->mqd_obj);
+static int gfx_v8_0_cp_compute_resume(struct amdgpu_device *adev)
+{
+	int r, i;
+	u32 tmp;
+	struct amdgpu_ring *ring;
+
+	/* Stating with gfxv8, all the pipe specific state was removed
+	 * The fields have been moved to be per-HQD now. */
+
+	/* init the queues */
+	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+		r = gfx_v8_0_compute_queue_init(adev, i);
+		if (r) {
+			gfx_v8_0_cp_compute_fini(adev);
+			return r;
+		}
 	}
 
-	if (use_doorbell) {
-		tmp = RREG32(mmCP_PQ_STATUS);
-		tmp = REG_SET_FIELD(tmp, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
-		WREG32(mmCP_PQ_STATUS, tmp);
+	if (adev->asic_type == CHIP_STONEY ||
+	    adev->asic_type == CHIP_POLARIS11 ||
+	    adev->asic_type == CHIP_POLARIS10 ||
+	    adev->asic_type == CHIP_POLARIS12) {
+		tmp = RREG32(mmCP_ME1_PIPE3_INT_CNTL);
+		tmp = REG_SET_FIELD(tmp, CP_ME1_PIPE3_INT_CNTL, GENERIC2_INT_ENABLE, 1);
+		WREG32(mmCP_ME1_PIPE3_INT_CNTL, tmp);
 	}
 
 	gfx_v8_0_cp_compute_enable(adev, true);
 
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-		struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
+		ring = &adev->gfx.compute_ring[i];
 
 		ring->ready = true;
 		r = amdgpu_ring_test_ring(ring);
 		if (r)
 			ring->ready = false;
 	}
 
 	return 0;
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/22] drm/amdgpu: doorbell registers need only be set once v2
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-28 22:14   ` [PATCH 01/22] drm/amdgpu: refactor MQD/HQD initialization Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 03/22] drm/amdgpu: detect timeout error when deactivating hqd Andres Rodriguez
                     ` (21 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

The CP_MEC_DOORBELL_RANGE_* and CP_PQ_STATUS.DOORBELL_ENABLE registers
are not HQD specific.

They only need to be set once if at least 1 pipe requested doorbell
support.

v2: move doorbell_enable to amdgpu_gfx instead of amdgpu_device

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 3 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 +++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e9af031..d699c3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -892,20 +892,23 @@ struct amdgpu_gfx {
 	/* gfx status */
 	uint32_t			gfx_current_status;
 	/* ce ram size*/
 	unsigned			ce_ram_size;
 	struct amdgpu_cu_info		cu_info;
 	const struct amdgpu_gfx_funcs	*funcs;
 
 	/* reset mask */
 	uint32_t                        grbm_soft_reset;
 	uint32_t                        srbm_soft_reset;
+
+	/* doorbell */
+	bool				doorbell_enabled;
 };
 
 int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		  unsigned size, struct amdgpu_ib *ib);
 void amdgpu_ib_free(struct amdgpu_device *adev, struct amdgpu_ib *ib,
 		    struct dma_fence *f);
 int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 		       struct amdgpu_ib *ibs, struct amdgpu_job *job,
 		       struct dma_fence **f);
 int amdgpu_ib_pool_init(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 1c8589a..044449a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4797,35 +4797,37 @@ static int gfx_v8_0_mqd_deactivate(struct amdgpu_device *adev)
 		WREG32(mmCP_HQD_PQ_WPTR, 0);
 	}
 
 	return 0;
 }
 
 static void gfx_v8_0_enable_doorbell(struct amdgpu_device *adev, bool enable)
 {
 	uint32_t tmp;
 
-	if (!enable)
+	if (!enable || adev->gfx.doorbell_enabled)
 		return;
 
 	if ((adev->asic_type == CHIP_CARRIZO) ||
 			(adev->asic_type == CHIP_FIJI) ||
 			(adev->asic_type == CHIP_STONEY) ||
 			(adev->asic_type == CHIP_POLARIS11) ||
 			(adev->asic_type == CHIP_POLARIS10)) {
 		WREG32(mmCP_MEC_DOORBELL_RANGE_LOWER, AMDGPU_DOORBELL_KIQ << 2);
 		WREG32(mmCP_MEC_DOORBELL_RANGE_UPPER, AMDGPU_DOORBELL_MEC_RING7 << 2);
 	}
 
 	tmp = RREG32(mmCP_PQ_STATUS);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
 	WREG32(mmCP_PQ_STATUS, tmp);
+
+	adev->gfx.doorbell_enabled = true;
 }
 
 static int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 {
 	uint32_t tmp;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
@@ -5109,20 +5111,22 @@ static int gfx_v8_0_cp_compute_resume(struct amdgpu_device *adev)
 			ring->ready = false;
 	}
 
 	return 0;
 }
 
 static int gfx_v8_0_cp_resume(struct amdgpu_device *adev)
 {
 	int r;
 
+	adev->gfx.doorbell_enabled = false;
+
 	if (!(adev->flags & AMD_IS_APU))
 		gfx_v8_0_enable_gui_idle_interrupt(adev, false);
 
 	if (!adev->pp_enabled) {
 		if (!adev->firmware.smu_load) {
 			/* legacy firmware loading */
 			r = gfx_v8_0_cp_gfx_load_microcode(adev);
 			if (r)
 				return r;
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/22] drm/amdgpu: detect timeout error when deactivating hqd
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-28 22:14   ` [PATCH 01/22] drm/amdgpu: refactor MQD/HQD initialization Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 02/22] drm/amdgpu: doorbell registers need only be set once v2 Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 04/22] drm/amdgpu: remove duplicate definition of cik_mqd Andres Rodriguez
                     ` (20 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Handle HQD deactivation timeouts instead of ignoring them.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 044449a..af4b505 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4884,20 +4884,21 @@ static int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 	/* activate the queue */
 	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
 
 	return 0;
 }
 
 static int gfx_v8_0_kiq_queue_init(struct amdgpu_ring *ring,
 				   struct vi_mqd *mqd,
 				   u64 mqd_gpu_addr)
 {
+	int r = 0;
 	struct amdgpu_device *adev = ring->adev;
 	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
 	uint64_t eop_gpu_addr;
 	bool is_kiq = false;
 
 	if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
 		is_kiq = true;
 
 	if (is_kiq) {
 		eop_gpu_addr = kiq->eop_gpu_addr;
@@ -4905,34 +4906,45 @@ static int gfx_v8_0_kiq_queue_init(struct amdgpu_ring *ring,
 	} else
 		eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr +
 					ring->queue * GFX8_MEC_HPD_SIZE;
 
 	mutex_lock(&adev->srbm_mutex);
 	vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
 
 	gfx_v8_0_mqd_init(adev, mqd, mqd_gpu_addr, eop_gpu_addr, ring);
 
 	if (is_kiq) {
-		gfx_v8_0_mqd_deactivate(adev);
+		r = gfx_v8_0_mqd_deactivate(adev);
+		if (r) {
+			dev_err(adev->dev, "failed to deactivate ring %s\n", ring->name);
+			goto out_unlock;
+		}
+
 		gfx_v8_0_enable_doorbell(adev, ring->use_doorbell);
 		gfx_v8_0_mqd_commit(adev, mqd);
 	}
 
 	vi_srbm_select(adev, 0, 0, 0, 0);
 	mutex_unlock(&adev->srbm_mutex);
 
 	if (is_kiq)
 		gfx_v8_0_kiq_enable(ring);
 	else
 		gfx_v8_0_map_queue_enable(&kiq->ring, ring);
 
 	return 0;
+
+out_unlock:
+	vi_srbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
+
+	return r;
 }
 
 static void gfx_v8_0_kiq_free_queue(struct amdgpu_device *adev)
 {
 	struct amdgpu_ring *ring = NULL;
 	int i;
 
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		ring = &adev->gfx.compute_ring[i];
 		amdgpu_bo_free_kernel(&ring->mqd_obj, NULL, NULL);
@@ -5052,24 +5064,30 @@ static int gfx_v8_0_compute_queue_init(struct amdgpu_device *adev,
 	eop_gpu_addr >>= 8;
 
 	/* init the mqd struct */
 	memset(mqd, 0, sizeof(struct vi_mqd));
 
 	mutex_lock(&adev->srbm_mutex);
 	vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
 
 	gfx_v8_0_mqd_init(adev, mqd, mqd_gpu_addr, eop_gpu_addr, ring);
 
-	gfx_v8_0_mqd_deactivate(adev);
+	r = gfx_v8_0_mqd_deactivate(adev);
+	if (r) {
+		dev_err(adev->dev, "failed to deactivate ring %s\n", ring->name);
+		goto out_unlock;
+	}
+
 	gfx_v8_0_enable_doorbell(adev, ring->use_doorbell);
 	gfx_v8_0_mqd_commit(adev, mqd);
 
+out_unlock:
 	vi_srbm_select(adev, 0, 0, 0, 0);
 	mutex_unlock(&adev->srbm_mutex);
 
 	amdgpu_bo_kunmap(ring->mqd_obj);
 out_unreserve:
 	amdgpu_bo_unreserve(ring->mqd_obj);
 out:
 	return r;
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/22] drm/amdgpu: remove duplicate definition of cik_mqd
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 03/22] drm/amdgpu: detect timeout error when deactivating hqd Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 05/22] drm/amdgpu: unify MQD programming sequence for kfd and amdgpu Andres Rodriguez
                     ` (19 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

The gfxv7 contains a slightly different version of cik_mqd called
bonaire_mqd. This can introduce subtle bugs if fixes are not applied in
both places.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 135 ++++++++++++++--------------------
 1 file changed, 54 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 8e1e601..c606e0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -20,20 +20,21 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
 #include <linux/firmware.h>
 #include "drmP.h"
 #include "amdgpu.h"
 #include "amdgpu_ih.h"
 #include "amdgpu_gfx.h"
 #include "cikd.h"
 #include "cik.h"
+#include "cik_structs.h"
 #include "atom.h"
 #include "amdgpu_ucode.h"
 #include "clearstate_ci.h"
 
 #include "dce/dce_8_0_d.h"
 #include "dce/dce_8_0_sh_mask.h"
 
 #include "bif/bif_4_1_d.h"
 #include "bif/bif_4_1_sh_mask.h"
 
@@ -2888,48 +2889,20 @@ struct hqd_registers
 	u32 cp_hqd_msg_type;
 	u32 cp_hqd_atomic0_preop_lo;
 	u32 cp_hqd_atomic0_preop_hi;
 	u32 cp_hqd_atomic1_preop_lo;
 	u32 cp_hqd_atomic1_preop_hi;
 	u32 cp_hqd_hq_scheduler0;
 	u32 cp_hqd_hq_scheduler1;
 	u32 cp_mqd_control;
 };
 
-struct bonaire_mqd
-{
-	u32 header;
-	u32 dispatch_initiator;
-	u32 dimensions[3];
-	u32 start_idx[3];
-	u32 num_threads[3];
-	u32 pipeline_stat_enable;
-	u32 perf_counter_enable;
-	u32 pgm[2];
-	u32 tba[2];
-	u32 tma[2];
-	u32 pgm_rsrc[2];
-	u32 vmid;
-	u32 resource_limits;
-	u32 static_thread_mgmt01[2];
-	u32 tmp_ring_size;
-	u32 static_thread_mgmt23[2];
-	u32 restart[3];
-	u32 thread_trace_enable;
-	u32 reserved1;
-	u32 user_data[16];
-	u32 vgtcs_invoke_count[2];
-	struct hqd_registers queue_state;
-	u32 dequeue_cntr;
-	u32 interrupt_queue[64];
-};
-
 static void gfx_v7_0_compute_pipe_init(struct amdgpu_device *adev, int me, int pipe)
 {
 	u64 eop_gpu_addr;
 	u32 tmp;
 	size_t eop_offset = me * pipe * GFX7_MEC_HPD_SIZE * 2;
 
 	mutex_lock(&adev->srbm_mutex);
 	eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + eop_offset;
 
 	cik_srbm_select(adev, me, pipe, 0, 0);
@@ -2969,162 +2942,162 @@ static int gfx_v7_0_mqd_deactivate(struct amdgpu_device *adev)
 
 		WREG32(mmCP_HQD_DEQUEUE_REQUEST, 0);
 		WREG32(mmCP_HQD_PQ_RPTR, 0);
 		WREG32(mmCP_HQD_PQ_WPTR, 0);
 	}
 
 	return 0;
 }
 
 static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
-			     struct bonaire_mqd *mqd,
+			     struct cik_mqd *mqd,
 			     uint64_t mqd_gpu_addr,
 			     struct amdgpu_ring *ring)
 {
 	u64 hqd_gpu_addr;
 	u64 wb_gpu_addr;
 
 	/* init the mqd struct */
-	memset(mqd, 0, sizeof(struct bonaire_mqd));
+	memset(mqd, 0, sizeof(struct cik_mqd));
 
 	mqd->header = 0xC0310800;
-	mqd->static_thread_mgmt01[0] = 0xffffffff;
-	mqd->static_thread_mgmt01[1] = 0xffffffff;
-	mqd->static_thread_mgmt23[0] = 0xffffffff;
-	mqd->static_thread_mgmt23[1] = 0xffffffff;
+	mqd->compute_static_thread_mgmt_se0 = 0xffffffff;
+	mqd->compute_static_thread_mgmt_se1 = 0xffffffff;
+	mqd->compute_static_thread_mgmt_se2 = 0xffffffff;
+	mqd->compute_static_thread_mgmt_se3 = 0xffffffff;
 
 	/* enable doorbell? */
-	mqd->queue_state.cp_hqd_pq_doorbell_control =
+	mqd->cp_hqd_pq_doorbell_control =
 		RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
 	if (ring->use_doorbell)
-		mqd->queue_state.cp_hqd_pq_doorbell_control |= CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
+		mqd->cp_hqd_pq_doorbell_control |= CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
 	else
-		mqd->queue_state.cp_hqd_pq_doorbell_control &= ~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
+		mqd->cp_hqd_pq_doorbell_control &= ~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
 
 	/* set the pointer to the MQD */
-	mqd->queue_state.cp_mqd_base_addr = mqd_gpu_addr & 0xfffffffc;
-	mqd->queue_state.cp_mqd_base_addr_hi = upper_32_bits(mqd_gpu_addr);
+	mqd->cp_mqd_base_addr_lo = mqd_gpu_addr & 0xfffffffc;
+	mqd->cp_mqd_base_addr_hi = upper_32_bits(mqd_gpu_addr);
 
 	/* set MQD vmid to 0 */
-	mqd->queue_state.cp_mqd_control = RREG32(mmCP_MQD_CONTROL);
-	mqd->queue_state.cp_mqd_control &= ~CP_MQD_CONTROL__VMID_MASK;
+	mqd->cp_mqd_control = RREG32(mmCP_MQD_CONTROL);
+	mqd->cp_mqd_control &= ~CP_MQD_CONTROL__VMID_MASK;
 
 	/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
 	hqd_gpu_addr = ring->gpu_addr >> 8;
-	mqd->queue_state.cp_hqd_pq_base = hqd_gpu_addr;
-	mqd->queue_state.cp_hqd_pq_base_hi = upper_32_bits(hqd_gpu_addr);
+	mqd->cp_hqd_pq_base_lo = hqd_gpu_addr;
+	mqd->cp_hqd_pq_base_hi = upper_32_bits(hqd_gpu_addr);
 
 	/* set up the HQD, this is similar to CP_RB0_CNTL */
-	mqd->queue_state.cp_hqd_pq_control = RREG32(mmCP_HQD_PQ_CONTROL);
-	mqd->queue_state.cp_hqd_pq_control &=
+	mqd->cp_hqd_pq_control = RREG32(mmCP_HQD_PQ_CONTROL);
+	mqd->cp_hqd_pq_control &=
 		~(CP_HQD_PQ_CONTROL__QUEUE_SIZE_MASK |
 				CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE_MASK);
 
-	mqd->queue_state.cp_hqd_pq_control |=
+	mqd->cp_hqd_pq_control |=
 		order_base_2(ring->ring_size / 8);
-	mqd->queue_state.cp_hqd_pq_control |=
+	mqd->cp_hqd_pq_control |=
 		(order_base_2(AMDGPU_GPU_PAGE_SIZE/8) << 8);
 #ifdef __BIG_ENDIAN
-	mqd->queue_state.cp_hqd_pq_control |=
+	mqd->cp_hqd_pq_control |=
 		2 << CP_HQD_PQ_CONTROL__ENDIAN_SWAP__SHIFT;
 #endif
-	mqd->queue_state.cp_hqd_pq_control &=
+	mqd->cp_hqd_pq_control &=
 		~(CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK |
 				CP_HQD_PQ_CONTROL__ROQ_PQ_IB_FLIP_MASK |
 				CP_HQD_PQ_CONTROL__PQ_VOLATILE_MASK);
-	mqd->queue_state.cp_hqd_pq_control |=
+	mqd->cp_hqd_pq_control |=
 		CP_HQD_PQ_CONTROL__PRIV_STATE_MASK |
 		CP_HQD_PQ_CONTROL__KMD_QUEUE_MASK; /* assuming kernel queue control */
 
 	/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
 	wb_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
-	mqd->queue_state.cp_hqd_pq_wptr_poll_addr = wb_gpu_addr & 0xfffffffc;
-	mqd->queue_state.cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
+	mqd->cp_hqd_pq_wptr_poll_addr_lo = wb_gpu_addr & 0xfffffffc;
+	mqd->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits(wb_gpu_addr) & 0xffff;
 
 	/* set the wb address wether it's enabled or not */
 	wb_gpu_addr = adev->wb.gpu_addr + (ring->rptr_offs * 4);
-	mqd->queue_state.cp_hqd_pq_rptr_report_addr = wb_gpu_addr & 0xfffffffc;
-	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi =
+	mqd->cp_hqd_pq_rptr_report_addr_lo = wb_gpu_addr & 0xfffffffc;
+	mqd->cp_hqd_pq_rptr_report_addr_hi =
 		upper_32_bits(wb_gpu_addr) & 0xffff;
 
 	/* enable the doorbell if requested */
 	if (ring->use_doorbell) {
-		mqd->queue_state.cp_hqd_pq_doorbell_control =
+		mqd->cp_hqd_pq_doorbell_control =
 			RREG32(mmCP_HQD_PQ_DOORBELL_CONTROL);
-		mqd->queue_state.cp_hqd_pq_doorbell_control &=
+		mqd->cp_hqd_pq_doorbell_control &=
 			~CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET_MASK;
-		mqd->queue_state.cp_hqd_pq_doorbell_control |=
+		mqd->cp_hqd_pq_doorbell_control |=
 			(ring->doorbell_index <<
 			 CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT);
-		mqd->queue_state.cp_hqd_pq_doorbell_control |=
+		mqd->cp_hqd_pq_doorbell_control |=
 			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_EN_MASK;
-		mqd->queue_state.cp_hqd_pq_doorbell_control &=
+		mqd->cp_hqd_pq_doorbell_control &=
 			~(CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_SOURCE_MASK |
 					CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_HIT_MASK);
 
 	} else {
-		mqd->queue_state.cp_hqd_pq_doorbell_control = 0;
+		mqd->cp_hqd_pq_doorbell_control = 0;
 	}
 
 	/* read and write pointers, similar to CP_RB0_WPTR/_RPTR */
 	ring->wptr = 0;
-	mqd->queue_state.cp_hqd_pq_wptr = ring->wptr;
-	mqd->queue_state.cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
+	mqd->cp_hqd_pq_wptr = ring->wptr;
+	mqd->cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
 
 	/* set the vmid for the queue */
-	mqd->queue_state.cp_hqd_vmid = 0;
+	mqd->cp_hqd_vmid = 0;
 
 	/* activate the queue */
-	mqd->queue_state.cp_hqd_active = 1;
+	mqd->cp_hqd_active = 1;
 }
 
 static int gfx_v7_0_mqd_commit(struct amdgpu_device *adev,
-			       struct bonaire_mqd *mqd)
+			       struct cik_mqd *mqd)
 {
 	u32 tmp;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
 	/* program MQD field to HW */
-	WREG32(mmCP_MQD_BASE_ADDR, mqd->queue_state.cp_mqd_base_addr);
-	WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->queue_state.cp_mqd_base_addr_hi);
-	WREG32(mmCP_MQD_CONTROL, mqd->queue_state.cp_mqd_control);
-	WREG32(mmCP_HQD_PQ_BASE, mqd->queue_state.cp_hqd_pq_base);
-	WREG32(mmCP_HQD_PQ_BASE_HI, mqd->queue_state.cp_hqd_pq_base_hi);
-	WREG32(mmCP_HQD_PQ_CONTROL, mqd->queue_state.cp_hqd_pq_control);
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->queue_state.cp_hqd_pq_wptr_poll_addr);
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->queue_state.cp_hqd_pq_wptr_poll_addr_hi);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, mqd->queue_state.cp_hqd_pq_rptr_report_addr);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI, mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi);
-	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->queue_state.cp_hqd_pq_doorbell_control);
-	WREG32(mmCP_HQD_PQ_WPTR, mqd->queue_state.cp_hqd_pq_wptr);
-	WREG32(mmCP_HQD_VMID, mqd->queue_state.cp_hqd_vmid);
+	WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
+	WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->cp_mqd_base_addr_hi);
+	WREG32(mmCP_MQD_CONTROL, mqd->cp_mqd_control);
+	WREG32(mmCP_HQD_PQ_BASE, mqd->cp_hqd_pq_base_lo);
+	WREG32(mmCP_HQD_PQ_BASE_HI, mqd->cp_hqd_pq_base_hi);
+	WREG32(mmCP_HQD_PQ_CONTROL, mqd->cp_hqd_pq_control);
+	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
+	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
+	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, mqd->cp_hqd_pq_rptr_report_addr_lo);
+	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI, mqd->cp_hqd_pq_rptr_report_addr_hi);
+	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
+	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
+	WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
 
 	/* activate the HQD */
-	WREG32(mmCP_HQD_ACTIVE, mqd->queue_state.cp_hqd_active);
+	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
 
 	return 0;
 }
 
 static int gfx_v7_0_compute_queue_init(struct amdgpu_device *adev, int ring_id)
 {
 	int r;
 	u64 mqd_gpu_addr;
-	struct bonaire_mqd *mqd;
+	struct cik_mqd *mqd;
 	struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
 
 	if (ring->mqd_obj == NULL) {
 		r = amdgpu_bo_create(adev,
-				sizeof(struct bonaire_mqd),
+				sizeof(struct cik_mqd),
 				PAGE_SIZE, true,
 				AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				&ring->mqd_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create MQD bo failed\n", r);
 			return r;
 		}
 	}
 
 	r = amdgpu_bo_reserve(ring->mqd_obj, false);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/22] drm/amdgpu: unify MQD programming sequence for kfd and amdgpu
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 04/22] drm/amdgpu: remove duplicate definition of cik_mqd Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 06/22] drm/amdgpu: rename rdev to adev Andres Rodriguez
                     ` (18 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Use the same gfx_*_mqd_commit function for kfd and amdgpu codepaths.

This removes the last duplicates of this programming sequence.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 51 ++---------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 49 ++--------------------
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c             | 38 ++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.h             |  5 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c             | 44 ++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.h             |  5 +++
 6 files changed, 97 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 1a0a5f7..038b7ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -22,20 +22,21 @@
 
 #include <linux/fdtable.h>
 #include <linux/uaccess.h>
 #include <linux/firmware.h>
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "cikd.h"
 #include "cik_sdma.h"
 #include "amdgpu_ucode.h"
+#include "gfx_v7_0.h"
 #include "gca/gfx_7_2_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 #include "gmc/gmc_7_1_d.h"
 #include "gmc/gmc_7_1_sh_mask.h"
 #include "cik_structs.h"
 
 #define CIK_PIPE_PER_MEC	(4)
@@ -302,69 +303,25 @@ static inline struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd)
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
 			uint32_t queue_id, uint32_t __user *wptr)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	uint32_t wptr_shadow, is_wptr_shadow_valid;
 	struct cik_mqd *m;
 
 	m = get_mqd(mqd);
 
 	is_wptr_shadow_valid = !get_user(wptr_shadow, wptr);
-
-	acquire_queue(kgd, pipe_id, queue_id);
-	WREG32(mmCP_MQD_BASE_ADDR, m->cp_mqd_base_addr_lo);
-	WREG32(mmCP_MQD_BASE_ADDR_HI, m->cp_mqd_base_addr_hi);
-	WREG32(mmCP_MQD_CONTROL, m->cp_mqd_control);
-
-	WREG32(mmCP_HQD_PQ_BASE, m->cp_hqd_pq_base_lo);
-	WREG32(mmCP_HQD_PQ_BASE_HI, m->cp_hqd_pq_base_hi);
-	WREG32(mmCP_HQD_PQ_CONTROL, m->cp_hqd_pq_control);
-
-	WREG32(mmCP_HQD_IB_CONTROL, m->cp_hqd_ib_control);
-	WREG32(mmCP_HQD_IB_BASE_ADDR, m->cp_hqd_ib_base_addr_lo);
-	WREG32(mmCP_HQD_IB_BASE_ADDR_HI, m->cp_hqd_ib_base_addr_hi);
-
-	WREG32(mmCP_HQD_IB_RPTR, m->cp_hqd_ib_rptr);
-
-	WREG32(mmCP_HQD_PERSISTENT_STATE, m->cp_hqd_persistent_state);
-	WREG32(mmCP_HQD_SEMA_CMD, m->cp_hqd_sema_cmd);
-	WREG32(mmCP_HQD_MSG_TYPE, m->cp_hqd_msg_type);
-
-	WREG32(mmCP_HQD_ATOMIC0_PREOP_LO, m->cp_hqd_atomic0_preop_lo);
-	WREG32(mmCP_HQD_ATOMIC0_PREOP_HI, m->cp_hqd_atomic0_preop_hi);
-	WREG32(mmCP_HQD_ATOMIC1_PREOP_LO, m->cp_hqd_atomic1_preop_lo);
-	WREG32(mmCP_HQD_ATOMIC1_PREOP_HI, m->cp_hqd_atomic1_preop_hi);
-
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, m->cp_hqd_pq_rptr_report_addr_lo);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
-			m->cp_hqd_pq_rptr_report_addr_hi);
-
-	WREG32(mmCP_HQD_PQ_RPTR, m->cp_hqd_pq_rptr);
-
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, m->cp_hqd_pq_wptr_poll_addr_lo);
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, m->cp_hqd_pq_wptr_poll_addr_hi);
-
-	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, m->cp_hqd_pq_doorbell_control);
-
-	WREG32(mmCP_HQD_VMID, m->cp_hqd_vmid);
-
-	WREG32(mmCP_HQD_QUANTUM, m->cp_hqd_quantum);
-
-	WREG32(mmCP_HQD_PIPE_PRIORITY, m->cp_hqd_pipe_priority);
-	WREG32(mmCP_HQD_QUEUE_PRIORITY, m->cp_hqd_queue_priority);
-
-	WREG32(mmCP_HQD_IQ_RPTR, m->cp_hqd_iq_rptr);
-
 	if (is_wptr_shadow_valid)
-		WREG32(mmCP_HQD_PQ_WPTR, wptr_shadow);
+		m->cp_hqd_pq_wptr = wptr_shadow;
 
-	WREG32(mmCP_HQD_ACTIVE, m->cp_hqd_active);
+	acquire_queue(kgd, pipe_id, queue_id);
+	gfx_v7_0_mqd_commit(adev, m);
 	release_queue(kgd);
 
 	return 0;
 }
 
 static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	struct cik_sdma_rlc_registers *m;
 	uint32_t sdma_base_addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 6697612..2ecef3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -21,20 +21,21 @@
  */
 
 #include <linux/module.h>
 #include <linux/fdtable.h>
 #include <linux/uaccess.h>
 #include <linux/firmware.h>
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_ucode.h"
+#include "gfx_v8_0.h"
 #include "gca/gfx_8_0_sh_mask.h"
 #include "gca/gfx_8_0_d.h"
 #include "gca/gfx_8_0_enum.h"
 #include "oss/oss_3_0_sh_mask.h"
 #include "oss/oss_3_0_d.h"
 #include "gmc/gmc_8_1_sh_mask.h"
 #include "gmc/gmc_8_1_d.h"
 #include "vi_structs.h"
 #include "vid.h"
 
@@ -244,67 +245,25 @@ static inline struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd)
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
 			uint32_t queue_id, uint32_t __user *wptr)
 {
 	struct vi_mqd *m;
 	uint32_t shadow_wptr, valid_wptr;
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
 	m = get_mqd(mqd);
 
 	valid_wptr = copy_from_user(&shadow_wptr, wptr, sizeof(shadow_wptr));
-	acquire_queue(kgd, pipe_id, queue_id);
-
-	WREG32(mmCP_MQD_CONTROL, m->cp_mqd_control);
-	WREG32(mmCP_MQD_BASE_ADDR, m->cp_mqd_base_addr_lo);
-	WREG32(mmCP_MQD_BASE_ADDR_HI, m->cp_mqd_base_addr_hi);
-
-	WREG32(mmCP_HQD_VMID, m->cp_hqd_vmid);
-	WREG32(mmCP_HQD_PERSISTENT_STATE, m->cp_hqd_persistent_state);
-	WREG32(mmCP_HQD_PIPE_PRIORITY, m->cp_hqd_pipe_priority);
-	WREG32(mmCP_HQD_QUEUE_PRIORITY, m->cp_hqd_queue_priority);
-	WREG32(mmCP_HQD_QUANTUM, m->cp_hqd_quantum);
-	WREG32(mmCP_HQD_PQ_BASE, m->cp_hqd_pq_base_lo);
-	WREG32(mmCP_HQD_PQ_BASE_HI, m->cp_hqd_pq_base_hi);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, m->cp_hqd_pq_rptr_report_addr_lo);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
-			m->cp_hqd_pq_rptr_report_addr_hi);
-
 	if (valid_wptr > 0)
-		WREG32(mmCP_HQD_PQ_WPTR, shadow_wptr);
-
-	WREG32(mmCP_HQD_PQ_CONTROL, m->cp_hqd_pq_control);
-	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, m->cp_hqd_pq_doorbell_control);
-
-	WREG32(mmCP_HQD_EOP_BASE_ADDR, m->cp_hqd_eop_base_addr_lo);
-	WREG32(mmCP_HQD_EOP_BASE_ADDR_HI, m->cp_hqd_eop_base_addr_hi);
-	WREG32(mmCP_HQD_EOP_CONTROL, m->cp_hqd_eop_control);
-	WREG32(mmCP_HQD_EOP_RPTR, m->cp_hqd_eop_rptr);
-	WREG32(mmCP_HQD_EOP_WPTR, m->cp_hqd_eop_wptr);
-	WREG32(mmCP_HQD_EOP_EVENTS, m->cp_hqd_eop_done_events);
-
-	WREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_LO, m->cp_hqd_ctx_save_base_addr_lo);
-	WREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_HI, m->cp_hqd_ctx_save_base_addr_hi);
-	WREG32(mmCP_HQD_CTX_SAVE_CONTROL, m->cp_hqd_ctx_save_control);
-	WREG32(mmCP_HQD_CNTL_STACK_OFFSET, m->cp_hqd_cntl_stack_offset);
-	WREG32(mmCP_HQD_CNTL_STACK_SIZE, m->cp_hqd_cntl_stack_size);
-	WREG32(mmCP_HQD_WG_STATE_OFFSET, m->cp_hqd_wg_state_offset);
-	WREG32(mmCP_HQD_CTX_SAVE_SIZE, m->cp_hqd_ctx_save_size);
-
-	WREG32(mmCP_HQD_IB_CONTROL, m->cp_hqd_ib_control);
-
-	WREG32(mmCP_HQD_DEQUEUE_REQUEST, m->cp_hqd_dequeue_request);
-	WREG32(mmCP_HQD_ERROR, m->cp_hqd_error);
-	WREG32(mmCP_HQD_EOP_WPTR_MEM, m->cp_hqd_eop_wptr_mem);
-	WREG32(mmCP_HQD_EOP_DONES, m->cp_hqd_eop_dones);
-
-	WREG32(mmCP_HQD_ACTIVE, m->cp_hqd_active);
+		m->cp_hqd_pq_wptr = valid_wptr;
 
+	acquire_queue(kgd, pipe_id, queue_id);
+	gfx_v8_0_mqd_commit(adev, mqd);
 	release_queue(kgd);
 
 	return 0;
 }
 
 static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd)
 {
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index c606e0b..7e5b426 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -3039,26 +3039,43 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
 	}
 
 	/* read and write pointers, similar to CP_RB0_WPTR/_RPTR */
 	ring->wptr = 0;
 	mqd->cp_hqd_pq_wptr = ring->wptr;
 	mqd->cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
 
 	/* set the vmid for the queue */
 	mqd->cp_hqd_vmid = 0;
 
+	/* defaults */
+	mqd->cp_hqd_ib_control = RREG32(mmCP_HQD_IB_CONTROL);
+	mqd->cp_hqd_ib_base_addr_lo = RREG32(mmCP_HQD_IB_BASE_ADDR);
+	mqd->cp_hqd_ib_base_addr_hi = RREG32(mmCP_HQD_IB_BASE_ADDR_HI);
+	mqd->cp_hqd_ib_rptr = RREG32(mmCP_HQD_IB_RPTR);
+	mqd->cp_hqd_persistent_state = RREG32(mmCP_HQD_PERSISTENT_STATE);
+	mqd->cp_hqd_sema_cmd = RREG32(mmCP_HQD_SEMA_CMD);
+	mqd->cp_hqd_msg_type = RREG32(mmCP_HQD_MSG_TYPE);
+	mqd->cp_hqd_atomic0_preop_lo = RREG32(mmCP_HQD_ATOMIC0_PREOP_LO);
+	mqd->cp_hqd_atomic0_preop_hi = RREG32(mmCP_HQD_ATOMIC0_PREOP_HI);
+	mqd->cp_hqd_atomic1_preop_lo = RREG32(mmCP_HQD_ATOMIC1_PREOP_LO);
+	mqd->cp_hqd_atomic1_preop_hi = RREG32(mmCP_HQD_ATOMIC1_PREOP_HI);
+	mqd->cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
+	mqd->cp_hqd_quantum = RREG32(mmCP_HQD_QUANTUM);
+	mqd->cp_hqd_pipe_priority = RREG32(mmCP_HQD_PIPE_PRIORITY);
+	mqd->cp_hqd_queue_priority = RREG32(mmCP_HQD_QUEUE_PRIORITY);
+	mqd->cp_hqd_iq_rptr = RREG32(mmCP_HQD_IQ_RPTR);
+
 	/* activate the queue */
 	mqd->cp_hqd_active = 1;
 }
 
-static int gfx_v7_0_mqd_commit(struct amdgpu_device *adev,
-			       struct cik_mqd *mqd)
+int gfx_v7_0_mqd_commit(struct amdgpu_device *adev, struct cik_mqd *mqd)
 {
 	u32 tmp;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
 	/* program MQD field to HW */
 	WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
@@ -3068,20 +3085,37 @@ static int gfx_v7_0_mqd_commit(struct amdgpu_device *adev,
 	WREG32(mmCP_HQD_PQ_BASE_HI, mqd->cp_hqd_pq_base_hi);
 	WREG32(mmCP_HQD_PQ_CONTROL, mqd->cp_hqd_pq_control);
 	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
 	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
 	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, mqd->cp_hqd_pq_rptr_report_addr_lo);
 	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI, mqd->cp_hqd_pq_rptr_report_addr_hi);
 	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
 	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
 	WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
 
+	WREG32(mmCP_HQD_IB_CONTROL, mqd->cp_hqd_ib_control);
+	WREG32(mmCP_HQD_IB_BASE_ADDR, mqd->cp_hqd_ib_base_addr_lo);
+	WREG32(mmCP_HQD_IB_BASE_ADDR_HI, mqd->cp_hqd_ib_base_addr_hi);
+	WREG32(mmCP_HQD_IB_RPTR, mqd->cp_hqd_ib_rptr);
+	WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
+	WREG32(mmCP_HQD_SEMA_CMD, mqd->cp_hqd_sema_cmd);
+	WREG32(mmCP_HQD_MSG_TYPE, mqd->cp_hqd_msg_type);
+	WREG32(mmCP_HQD_ATOMIC0_PREOP_LO, mqd->cp_hqd_atomic0_preop_lo);
+	WREG32(mmCP_HQD_ATOMIC0_PREOP_HI, mqd->cp_hqd_atomic0_preop_hi);
+	WREG32(mmCP_HQD_ATOMIC1_PREOP_LO, mqd->cp_hqd_atomic1_preop_lo);
+	WREG32(mmCP_HQD_ATOMIC1_PREOP_HI, mqd->cp_hqd_atomic1_preop_hi);
+	WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
+	WREG32(mmCP_HQD_QUANTUM, mqd->cp_hqd_quantum);
+	WREG32(mmCP_HQD_PIPE_PRIORITY, mqd->cp_hqd_pipe_priority);
+	WREG32(mmCP_HQD_QUEUE_PRIORITY, mqd->cp_hqd_queue_priority);
+	WREG32(mmCP_HQD_IQ_RPTR, mqd->cp_hqd_iq_rptr);
+
 	/* activate the HQD */
 	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
 
 	return 0;
 }
 
 static int gfx_v7_0_compute_queue_init(struct amdgpu_device *adev, int ring_id)
 {
 	int r;
 	u64 mqd_gpu_addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.h b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.h
index 2f5164c..6fb9c15 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.h
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.h
@@ -22,11 +22,16 @@
  */
 
 #ifndef __GFX_V7_0_H__
 #define __GFX_V7_0_H__
 
 extern const struct amdgpu_ip_block_version gfx_v7_0_ip_block;
 extern const struct amdgpu_ip_block_version gfx_v7_1_ip_block;
 extern const struct amdgpu_ip_block_version gfx_v7_2_ip_block;
 extern const struct amdgpu_ip_block_version gfx_v7_3_ip_block;
 
+struct amdgpu_device;
+struct cik_mqd;
+
+int gfx_v7_0_mqd_commit(struct amdgpu_device *adev, struct cik_mqd *mqd);
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index af4b505..f862bc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4763,20 +4763,40 @@ static int gfx_v8_0_mqd_init(struct amdgpu_device *adev,
 	mqd->cp_hqd_pq_wptr = ring->wptr;
 	mqd->cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
 
 	/* set the vmid for the queue */
 	mqd->cp_hqd_vmid = 0;
 
 	tmp = RREG32(mmCP_HQD_PERSISTENT_STATE);
 	tmp = REG_SET_FIELD(tmp, CP_HQD_PERSISTENT_STATE, PRELOAD_SIZE, 0x53);
 	mqd->cp_hqd_persistent_state = tmp;
 
+	/* defaults */
+	mqd->cp_hqd_eop_rptr = RREG32(mmCP_HQD_EOP_RPTR);
+	mqd->cp_hqd_eop_wptr = RREG32(mmCP_HQD_EOP_WPTR);
+	mqd->cp_hqd_pipe_priority = RREG32(mmCP_HQD_PIPE_PRIORITY);
+	mqd->cp_hqd_queue_priority = RREG32(mmCP_HQD_QUEUE_PRIORITY);
+	mqd->cp_hqd_quantum = RREG32(mmCP_HQD_QUANTUM);
+	mqd->cp_hqd_ctx_save_base_addr_lo = RREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_LO);
+	mqd->cp_hqd_ctx_save_base_addr_hi = RREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_HI);
+	mqd->cp_hqd_ctx_save_control = RREG32(mmCP_HQD_CTX_SAVE_CONTROL);
+	mqd->cp_hqd_cntl_stack_offset = RREG32(mmCP_HQD_CNTL_STACK_OFFSET);
+	mqd->cp_hqd_cntl_stack_size = RREG32(mmCP_HQD_CNTL_STACK_SIZE);
+	mqd->cp_hqd_wg_state_offset = RREG32(mmCP_HQD_WG_STATE_OFFSET);
+	mqd->cp_hqd_ctx_save_size = RREG32(mmCP_HQD_CTX_SAVE_SIZE);
+	mqd->cp_hqd_ib_control = RREG32(mmCP_HQD_IB_CONTROL);
+	mqd->cp_hqd_eop_done_events = RREG32(mmCP_HQD_EOP_EVENTS);
+	mqd->cp_hqd_error = RREG32(mmCP_HQD_ERROR);
+	mqd->cp_hqd_eop_wptr_mem = RREG32(mmCP_HQD_EOP_WPTR_MEM);
+	mqd->cp_hqd_eop_dones = RREG32(mmCP_HQD_EOP_DONES);
+
+
 	/* activate the queue */
 	mqd->cp_hqd_active = 1;
 
 	return 0;
 }
 
 static int gfx_v8_0_mqd_deactivate(struct amdgpu_device *adev)
 {
 	int i;
 
@@ -4816,21 +4836,21 @@ static void gfx_v8_0_enable_doorbell(struct amdgpu_device *adev, bool enable)
 		WREG32(mmCP_MEC_DOORBELL_RANGE_UPPER, AMDGPU_DOORBELL_MEC_RING7 << 2);
 	}
 
 	tmp = RREG32(mmCP_PQ_STATUS);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
 	WREG32(mmCP_PQ_STATUS, tmp);
 
 	adev->gfx.doorbell_enabled = true;
 }
 
-static int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
+int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 {
 	uint32_t tmp;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
 	WREG32(mmCP_HQD_EOP_BASE_ADDR, mqd->cp_hqd_eop_base_addr_lo);
 	WREG32(mmCP_HQD_EOP_BASE_ADDR_HI, mqd->cp_hqd_eop_base_addr_hi);
@@ -4868,20 +4888,42 @@ static int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 
 	/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
 	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
 	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
 
 	/* enable the doorbell if requested */
 	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
 
 	/* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
 	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
+	WREG32(mmCP_HQD_EOP_RPTR, mqd->cp_hqd_eop_rptr);
+	WREG32(mmCP_HQD_EOP_WPTR, mqd->cp_hqd_eop_wptr);
+
+	/* set the HQD priority */
+	WREG32(mmCP_HQD_PIPE_PRIORITY, mqd->cp_hqd_pipe_priority);
+	WREG32(mmCP_HQD_QUEUE_PRIORITY, mqd->cp_hqd_queue_priority);
+	WREG32(mmCP_HQD_QUANTUM, mqd->cp_hqd_quantum);
+
+	/* set cwsr save area */
+	WREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_LO, mqd->cp_hqd_ctx_save_base_addr_lo);
+	WREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_HI, mqd->cp_hqd_ctx_save_base_addr_hi);
+	WREG32(mmCP_HQD_CTX_SAVE_CONTROL, mqd->cp_hqd_ctx_save_control);
+	WREG32(mmCP_HQD_CNTL_STACK_OFFSET, mqd->cp_hqd_cntl_stack_offset);
+	WREG32(mmCP_HQD_CNTL_STACK_SIZE, mqd->cp_hqd_cntl_stack_size);
+	WREG32(mmCP_HQD_WG_STATE_OFFSET, mqd->cp_hqd_wg_state_offset);
+	WREG32(mmCP_HQD_CTX_SAVE_SIZE, mqd->cp_hqd_ctx_save_size);
+
+	WREG32(mmCP_HQD_IB_CONTROL, mqd->cp_hqd_ib_control);
+	WREG32(mmCP_HQD_EOP_EVENTS, mqd->cp_hqd_eop_done_events);
+	WREG32(mmCP_HQD_ERROR, mqd->cp_hqd_error);
+	WREG32(mmCP_HQD_EOP_WPTR_MEM, mqd->cp_hqd_eop_wptr_mem);
+	WREG32(mmCP_HQD_EOP_DONES, mqd->cp_hqd_eop_dones);
 
 	/* set the vmid for the queue */
 	WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
 
 	WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
 
 	/* activate the queue */
 	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
 
 	return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.h b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.h
index 788cc3a..ec3f11f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.h
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.h
@@ -20,11 +20,16 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
 
 #ifndef __GFX_V8_0_H__
 #define __GFX_V8_0_H__
 
 extern const struct amdgpu_ip_block_version gfx_v8_0_ip_block;
 extern const struct amdgpu_ip_block_version gfx_v8_1_ip_block;
 
+struct amdgpu_device;
+struct vi_mqd;
+
+int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd);
+
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/22] drm/amdgpu: rename rdev to adev
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 05/22] drm/amdgpu: unify MQD programming sequence for kfd and amdgpu Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 07/22] drm/amdgpu: take ownership of per-pipe configuration Andres Rodriguez
                     ` (17 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Rename straggler instances of r(adeon)dev to a(mdgpu)dev

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 70 +++++++++++++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 14 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c      |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      |  2 +-
 4 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index dba8a5b..3200ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -53,23 +53,23 @@ int amdgpu_amdkfd_init(void)
 	if (ret)
 		kgd2kfd = NULL;
 
 #else
 	ret = -ENOENT;
 #endif
 
 	return ret;
 }
 
-bool amdgpu_amdkfd_load_interface(struct amdgpu_device *rdev)
+bool amdgpu_amdkfd_load_interface(struct amdgpu_device *adev)
 {
-	switch (rdev->asic_type) {
+	switch (adev->asic_type) {
 #ifdef CONFIG_DRM_AMDGPU_CIK
 	case CHIP_KAVERI:
 		kfd2kgd = amdgpu_amdkfd_gfx_7_get_functions();
 		break;
 #endif
 	case CHIP_CARRIZO:
 		kfd2kgd = amdgpu_amdkfd_gfx_8_0_get_functions();
 		break;
 	default:
 		return false;
@@ -79,119 +79,119 @@ bool amdgpu_amdkfd_load_interface(struct amdgpu_device *rdev)
 }
 
 void amdgpu_amdkfd_fini(void)
 {
 	if (kgd2kfd) {
 		kgd2kfd->exit();
 		symbol_put(kgd2kfd_init);
 	}
 }
 
-void amdgpu_amdkfd_device_probe(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 {
 	if (kgd2kfd)
-		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev,
-					rdev->pdev, kfd2kgd);
+		adev->kfd = kgd2kfd->probe((struct kgd_dev *)adev,
+					adev->pdev, kfd2kgd);
 }
 
-void amdgpu_amdkfd_device_init(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
-	if (rdev->kfd) {
+	if (adev->kfd) {
 		struct kgd2kfd_shared_resources gpu_resources = {
 			.compute_vmid_bitmap = 0xFF00,
 
 			.first_compute_pipe = 1,
 			.compute_pipe_count = 4 - 1,
 		};
 
-		amdgpu_doorbell_get_kfd_info(rdev,
+		amdgpu_doorbell_get_kfd_info(adev,
 				&gpu_resources.doorbell_physical_address,
 				&gpu_resources.doorbell_aperture_size,
 				&gpu_resources.doorbell_start_offset);
 
-		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
+		kgd2kfd->device_init(adev->kfd, &gpu_resources);
 	}
 }
 
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
 {
-	if (rdev->kfd) {
-		kgd2kfd->device_exit(rdev->kfd);
-		rdev->kfd = NULL;
+	if (adev->kfd) {
+		kgd2kfd->device_exit(adev->kfd);
+		adev->kfd = NULL;
 	}
 }
 
-void amdgpu_amdkfd_interrupt(struct amdgpu_device *rdev,
+void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
 		const void *ih_ring_entry)
 {
-	if (rdev->kfd)
-		kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
+	if (adev->kfd)
+		kgd2kfd->interrupt(adev->kfd, ih_ring_entry);
 }
 
-void amdgpu_amdkfd_suspend(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_suspend(struct amdgpu_device *adev)
 {
-	if (rdev->kfd)
-		kgd2kfd->suspend(rdev->kfd);
+	if (adev->kfd)
+		kgd2kfd->suspend(adev->kfd);
 }
 
-int amdgpu_amdkfd_resume(struct amdgpu_device *rdev)
+int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
 {
 	int r = 0;
 
-	if (rdev->kfd)
-		r = kgd2kfd->resume(rdev->kfd);
+	if (adev->kfd)
+		r = kgd2kfd->resume(adev->kfd);
 
 	return r;
 }
 
 int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **mem_obj, uint64_t *gpu_addr,
 			void **cpu_ptr)
 {
-	struct amdgpu_device *rdev = (struct amdgpu_device *)kgd;
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
 	struct kgd_mem **mem = (struct kgd_mem **) mem_obj;
 	int r;
 
 	BUG_ON(kgd == NULL);
 	BUG_ON(gpu_addr == NULL);
 	BUG_ON(cpu_ptr == NULL);
 
 	*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
 	if ((*mem) == NULL)
 		return -ENOMEM;
 
-	r = amdgpu_bo_create(rdev, size, PAGE_SIZE, true, AMDGPU_GEM_DOMAIN_GTT,
+	r = amdgpu_bo_create(adev, size, PAGE_SIZE, true, AMDGPU_GEM_DOMAIN_GTT,
 			     AMDGPU_GEM_CREATE_CPU_GTT_USWC, NULL, NULL, &(*mem)->bo);
 	if (r) {
-		dev_err(rdev->dev,
+		dev_err(adev->dev,
 			"failed to allocate BO for amdkfd (%d)\n", r);
 		return r;
 	}
 
 	/* map the buffer */
 	r = amdgpu_bo_reserve((*mem)->bo, true);
 	if (r) {
-		dev_err(rdev->dev, "(%d) failed to reserve bo for amdkfd\n", r);
+		dev_err(adev->dev, "(%d) failed to reserve bo for amdkfd\n", r);
 		goto allocate_mem_reserve_bo_failed;
 	}
 
 	r = amdgpu_bo_pin((*mem)->bo, AMDGPU_GEM_DOMAIN_GTT,
 				&(*mem)->gpu_addr);
 	if (r) {
-		dev_err(rdev->dev, "(%d) failed to pin bo for amdkfd\n", r);
+		dev_err(adev->dev, "(%d) failed to pin bo for amdkfd\n", r);
 		goto allocate_mem_pin_bo_failed;
 	}
 	*gpu_addr = (*mem)->gpu_addr;
 
 	r = amdgpu_bo_kmap((*mem)->bo, &(*mem)->cpu_ptr);
 	if (r) {
-		dev_err(rdev->dev,
+		dev_err(adev->dev,
 			"(%d) failed to map bo to kernel for amdkfd\n", r);
 		goto allocate_mem_kmap_bo_failed;
 	}
 	*cpu_ptr = (*mem)->cpu_ptr;
 
 	amdgpu_bo_unreserve((*mem)->bo);
 
 	return 0;
 
 allocate_mem_kmap_bo_failed:
@@ -213,34 +213,34 @@ void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj)
 	amdgpu_bo_reserve(mem->bo, true);
 	amdgpu_bo_kunmap(mem->bo);
 	amdgpu_bo_unpin(mem->bo);
 	amdgpu_bo_unreserve(mem->bo);
 	amdgpu_bo_unref(&(mem->bo));
 	kfree(mem);
 }
 
 uint64_t get_vmem_size(struct kgd_dev *kgd)
 {
-	struct amdgpu_device *rdev =
+	struct amdgpu_device *adev =
 		(struct amdgpu_device *)kgd;
 
 	BUG_ON(kgd == NULL);
 
-	return rdev->mc.real_vram_size;
+	return adev->mc.real_vram_size;
 }
 
 uint64_t get_gpu_clock_counter(struct kgd_dev *kgd)
 {
-	struct amdgpu_device *rdev = (struct amdgpu_device *)kgd;
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
 
-	if (rdev->gfx.funcs->get_gpu_clock_counter)
-		return rdev->gfx.funcs->get_gpu_clock_counter(rdev);
+	if (adev->gfx.funcs->get_gpu_clock_counter)
+		return adev->gfx.funcs->get_gpu_clock_counter(adev);
 	return 0;
 }
 
 uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd)
 {
-	struct amdgpu_device *rdev = (struct amdgpu_device *)kgd;
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
 
 	/* The sclk is in quantas of 10kHz */
-	return rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac.sclk / 100;
+	return adev->pm.dpm.dyn_state.max_clock_voltage_on_ac.sclk / 100;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index de530f68d..73f83a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -32,29 +32,29 @@ struct amdgpu_device;
 
 struct kgd_mem {
 	struct amdgpu_bo *bo;
 	uint64_t gpu_addr;
 	void *cpu_ptr;
 };
 
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
-bool amdgpu_amdkfd_load_interface(struct amdgpu_device *rdev);
+bool amdgpu_amdkfd_load_interface(struct amdgpu_device *adev);
 
-void amdgpu_amdkfd_suspend(struct amdgpu_device *rdev);
-int amdgpu_amdkfd_resume(struct amdgpu_device *rdev);
-void amdgpu_amdkfd_interrupt(struct amdgpu_device *rdev,
+void amdgpu_amdkfd_suspend(struct amdgpu_device *adev);
+int amdgpu_amdkfd_resume(struct amdgpu_device *adev);
+void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
 			const void *ih_ring_entry);
-void amdgpu_amdkfd_device_probe(struct amdgpu_device *rdev);
-void amdgpu_amdkfd_device_init(struct amdgpu_device *rdev);
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *rdev);
+void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
+void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
+void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
 
 /* Shared API */
 int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **mem_obj, uint64_t *gpu_addr,
 			void **cpu_ptr);
 void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj);
 uint64_t get_vmem_size(struct kgd_dev *kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 7e5b426..03a4cee 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -1835,21 +1835,21 @@ static void gfx_v7_0_setup_rb(struct amdgpu_device *adev)
 		gfx_v7_0_write_harvested_raster_configs(adev, raster_config, raster_config_1,
 							adev->gfx.config.backend_enable_mask,
 							num_rb_pipes);
 	}
 	mutex_unlock(&adev->grbm_idx_mutex);
 }
 
 /**
  * gmc_v7_0_init_compute_vmid - gart enable
  *
- * @rdev: amdgpu_device pointer
+ * @adev: amdgpu_device pointer
  *
  * Initialize compute vmid sh_mem registers
  *
  */
 #define DEFAULT_SH_MEM_BASES	(0x6000)
 #define FIRST_COMPUTE_VMID	(8)
 #define LAST_COMPUTE_VMID	(16)
 static void gmc_v7_0_init_compute_vmid(struct amdgpu_device *adev)
 {
 	int i;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index f862bc0..3cb39d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -3792,21 +3792,21 @@ static void gfx_v8_0_setup_rb(struct amdgpu_device *adev)
 				RREG32(mmPA_SC_RASTER_CONFIG_1);
 		}
 	}
 	gfx_v8_0_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff);
 	mutex_unlock(&adev->grbm_idx_mutex);
 }
 
 /**
  * gfx_v8_0_init_compute_vmid - gart enable
  *
- * @rdev: amdgpu_device pointer
+ * @adev: amdgpu_device pointer
  *
  * Initialize compute vmid sh_mem registers
  *
  */
 #define DEFAULT_SH_MEM_BASES	(0x6000)
 #define FIRST_COMPUTE_VMID	(8)
 #define LAST_COMPUTE_VMID	(16)
 static void gfx_v8_0_init_compute_vmid(struct amdgpu_device *adev)
 {
 	int i;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/22] drm/amdgpu: take ownership of per-pipe configuration
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 06/22] drm/amdgpu: rename rdev to adev Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 08/22] drm/radeon: take ownership of pipe initialization Andres Rodriguez
                     ` (16 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Make amdgpu the owner of all per-pipe state of the HQDs.

This change will allow us to split the queues between kfd and amdgpu
with a queue granularity instead of pipe granularity.

This patch fixes kfd allocating an HDP_EOP region for its 3 pipes which
goes unused.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h                |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  | 13 +------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c              | 28 ++++++++++----
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c              | 33 +++++++++++-----
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 45 ----------------------
 6 files changed, 49 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d699c3b..97c3f6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -769,23 +769,23 @@ struct amdgpu_rlc {
 	u32 reg_list_format_size_bytes;
 	u32 reg_list_size_bytes;
 
 	u32 *register_list_format;
 	u32 *register_restore;
 };
 
 struct amdgpu_mec {
 	struct amdgpu_bo	*hpd_eop_obj;
 	u64			hpd_eop_gpu_addr;
-	u32 num_pipe;
 	u32 num_mec;
-	u32 num_queue;
+	u32 num_pipe_per_mec;
+	u32 num_queue_per_pipe;
 };
 
 struct amdgpu_kiq {
 	u64			eop_gpu_addr;
 	struct amdgpu_bo	*eop_obj;
 	struct amdgpu_ring	ring;
 	struct amdgpu_irq_src	irq;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 038b7ea..910f9d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -237,32 +237,21 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 
 	/* Mapping vmid to pasid also for IH block */
 	WREG32(mmIH_VMID_0_LUT + vmid, pasid_mapping);
 
 	return 0;
 }
 
 static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
 				uint32_t hpd_size, uint64_t hpd_gpu_addr)
 {
-	struct amdgpu_device *adev = get_amdgpu_device(kgd);
-
-	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
-	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
-
-	lock_srbm(kgd, mec, pipe, 0, 0);
-	WREG32(mmCP_HPD_EOP_BASE_ADDR, lower_32_bits(hpd_gpu_addr >> 8));
-	WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(hpd_gpu_addr >> 8));
-	WREG32(mmCP_HPD_EOP_VMID, 0);
-	WREG32(mmCP_HPD_EOP_CONTROL, hpd_size);
-	unlock_srbm(kgd);
-
+	/* amdgpu owns the per-pipe state */
 	return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	uint32_t mec;
 	uint32_t pipe;
 
 	mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 2ecef3d..5843368 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -199,20 +199,21 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 
 	/* Mapping vmid to pasid also for IH block */
 	WREG32(mmIH_VMID_0_LUT + vmid, pasid_mapping);
 
 	return 0;
 }
 
 static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
 				uint32_t hpd_size, uint64_t hpd_gpu_addr)
 {
+	/* amdgpu owns the per-pipe state */
 	return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	uint32_t mec;
 	uint32_t pipe;
 
 	mec = (++pipe_id / VI_PIPE_PER_MEC) + 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 03a4cee..2f1faa4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2799,34 +2799,48 @@ static void gfx_v7_0_mec_fini(struct amdgpu_device *adev)
 
 		amdgpu_bo_unref(&adev->gfx.mec.hpd_eop_obj);
 		adev->gfx.mec.hpd_eop_obj = NULL;
 	}
 }
 
 static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
+	size_t mec_hpd_size;
 
 	/*
 	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
 	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
 	 * Nonetheless, we assign only 1 pipe because all other pipes will
 	 * be handled by KFD
 	 */
-	adev->gfx.mec.num_mec = 1;
-	adev->gfx.mec.num_pipe = 1;
-	adev->gfx.mec.num_queue = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * 8;
+	switch (adev->asic_type) {
+	case CHIP_KAVERI:
+		adev->gfx.mec.num_mec = 2;
+		break;
+	case CHIP_BONAIRE:
+	case CHIP_HAWAII:
+	case CHIP_KABINI:
+	case CHIP_MULLINS:
+	default:
+		adev->gfx.mec.num_mec = 1;
+		break;
+	}
+	adev->gfx.mec.num_pipe_per_mec = 4;
+	adev->gfx.mec.num_queue_per_pipe = 8;
 
+	mec_hpd_size = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe_per_mec
+		* GFX7_MEC_HPD_SIZE * 2;
 	if (adev->gfx.mec.hpd_eop_obj == NULL) {
 		r = amdgpu_bo_create(adev,
-				     adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * GFX7_MEC_HPD_SIZE * 2,
+				     mec_hpd_size,
 				     PAGE_SIZE, true,
 				     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				     &adev->gfx.mec.hpd_eop_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
 			return r;
 		}
 	}
 
 	r = amdgpu_bo_reserve(adev->gfx.mec.hpd_eop_obj, false);
@@ -2842,21 +2856,21 @@ static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 		return r;
 	}
 	r = amdgpu_bo_kmap(adev->gfx.mec.hpd_eop_obj, (void **)&hpd);
 	if (r) {
 		dev_warn(adev->dev, "(%d) map HDP EOP bo failed\n", r);
 		gfx_v7_0_mec_fini(adev);
 		return r;
 	}
 
 	/* clear memory.  Not sure if this is required or not */
-	memset(hpd, 0, adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * GFX7_MEC_HPD_SIZE * 2);
+	memset(hpd, 0, mec_hpd_size);
 
 	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
 	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
 	return 0;
 }
 
 struct hqd_registers
 {
 	u32 cp_mqd_base_addr;
@@ -3180,23 +3194,23 @@ static int gfx_v7_0_cp_compute_resume(struct amdgpu_device *adev)
 {
 	int r, i, j;
 	u32 tmp;
 	struct amdgpu_ring *ring;
 
 	/* fix up chicken bits */
 	tmp = RREG32(mmCP_CPF_DEBUG);
 	tmp |= (1 << 23);
 	WREG32(mmCP_CPF_DEBUG, tmp);
 
-	/* init the pipes */
+	/* init all pipes (even the ones we don't own) */
 	for (i = 0; i < adev->gfx.mec.num_mec; i++)
-		for (j = 0; j < adev->gfx.mec.num_pipe; j++)
+		for (j = 0; j < adev->gfx.mec.num_pipe_per_mec; j++)
 			gfx_v7_0_compute_pipe_init(adev, i, j);
 
 	/* init the queues */
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		r = gfx_v7_0_compute_queue_init(adev, i);
 		if (r) {
 			gfx_v7_0_cp_compute_fini(adev);
 			return r;
 		}
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 3cb39d4..1bd4759 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1417,32 +1417,47 @@ static void gfx_v8_0_kiq_free_ring(struct amdgpu_ring *ring,
 	amdgpu_ring_fini(ring);
 	irq->data = NULL;
 }
 
 #define GFX8_MEC_HPD_SIZE 2048
 
 static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
+	size_t mec_hpd_size;
 
-	/*
-	 * we assign only 1 pipe because all other pipes will
-	 * be handled by KFD
-	 */
-	adev->gfx.mec.num_mec = 1;
-	adev->gfx.mec.num_pipe = 1;
-	adev->gfx.mec.num_queue = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * 8;
+	switch (adev->asic_type) {
+	case CHIP_FIJI:
+	case CHIP_TONGA:
+	case CHIP_POLARIS11:
+	case CHIP_POLARIS12:
+	case CHIP_POLARIS10:
+	case CHIP_CARRIZO:
+		adev->gfx.mec.num_mec = 2;
+		break;
+	case CHIP_TOPAZ:
+	case CHIP_STONEY:
+	default:
+		adev->gfx.mec.num_mec = 1;
+		break;
+	}
+
+	adev->gfx.mec.num_pipe_per_mec = 4;
+	adev->gfx.mec.num_queue_per_pipe = 8;
+
+	/* only 1 pipe of the first MEC is owned by amdgpu */
+	mec_hpd_size = 1 * 1 * adev->gfx.mec.num_queue_per_pipe * GFX8_MEC_HPD_SIZE;
 
 	if (adev->gfx.mec.hpd_eop_obj == NULL) {
 		r = amdgpu_bo_create(adev,
-				     adev->gfx.mec.num_queue * GFX8_MEC_HPD_SIZE,
+				     mec_hpd_size,
 				     PAGE_SIZE, true,
 				     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				     &adev->gfx.mec.hpd_eop_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
 			return r;
 		}
 	}
 
 	r = amdgpu_bo_reserve(adev->gfx.mec.hpd_eop_obj, false);
@@ -1457,21 +1472,21 @@ static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 		gfx_v8_0_mec_fini(adev);
 		return r;
 	}
 	r = amdgpu_bo_kmap(adev->gfx.mec.hpd_eop_obj, (void **)&hpd);
 	if (r) {
 		dev_warn(adev->dev, "(%d) map HDP EOP bo failed\n", r);
 		gfx_v8_0_mec_fini(adev);
 		return r;
 	}
 
-	memset(hpd, 0, adev->gfx.mec.num_queue * GFX8_MEC_HPD_SIZE);
+	memset(hpd, 0, mec_hpd_size);
 
 	amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
 	amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
 	return 0;
 }
 
 static void gfx_v8_0_kiq_fini(struct amdgpu_device *adev)
 {
 	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f49c551..c064dea 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -465,69 +465,24 @@ set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid,
 		ATC_VMID_PASID_MAPPING_VALID;
 
 	return dqm->dev->kfd2kgd->set_pasid_vmid_mapping(
 						dqm->dev->kgd, pasid_mapping,
 						vmid);
 }
 
 int init_pipelines(struct device_queue_manager *dqm,
 			unsigned int pipes_num, unsigned int first_pipe)
 {
-	void *hpdptr;
-	struct mqd_manager *mqd;
-	unsigned int i, err, inx;
-	uint64_t pipe_hpd_addr;
-
 	BUG_ON(!dqm || !dqm->dev);
 
 	pr_debug("kfd: In func %s\n", __func__);
 
-	/*
-	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
-	 * The driver never accesses this memory after zeroing it.
-	 * It doesn't even have to be saved/restored on suspend/resume
-	 * because it contains no data when there are no active queues.
-	 */
-
-	err = kfd_gtt_sa_allocate(dqm->dev, CIK_HPD_EOP_BYTES * pipes_num,
-					&dqm->pipeline_mem);
-
-	if (err) {
-		pr_err("kfd: error allocate vidmem num pipes: %d\n",
-			pipes_num);
-		return -ENOMEM;
-	}
-
-	hpdptr = dqm->pipeline_mem->cpu_ptr;
-	dqm->pipelines_addr = dqm->pipeline_mem->gpu_addr;
-
-	memset(hpdptr, 0, CIK_HPD_EOP_BYTES * pipes_num);
-
-	mqd = dqm->ops.get_mqd_manager(dqm, KFD_MQD_TYPE_COMPUTE);
-	if (mqd == NULL) {
-		kfd_gtt_sa_free(dqm->dev, dqm->pipeline_mem);
-		return -ENOMEM;
-	}
-
-	for (i = 0; i < pipes_num; i++) {
-		inx = i + first_pipe;
-		/*
-		 * HPD buffer on GTT is allocated by amdkfd, no need to waste
-		 * space in GTT for pipelines we don't initialize
-		 */
-		pipe_hpd_addr = dqm->pipelines_addr + i * CIK_HPD_EOP_BYTES;
-		pr_debug("kfd: pipeline address %llX\n", pipe_hpd_addr);
-		/* = log2(bytes/4)-1 */
-		dqm->dev->kfd2kgd->init_pipeline(dqm->dev->kgd, inx,
-				CIK_HPD_EOP_BYTES_LOG2 - 3, pipe_hpd_addr);
-	}
-
 	return 0;
 }
 
 static void init_interrupts(struct device_queue_manager *dqm)
 {
 	unsigned int i;
 
 	BUG_ON(dqm == NULL);
 
 	for (i = 0 ; i < get_pipes_num(dqm) ; i++)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/22] drm/radeon: take ownership of pipe initialization
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 07/22] drm/amdgpu: take ownership of per-pipe configuration Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 09/22] drm/amdgpu: allow split of queues with kfd at queue granularity Andres Rodriguez
                     ` (15 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Take ownership of pipe initialization away from KFD.

Note that hpd_eop_gpu_addr was already large enough to accomodate all
pipes.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/radeon/cik.c        | 27 ++++++++++++++-------------
 drivers/gpu/drm/radeon/radeon_kfd.c | 13 +------------
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index f6ff41a..82b57ef 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -4588,37 +4588,38 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
 		return r;
 
 	/* fix up chicken bits */
 	tmp = RREG32(CP_CPF_DEBUG);
 	tmp |= (1 << 23);
 	WREG32(CP_CPF_DEBUG, tmp);
 
 	/* init the pipes */
 	mutex_lock(&rdev->srbm_mutex);
 
-	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
+	for (i = 0; i < rdev->mec.num_pipe; ++i) {
+		cik_srbm_select(rdev, 0, i, 0, 0);
 
-	cik_srbm_select(rdev, 0, 0, 0, 0);
-
-	/* write the EOP addr */
-	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
+		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2) ;
+		/* write the EOP addr */
+		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
-	/* set the VMID assigned */
-	WREG32(CP_HPD_EOP_VMID, 0);
+		/* set the VMID assigned */
+		WREG32(CP_HPD_EOP_VMID, 0);
 
-	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-	tmp = RREG32(CP_HPD_EOP_CONTROL);
-	tmp &= ~EOP_SIZE_MASK;
-	tmp |= order_base_2(MEC_HPD_SIZE / 8);
-	WREG32(CP_HPD_EOP_CONTROL, tmp);
+		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+		tmp = RREG32(CP_HPD_EOP_CONTROL);
+		tmp &= ~EOP_SIZE_MASK;
+		tmp |= order_base_2(MEC_HPD_SIZE / 8);
+		WREG32(CP_HPD_EOP_CONTROL, tmp);
 
+	}
 	mutex_unlock(&rdev->srbm_mutex);
 
 	/* init the queues.  Just two for now. */
 	for (i = 0; i < 2; i++) {
 		if (i == 0)
 			idx = CAYMAN_RING_TYPE_CP1_INDEX;
 		else
 			idx = CAYMAN_RING_TYPE_CP2_INDEX;
 
 		if (rdev->ring[idx].mqd_obj == NULL) {
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index 87a9ebb..a06e3b1 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -416,32 +416,21 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 	/* Mapping vmid to pasid also for IH block */
 	write_register(kgd, IH_VMID_0_LUT + vmid * sizeof(uint32_t),
 			pasid_mapping);
 
 	return 0;
 }
 
 static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
 				uint32_t hpd_size, uint64_t hpd_gpu_addr)
 {
-	uint32_t mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
-	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
-
-	lock_srbm(kgd, mec, pipe, 0, 0);
-	write_register(kgd, CP_HPD_EOP_BASE_ADDR,
-			lower_32_bits(hpd_gpu_addr >> 8));
-	write_register(kgd, CP_HPD_EOP_BASE_ADDR_HI,
-			upper_32_bits(hpd_gpu_addr >> 8));
-	write_register(kgd, CP_HPD_EOP_VMID, 0);
-	write_register(kgd, CP_HPD_EOP_CONTROL, hpd_size);
-	unlock_srbm(kgd);
-
+	/* nothing to do here */
 	return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	uint32_t mec;
 	uint32_t pipe;
 
 	mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
 	pipe = (pipe_id % CIK_PIPE_PER_MEC);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/22] drm/amdgpu: allow split of queues with kfd at queue granularity
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 08/22] drm/radeon: take ownership of pipe initialization Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 10/22] drm/amdgpu: teach amdgpu how to enable interrupts for any pipe Andres Rodriguez
                     ` (14 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Previously the queue/pipe split with kfd operated with pipe
granularity. This patch allows amdgpu to take ownership of an arbitrary
set of queues.

It also consolidates the last few magic numbers in the compute
initialization process into mec_init.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h             |  7 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c           | 83 ++++++++++++++++++-------
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c           | 79 ++++++++++++++++++-----
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  1 +
 4 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 97c3f6c..6f7e4f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -39,20 +39,22 @@
 #include <ttm/ttm_bo_api.h>
 #include <ttm/ttm_bo_driver.h>
 #include <ttm/ttm_placement.h>
 #include <ttm/ttm_module.h>
 #include <ttm/ttm_execbuf_util.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_gem.h>
 #include <drm/amdgpu_drm.h>
 
+#include <kgd_kfd_interface.h>
+
 #include "amd_shared.h"
 #include "amdgpu_mode.h"
 #include "amdgpu_ih.h"
 #include "amdgpu_irq.h"
 #include "amdgpu_ucode.h"
 #include "amdgpu_ttm.h"
 #include "amdgpu_gds.h"
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
 #include "amdgpu_vm.h"
@@ -766,26 +768,31 @@ struct amdgpu_rlc {
 	u32 reg_list_format_start;
 	u32 reg_list_format_separate_start;
 	u32 starting_offsets_start;
 	u32 reg_list_format_size_bytes;
 	u32 reg_list_size_bytes;
 
 	u32 *register_list_format;
 	u32 *register_restore;
 };
 
+#define AMDGPU_MAX_QUEUES KGD_MAX_QUEUES
+
 struct amdgpu_mec {
 	struct amdgpu_bo	*hpd_eop_obj;
 	u64			hpd_eop_gpu_addr;
 	u32 num_mec;
 	u32 num_pipe_per_mec;
 	u32 num_queue_per_pipe;
+
+	/* These are the resources for which amdgpu takes ownership */
+	DECLARE_BITMAP(queue_bitmap, AMDGPU_MAX_QUEUES);
 };
 
 struct amdgpu_kiq {
 	u64			eop_gpu_addr;
 	struct amdgpu_bo	*eop_obj;
 	struct amdgpu_ring	ring;
 	struct amdgpu_irq_src	irq;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 2f1faa4..fe46765 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -42,21 +42,20 @@
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 
 #include "gmc/gmc_7_0_d.h"
 #include "gmc/gmc_7_0_sh_mask.h"
 
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 
 #define GFX7_NUM_GFX_RINGS     1
-#define GFX7_NUM_COMPUTE_RINGS 8
 #define GFX7_MEC_HPD_SIZE      2048
 
 
 static void gfx_v7_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_gds_init(struct amdgpu_device *adev);
 
 MODULE_FIRMWARE("radeon/bonaire_pfp.bin");
 MODULE_FIRMWARE("radeon/bonaire_me.bin");
 MODULE_FIRMWARE("radeon/bonaire_ce.bin");
@@ -2795,47 +2794,79 @@ static void gfx_v7_0_mec_fini(struct amdgpu_device *adev)
 		if (unlikely(r != 0))
 			dev_warn(adev->dev, "(%d) reserve HPD EOP bo failed\n", r);
 		amdgpu_bo_unpin(adev->gfx.mec.hpd_eop_obj);
 		amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
 		amdgpu_bo_unref(&adev->gfx.mec.hpd_eop_obj);
 		adev->gfx.mec.hpd_eop_obj = NULL;
 	}
 }
 
+static void gfx_v7_0_compute_queue_acquire(struct amdgpu_device *adev)
+{
+	int i, queue, pipe, mec;
+
+	/* policy for amdgpu compute queue ownership */
+	for (i = 0; i < AMDGPU_MAX_QUEUES; ++i) {
+		queue = i % adev->gfx.mec.num_queue_per_pipe;
+		pipe = (i / adev->gfx.mec.num_queue_per_pipe)
+			% adev->gfx.mec.num_pipe_per_mec;
+		mec = (i / adev->gfx.mec.num_queue_per_pipe)
+			/ adev->gfx.mec.num_pipe_per_mec;
+
+		/* we've run out of HW */
+		if (mec > adev->gfx.mec.num_mec)
+			break;
+
+		/* policy: amdgpu owns all queues in the first pipe */
+		if (mec == 0 && pipe == 0)
+			set_bit(i, adev->gfx.mec.queue_bitmap);
+	}
+
+	/* update the number of active compute rings */
+	adev->gfx.num_compute_rings =
+		bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_QUEUES);
+
+	/* If you hit this case and edited the policy, you probably just
+	 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
+	WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS);
+	if (adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS)
+		adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
+}
+
 static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
 	size_t mec_hpd_size;
 
-	/*
-	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
-	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
-	 * Nonetheless, we assign only 1 pipe because all other pipes will
-	 * be handled by KFD
-	 */
+	bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_QUEUES);
+
 	switch (adev->asic_type) {
 	case CHIP_KAVERI:
 		adev->gfx.mec.num_mec = 2;
 		break;
 	case CHIP_BONAIRE:
 	case CHIP_HAWAII:
 	case CHIP_KABINI:
 	case CHIP_MULLINS:
 	default:
 		adev->gfx.mec.num_mec = 1;
 		break;
 	}
 	adev->gfx.mec.num_pipe_per_mec = 4;
 	adev->gfx.mec.num_queue_per_pipe = 8;
 
+	/* take ownership of the relevant compute queues */
+	gfx_v7_0_compute_queue_acquire(adev);
+
+	/* allocate space for ALL pipes (even the ones we don't own) */
 	mec_hpd_size = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe_per_mec
 		* GFX7_MEC_HPD_SIZE * 2;
 	if (adev->gfx.mec.hpd_eop_obj == NULL) {
 		r = amdgpu_bo_create(adev,
 				     mec_hpd_size,
 				     PAGE_SIZE, true,
 				     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				     &adev->gfx.mec.hpd_eop_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
@@ -4497,21 +4528,21 @@ static const struct amdgpu_gfx_funcs gfx_v7_0_gfx_funcs = {
 static const struct amdgpu_rlc_funcs gfx_v7_0_rlc_funcs = {
 	.enter_safe_mode = gfx_v7_0_enter_rlc_safe_mode,
 	.exit_safe_mode = gfx_v7_0_exit_rlc_safe_mode
 };
 
 static int gfx_v7_0_early_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	adev->gfx.num_gfx_rings = GFX7_NUM_GFX_RINGS;
-	adev->gfx.num_compute_rings = GFX7_NUM_COMPUTE_RINGS;
+	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
 	adev->gfx.funcs = &gfx_v7_0_gfx_funcs;
 	adev->gfx.rlc.funcs = &gfx_v7_0_rlc_funcs;
 	gfx_v7_0_set_ring_funcs(adev);
 	gfx_v7_0_set_irq_funcs(adev);
 	gfx_v7_0_set_gds_init(adev);
 
 	return 0;
 }
 
 static int gfx_v7_0_late_init(void *handle)
@@ -4693,21 +4724,21 @@ static void gfx_v7_0_gpu_early_init(struct amdgpu_device *adev)
 		gb_addr_config |= (2 << GB_ADDR_CONFIG__ROW_SIZE__SHIFT);
 		break;
 	}
 	adev->gfx.config.gb_addr_config = gb_addr_config;
 }
 
 static int gfx_v7_0_sw_init(void *handle)
 {
 	struct amdgpu_ring *ring;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-	int i, r;
+	int i, r, ring_id;
 
 	/* EOP Event */
 	r = amdgpu_irq_add_id(adev, 181, &adev->gfx.eop_irq);
 	if (r)
 		return r;
 
 	/* Privileged reg */
 	r = amdgpu_irq_add_id(adev, 184, &adev->gfx.priv_reg_irq);
 	if (r)
 		return r;
@@ -4742,42 +4773,52 @@ static int gfx_v7_0_sw_init(void *handle)
 		ring = &adev->gfx.gfx_ring[i];
 		ring->ring_obj = NULL;
 		sprintf(ring->name, "gfx");
 		r = amdgpu_ring_init(adev, ring, 1024,
 				     &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
 	/* set up the compute queues */
-	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+	for (i = 0, ring_id = 0; i < AMDGPU_MAX_QUEUES; i++) {
 		unsigned irq_type;
 
-		/* max 32 queues per MEC */
-		if ((i >= 32) || (i >= AMDGPU_MAX_COMPUTE_RINGS)) {
-			DRM_ERROR("Too many (%d) compute rings!\n", i);
-			break;
-		}
-		ring = &adev->gfx.compute_ring[i];
+		if (!test_bit(i, adev->gfx.mec.queue_bitmap))
+			continue;
+
+		ring = &adev->gfx.compute_ring[ring_id];
+
+		/* mec0 is me1 */
+		ring->me = ((i / adev->gfx.mec.num_queue_per_pipe)
+				/ adev->gfx.mec.num_pipe_per_mec)
+				+ 1;
+		ring->pipe = (i / adev->gfx.mec.num_queue_per_pipe)
+				% adev->gfx.mec.num_pipe_per_mec;
+		ring->queue = i % adev->gfx.mec.num_queue_per_pipe;
+
 		ring->ring_obj = NULL;
 		ring->use_doorbell = true;
-		ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + i;
-		ring->me = 1; /* first MEC */
-		ring->pipe = i / 8;
-		ring->queue = i % 8;
+		ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
 		sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
-		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP + ring->pipe;
+
+		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
+			+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
+			+ ring->pipe;
+
 		/* type-2 packets are deprecated on MEC, use type-3 instead */
 		r = amdgpu_ring_init(adev, ring, 1024,
 				     &adev->gfx.eop_irq, irq_type);
 		if (r)
 			return r;
+
+		ring_id++;
 	}
 
 	/* reserve GDS, GWS and OA resource for gfx */
 	r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
 				    PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
 				    &adev->gds.gds_gfx_bo, NULL, NULL);
 	if (r)
 		return r;
 
 	r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 1bd4759..1238b3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -45,21 +45,20 @@
 #include "gca/gfx_8_0_enum.h"
 #include "gca/gfx_8_0_sh_mask.h"
 #include "gca/gfx_8_0_enum.h"
 
 #include "dce/dce_10_0_d.h"
 #include "dce/dce_10_0_sh_mask.h"
 
 #include "smu/smu_7_1_3_d.h"
 
 #define GFX8_NUM_GFX_RINGS     1
-#define GFX8_NUM_COMPUTE_RINGS 8
 #define GFX8_MEC_HPD_SIZE 2048
 
 
 #define TOPAZ_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define CARRIZO_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define POLARIS11_GB_ADDR_CONFIG_GOLDEN 0x22011002
 #define TONGA_GB_ADDR_CONFIG_GOLDEN 0x22011003
 
 #define ARRAY_MODE(x)					((x) << GB_TILE_MODE0__ARRAY_MODE__SHIFT)
 #define PIPE_CONFIG(x)					((x) << GB_TILE_MODE0__PIPE_CONFIG__SHIFT)
@@ -1413,47 +1412,82 @@ static void gfx_v8_0_kiq_free_ring(struct amdgpu_ring *ring,
 {
 	if (amdgpu_sriov_vf(ring->adev))
 		amdgpu_wb_free(ring->adev, ring->adev->virt.reg_val_offs);
 
 	amdgpu_ring_fini(ring);
 	irq->data = NULL;
 }
 
 #define GFX8_MEC_HPD_SIZE 2048
 
+static void gfx_v8_0_compute_queue_acquire(struct amdgpu_device *adev)
+{
+	int i, queue, pipe, mec;
+
+	/* policy for amdgpu compute queue ownership */
+	for (i = 0; i < AMDGPU_MAX_QUEUES; ++i) {
+		queue = i % adev->gfx.mec.num_queue_per_pipe;
+		pipe = (i / adev->gfx.mec.num_queue_per_pipe)
+			% adev->gfx.mec.num_pipe_per_mec;
+		mec = (i / adev->gfx.mec.num_queue_per_pipe)
+			/ adev->gfx.mec.num_pipe_per_mec;
+
+		/* we've run out of HW */
+		if (mec > adev->gfx.mec.num_mec)
+			break;
+
+		/* policy: amdgpu owns all queues in the first pipe */
+		if (mec == 0 && pipe == 0)
+			set_bit(i, adev->gfx.mec.queue_bitmap);
+	}
+
+	/* update the number of active compute rings */
+	adev->gfx.num_compute_rings =
+		bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_QUEUES);
+
+	/* If you hit this case and edited the policy, you probably just
+	 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
+	if (WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS))
+		adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
+}
+
 static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 {
 	int r;
 	u32 *hpd;
 	size_t mec_hpd_size;
 
+	bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_QUEUES);
+
 	switch (adev->asic_type) {
 	case CHIP_FIJI:
 	case CHIP_TONGA:
 	case CHIP_POLARIS11:
 	case CHIP_POLARIS12:
 	case CHIP_POLARIS10:
 	case CHIP_CARRIZO:
 		adev->gfx.mec.num_mec = 2;
 		break;
 	case CHIP_TOPAZ:
 	case CHIP_STONEY:
 	default:
 		adev->gfx.mec.num_mec = 1;
 		break;
 	}
 
 	adev->gfx.mec.num_pipe_per_mec = 4;
 	adev->gfx.mec.num_queue_per_pipe = 8;
 
-	/* only 1 pipe of the first MEC is owned by amdgpu */
-	mec_hpd_size = 1 * 1 * adev->gfx.mec.num_queue_per_pipe * GFX8_MEC_HPD_SIZE;
+	/* take ownership of the relevant compute queues */
+	gfx_v8_0_compute_queue_acquire(adev);
+
+	mec_hpd_size = adev->gfx.num_compute_rings * GFX8_MEC_HPD_SIZE;
 
 	if (adev->gfx.mec.hpd_eop_obj == NULL) {
 		r = amdgpu_bo_create(adev,
 				     mec_hpd_size,
 				     PAGE_SIZE, true,
 				     AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 				     &adev->gfx.mec.hpd_eop_obj);
 		if (r) {
 			dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
 			return r;
@@ -2083,21 +2117,21 @@ static int gfx_v8_0_gpu_early_init(struct amdgpu_device *adev)
 		gb_addr_config = REG_SET_FIELD(gb_addr_config, GB_ADDR_CONFIG, ROW_SIZE, 2);
 		break;
 	}
 	adev->gfx.config.gb_addr_config = gb_addr_config;
 
 	return 0;
 }
 
 static int gfx_v8_0_sw_init(void *handle)
 {
-	int i, r;
+	int i, r, ring_id;
 	struct amdgpu_ring *ring;
 	struct amdgpu_kiq *kiq;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	/* KIQ event */
 	r = amdgpu_irq_add_id(adev, 178, &adev->gfx.kiq.irq);
 	if (r)
 		return r;
 
 	/* EOP Event */
@@ -2159,42 +2193,55 @@ static int gfx_v8_0_sw_init(void *handle)
 			ring->doorbell_index = AMDGPU_DOORBELL_GFX_RING0;
 		}
 
 		r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
 				     AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
 	/* set up the compute queues */
-	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+	for (i = 0, ring_id = 0; i < AMDGPU_MAX_QUEUES; i++) {
 		unsigned irq_type;
 
-		/* max 32 queues per MEC */
-		if ((i >= 32) || (i >= AMDGPU_MAX_COMPUTE_RINGS)) {
-			DRM_ERROR("Too many (%d) compute rings!\n", i);
+		if (!test_bit(i, adev->gfx.mec.queue_bitmap))
+			continue;
+
+		if (WARN_ON(ring_id >= AMDGPU_MAX_COMPUTE_RINGS))
 			break;
-		}
-		ring = &adev->gfx.compute_ring[i];
+
+		ring = &adev->gfx.compute_ring[ring_id];
+
+		/* mec0 is me1 */
+		ring->me = ((i / adev->gfx.mec.num_queue_per_pipe)
+				/ adev->gfx.mec.num_pipe_per_mec)
+				+ 1;
+		ring->pipe = (i / adev->gfx.mec.num_queue_per_pipe)
+				% adev->gfx.mec.num_pipe_per_mec;
+		ring->queue = i % adev->gfx.mec.num_queue_per_pipe;
+
 		ring->ring_obj = NULL;
 		ring->use_doorbell = true;
-		ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + i;
-		ring->me = 1; /* first MEC */
-		ring->pipe = i / 8;
-		ring->queue = i % 8;
+		ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
 		sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
-		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP + ring->pipe;
+
+		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
+			+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
+			+ ring->pipe;
+
 		/* type-2 packets are deprecated on MEC, use type-3 instead */
 		r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
 				     irq_type);
 		if (r)
 			return r;
+
+		ring_id++;
 	}
 
 	/* reserve GDS, GWS and OA resource for gfx */
 	r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
 				    PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
 				    &adev->gds.gds_gfx_bo, NULL, NULL);
 	if (r)
 		return r;
 
 	r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
@@ -5693,21 +5740,21 @@ static const struct amdgpu_gfx_funcs gfx_v8_0_gfx_funcs = {
 	.select_se_sh = &gfx_v8_0_select_se_sh,
 	.read_wave_data = &gfx_v8_0_read_wave_data,
 	.read_wave_sgprs = &gfx_v8_0_read_wave_sgprs,
 };
 
 static int gfx_v8_0_early_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	adev->gfx.num_gfx_rings = GFX8_NUM_GFX_RINGS;
-	adev->gfx.num_compute_rings = GFX8_NUM_COMPUTE_RINGS;
+	adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
 	adev->gfx.funcs = &gfx_v8_0_gfx_funcs;
 	gfx_v8_0_set_ring_funcs(adev);
 	gfx_v8_0_set_irq_funcs(adev);
 	gfx_v8_0_set_gds_init(adev);
 	gfx_v8_0_set_rlc_funcs(adev);
 
 	return 0;
 }
 
 static int gfx_v8_0_late_init(void *handle)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index a09d9f3..67f6d19 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -26,20 +26,21 @@
  */
 
 #ifndef KGD_KFD_INTERFACE_H_INCLUDED
 #define KGD_KFD_INTERFACE_H_INCLUDED
 
 #include <linux/types.h>
 
 struct pci_dev;
 
 #define KFD_INTERFACE_VERSION 1
+#define KGD_MAX_QUEUES 128
 
 struct kfd_dev;
 struct kgd_dev;
 
 struct kgd_mem;
 
 enum kgd_memory_pool {
 	KGD_POOL_SYSTEM_CACHEABLE = 1,
 	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
 	KGD_POOL_FRAMEBUFFER = 3,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/22] drm/amdgpu: teach amdgpu how to enable interrupts for any pipe
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 09/22] drm/amdgpu: allow split of queues with kfd at queue granularity Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 11/22] drm/amdkfd: allow split HQD on per-queue granularity v3 Andres Rodriguez
                     ` (13 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

The current implementation is hardcoded to enable ME1/PIPE0 interrupts
only.

This patch allows amdgpu to enable interrupts for any pipe of ME1.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 48 +++++++++++++----------------------
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 33 ++++++++++++------------
 2 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index fe46765..68265b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -5032,56 +5032,42 @@ static void gfx_v7_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev,
 		break;
 	default:
 		break;
 	}
 }
 
 static void gfx_v7_0_set_compute_eop_interrupt_state(struct amdgpu_device *adev,
 						     int me, int pipe,
 						     enum amdgpu_interrupt_state state)
 {
-	u32 mec_int_cntl, mec_int_cntl_reg;
-
-	/*
-	 * amdgpu controls only pipe 0 of MEC1. That's why this function only
-	 * handles the setting of interrupts for this specific pipe. All other
-	 * pipes' interrupts are set by amdkfd.
+	/* Me 0 is for graphics and Me 2 is reserved for HW scheduling
+	 * So we should only really be configuring ME 1 i.e. MEC0
 	 */
-
-	if (me == 1) {
-		switch (pipe) {
-		case 0:
-			mec_int_cntl_reg = mmCP_ME1_PIPE0_INT_CNTL;
-			break;
-		default:
-			DRM_DEBUG("invalid pipe %d\n", pipe);
-			return;
-		}
-	} else {
-		DRM_DEBUG("invalid me %d\n", me);
+	if (me != 1) {
+		DRM_ERROR("Ignoring request to enable interrupts for invalid me:%d\n", me);
 		return;
 	}
 
-	switch (state) {
-	case AMDGPU_IRQ_STATE_DISABLE:
-		mec_int_cntl = RREG32(mec_int_cntl_reg);
-		mec_int_cntl &= ~CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK;
-		WREG32(mec_int_cntl_reg, mec_int_cntl);
-		break;
-	case AMDGPU_IRQ_STATE_ENABLE:
-		mec_int_cntl = RREG32(mec_int_cntl_reg);
-		mec_int_cntl |= CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK;
-		WREG32(mec_int_cntl_reg, mec_int_cntl);
-		break;
-	default:
-		break;
+	if (pipe >= adev->gfx.mec.num_pipe_per_mec) {
+		DRM_ERROR("Ignoring request to enable interrupts for invalid "
+				"me:%d pipe:%d\n", pipe, me);
+		return;
 	}
+
+	mutex_lock(&adev->srbm_mutex);
+	cik_srbm_select(adev, me, pipe, 0, 0);
+
+	WREG32_FIELD(CPC_INT_CNTL, TIME_STAMP_INT_ENABLE,
+			state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
+
+	cik_srbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
 }
 
 static int gfx_v7_0_set_priv_reg_fault_state(struct amdgpu_device *adev,
 					     struct amdgpu_irq_src *src,
 					     unsigned type,
 					     enum amdgpu_interrupt_state state)
 {
 	u32 cp_int_cntl;
 
 	switch (state) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 1238b3d..861334b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6779,41 +6779,42 @@ static void gfx_v8_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev,
 						 enum amdgpu_interrupt_state state)
 {
 	WREG32_FIELD(CP_INT_CNTL_RING0, TIME_STAMP_INT_ENABLE,
 		     state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
 }
 
 static void gfx_v8_0_set_compute_eop_interrupt_state(struct amdgpu_device *adev,
 						     int me, int pipe,
 						     enum amdgpu_interrupt_state state)
 {
-	/*
-	 * amdgpu controls only pipe 0 of MEC1. That's why this function only
-	 * handles the setting of interrupts for this specific pipe. All other
-	 * pipes' interrupts are set by amdkfd.
+	/* Me 0 is for graphics and Me 2 is reserved for HW scheduling
+	 * So we should only really be configuring ME 1 i.e. MEC0
 	 */
+	if (me != 1) {
+		DRM_ERROR("Ignoring request to enable interrupts for invalid me:%d\n", me);
+		return;
+	}
 
-	if (me == 1) {
-		switch (pipe) {
-		case 0:
-			break;
-		default:
-			DRM_DEBUG("invalid pipe %d\n", pipe);
-			return;
-		}
-	} else {
-		DRM_DEBUG("invalid me %d\n", me);
+	if (pipe >= adev->gfx.mec.num_pipe_per_mec) {
+		DRM_ERROR("Ignoring request to enable interrupts for invalid "
+				"me:%d pipe:%d\n", pipe, me);
 		return;
 	}
 
-	WREG32_FIELD(CP_ME1_PIPE0_INT_CNTL, TIME_STAMP_INT_ENABLE,
-		     state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
+	mutex_lock(&adev->srbm_mutex);
+	vi_srbm_select(adev, me, pipe, 0, 0);
+
+	WREG32_FIELD(CPC_INT_CNTL, TIME_STAMP_INT_ENABLE,
+			state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
+
+	vi_srbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
 }
 
 static int gfx_v8_0_set_priv_reg_fault_state(struct amdgpu_device *adev,
 					     struct amdgpu_irq_src *source,
 					     unsigned type,
 					     enum amdgpu_interrupt_state state)
 {
 	WREG32_FIELD(CP_INT_CNTL_RING0, PRIV_REG_INT_ENABLE,
 		     state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/22] drm/amdkfd: allow split HQD on per-queue granularity v3
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 10/22] drm/amdgpu: teach amdgpu how to enable interrupts for any pipe Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 12/22] drm/amdgpu: remove duplicate magic constants from amdgpu_amdkfd_gfx*.c Andres Rodriguez
                     ` (12 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Oded Gabbay, Andres Rodriguez

Update the KGD to KFD interface to allow sharing pipes with queue
granularity instead of pipe granularity.

This allows for more interesting pipe/queue splits.

v2: fix overflow check for res.queue_mask
v3: fix shift overflow when setting res.queue_mask

CC: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |  22 ++++-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   4 +
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 100 ++++++++++++++-------
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  10 +--
 .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |   3 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   2 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  17 ++--
 drivers/gpu/drm/radeon/radeon_kfd.c                |  21 ++++-
 9 files changed, 126 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 3200ff9..8fc5aa3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -88,28 +88,44 @@ void amdgpu_amdkfd_fini(void)
 
 void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 {
 	if (kgd2kfd)
 		adev->kfd = kgd2kfd->probe((struct kgd_dev *)adev,
 					adev->pdev, kfd2kgd);
 }
 
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
+	int i;
+	int last_valid_bit;
 	if (adev->kfd) {
 		struct kgd2kfd_shared_resources gpu_resources = {
 			.compute_vmid_bitmap = 0xFF00,
-
-			.first_compute_pipe = 1,
-			.compute_pipe_count = 4 - 1,
+			.num_mec = adev->gfx.mec.num_mec,
+			.num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
+			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
 		};
 
+		/* this is going to have a few of the MSBs set that we need to
+		 * clear */
+		bitmap_complement(gpu_resources.queue_bitmap,
+				  adev->gfx.mec.queue_bitmap,
+				  KGD_MAX_QUEUES);
+
+		/* According to linux/bitmap.h we shouldn't use bitmap_clear if
+		 * nbits is not compile time constant */
+		last_valid_bit = adev->gfx.mec.num_mec
+				* adev->gfx.mec.num_pipe_per_mec
+				* adev->gfx.mec.num_queue_per_pipe;
+		for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
+			clear_bit(i, gpu_resources.queue_bitmap);
+
 		amdgpu_doorbell_get_kfd_info(adev,
 				&gpu_resources.doorbell_physical_address,
 				&gpu_resources.doorbell_aperture_size,
 				&gpu_resources.doorbell_start_offset);
 
 		kgd2kfd->device_init(adev->kfd, &gpu_resources);
 	}
 }
 
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 3f95f7c..88187bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -219,20 +219,24 @@ static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
 	return AMD_IOMMU_INV_PRI_RSP_INVALID;
 }
 
 bool kgd2kfd_device_init(struct kfd_dev *kfd,
 			 const struct kgd2kfd_shared_resources *gpu_resources)
 {
 	unsigned int size;
 
 	kfd->shared_resources = *gpu_resources;
 
+	/* We only use the first MEC */
+	if (kfd->shared_resources.num_mec > 1)
+		kfd->shared_resources.num_mec = 1;
+
 	/* calculate max size of mqds needed for queues */
 	size = max_num_of_queues_per_device *
 			kfd->device_info->mqd_size_aligned;
 
 	/*
 	 * calculate max size of runlist packet.
 	 * There can be only 2 packets at once
 	 */
 	size += (KFD_MAX_NUM_OF_PROCESSES * sizeof(struct pm4_map_process) +
 		max_num_of_queues_per_device *
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c064dea..ab52606 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -56,35 +56,58 @@ static void deallocate_sdma_queue(struct device_queue_manager *dqm,
 				unsigned int sdma_queue_id);
 
 static inline
 enum KFD_MQD_TYPE get_mqd_type_from_queue_type(enum kfd_queue_type type)
 {
 	if (type == KFD_QUEUE_TYPE_SDMA)
 		return KFD_MQD_TYPE_SDMA;
 	return KFD_MQD_TYPE_CP;
 }
 
-unsigned int get_first_pipe(struct device_queue_manager *dqm)
+static bool is_pipe_enabled(struct device_queue_manager *dqm, int mec, int pipe)
+{
+	int i;
+	int pipe_offset = mec * dqm->dev->shared_resources.num_pipe_per_mec
+		+ pipe * dqm->dev->shared_resources.num_queue_per_pipe;
+
+	/* queue is available for KFD usage if bit is 0 */
+	for (i = 0; i <  dqm->dev->shared_resources.num_queue_per_pipe; ++i)
+		if (test_bit(pipe_offset + i,
+			      dqm->dev->shared_resources.queue_bitmap))
+			return true;
+	return false;
+}
+
+unsigned int get_mec_num(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm || !dqm->dev);
+
+	return dqm->dev->shared_resources.num_mec;
+}
+
+unsigned int get_queues_num(struct device_queue_manager *dqm)
 {
 	BUG_ON(!dqm || !dqm->dev);
-	return dqm->dev->shared_resources.first_compute_pipe;
+	return bitmap_weight(dqm->dev->shared_resources.queue_bitmap,
+				KGD_MAX_QUEUES);
 }
 
-unsigned int get_pipes_num(struct device_queue_manager *dqm)
+unsigned int get_queues_per_pipe(struct device_queue_manager *dqm)
 {
 	BUG_ON(!dqm || !dqm->dev);
-	return dqm->dev->shared_resources.compute_pipe_count;
+	return dqm->dev->shared_resources.num_queue_per_pipe;
 }
 
-static inline unsigned int get_pipes_num_cpsch(void)
+unsigned int get_pipes_per_mec(struct device_queue_manager *dqm)
 {
-	return PIPE_PER_ME_CP_SCHEDULING;
+	BUG_ON(!dqm || !dqm->dev);
+	return dqm->dev->shared_resources.num_pipe_per_mec;
 }
 
 void program_sh_mem_settings(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
 	return dqm->dev->kfd2kgd->program_sh_mem_settings(
 						dqm->dev->kgd, qpd->vmid,
 						qpd->sh_mem_config,
 						qpd->sh_mem_ape1_base,
 						qpd->sh_mem_ape1_limit,
@@ -193,43 +216,47 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 	return 0;
 }
 
 static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
 {
 	bool set;
 	int pipe, bit, i;
 
 	set = false;
 
-	for (pipe = dqm->next_pipe_to_allocate, i = 0; i < get_pipes_num(dqm);
-			pipe = ((pipe + 1) % get_pipes_num(dqm)), ++i) {
+	for (pipe = dqm->next_pipe_to_allocate, i = 0; i < get_pipes_per_mec(dqm);
+			pipe = ((pipe + 1) % get_pipes_per_mec(dqm)), ++i) {
+
+		if (!is_pipe_enabled(dqm, 0, pipe))
+			continue;
+
 		if (dqm->allocated_queues[pipe] != 0) {
 			bit = find_first_bit(
 				(unsigned long *)&dqm->allocated_queues[pipe],
-				QUEUES_PER_PIPE);
+				get_queues_per_pipe(dqm));
 
 			clear_bit(bit,
 				(unsigned long *)&dqm->allocated_queues[pipe]);
 			q->pipe = pipe;
 			q->queue = bit;
 			set = true;
 			break;
 		}
 	}
 
 	if (!set)
 		return -EBUSY;
 
 	pr_debug("kfd: DQM %s hqd slot - pipe (%d) queue(%d)\n",
 				__func__, q->pipe, q->queue);
 	/* horizontal hqd allocation */
-	dqm->next_pipe_to_allocate = (pipe + 1) % get_pipes_num(dqm);
+	dqm->next_pipe_to_allocate = (pipe + 1) % get_pipes_per_mec(dqm);
 
 	return 0;
 }
 
 static inline void deallocate_hqd(struct device_queue_manager *dqm,
 				struct queue *q)
 {
 	set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
 }
 
@@ -462,75 +489,64 @@ set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid,
 
 	pasid_mapping = (pasid == 0) ? 0 :
 		(uint32_t)pasid |
 		ATC_VMID_PASID_MAPPING_VALID;
 
 	return dqm->dev->kfd2kgd->set_pasid_vmid_mapping(
 						dqm->dev->kgd, pasid_mapping,
 						vmid);
 }
 
-int init_pipelines(struct device_queue_manager *dqm,
-			unsigned int pipes_num, unsigned int first_pipe)
-{
-	BUG_ON(!dqm || !dqm->dev);
-
-	pr_debug("kfd: In func %s\n", __func__);
-
-	return 0;
-}
-
 static void init_interrupts(struct device_queue_manager *dqm)
 {
 	unsigned int i;
 
 	BUG_ON(dqm == NULL);
 
-	for (i = 0 ; i < get_pipes_num(dqm) ; i++)
-		dqm->dev->kfd2kgd->init_interrupts(dqm->dev->kgd,
-				i + get_first_pipe(dqm));
+	for (i = 0 ; i < get_pipes_per_mec(dqm) ; i++)
+		if (is_pipe_enabled(dqm, 0, i))
+			dqm->dev->kfd2kgd->init_interrupts(dqm->dev->kgd, i);
 }
 
 static int init_scheduler(struct device_queue_manager *dqm)
 {
-	int retval;
+	int retval = 0;
 
 	BUG_ON(!dqm);
 
 	pr_debug("kfd: In %s\n", __func__);
 
-	retval = init_pipelines(dqm, get_pipes_num(dqm), get_first_pipe(dqm));
 	return retval;
 }
 
 static int initialize_nocpsch(struct device_queue_manager *dqm)
 {
 	int i;
 
 	BUG_ON(!dqm);
 
 	pr_debug("kfd: In func %s num of pipes: %d\n",
-			__func__, get_pipes_num(dqm));
+			__func__, get_pipes_per_mec(dqm));
 
 	mutex_init(&dqm->lock);
 	INIT_LIST_HEAD(&dqm->queues);
 	dqm->queue_count = dqm->next_pipe_to_allocate = 0;
 	dqm->sdma_queue_count = 0;
-	dqm->allocated_queues = kcalloc(get_pipes_num(dqm),
+	dqm->allocated_queues = kcalloc(get_pipes_per_mec(dqm),
 					sizeof(unsigned int), GFP_KERNEL);
 	if (!dqm->allocated_queues) {
 		mutex_destroy(&dqm->lock);
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < get_pipes_num(dqm); i++)
-		dqm->allocated_queues[i] = (1 << QUEUES_PER_PIPE) - 1;
+	for (i = 0; i < get_pipes_per_mec(dqm); i++)
+		dqm->allocated_queues[i] = (1 << get_queues_per_pipe(dqm)) - 1;
 
 	dqm->vmid_bitmap = (1 << VMID_PER_DEVICE) - 1;
 	dqm->sdma_bitmap = (1 << CIK_SDMA_QUEUES) - 1;
 
 	init_scheduler(dqm);
 	return 0;
 }
 
 static void uninitialize_nocpsch(struct device_queue_manager *dqm)
 {
@@ -623,51 +639,67 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 
 	return 0;
 }
 
 /*
  * Device Queue Manager implementation for cp scheduler
  */
 
 static int set_sched_resources(struct device_queue_manager *dqm)
 {
+	int i;
 	struct scheduling_resources res;
-	unsigned int queue_num, queue_mask;
 
 	BUG_ON(!dqm);
 
 	pr_debug("kfd: In func %s\n", __func__);
 
-	queue_num = get_pipes_num_cpsch() * QUEUES_PER_PIPE;
-	queue_mask = (1 << queue_num) - 1;
 	res.vmid_mask = (1 << VMID_PER_DEVICE) - 1;
 	res.vmid_mask <<= KFD_VMID_START_OFFSET;
-	res.queue_mask = queue_mask << (get_first_pipe(dqm) * QUEUES_PER_PIPE);
+
+	/* Avoid touching the internal representation queue_bitmap directly.
+	 * Even though doing a simple memcpy might sound tempting, it would
+	 * silently break if the implementation of bitmaps is changed */
+	res.queue_mask = 0;
+	for (i = 0; i < KGD_MAX_QUEUES; ++i) {
+		if (!test_bit(i, dqm->dev->shared_resources.queue_bitmap))
+			continue;
+
+		/* This situation may be hit in the future if a new HW
+		 * generation exposes more than 64 queues. If so, the
+		 * definition of res.queue_mask needs updating */
+		if (WARN_ON(i > (sizeof(res.queue_mask)*8))) {
+			pr_err("Invalid queue enabled by amdgpu: %d\n", i);
+			break;
+		}
+
+		res.queue_mask |= (1ull << i);
+	}
 	res.gws_mask = res.oac_mask = res.gds_heap_base =
 						res.gds_heap_size = 0;
 
 	pr_debug("kfd: scheduling resources:\n"
 			"      vmid mask: 0x%8X\n"
 			"      queue mask: 0x%8llX\n",
 			res.vmid_mask, res.queue_mask);
 
 	return pm_send_set_resources(&dqm->packets, &res);
 }
 
 static int initialize_cpsch(struct device_queue_manager *dqm)
 {
 	int retval;
 
 	BUG_ON(!dqm);
 
 	pr_debug("kfd: In func %s num of pipes: %d\n",
-			__func__, get_pipes_num_cpsch());
+			__func__, get_pipes_per_mec(dqm));
 
 	mutex_init(&dqm->lock);
 	INIT_LIST_HEAD(&dqm->queues);
 	dqm->queue_count = dqm->processes_count = 0;
 	dqm->sdma_queue_count = 0;
 	dqm->active_runlist = false;
 	retval = dqm->ops_asic_specific.initialize(dqm);
 	if (retval != 0)
 		goto fail_init_pipelines;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index a625b91..66b9615 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -23,22 +23,20 @@
 
 #ifndef KFD_DEVICE_QUEUE_MANAGER_H_
 #define KFD_DEVICE_QUEUE_MANAGER_H_
 
 #include <linux/rwsem.h>
 #include <linux/list.h>
 #include "kfd_priv.h"
 #include "kfd_mqd_manager.h"
 
 #define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS	(500)
-#define QUEUES_PER_PIPE				(8)
-#define PIPE_PER_ME_CP_SCHEDULING		(3)
 #define CIK_VMID_NUM				(8)
 #define KFD_VMID_START_OFFSET			(8)
 #define VMID_PER_DEVICE				CIK_VMID_NUM
 #define KFD_DQM_FIRST_PIPE			(0)
 #define CIK_SDMA_QUEUES				(4)
 #define CIK_SDMA_QUEUES_PER_ENGINE		(2)
 #define CIK_SDMA_ENGINE_NUM			(2)
 
 struct device_process_node {
 	struct qcm_process_device *qpd;
@@ -175,24 +173,24 @@ struct device_queue_manager {
 	uint64_t		fence_gpu_addr;
 	unsigned int		*fence_addr;
 	struct kfd_mem_obj	*fence_mem;
 	bool			active_runlist;
 };
 
 void device_queue_manager_init_cik(struct device_queue_manager_asic_ops *ops);
 void device_queue_manager_init_vi(struct device_queue_manager_asic_ops *ops);
 void program_sh_mem_settings(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd);
-int init_pipelines(struct device_queue_manager *dqm,
-		unsigned int pipes_num, unsigned int first_pipe);
-unsigned int get_first_pipe(struct device_queue_manager *dqm);
-unsigned int get_pipes_num(struct device_queue_manager *dqm);
+unsigned int get_mec_num(struct device_queue_manager *dqm);
+unsigned int get_queues_num(struct device_queue_manager *dqm);
+unsigned int get_queues_per_pipe(struct device_queue_manager *dqm);
+unsigned int get_pipes_per_mec(struct device_queue_manager *dqm);
 
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
 	return (pdd->lds_base >> 16) & 0xFF;
 }
 
 static inline unsigned int
 get_sh_mem_bases_nybble_64(struct kfd_process_device *pdd)
 {
 	return (pdd->lds_base >> 60) & 0x0E;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c
index c6f435a..48dc056 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_cik.c
@@ -144,12 +144,12 @@ static void init_sdma_vm(struct device_queue_manager *dqm, struct queue *q,
 	else
 		value |= ((get_sh_mem_bases_nybble_64(qpd_to_pdd(qpd))) <<
 				SDMA0_RLC0_VIRTUAL_ADDR__SHARED_BASE__SHIFT) &
 				SDMA0_RLC0_VIRTUAL_ADDR__SHARED_BASE_MASK;
 
 	q->properties.sdma_vm_addr = value;
 }
 
 static int initialize_cpsch_cik(struct device_queue_manager *dqm)
 {
-	return init_pipelines(dqm, get_pipes_num(dqm), get_first_pipe(dqm));
+	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index ca8c093..7131998 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -58,22 +58,21 @@ static void pm_calc_rlib_size(struct packet_manager *pm,
 	unsigned int process_count, queue_count;
 	unsigned int map_queue_size;
 
 	BUG_ON(!pm || !rlib_size || !over_subscription);
 
 	process_count = pm->dqm->processes_count;
 	queue_count = pm->dqm->queue_count;
 
 	/* check if there is over subscription*/
 	*over_subscription = false;
-	if ((process_count > 1) ||
-		queue_count > PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE) {
+	if ((process_count > 1) || queue_count > get_queues_num(pm->dqm)) {
 		*over_subscription = true;
 		pr_debug("kfd: over subscribed runlist\n");
 	}
 
 	map_queue_size =
 		(pm->dqm->dev->device_info->asic_family == CHIP_CARRIZO) ?
 		sizeof(struct pm4_mes_map_queues) :
 		sizeof(struct pm4_map_queues);
 	/* calculate run list ib allocation size */
 	*rlib_size = process_count * sizeof(struct pm4_map_process) +
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index e1fb40b..32cdf2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -202,21 +202,21 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 		retval = -ENOMEM;
 		goto err_allocate_pqn;
 	}
 
 	switch (type) {
 	case KFD_QUEUE_TYPE_SDMA:
 	case KFD_QUEUE_TYPE_COMPUTE:
 		/* check if there is over subscription */
 		if ((sched_policy == KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION) &&
 		((dev->dqm->processes_count >= VMID_PER_DEVICE) ||
-		(dev->dqm->queue_count >= PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE))) {
+		(dev->dqm->queue_count >= get_queues_num(dev->dqm)))) {
 			pr_err("kfd: over-subscription is not allowed in radeon_kfd.sched_policy == 1\n");
 			retval = -EPERM;
 			goto err_create_queue;
 		}
 
 		retval = create_cp_queue(pqm, dev, &q, &q_properties, f, *qid);
 		if (retval != 0)
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 67f6d19..91ef148 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -22,24 +22,25 @@
 
 /*
  * This file defines the private interface between the
  * AMD kernel graphics drivers and the AMD KFD.
  */
 
 #ifndef KGD_KFD_INTERFACE_H_INCLUDED
 #define KGD_KFD_INTERFACE_H_INCLUDED
 
 #include <linux/types.h>
+#include <linux/bitmap.h>
 
 struct pci_dev;
 
-#define KFD_INTERFACE_VERSION 1
+#define KFD_INTERFACE_VERSION 2
 #define KGD_MAX_QUEUES 128
 
 struct kfd_dev;
 struct kgd_dev;
 
 struct kgd_mem;
 
 enum kgd_memory_pool {
 	KGD_POOL_SYSTEM_CACHEABLE = 1,
 	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
@@ -55,25 +56,31 @@ enum kgd_engine_type {
 	KGD_ENGINE_RLC,
 	KGD_ENGINE_SDMA1,
 	KGD_ENGINE_SDMA2,
 	KGD_ENGINE_MAX
 };
 
 struct kgd2kfd_shared_resources {
 	/* Bit n == 1 means VMID n is available for KFD. */
 	unsigned int compute_vmid_bitmap;
 
-	/* Compute pipes are counted starting from MEC0/pipe0 as 0. */
-	unsigned int first_compute_pipe;
+	/* number of mec available from the hardware */
+	uint32_t num_mec;
 
-	/* Number of MEC pipes available for KFD. */
-	unsigned int compute_pipe_count;
+	/* number of pipes per mec */
+	uint32_t num_pipe_per_mec;
+
+	/* number of queues per pipe */
+	uint32_t num_queue_per_pipe;
+
+	/* Bit n == 1 means Queue n is available for KFD */
+	DECLARE_BITMAP(queue_bitmap, KGD_MAX_QUEUES);
 
 	/* Base address of doorbell aperture. */
 	phys_addr_t doorbell_physical_address;
 
 	/* Size in bytes of doorbell aperture. */
 	size_t doorbell_aperture_size;
 
 	/* Number of bytes at start of aperture reserved for KGD. */
 	size_t doorbell_start_offset;
 };
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index a06e3b1..699fe7f 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -172,28 +172,43 @@ void radeon_kfd_fini(void)
 
 void radeon_kfd_device_probe(struct radeon_device *rdev)
 {
 	if (kgd2kfd)
 		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev,
 			rdev->pdev, &kfd2kgd);
 }
 
 void radeon_kfd_device_init(struct radeon_device *rdev)
 {
+	int i, queue, pipe, mec;
+
 	if (rdev->kfd) {
 		struct kgd2kfd_shared_resources gpu_resources = {
 			.compute_vmid_bitmap = 0xFF00,
-
-			.first_compute_pipe = 1,
-			.compute_pipe_count = 4 - 1,
+			.num_mec = 1,
+			.num_pipe_per_mec = 4,
+			.num_queue_per_pipe = 8
 		};
 
+		bitmap_zero(gpu_resources.queue_bitmap, KGD_MAX_QUEUES);
+
+		for (i = 0; i < KGD_MAX_QUEUES; ++i) {
+			queue = i % gpu_resources.num_queue_per_pipe;
+			pipe = (i / gpu_resources.num_queue_per_pipe)
+				% gpu_resources.num_pipe_per_mec;
+			mec = (i / gpu_resources.num_queue_per_pipe)
+				/ gpu_resources.num_pipe_per_mec;
+
+			if (mec == 0 && pipe > 0)
+				set_bit(i, gpu_resources.queue_bitmap);
+		}
+
 		radeon_doorbell_get_kfd_info(rdev,
 				&gpu_resources.doorbell_physical_address,
 				&gpu_resources.doorbell_aperture_size,
 				&gpu_resources.doorbell_start_offset);
 
 		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
 	}
 }
 
 void radeon_kfd_device_fini(struct radeon_device *rdev)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/22] drm/amdgpu: remove duplicate magic constants from amdgpu_amdkfd_gfx*.c
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 11/22] drm/amdkfd: allow split HQD on per-queue granularity v3 Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 13/22] drm/amdgpu: allocate queues horizontally across pipes Andres Rodriguez
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

This information is already available in adev.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 12 ++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 910f9d3..5254562 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -32,22 +32,20 @@
 #include "gfx_v7_0.h"
 #include "gca/gfx_7_2_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 #include "gmc/gmc_7_1_d.h"
 #include "gmc/gmc_7_1_sh_mask.h"
 #include "cik_structs.h"
 
-#define CIK_PIPE_PER_MEC	(4)
-
 enum {
 	MAX_TRAPID = 8,		/* 3 bits in the bitfield. */
 	MAX_WATCH_ADDRESSES = 4
 };
 
 enum {
 	ADDRESS_WATCH_REG_ADDR_HI = 0,
 	ADDRESS_WATCH_REG_ADDR_LO,
 	ADDRESS_WATCH_REG_CNTL,
 	ADDRESS_WATCH_REG_MAX
@@ -179,22 +177,24 @@ static void unlock_srbm(struct kgd_dev *kgd)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
 	WREG32(mmSRBM_GFX_CNTL, 0);
 	mutex_unlock(&adev->srbm_mutex);
 }
 
 static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id,
 				uint32_t queue_id)
 {
-	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
-	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	uint32_t mec = (++pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+	uint32_t pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
 	lock_srbm(kgd, mec, pipe, queue_id, 0);
 }
 
 static void release_queue(struct kgd_dev *kgd)
 {
 	unlock_srbm(kgd);
 }
 
 static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
@@ -247,22 +247,22 @@ static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
 	/* amdgpu owns the per-pipe state */
 	return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	uint32_t mec;
 	uint32_t pipe;
 
-	mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
-	pipe = (pipe_id % CIK_PIPE_PER_MEC);
+	mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+	pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
 	lock_srbm(kgd, mec, pipe, 0, 0);
 
 	WREG32(mmCPC_INT_CNTL, CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
 			CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
 
 	unlock_srbm(kgd);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 5843368..db7410a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -32,22 +32,20 @@
 #include "gca/gfx_8_0_sh_mask.h"
 #include "gca/gfx_8_0_d.h"
 #include "gca/gfx_8_0_enum.h"
 #include "oss/oss_3_0_sh_mask.h"
 #include "oss/oss_3_0_d.h"
 #include "gmc/gmc_8_1_sh_mask.h"
 #include "gmc/gmc_8_1_d.h"
 #include "vi_structs.h"
 #include "vid.h"
 
-#define VI_PIPE_PER_MEC	(4)
-
 struct cik_sdma_rlc_registers;
 
 /*
  * Register access functions
  */
 
 static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 		uint32_t sh_mem_config,
 		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit,
 		uint32_t sh_mem_bases);
@@ -140,22 +138,24 @@ static void unlock_srbm(struct kgd_dev *kgd)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
 	WREG32(mmSRBM_GFX_CNTL, 0);
 	mutex_unlock(&adev->srbm_mutex);
 }
 
 static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id,
 				uint32_t queue_id)
 {
-	uint32_t mec = (++pipe_id / VI_PIPE_PER_MEC) + 1;
-	uint32_t pipe = (pipe_id % VI_PIPE_PER_MEC);
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	uint32_t mec = (++pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+	uint32_t pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
 	lock_srbm(kgd, mec, pipe, queue_id, 0);
 }
 
 static void release_queue(struct kgd_dev *kgd)
 {
 	unlock_srbm(kgd);
 }
 
 static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
@@ -209,22 +209,22 @@ static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
 	/* amdgpu owns the per-pipe state */
 	return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	uint32_t mec;
 	uint32_t pipe;
 
-	mec = (++pipe_id / VI_PIPE_PER_MEC) + 1;
-	pipe = (pipe_id % VI_PIPE_PER_MEC);
+	mec = (++pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+	pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
 	lock_srbm(kgd, mec, pipe, 0, 0);
 
 	WREG32(mmCPC_INT_CNTL, CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK);
 
 	unlock_srbm(kgd);
 
 	return 0;
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/22] drm/amdgpu: allocate queues horizontally across pipes
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 12/22] drm/amdgpu: remove duplicate magic constants from amdgpu_amdkfd_gfx*.c Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 14/22] drm/amdgpu: new queue policy, take first 2 queues of each pipe Andres Rodriguez
                     ` (10 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Pipes provide better concurrency than queues, therefore we want to make
sure that apps use queues from different pipes whenever possible.

Optimize for the trivial case where an app will consume rings in order,
therefore we don't want adjacent rings to belong to the same pipe.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 13 ++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 78 +++++++++++++++++++-------------
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 83 +++++++++++++++++++++--------------
 3 files changed, 109 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6f7e4f8..88dad81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1636,20 +1636,33 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
 	for (i = 0; i < adev->sdma.num_instances; i++)
 		if (&adev->sdma.instance[i].ring == ring)
 			break;
 
 	if (i < AMDGPU_MAX_SDMA_INSTANCES)
 		return &adev->sdma.instance[i];
 	else
 		return NULL;
 }
 
+static inline bool amdgpu_is_mec_queue_enabled(struct amdgpu_device *adev,
+						int mec, int pipe, int queue)
+{
+	int bit = 0;
+
+	bit += mec * adev->gfx.mec.num_pipe_per_mec
+		* adev->gfx.mec.num_queue_per_pipe;
+	bit += pipe * adev->gfx.mec.num_queue_per_pipe;
+	bit += queue;
+
+	return test_bit(bit, adev->gfx.mec.queue_bitmap);
+}
+
 /*
  * ASICs macro.
  */
 #define amdgpu_asic_set_vga_state(adev, state) (adev)->asic_funcs->set_vga_state((adev), (state))
 #define amdgpu_asic_reset(adev) (adev)->asic_funcs->reset((adev))
 #define amdgpu_asic_get_xclk(adev) (adev)->asic_funcs->get_xclk((adev))
 #define amdgpu_asic_set_uvd_clocks(adev, v, d) (adev)->asic_funcs->set_uvd_clocks((adev), (v), (d))
 #define amdgpu_asic_set_vce_clocks(adev, ev, ec) (adev)->asic_funcs->set_vce_clocks((adev), (ev), (ec))
 #define amdgpu_get_pcie_lanes(adev) (adev)->asic_funcs->get_pcie_lanes((adev))
 #define amdgpu_set_pcie_lanes(adev, l) (adev)->asic_funcs->set_pcie_lanes((adev), (l))
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 68265b7..3ca5519 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -4720,25 +4720,56 @@ static void gfx_v7_0_gpu_early_init(struct amdgpu_device *adev)
 	case 2:
 		gb_addr_config |= (1 << GB_ADDR_CONFIG__ROW_SIZE__SHIFT);
 		break;
 	case 4:
 		gb_addr_config |= (2 << GB_ADDR_CONFIG__ROW_SIZE__SHIFT);
 		break;
 	}
 	adev->gfx.config.gb_addr_config = gb_addr_config;
 }
 
+static int gfx_v7_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
+					int mec, int pipe, int queue)
+{
+	int r;
+	unsigned irq_type;
+	struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
+
+	/* mec0 is me1 */
+	ring->me = mec + 1;
+	ring->pipe = pipe;
+	ring->queue = queue;
+
+	ring->ring_obj = NULL;
+	ring->use_doorbell = true;
+	ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
+	sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
+
+	irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
+		+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
+		+ ring->pipe;
+
+	/* type-2 packets are deprecated on MEC, use type-3 instead */
+	r = amdgpu_ring_init(adev, ring, 1024,
+			&adev->gfx.eop_irq, irq_type);
+	if (r)
+		return r;
+
+
+	return 0;
+}
+
 static int gfx_v7_0_sw_init(void *handle)
 {
 	struct amdgpu_ring *ring;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-	int i, r, ring_id;
+	int i, j, k, r, ring_id;
 
 	/* EOP Event */
 	r = amdgpu_irq_add_id(adev, 181, &adev->gfx.eop_irq);
 	if (r)
 		return r;
 
 	/* Privileged reg */
 	r = amdgpu_irq_add_id(adev, 184, &adev->gfx.priv_reg_irq);
 	if (r)
 		return r;
@@ -4772,53 +4803,38 @@ static int gfx_v7_0_sw_init(void *handle)
 	for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
 		ring = &adev->gfx.gfx_ring[i];
 		ring->ring_obj = NULL;
 		sprintf(ring->name, "gfx");
 		r = amdgpu_ring_init(adev, ring, 1024,
 				     &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
-	/* set up the compute queues */
-	for (i = 0, ring_id = 0; i < AMDGPU_MAX_QUEUES; i++) {
-		unsigned irq_type;
-
-		if (!test_bit(i, adev->gfx.mec.queue_bitmap))
-			continue;
-
-		ring = &adev->gfx.compute_ring[ring_id];
-
-		/* mec0 is me1 */
-		ring->me = ((i / adev->gfx.mec.num_queue_per_pipe)
-				/ adev->gfx.mec.num_pipe_per_mec)
-				+ 1;
-		ring->pipe = (i / adev->gfx.mec.num_queue_per_pipe)
-				% adev->gfx.mec.num_pipe_per_mec;
-		ring->queue = i % adev->gfx.mec.num_queue_per_pipe;
-
-		ring->ring_obj = NULL;
-		ring->use_doorbell = true;
-		ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
-		sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
+	/* set up the compute queues - allocate horizontally across pipes */
+	ring_id = 0;
+	for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; ++i) {
+		for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
+			for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
 
-		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
-			+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
-			+ ring->pipe;
+				if (!amdgpu_is_mec_queue_enabled(adev, i, k, j))
+					continue;
 
-		/* type-2 packets are deprecated on MEC, use type-3 instead */
-		r = amdgpu_ring_init(adev, ring, 1024,
-				     &adev->gfx.eop_irq, irq_type);
-		if (r)
-			return r;
+				r = gfx_v7_0_compute_ring_init(adev,
+								ring_id,
+								i, k, j);
+				if (r)
+					return r;
 
-		ring_id++;
+				ring_id++;
+			}
+		}
 	}
 
 	/* reserve GDS, GWS and OA resource for gfx */
 	r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
 				    PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
 				    &adev->gds.gds_gfx_bo, NULL, NULL);
 	if (r)
 		return r;
 
 	r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 861334b..edddd86 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2115,23 +2115,56 @@ static int gfx_v8_0_gpu_early_init(struct amdgpu_device *adev)
 		break;
 	case 4:
 		gb_addr_config = REG_SET_FIELD(gb_addr_config, GB_ADDR_CONFIG, ROW_SIZE, 2);
 		break;
 	}
 	adev->gfx.config.gb_addr_config = gb_addr_config;
 
 	return 0;
 }
 
+static int gfx_v8_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
+					int mec, int pipe, int queue)
+{
+	int r;
+	unsigned irq_type;
+	struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
+
+	ring = &adev->gfx.compute_ring[ring_id];
+
+	/* mec0 is me1 */
+	ring->me = mec + 1;
+	ring->pipe = pipe;
+	ring->queue = queue;
+
+	ring->ring_obj = NULL;
+	ring->use_doorbell = true;
+	ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
+	sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
+
+	irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
+		+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
+		+ ring->pipe;
+
+	/* type-2 packets are deprecated on MEC, use type-3 instead */
+	r = amdgpu_ring_init(adev, ring, 1024,
+			&adev->gfx.eop_irq, irq_type);
+	if (r)
+		return r;
+
+
+	return 0;
+}
+
 static int gfx_v8_0_sw_init(void *handle)
 {
-	int i, r, ring_id;
+	int i, j, k, r, ring_id;
 	struct amdgpu_ring *ring;
 	struct amdgpu_kiq *kiq;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	/* KIQ event */
 	r = amdgpu_irq_add_id(adev, 178, &adev->gfx.kiq.irq);
 	if (r)
 		return r;
 
 	/* EOP Event */
@@ -2192,56 +2225,38 @@ static int gfx_v8_0_sw_init(void *handle)
 			ring->use_doorbell = true;
 			ring->doorbell_index = AMDGPU_DOORBELL_GFX_RING0;
 		}
 
 		r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
 				     AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
-	/* set up the compute queues */
-	for (i = 0, ring_id = 0; i < AMDGPU_MAX_QUEUES; i++) {
-		unsigned irq_type;
-
-		if (!test_bit(i, adev->gfx.mec.queue_bitmap))
-			continue;
-
-		if (WARN_ON(ring_id >= AMDGPU_MAX_COMPUTE_RINGS))
-			break;
-
-		ring = &adev->gfx.compute_ring[ring_id];
-
-		/* mec0 is me1 */
-		ring->me = ((i / adev->gfx.mec.num_queue_per_pipe)
-				/ adev->gfx.mec.num_pipe_per_mec)
-				+ 1;
-		ring->pipe = (i / adev->gfx.mec.num_queue_per_pipe)
-				% adev->gfx.mec.num_pipe_per_mec;
-		ring->queue = i % adev->gfx.mec.num_queue_per_pipe;
-
-		ring->ring_obj = NULL;
-		ring->use_doorbell = true;
-		ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
-		sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
+	/* set up the compute queues - allocate horizontally across pipes */
+	ring_id = 0;
+	for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; ++i) {
+		for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
+			for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
 
-		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
-			+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
-			+ ring->pipe;
+				if (!amdgpu_is_mec_queue_enabled(adev, i, k, j))
+					continue;
 
-		/* type-2 packets are deprecated on MEC, use type-3 instead */
-		r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
-				     irq_type);
-		if (r)
-			return r;
+				r = gfx_v8_0_compute_ring_init(adev,
+								ring_id,
+								i, k, j);
+				if (r)
+					return r;
 
-		ring_id++;
+				ring_id++;
+			}
+		}
 	}
 
 	/* reserve GDS, GWS and OA resource for gfx */
 	r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
 				    PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
 				    &adev->gds.gds_gfx_bo, NULL, NULL);
 	if (r)
 		return r;
 
 	r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 14/22] drm/amdgpu: new queue policy, take first 2 queues of each pipe
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 13/22] drm/amdgpu: allocate queues horizontally across pipes Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 15/22] drm/amdgpu: add hw_ip member to amdgpu_ring Andres Rodriguez
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Instead of taking the first pipe and givint the rest to kfd, take the
first 2 queues of each pipe.

Effectively, amdgpu and amdkfd own the same number of queues. But
because the queues are spread over multiple pipes the hardware will be
able to better handle concurrent compute workloads.

amdgpu goes from 1 pipe to 4 pipes, i.e. from 1 compute threads to 4
amdkfd goes from 3 pipe to 4 pipes, i.e. from 3 compute threads to 4

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 3ca5519..b0b0c89 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2811,21 +2811,21 @@ static void gfx_v7_0_compute_queue_acquire(struct amdgpu_device *adev)
 		pipe = (i / adev->gfx.mec.num_queue_per_pipe)
 			% adev->gfx.mec.num_pipe_per_mec;
 		mec = (i / adev->gfx.mec.num_queue_per_pipe)
 			/ adev->gfx.mec.num_pipe_per_mec;
 
 		/* we've run out of HW */
 		if (mec > adev->gfx.mec.num_mec)
 			break;
 
 		/* policy: amdgpu owns all queues in the first pipe */
-		if (mec == 0 && pipe == 0)
+		if (mec == 0 && queue < 2)
 			set_bit(i, adev->gfx.mec.queue_bitmap);
 	}
 
 	/* update the number of active compute rings */
 	adev->gfx.num_compute_rings =
 		bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_QUEUES);
 
 	/* If you hit this case and edited the policy, you probably just
 	 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
 	WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index edddd86..5db5bac 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1429,21 +1429,21 @@ static void gfx_v8_0_compute_queue_acquire(struct amdgpu_device *adev)
 		pipe = (i / adev->gfx.mec.num_queue_per_pipe)
 			% adev->gfx.mec.num_pipe_per_mec;
 		mec = (i / adev->gfx.mec.num_queue_per_pipe)
 			/ adev->gfx.mec.num_pipe_per_mec;
 
 		/* we've run out of HW */
 		if (mec > adev->gfx.mec.num_mec)
 			break;
 
 		/* policy: amdgpu owns all queues in the first pipe */
-		if (mec == 0 && pipe == 0)
+		if (mec == 0 && queue < 2)
 			set_bit(i, adev->gfx.mec.queue_bitmap);
 	}
 
 	/* update the number of active compute rings */
 	adev->gfx.num_compute_rings =
 		bitmap_weight(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_QUEUES);
 
 	/* If you hit this case and edited the policy, you probably just
 	 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
 	if (WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS))
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 15/22] drm/amdgpu: add hw_ip member to amdgpu_ring
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 14/22] drm/amdgpu: new queue policy, take first 2 queues of each pipe Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
       [not found]     ` <1488320089-22035-16-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-28 22:14   ` [PATCH 16/22] drm/amdgpu: add a mechanism to untie user ring ids from kernel ring ids Andres Rodriguez
                     ` (8 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Keep track of a ring's HW IP block so we can identify it later.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +++--
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c    | 8 ++++----
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 2 +-
 drivers/gpu/drm/amd/amdgpu/si_dma.c      | 2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c    | 3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c    | 3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    | 3 ++-
 drivers/gpu/drm/amd/amdgpu/vce_v2_0.c    | 2 +-
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c    | 3 ++-
 include/uapi/drm/amdgpu_drm.h            | 3 ++-
 15 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 7c842b7..4ff762c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -157,21 +157,21 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
  *
  * @adev: amdgpu_device pointer
  * @ring: amdgpu_ring structure holding ring information
  * @max_ndw: maximum number of dw for ring alloc
  * @nop: nop packet for this ring
  *
  * Initialize the driver information for the selected ring (all asics).
  * Returns 0 on success, error on failure.
  */
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
-		     unsigned max_dw, struct amdgpu_irq_src *irq_src,
+		     int hw_ip, unsigned max_dw, struct amdgpu_irq_src *irq_src,
 		     unsigned irq_type)
 {
 	int r;
 
 	if (ring->adev == NULL) {
 		if (adev->num_rings >= AMDGPU_MAX_RINGS)
 			return -EINVAL;
 
 		ring->adev = adev;
 		ring->idx = adev->num_rings++;
@@ -227,20 +227,21 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 					    &ring->gpu_addr,
 					    (void **)&ring->ring);
 		if (r) {
 			dev_err(adev->dev, "(%d) ring create failed\n", r);
 			return r;
 		}
 		memset((void *)ring->ring, 0, ring->ring_size);
 	}
 	ring->ptr_mask = (ring->ring_size / 4) - 1;
 	ring->max_dw = max_dw;
+	ring->hw_ip = hw_ip;
 
 	if (amdgpu_debugfs_ring_init(adev, ring)) {
 		DRM_ERROR("Failed to register debugfs file for rings !\n");
 	}
 	return 0;
 }
 
 /**
  * amdgpu_ring_fini - tear down the driver ring struct.
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 2345b398..3ff021f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -150,20 +150,21 @@ struct amdgpu_ring {
 	unsigned		rptr_offs;
 	unsigned		wptr;
 	unsigned		wptr_old;
 	unsigned		ring_size;
 	unsigned		max_dw;
 	int			count_dw;
 	uint64_t		gpu_addr;
 	uint32_t		ptr_mask;
 	bool			ready;
 	u32			idx;
+	u32			hw_ip;
 	u32			me;
 	u32			pipe;
 	u32			queue;
 	struct amdgpu_bo	*mqd_obj;
 	u32			doorbell_index;
 	bool			use_doorbell;
 	unsigned		wptr_offs;
 	unsigned		fence_offs;
 	uint64_t		current_ctx;
 	char			name[16];
@@ -174,15 +175,15 @@ struct amdgpu_ring {
 	struct dentry *ent;
 #endif
 };
 
 int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
 void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
 void amdgpu_ring_commit(struct amdgpu_ring *ring);
 void amdgpu_ring_undo(struct amdgpu_ring *ring);
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
-		     unsigned ring_size, struct amdgpu_irq_src *irq_src,
-		     unsigned irq_type);
+		     int hw_ip, unsigned ring_size,
+		     struct amdgpu_irq_src *irq_src, unsigned irq_type);
 void amdgpu_ring_fini(struct amdgpu_ring *ring);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index 810bba5..64b6cb7 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -935,21 +935,21 @@ static int cik_sdma_sw_init(void *handle)
 
 	/* SDMA Privileged inst */
 	r = amdgpu_irq_add_id(adev, 247, &adev->sdma.illegal_inst_irq);
 	if (r)
 		return r;
 
 	for (i = 0; i < adev->sdma.num_instances; i++) {
 		ring = &adev->sdma.instance[i].ring;
 		ring->ring_obj = NULL;
 		sprintf(ring->name, "sdma%d", i);
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
 				     &adev->sdma.trap_irq,
 				     (i == 0) ?
 				     AMDGPU_SDMA_IRQ_TRAP0 :
 				     AMDGPU_SDMA_IRQ_TRAP1);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 2086e7e..09ed842 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -3261,21 +3261,21 @@ static int gfx_v6_0_sw_init(void *handle)
 	r = gfx_v6_0_rlc_init(adev);
 	if (r) {
 		DRM_ERROR("Failed to init rlc BOs!\n");
 		return r;
 	}
 
 	for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
 		ring = &adev->gfx.gfx_ring[i];
 		ring->ring_obj = NULL;
 		sprintf(ring->name, "gfx");
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_GFX, 1024,
 				     &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		unsigned irq_type;
 
 		if ((i >= 32) || (i >= AMDGPU_MAX_COMPUTE_RINGS)) {
 			DRM_ERROR("Too many (%d) compute rings!\n", i);
@@ -3283,21 +3283,21 @@ static int gfx_v6_0_sw_init(void *handle)
 		}
 		ring = &adev->gfx.compute_ring[i];
 		ring->ring_obj = NULL;
 		ring->use_doorbell = false;
 		ring->doorbell_index = 0;
 		ring->me = 1;
 		ring->pipe = i;
 		ring->queue = i;
 		sprintf(ring->name, "comp %d.%d.%d", ring->me, ring->pipe, ring->queue);
 		irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP + ring->pipe;
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_COMPUTE, 1024,
 				     &adev->gfx.eop_irq, irq_type);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
 
 static int gfx_v6_0_sw_fini(void *handle)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index b0b0c89..c76dcc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -4742,21 +4742,21 @@ static int gfx_v7_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
 	ring->ring_obj = NULL;
 	ring->use_doorbell = true;
 	ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
 	sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
 
 	irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
 		+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
 		+ ring->pipe;
 
 	/* type-2 packets are deprecated on MEC, use type-3 instead */
-	r = amdgpu_ring_init(adev, ring, 1024,
+	r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_COMPUTE, 1024,
 			&adev->gfx.eop_irq, irq_type);
 	if (r)
 		return r;
 
 
 	return 0;
 }
 
 static int gfx_v7_0_sw_init(void *handle)
 {
@@ -4797,21 +4797,21 @@ static int gfx_v7_0_sw_init(void *handle)
 	r = gfx_v7_0_mec_init(adev);
 	if (r) {
 		DRM_ERROR("Failed to init MEC BOs!\n");
 		return r;
 	}
 
 	for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
 		ring = &adev->gfx.gfx_ring[i];
 		ring->ring_obj = NULL;
 		sprintf(ring->name, "gfx");
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_GFX, 1024,
 				     &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
 	/* set up the compute queues - allocate horizontally across pipes */
 	ring_id = 0;
 	for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; ++i) {
 		for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
 			for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 5db5bac..a778d58 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1392,21 +1392,21 @@ static int gfx_v8_0_kiq_init_ring(struct amdgpu_device *adev,
 		ring->me = 2;
 		ring->pipe = 0;
 	} else {
 		ring->me = 1;
 		ring->pipe = 1;
 	}
 
 	irq->data = ring;
 	ring->queue = 0;
 	sprintf(ring->name, "kiq %d.%d.%d", ring->me, ring->pipe, ring->queue);
-	r = amdgpu_ring_init(adev, ring, 1024,
+	r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_KIQ, 1024,
 			     irq, AMDGPU_CP_KIQ_IRQ_DRIVER0);
 	if (r)
 		dev_warn(adev->dev, "(%d) failed to init kiq ring\n", r);
 
 	return r;
 }
 
 static void gfx_v8_0_kiq_free_ring(struct amdgpu_ring *ring,
 				   struct amdgpu_irq_src *irq)
 {
@@ -2139,21 +2139,21 @@ static int gfx_v8_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
 	ring->ring_obj = NULL;
 	ring->use_doorbell = true;
 	ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
 	sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
 
 	irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
 		+ ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
 		+ ring->pipe;
 
 	/* type-2 packets are deprecated on MEC, use type-3 instead */
-	r = amdgpu_ring_init(adev, ring, 1024,
+	r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_COMPUTE, 1024,
 			&adev->gfx.eop_irq, irq_type);
 	if (r)
 		return r;
 
 
 	return 0;
 }
 
 static int gfx_v8_0_sw_init(void *handle)
 {
@@ -2219,22 +2219,22 @@ static int gfx_v8_0_sw_init(void *handle)
 	for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
 		ring = &adev->gfx.gfx_ring[i];
 		ring->ring_obj = NULL;
 		sprintf(ring->name, "gfx");
 		/* no gfx doorbells on iceland */
 		if (adev->asic_type != CHIP_TOPAZ) {
 			ring->use_doorbell = true;
 			ring->doorbell_index = AMDGPU_DOORBELL_GFX_RING0;
 		}
 
-		r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
-				     AMDGPU_CP_IRQ_GFX_EOP);
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_GFX, 1024,
+				     &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
 		if (r)
 			return r;
 	}
 
 	/* set up the compute queues - allocate horizontally across pipes */
 	ring_id = 0;
 	for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; ++i) {
 		for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
 			for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
 
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 896be64..62c3461 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -941,21 +941,21 @@ static int sdma_v2_4_sw_init(void *handle)
 	if (r) {
 		DRM_ERROR("Failed to load sdma firmware!\n");
 		return r;
 	}
 
 	for (i = 0; i < adev->sdma.num_instances; i++) {
 		ring = &adev->sdma.instance[i].ring;
 		ring->ring_obj = NULL;
 		ring->use_doorbell = false;
 		sprintf(ring->name, "sdma%d", i);
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
 				     &adev->sdma.trap_irq,
 				     (i == 0) ?
 				     AMDGPU_SDMA_IRQ_TRAP0 :
 				     AMDGPU_SDMA_IRQ_TRAP1);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 31375bd..7467a1e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -1159,21 +1159,21 @@ static int sdma_v3_0_sw_init(void *handle)
 	}
 
 	for (i = 0; i < adev->sdma.num_instances; i++) {
 		ring = &adev->sdma.instance[i].ring;
 		ring->ring_obj = NULL;
 		ring->use_doorbell = true;
 		ring->doorbell_index = (i == 0) ?
 			AMDGPU_DOORBELL_sDMA_ENGINE0 : AMDGPU_DOORBELL_sDMA_ENGINE1;
 
 		sprintf(ring->name, "sdma%d", i);
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
 				     &adev->sdma.trap_irq,
 				     (i == 0) ?
 				     AMDGPU_SDMA_IRQ_TRAP0 :
 				     AMDGPU_SDMA_IRQ_TRAP1);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
index 3372a07..64d22d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
+++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
@@ -523,21 +523,21 @@ static int si_dma_sw_init(void *handle)
 	/* DMA1 trap event */
 	r = amdgpu_irq_add_id(adev, 244, &adev->sdma.trap_irq_1);
 	if (r)
 		return r;
 
 	for (i = 0; i < adev->sdma.num_instances; i++) {
 		ring = &adev->sdma.instance[i].ring;
 		ring->ring_obj = NULL;
 		ring->use_doorbell = false;
 		sprintf(ring->name, "sdma%d", i);
-		r = amdgpu_ring_init(adev, ring, 1024,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
 				     &adev->sdma.trap_irq,
 				     (i == 0) ?
 				     AMDGPU_SDMA_IRQ_TRAP0 :
 				     AMDGPU_SDMA_IRQ_TRAP1);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
index b34cefc..9df30ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
@@ -114,21 +114,22 @@ static int uvd_v4_2_sw_init(void *handle)
 	r = amdgpu_uvd_sw_init(adev);
 	if (r)
 		return r;
 
 	r = amdgpu_uvd_resume(adev);
 	if (r)
 		return r;
 
 	ring = &adev->uvd.ring;
 	sprintf(ring->name, "uvd");
-	r = amdgpu_ring_init(adev, ring, 512, &adev->uvd.irq, 0);
+	r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_UVD, 512,
+			     &adev->uvd.irq, 0);
 
 	return r;
 }
 
 static int uvd_v4_2_sw_fini(void *handle)
 {
 	int r;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	r = amdgpu_uvd_suspend(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
index ad8c02e..9b4017f 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
@@ -110,21 +110,22 @@ static int uvd_v5_0_sw_init(void *handle)
 	r = amdgpu_uvd_sw_init(adev);
 	if (r)
 		return r;
 
 	r = amdgpu_uvd_resume(adev);
 	if (r)
 		return r;
 
 	ring = &adev->uvd.ring;
 	sprintf(ring->name, "uvd");
-	r = amdgpu_ring_init(adev, ring, 512, &adev->uvd.irq, 0);
+	r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_UVD, 512,
+			     &adev->uvd.irq, 0);
 
 	return r;
 }
 
 static int uvd_v5_0_sw_fini(void *handle)
 {
 	int r;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	r = amdgpu_uvd_suspend(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 18a6de4..de9cce1 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -113,21 +113,22 @@ static int uvd_v6_0_sw_init(void *handle)
 	r = amdgpu_uvd_sw_init(adev);
 	if (r)
 		return r;
 
 	r = amdgpu_uvd_resume(adev);
 	if (r)
 		return r;
 
 	ring = &adev->uvd.ring;
 	sprintf(ring->name, "uvd");
-	r = amdgpu_ring_init(adev, ring, 512, &adev->uvd.irq, 0);
+	r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_UVD, 512,
+			     &adev->uvd.irq, 0);
 
 	return r;
 }
 
 static int uvd_v6_0_sw_fini(void *handle)
 {
 	int r;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
 	r = amdgpu_uvd_suspend(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
index 9ea9934..38cd52b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
@@ -438,21 +438,21 @@ static int vce_v2_0_sw_init(void *handle)
 	if (r)
 		return r;
 
 	r = amdgpu_vce_resume(adev);
 	if (r)
 		return r;
 
 	for (i = 0; i < adev->vce.num_rings; i++) {
 		ring = &adev->vce.ring[i];
 		sprintf(ring->name, "vce%d", i);
-		r = amdgpu_ring_init(adev, ring, 512,
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_VCE, 512,
 				     &adev->vce.irq, 0);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
 
 static int vce_v2_0_sw_fini(void *handle)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 93ec881..09d04e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -396,21 +396,22 @@ static int vce_v3_0_sw_init(void *handle)
 	if (adev->vce.fw_version < FW_52_8_3)
 		adev->vce.num_rings = 2;
 
 	r = amdgpu_vce_resume(adev);
 	if (r)
 		return r;
 
 	for (i = 0; i < adev->vce.num_rings; i++) {
 		ring = &adev->vce.ring[i];
 		sprintf(ring->name, "vce%d", i);
-		r = amdgpu_ring_init(adev, ring, 512, &adev->vce.irq, 0);
+		r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_VCE, 512,
+				     &adev->vce.irq, 0);
 		if (r)
 			return r;
 	}
 
 	return r;
 }
 
 static int vce_v3_0_sw_fini(void *handle)
 {
 	int r;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 5797283..b5ae774 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -376,21 +376,22 @@ struct drm_amdgpu_gem_va {
 	__u64 offset_in_bo;
 	/** Specify mapping size. Must be correctly aligned. */
 	__u64 map_size;
 };
 
 #define AMDGPU_HW_IP_GFX          0
 #define AMDGPU_HW_IP_COMPUTE      1
 #define AMDGPU_HW_IP_DMA          2
 #define AMDGPU_HW_IP_UVD          3
 #define AMDGPU_HW_IP_VCE          4
-#define AMDGPU_HW_IP_NUM          5
+#define AMDGPU_HW_IP_KIQ          5
+#define AMDGPU_HW_IP_NUM          6
 
 #define AMDGPU_HW_IP_INSTANCE_MAX_COUNT 1
 
 #define AMDGPU_CHUNK_ID_IB		0x01
 #define AMDGPU_CHUNK_ID_FENCE		0x02
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
 	__u32		length_dw;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 16/22] drm/amdgpu: add a mechanism to untie user ring ids from kernel ring ids
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 15/22] drm/amdgpu: add hw_ip member to amdgpu_ring Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 17/22] drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute Andres Rodriguez
                     ` (7 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Add amdgpu_queue_mgr, a mechanism that allows disjointing usermode's
ring ids from the kernel's ring ids.

The queue manager maintains a per-file descriptor map of user ring ids
to amdgpu_ring pointers. Once a map is created it is permanent (this is
required to maintain FIFO execution guarantees for a ring).

Different queue map policies can be configured for each HW IP.
Currently all HW IPs use the identity mapper, i.e. kernel ring id is
equal to the user ring id.

The purpose of this mechanism is to distribute the load across multiple
queues more effectively for HW IPs that support multiple rings.
Userspace clients are unable to check whether a specific resource is in
use by a different client. Therefore, it is up to the kernel driver to
make the optimal choice.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile           |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  31 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  70 ++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 157 ++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.h |  75 ++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c      |  45 ++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |   2 +
 8 files changed, 333 insertions(+), 52 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2814aad..0081d0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -17,21 +17,21 @@ amdgpu-y := amdgpu_drv.o
 amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 	amdgpu_atombios.o atombios_crtc.o amdgpu_connectors.o \
 	atom.o amdgpu_fence.o amdgpu_ttm.o amdgpu_object.o amdgpu_gart.o \
 	amdgpu_encoders.o amdgpu_display.o amdgpu_i2c.o \
 	amdgpu_fb.o amdgpu_gem.o amdgpu_ring.o \
 	amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o amdgpu_test.o \
 	amdgpu_pm.o atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
 	atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
 	amdgpu_prime.o amdgpu_vm.o amdgpu_ib.o amdgpu_pll.o \
 	amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
-	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o
+	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_queue_mgr.o
 
 # add asic specific block
 amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o kv_smc.o kv_dpm.o \
 	ci_smc.o ci_dpm.o dce_v8_0.o gfx_v7_0.o cik_sdma.o uvd_v4_2.o vce_v2_0.o \
 	amdgpu_amdkfd_gfx_v7.o
 
 amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o si_ih.o si_dma.o dce_v6_0.o si_dpm.o si_smc.o
 
 amdgpu-y += \
 	vi.o mxgpu_vi.o
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 88dad81..67b33aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -689,28 +689,54 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
 struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
 				   struct amdgpu_ring *ring, uint64_t seq);
 
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 		     struct drm_file *filp);
 
 void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
 void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
 
 /*
+ * Queue manager related structures
+ */
+struct amdgpu_queue_mapper;
+
+struct amdgpu_queue_mapper_funcs {
+	/* map a userspace ring id to a kernel ring id */
+	int (*map)(struct amdgpu_device *adev,
+		   struct amdgpu_queue_mapper *mapper,
+		   int ring,
+		   struct amdgpu_ring **out_ring);
+};
+
+struct amdgpu_queue_mapper {
+	struct amdgpu_queue_mapper_funcs *funcs;
+	int 		hw_ip;
+	struct mutex	lock;
+	/* protected by lock */
+	struct amdgpu_ring *queue_map[AMDGPU_MAX_RINGS];
+};
+
+struct amdgpu_queue_mgr {
+	struct amdgpu_queue_mapper mapper[AMDGPU_MAX_IP_NUM];
+};
+
+/*
  * file private structure
  */
 
 struct amdgpu_fpriv {
 	struct amdgpu_vm	vm;
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	struct amdgpu_queue_mgr queue_mgr;
 };
 
 /*
  * residency list
  */
 
 struct amdgpu_bo_list {
 	struct mutex lock;
 	struct amdgpu_bo *gds_obj;
 	struct amdgpu_bo *gws_obj;
@@ -1720,22 +1746,23 @@ static inline bool amdgpu_is_mec_queue_enabled(struct amdgpu_device *adev,
 #define amdgpu_gds_switch(adev, r, v, d, w, a) (adev)->gds.funcs->patch_gds_switch((r), (v), (d), (w), (a))
 
 /* Common functions */
 int amdgpu_gpu_reset(struct amdgpu_device *adev);
 bool amdgpu_need_backup(struct amdgpu_device *adev);
 void amdgpu_pci_config_reset(struct amdgpu_device *adev);
 bool amdgpu_card_posted(struct amdgpu_device *adev);
 void amdgpu_update_display_priority(struct amdgpu_device *adev);
 
 int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data);
-int amdgpu_cs_get_ring(struct amdgpu_device *adev, u32 ip_type,
-		       u32 ip_instance, u32 ring,
+int amdgpu_cs_get_ring(struct amdgpu_device *adev,
+		       struct amdgpu_queue_mgr *mgr,
+		       u32 ip_type, u32 ip_instance, u32 user_ring,
 		       struct amdgpu_ring **out_ring);
 void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes);
 void amdgpu_ttm_placement_from_domain(struct amdgpu_bo *abo, u32 domain);
 bool amdgpu_ttm_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
 int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages);
 int amdgpu_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr,
 				     uint32_t flags);
 bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
 struct mm_struct *amdgpu_ttm_tt_get_usermm(struct ttm_tt *ttm);
 bool amdgpu_ttm_tt_affect_userptr(struct ttm_tt *ttm, unsigned long start,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 57301f5..605d40e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -22,74 +22,42 @@
  * DEALINGS IN THE SOFTWARE.
  *
  * Authors:
  *    Jerome Glisse <glisse@freedesktop.org>
  */
 #include <linux/pagemap.h>
 #include <drm/drmP.h>
 #include <drm/amdgpu_drm.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_queue_mgr.h"
 
-int amdgpu_cs_get_ring(struct amdgpu_device *adev, u32 ip_type,
-		       u32 ip_instance, u32 ring,
+int amdgpu_cs_get_ring(struct amdgpu_device *adev,
+		       struct amdgpu_queue_mgr *mgr,
+		       u32 ip_type, u32 ip_instance, u32 user_ring,
 		       struct amdgpu_ring **out_ring)
 {
+	int r;
+
 	/* Right now all IPs have only one instance - multiple rings. */
 	if (ip_instance != 0) {
 		DRM_ERROR("invalid ip instance: %d\n", ip_instance);
 		return -EINVAL;
 	}
 
-	switch (ip_type) {
-	default:
-		DRM_ERROR("unknown ip type: %d\n", ip_type);
-		return -EINVAL;
-	case AMDGPU_HW_IP_GFX:
-		if (ring < adev->gfx.num_gfx_rings) {
-			*out_ring = &adev->gfx.gfx_ring[ring];
-		} else {
-			DRM_ERROR("only %d gfx rings are supported now\n",
-				  adev->gfx.num_gfx_rings);
-			return -EINVAL;
-		}
-		break;
-	case AMDGPU_HW_IP_COMPUTE:
-		if (ring < adev->gfx.num_compute_rings) {
-			*out_ring = &adev->gfx.compute_ring[ring];
-		} else {
-			DRM_ERROR("only %d compute rings are supported now\n",
-				  adev->gfx.num_compute_rings);
-			return -EINVAL;
-		}
-		break;
-	case AMDGPU_HW_IP_DMA:
-		if (ring < adev->sdma.num_instances) {
-			*out_ring = &adev->sdma.instance[ring].ring;
-		} else {
-			DRM_ERROR("only %d SDMA rings are supported\n",
-				  adev->sdma.num_instances);
-			return -EINVAL;
-		}
-		break;
-	case AMDGPU_HW_IP_UVD:
-		*out_ring = &adev->uvd.ring;
-		break;
-	case AMDGPU_HW_IP_VCE:
-		if (ring < adev->vce.num_rings){
-			*out_ring = &adev->vce.ring[ring];
-		} else {
-			DRM_ERROR("only %d VCE rings are supported\n", adev->vce.num_rings);
-			return -EINVAL;
-		}
-		break;
+	r = amdgpu_queue_mgr_map(adev, mgr, ip_type, user_ring, out_ring);
+	if (r) {
+		DRM_ERROR("unable to map userspace ip:%d ring:%d to kernel ring\n",
+				ip_type, user_ring);
+		return r;
 	}
+
 	return 0;
 }
 
 static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,
 				      struct drm_amdgpu_cs_chunk_fence *data,
 				      uint32_t *offset)
 {
 	struct drm_gem_object *gobj;
 	unsigned long size;
 
@@ -868,21 +836,21 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
 		struct drm_amdgpu_cs_chunk_ib *chunk_ib;
 		struct amdgpu_ring *ring;
 
 		chunk = &parser->chunks[i];
 		ib = &parser->job->ibs[j];
 		chunk_ib = (struct drm_amdgpu_cs_chunk_ib *)chunk->kdata;
 
 		if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB)
 			continue;
 
-		r = amdgpu_cs_get_ring(adev, chunk_ib->ip_type,
+		r = amdgpu_cs_get_ring(adev, &fpriv->queue_mgr, chunk_ib->ip_type,
 				       chunk_ib->ip_instance, chunk_ib->ring,
 				       &ring);
 		if (r)
 			return r;
 
 		if (ib->flags & AMDGPU_IB_FLAG_PREAMBLE) {
 			parser->job->preamble_status |= AMDGPU_PREAMBLE_IB_PRESENT;
 			if (!parser->ctx->preamble_presented) {
 				parser->job->preamble_status |= AMDGPU_PREAMBLE_IB_PRESENT_FIRST;
 				parser->ctx->preamble_presented = true;
@@ -972,21 +940,22 @@ static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
 
 		deps = (struct drm_amdgpu_cs_chunk_dep *)chunk->kdata;
 		num_deps = chunk->length_dw * 4 /
 			sizeof(struct drm_amdgpu_cs_chunk_dep);
 
 		for (j = 0; j < num_deps; ++j) {
 			struct amdgpu_ring *ring;
 			struct amdgpu_ctx *ctx;
 			struct dma_fence *fence;
 
-			r = amdgpu_cs_get_ring(adev, deps[j].ip_type,
+			r = amdgpu_cs_get_ring(adev, &fpriv->queue_mgr,
+					       deps[j].ip_type,
 					       deps[j].ip_instance,
 					       deps[j].ring, &ring);
 			if (r)
 				return r;
 
 			ctx = amdgpu_ctx_get(fpriv, deps[j].ctx_id);
 			if (ctx == NULL)
 				return -EINVAL;
 
 			fence = amdgpu_ctx_get_fence(ctx, ring,
@@ -1099,29 +1068,31 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
  *
  * @dev: drm device
  * @data: data from userspace
  * @filp: file private
  *
  * Wait for the command submission identified by handle to finish.
  */
 int amdgpu_cs_wait_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *filp)
 {
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	union drm_amdgpu_wait_cs *wait = data;
 	struct amdgpu_device *adev = dev->dev_private;
 	unsigned long timeout = amdgpu_gem_timeout(wait->in.timeout);
 	struct amdgpu_ring *ring = NULL;
 	struct amdgpu_ctx *ctx;
 	struct dma_fence *fence;
 	long r;
 
-	r = amdgpu_cs_get_ring(adev, wait->in.ip_type, wait->in.ip_instance,
+	r = amdgpu_cs_get_ring(adev, &fpriv->queue_mgr,
+			       wait->in.ip_type, wait->in.ip_instance,
 			       wait->in.ring, &ring);
 	if (r)
 		return r;
 
 	ctx = amdgpu_ctx_get(filp->driver_priv, wait->in.ctx_id);
 	if (ctx == NULL)
 		return -EINVAL;
 
 	fence = amdgpu_ctx_get_fence(ctx, ring, wait->in.handle);
 	if (IS_ERR(fence))
@@ -1149,24 +1120,25 @@ int amdgpu_cs_wait_ioctl(struct drm_device *dev, void *data,
  * @filp: file private
  * @user: drm_amdgpu_fence copied from user space
  */
 static struct dma_fence *amdgpu_cs_get_fence(struct amdgpu_device *adev,
 					     struct drm_file *filp,
 					     struct drm_amdgpu_fence *user)
 {
 	struct amdgpu_ring *ring;
 	struct amdgpu_ctx *ctx;
 	struct dma_fence *fence;
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 	int r;
 
-	r = amdgpu_cs_get_ring(adev, user->ip_type, user->ip_instance,
-			       user->ring, &ring);
+	r = amdgpu_cs_get_ring(adev, &fpriv->queue_mgr, user->ip_type,
+			       user->ip_instance, user->ring, &ring);
 	if (r)
 		return ERR_PTR(r);
 
 	ctx = amdgpu_ctx_get(filp->driver_priv, user->ctx_id);
 	if (ctx == NULL)
 		return ERR_PTR(-EINVAL);
 
 	fence = amdgpu_ctx_get_fence(ctx, ring, user->seq_no);
 	amdgpu_ctx_put(ctx);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 61d94c7..0932ade 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -23,20 +23,21 @@
  *
  * Authors: Dave Airlie
  *          Alex Deucher
  *          Jerome Glisse
  */
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include <drm/amdgpu_drm.h>
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
+#include "amdgpu_queue_mgr.h"
 
 #include <linux/vga_switcheroo.h>
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
 #include "amdgpu_amdkfd.h"
 
 #if defined(CONFIG_VGA_SWITCHEROO)
 bool amdgpu_has_atpx(void);
 #else
 static inline bool amdgpu_has_atpx(void) { return false; }
@@ -658,20 +659,21 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 	if (amdgpu_sriov_vf(adev)) {
 		r = amdgpu_map_static_csa(adev, &fpriv->vm);
 		if (r)
 			goto out_suspend;
 	}
 
 	mutex_init(&fpriv->bo_list_lock);
 	idr_init(&fpriv->bo_list_handles);
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
+	amdgpu_queue_mgr_init(adev, &fpriv->queue_mgr);
 
 	file_priv->driver_priv = fpriv;
 
 out_suspend:
 	pm_runtime_mark_last_busy(dev->dev);
 	pm_runtime_put_autosuspend(dev->dev);
 
 	return r;
 }
 
@@ -687,20 +689,21 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 				 struct drm_file *file_priv)
 {
 	struct amdgpu_device *adev = dev->dev_private;
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_bo_list *list;
 	int handle;
 
 	if (!fpriv)
 		return;
 
+	amdgpu_queue_mgr_fini(adev, &fpriv->queue_mgr);
 	amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
 
 	amdgpu_uvd_free_handles(adev, file_priv);
 	amdgpu_vce_free_handles(adev, file_priv);
 
 	if (amdgpu_sriov_vf(adev)) {
 		/* TODO: how to handle reserve failure */
 		BUG_ON(amdgpu_bo_reserve(adev->virt.csa_obj, false));
 		amdgpu_vm_bo_rmv(adev, fpriv->vm.csa_bo_va);
 		fpriv->vm.csa_bo_va = NULL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
new file mode 100644
index 0000000..3918bdb
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
@@ -0,0 +1,157 @@
+/*
+ * Copyright 2017 Valve Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Andres Rodriguez
+ */
+
+#include "amdgpu_ring.h"
+#include "amdgpu_queue_mgr.h"
+
+static int amdgpu_queue_mapper_init(struct amdgpu_queue_mapper *mapper,
+				    int hw_ip,
+				    struct amdgpu_queue_mapper_funcs *funcs)
+{
+	if (!mapper || !funcs)
+		return -EINVAL;
+
+	if (hw_ip > AMDGPU_MAX_IP_NUM)
+		return -EINVAL;
+
+	mapper->hw_ip = hw_ip;
+	mapper->funcs = funcs;
+	mutex_init(&mapper->lock);
+
+	memset(mapper->queue_map, 0, sizeof(mapper->queue_map));
+
+	return 0;
+}
+
+static struct amdgpu_ring *get_cached_map(struct amdgpu_queue_mapper *mapper,
+					  int ring)
+{
+	return mapper->queue_map[ring];
+}
+
+static int update_cached_map(struct amdgpu_queue_mapper *mapper,
+			     int ring, struct amdgpu_ring *pring)
+{
+	if (WARN_ON(mapper->queue_map[ring])) {
+		DRM_ERROR("Un-expected ring re-map\n");
+		return -EINVAL;
+	}
+
+	mapper->queue_map[ring] = pring;
+
+	return 0;
+}
+
+static int amdgpu_identity_map(struct amdgpu_device *adev,
+			       struct amdgpu_queue_mapper *mapper,
+			       int ring,
+			       struct amdgpu_ring **out_ring)
+{
+	switch (mapper->hw_ip) {
+	case AMDGPU_HW_IP_GFX:
+		*out_ring = &adev->gfx.gfx_ring[ring];
+		break;
+	case AMDGPU_HW_IP_COMPUTE:
+		*out_ring = &adev->gfx.compute_ring[ring];
+		break;
+	case AMDGPU_HW_IP_DMA:
+		*out_ring = &adev->sdma.instance[ring].ring;
+		break;
+	case AMDGPU_HW_IP_UVD:
+		*out_ring = &adev->uvd.ring;
+		break;
+	case AMDGPU_HW_IP_VCE:
+		*out_ring = &adev->vce.ring[ring];
+		break;
+	default:
+		*out_ring = NULL;
+		DRM_ERROR("unknown HW IP type: %d\n", mapper->hw_ip);
+		return -EINVAL;
+	}
+
+	return update_cached_map(mapper, ring, *out_ring);
+}
+
+static struct amdgpu_queue_mapper_funcs identity_mapper = {
+	.map = amdgpu_identity_map
+};
+
+int amdgpu_queue_mgr_init(struct amdgpu_device *adev,
+			  struct amdgpu_queue_mgr *mgr)
+{
+	int i;
+
+	if (!adev || !mgr)
+		return -EINVAL;
+
+	memset(mgr, 0, sizeof(*mgr));
+
+	for (i = 0; i < AMDGPU_MAX_IP_NUM; ++i)
+		amdgpu_queue_mapper_init(&mgr->mapper[i], i, &identity_mapper);
+
+	return 0;
+}
+
+int amdgpu_queue_mgr_fini(struct amdgpu_device *adev,
+			  struct amdgpu_queue_mgr *mgr)
+{
+	return 0;
+}
+
+int amdgpu_queue_mgr_map(struct amdgpu_device *adev,
+			 struct amdgpu_queue_mgr *mgr,
+			 int hw_ip, int ring,
+			 struct amdgpu_ring **out_ring)
+{
+	int r;
+	struct amdgpu_queue_mapper *mapper = &mgr->mapper[hw_ip];
+
+	if (!adev || !mgr || !out_ring)
+		return -EINVAL;
+
+	if (hw_ip >= AMDGPU_MAX_IP_NUM)
+		return -EINVAL;
+
+	if (ring >= AMDGPU_MAX_RINGS)
+		return -EINVAL;
+
+	r = amdgpu_ring_is_valid_index(adev, hw_ip, ring);
+	if (r)
+		return r;
+
+	mutex_lock(&mapper->lock);
+
+	*out_ring = get_cached_map(mapper, ring);
+	if (*out_ring) {
+		/* cache hit */
+		r = 0;
+		goto out_unlock;
+	}
+
+	r = mapper->funcs->map(adev, mapper, ring, out_ring);
+
+out_unlock:
+	mutex_unlock(&mapper->lock);
+	return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.h
new file mode 100644
index 0000000..a85bb32
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright 2017 Valve Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ * 	Andres Rodriguez <andresx7@gmail.com>
+ */
+
+#ifndef __AMDGPU_QUEUE_MGR_H__
+#define __AMDGPU_QUEUE_MGR_H__
+
+#include "amdgpu.h"
+
+/**
+ * amdgpu_queue_mgr_init - init an amdgpu_queue_mgr struct
+ *
+ * @adev: amdgpu_device pointer
+ * @mgr: amdgpu_queue_mgr structure holding queue information
+ *
+ * Initialize the the selected @mgr (all asics).
+ *
+ * Returns 0 on success, error on failure.
+ */
+int amdgpu_queue_mgr_init(struct amdgpu_device *adev,
+			  struct amdgpu_queue_mgr *mgr);
+
+/**
+ * amdgpu_queue_mgr_fini - de-initialize an amdgpu_queue_mgr struct
+ *
+ * @adev: amdgpu_device pointer
+ * @mgr: amdgpu_queue_mgr structure holding queue information
+ *
+ * De-initialize the the selected @mgr (all asics).
+ *
+ * Returns 0 on success, error on failure.
+ */
+int amdgpu_queue_mgr_fini(struct amdgpu_device *adev,
+			  struct amdgpu_queue_mgr *mgr);
+
+/**
+ * amdgpu_queue_mgr_map - Map a userspace ring id to an amdgpu_ring
+ *
+ * @adev: amdgpu_device pointer
+ * @mgr: amdgpu_queue_mgr structure holding queue information
+ * @hw_ip: HW IP enum
+ * @ring: user ring id
+ * @our_ring: pointer to mapped amdgpu_ring
+ *
+ * Map a userspace ring id to an appropriate kernel ring. Different
+ * policies are configurable at a HW IP level.
+ *
+ * Returns 0 on success, error on failure.
+ */
+int amdgpu_queue_mgr_map(struct amdgpu_device *adev,
+			 struct amdgpu_queue_mgr *mgr,
+			 int hw_ip, int ring,
+			 struct amdgpu_ring **out_ring);
+#endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 4ff762c..a04f07d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -45,20 +45,65 @@
  * pointers are equal, the ring is idle.  When the host
  * writes commands to the ring buffer, it increments the
  * wptr.  The GPU then starts fetching commands and executes
  * them until the pointers are equal again.
  */
 static int amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
 				    struct amdgpu_ring *ring);
 static void amdgpu_debugfs_ring_fini(struct amdgpu_ring *ring);
 
 /**
+ * amdgpu_ring_is_valid_index - check if a ring idex is valid for a HW IP
+ *
+ * @adev: amdgpu_device pointer
+ * @ip_type: The HW IP to check against
+ * @ring: the ring index
+ *
+ * Check if @ring is a valid index for @ip_type (all asics).
+ * Returns 0 on success, error on failure.
+ */
+int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
+			       int ip_type, int ring)
+{
+	int ip_num_rings;
+
+	switch (ip_type) {
+	case AMDGPU_HW_IP_GFX:
+		ip_num_rings = adev->gfx.num_gfx_rings;
+		break;
+	case AMDGPU_HW_IP_COMPUTE:
+		ip_num_rings = adev->gfx.num_compute_rings;
+		break;
+	case AMDGPU_HW_IP_DMA:
+		ip_num_rings = adev->sdma.num_instances;
+		break;
+	case AMDGPU_HW_IP_UVD:
+		ip_num_rings = 1;
+		break;
+	case AMDGPU_HW_IP_VCE:
+		ip_num_rings = adev->vce.num_rings;
+		break;
+	default:
+		DRM_ERROR("unknown ip type: %d\n", ip_type);
+		return -EINVAL;
+	}
+
+	if (ring >= ip_num_rings) {
+		DRM_ERROR("Ring index:%d exceeds maximum:%d for ip:%d\n",
+				ring, ip_num_rings, ip_type);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
  * amdgpu_ring_alloc - allocate space on the ring buffer
  *
  * @adev: amdgpu_device pointer
  * @ring: amdgpu_ring structure holding ring information
  * @ndw: number of dwords to allocate in the ring buffer
  *
  * Allocate @ndw dwords in the ring buffer (all asics).
  * Returns 0 on success, error on failure.
  */
 int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 3ff021f..731c422 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -169,20 +169,22 @@ struct amdgpu_ring {
 	uint64_t		current_ctx;
 	char			name[16];
 	unsigned		cond_exe_offs;
 	u64			cond_exe_gpu_addr;
 	volatile u32		*cond_exe_cpu_addr;
 #if defined(CONFIG_DEBUG_FS)
 	struct dentry *ent;
 #endif
 };
 
+int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
+			       int hw_ip, int ring);
 int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
 void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
 void amdgpu_ring_commit(struct amdgpu_ring *ring);
 void amdgpu_ring_undo(struct amdgpu_ring *ring);
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 		     int hw_ip, unsigned ring_size,
 		     struct amdgpu_irq_src *irq_src, unsigned irq_type);
 void amdgpu_ring_fini(struct amdgpu_ring *ring);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 17/22] drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (15 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 16/22] drm/amdgpu: add a mechanism to untie user ring ids from kernel ring ids Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4 Andres Rodriguez
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Use an LRU policy to map usermode rings to HW compute queues.

Most compute clients use one queue, and usually the first queue
available. This results in poor pipe/queue work distribution when
multiple compute apps are running. In most cases pipe 0 queue 0 is
the only queue that gets used.

In order to better distribute work across multiple HW queues, we adopt
a policy to map the usermode ring ids to the LRU HW queue.

This fixes a large majority of multi-app compute workloads sharing the
same HW queue, even though 7 other queues are available.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 32 ++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c      | 57 +++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |  4 ++
 5 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 67b33aa..e30c47e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1510,20 +1510,23 @@ struct amdgpu_device {
 	struct kfd_dev          *kfd;
 
 	struct amdgpu_virt	virt;
 
 	/* link all shadow bo */
 	struct list_head                shadow_list;
 	struct mutex                    shadow_list_lock;
 	/* link all gtt */
 	spinlock_t			gtt_list_lock;
 	struct list_head                gtt_list;
+	/* keep an lru list of rings by HW IP */
+	struct list_head                ring_lru_list;
+	struct mutex                    ring_lru_list_lock;
 
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
 {
 	return container_of(bdev, struct amdgpu_device, mman.bdev);
 }
 
 bool amdgpu_device_is_px(struct drm_device *dev);
 int amdgpu_device_init(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 944ba0d..1fb1303 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1708,20 +1708,23 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	spin_lock_init(&adev->gc_cac_idx_lock);
 	spin_lock_init(&adev->audio_endpt_idx_lock);
 	spin_lock_init(&adev->mm_stats.lock);
 
 	INIT_LIST_HEAD(&adev->shadow_list);
 	mutex_init(&adev->shadow_list_lock);
 
 	INIT_LIST_HEAD(&adev->gtt_list);
 	spin_lock_init(&adev->gtt_list_lock);
 
+	INIT_LIST_HEAD(&adev->ring_lru_list);
+	mutex_init(&adev->ring_lru_list_lock);
+
 	if (adev->asic_type >= CHIP_BONAIRE) {
 		adev->rmmio_base = pci_resource_start(adev->pdev, 5);
 		adev->rmmio_size = pci_resource_len(adev->pdev, 5);
 	} else {
 		adev->rmmio_base = pci_resource_start(adev->pdev, 2);
 		adev->rmmio_size = pci_resource_len(adev->pdev, 2);
 	}
 
 	adev->rmmio = ioremap(adev->rmmio_base, adev->rmmio_size);
 	if (adev->rmmio == NULL) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
index 3918bdb..e4c6ac3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
@@ -90,32 +90,60 @@ static int amdgpu_identity_map(struct amdgpu_device *adev,
 		return -EINVAL;
 	}
 
 	return update_cached_map(mapper, ring, *out_ring);
 }
 
 static struct amdgpu_queue_mapper_funcs identity_mapper = {
 	.map = amdgpu_identity_map
 };
 
+static int amdgpu_lru_map(struct amdgpu_device *adev,
+			  struct amdgpu_queue_mapper *mapper,
+			  int user_ring,
+			  struct amdgpu_ring **out_ring)
+{
+	int r;
+
+	r = amdgpu_ring_lru_get(adev, mapper->hw_ip, out_ring);
+	if (r)
+		return r;
+
+	return update_cached_map(mapper, user_ring, *out_ring);
+}
+
+static struct amdgpu_queue_mapper_funcs lru_mapper = {
+	.map = amdgpu_lru_map
+};
+
 int amdgpu_queue_mgr_init(struct amdgpu_device *adev,
 			  struct amdgpu_queue_mgr *mgr)
 {
 	int i;
 
 	if (!adev || !mgr)
 		return -EINVAL;
 
 	memset(mgr, 0, sizeof(*mgr));
 
-	for (i = 0; i < AMDGPU_MAX_IP_NUM; ++i)
-		amdgpu_queue_mapper_init(&mgr->mapper[i], i, &identity_mapper);
+	for (i = 0; i < AMDGPU_MAX_IP_NUM; ++i) {
+		switch (i) {
+		case AMDGPU_HW_IP_COMPUTE:
+			amdgpu_queue_mapper_init(&mgr->mapper[i], i,
+						 &lru_mapper);
+			break;
+		default:
+			amdgpu_queue_mapper_init(&mgr->mapper[i], i,
+						 &identity_mapper);
+			break;
+		}
+	}
 
 	return 0;
 }
 
 int amdgpu_queue_mgr_fini(struct amdgpu_device *adev,
 			  struct amdgpu_queue_mgr *mgr)
 {
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index a04f07d..80cb051 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -173,20 +173,22 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
 	count = ring->funcs->align_mask + 1 -
 		(ring->wptr & ring->funcs->align_mask);
 	count %= ring->funcs->align_mask + 1;
 	ring->funcs->insert_nop(ring, count);
 
 	mb();
 	amdgpu_ring_set_wptr(ring);
 
 	if (ring->funcs->end_use)
 		ring->funcs->end_use(ring);
+
+	amdgpu_ring_lru_touch(ring->adev, ring);
 }
 
 /**
  * amdgpu_ring_undo - reset the wptr
  *
  * @ring: amdgpu_ring structure holding ring information
  *
  * Reset the driver's copy of the wptr (all asics).
  */
 void amdgpu_ring_undo(struct amdgpu_ring *ring)
@@ -273,20 +275,22 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 					    (void **)&ring->ring);
 		if (r) {
 			dev_err(adev->dev, "(%d) ring create failed\n", r);
 			return r;
 		}
 		memset((void *)ring->ring, 0, ring->ring_size);
 	}
 	ring->ptr_mask = (ring->ring_size / 4) - 1;
 	ring->max_dw = max_dw;
 	ring->hw_ip = hw_ip;
+	INIT_LIST_HEAD(&ring->lru_list);
+	amdgpu_ring_lru_touch(adev, ring);
 
 	if (amdgpu_debugfs_ring_init(adev, ring)) {
 		DRM_ERROR("Failed to register debugfs file for rings !\n");
 	}
 	return 0;
 }
 
 /**
  * amdgpu_ring_fini - tear down the driver ring struct.
  *
@@ -306,20 +310,73 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
 
 	amdgpu_bo_free_kernel(&ring->ring_obj,
 			      &ring->gpu_addr,
 			      (void **)&ring->ring);
 
 	amdgpu_debugfs_ring_fini(ring);
 
 	ring->adev->rings[ring->idx] = NULL;
 }
 
+/**
+ * amdgpu_ring_lru_get - get the least recently used ring for a HW IP block
+ *
+ * @adev: amdgpu_device pointer
+ * @hw_ip: HW IP enum
+ * @ring: output ring
+ *
+ * Retreive the amdgpu_ring structure for the least recently used ring of
+ * a specific IP block (all asics).
+ * Returns 0 on success, error on failure.
+ */
+int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
+			struct amdgpu_ring **ring)
+{
+	struct amdgpu_ring *entry;
+
+	/* List is sorted in LRU order, find first entry corresponding
+	 * to the desired HW IP */
+	*ring = NULL;
+	mutex_lock(&adev->ring_lru_list_lock);
+	list_for_each_entry(entry, &adev->ring_lru_list, lru_list) {
+		if (entry->hw_ip == hw_ip) {
+			*ring = entry;
+			break;
+		}
+	}
+	mutex_unlock(&adev->ring_lru_list_lock);
+
+	if (!*ring) {
+		DRM_ERROR("Ring LRU contains no entries for hw ip:%d\n", hw_ip);
+		return -EINVAL;
+	}
+
+	amdgpu_ring_lru_touch(adev, entry);
+	return 0;
+}
+
+/**
+ * amdgpu_ring_lru_touch - mark a ring as recently being used
+ *
+ * @adev: amdgpu_device pointer
+ * @ring: ring to touch
+ *
+ * Move @ring to the the tail of the lru list
+ */
+void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct amdgpu_ring *ring)
+{
+	/* list_move_tail handles the case where ring isn't part of the list */
+	mutex_lock(&adev->ring_lru_list_lock);
+	list_move_tail(&ring->lru_list, &adev->ring_lru_list);
+	mutex_unlock(&adev->ring_lru_list_lock);
+}
+
 /*
  * Debugfs info
  */
 #if defined(CONFIG_DEBUG_FS)
 
 /* Layout of file is 12 bytes consisting of
  * - rptr
  * - wptr
  * - driver's copy of wptr
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 731c422..ecdd87c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -137,20 +137,21 @@ struct amdgpu_ring_funcs {
 	void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
 	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
 	void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
 };
 
 struct amdgpu_ring {
 	struct amdgpu_device		*adev;
 	const struct amdgpu_ring_funcs	*funcs;
 	struct amdgpu_fence_driver	fence_drv;
 	struct amd_gpu_scheduler	sched;
+	struct list_head		lru_list;
 
 	struct amdgpu_bo	*ring_obj;
 	volatile uint32_t	*ring;
 	unsigned		rptr_offs;
 	unsigned		wptr;
 	unsigned		wptr_old;
 	unsigned		ring_size;
 	unsigned		max_dw;
 	int			count_dw;
 	uint64_t		gpu_addr;
@@ -180,12 +181,15 @@ int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
 			       int hw_ip, int ring);
 int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
 void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
 void amdgpu_ring_commit(struct amdgpu_ring *ring);
 void amdgpu_ring_undo(struct amdgpu_ring *ring);
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 		     int hw_ip, unsigned ring_size,
 		     struct amdgpu_irq_src *irq_src, unsigned irq_type);
 void amdgpu_ring_fini(struct amdgpu_ring *ring);
+int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
+			struct amdgpu_ring **ring);
+void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct amdgpu_ring *ring);
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (16 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 17/22] drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
       [not found]     ` <1488320089-22035-19-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-28 22:14   ` [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings Andres Rodriguez
                     ` (5 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Add a new context creation parameter to express a global context priority.

Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
(default) contexts.

v2: Instead of using flags, repurpose __pad
v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
v4: Validate usermode priority and store it

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41 +++++++++++++++++++++++----
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
 include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
 4 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e30c47e..366f6d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
 	struct amd_sched_entity	entity;
 };
 
 struct amdgpu_ctx {
 	struct kref		refcount;
 	struct amdgpu_device    *adev;
 	unsigned		reset_counter;
 	spinlock_t		ring_lock;
 	struct dma_fence	**fences;
 	struct amdgpu_ctx_ring	rings[AMDGPU_MAX_RINGS];
+	int			priority;
 	bool preamble_presented;
 };
 
 struct amdgpu_ctx_mgr {
 	struct amdgpu_device	*adev;
 	struct mutex		lock;
 	/* protected by lock */
 	struct idr		ctx_handles;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 400c66b..22a15d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -18,47 +18,75 @@
  * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  * Authors: monk liu <monk.liu@amd.com>
  */
 
 #include <drm/drmP.h>
 #include "amdgpu.h"
 
-static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
+static enum amd_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
+{
+	switch (amdgpu_priority) {
+	case AMDGPU_CTX_PRIORITY_HIGH:
+		return AMD_SCHED_PRIORITY_HIGH;
+	case AMDGPU_CTX_PRIORITY_NORMAL:
+		return AMD_SCHED_PRIORITY_NORMAL;
+	default:
+		WARN(1, "Invalid context priority %d\n", amdgpu_priority);
+		return AMD_SCHED_PRIORITY_NORMAL;
+	}
+}
+
+static int amdgpu_ctx_init(struct amdgpu_device *adev,
+				uint32_t priority,
+				struct amdgpu_ctx *ctx)
 {
 	unsigned i, j;
 	int r;
+	enum amd_sched_priority sched_priority;
+
+	sched_priority = amdgpu_to_sched_priority(priority);
+
+	if (priority >= AMDGPU_CTX_PRIORITY_NUM)
+		return -EINVAL;
+
+	if (sched_priority < 0 || sched_priority >= AMD_SCHED_MAX_PRIORITY)
+		return -EINVAL;
+
+	if (sched_priority == AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
 
 	memset(ctx, 0, sizeof(*ctx));
 	ctx->adev = adev;
+	ctx->priority = priority;
 	kref_init(&ctx->refcount);
 	spin_lock_init(&ctx->ring_lock);
 	ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
 			      sizeof(struct dma_fence*), GFP_KERNEL);
 	if (!ctx->fences)
 		return -ENOMEM;
 
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		ctx->rings[i].sequence = 1;
 		ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
 	}
 
 	ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
 
 	/* create context entity for each ring */
 	for (i = 0; i < adev->num_rings; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 		struct amd_sched_rq *rq;
 
-		rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
+		rq = &ring->sched.sched_rq[sched_priority];
 		r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
 					  rq, amdgpu_sched_jobs);
 		if (r)
 			goto failed;
 	}
 
 	return 0;
 
 failed:
 	for (j = 0; j < i; j++)
@@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
 	kfree(ctx->fences);
 	ctx->fences = NULL;
 
 	for (i = 0; i < adev->num_rings; i++)
 		amd_sched_entity_fini(&adev->rings[i]->sched,
 				      &ctx->rings[i].entity);
 }
 
 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
 			    struct amdgpu_fpriv *fpriv,
+			    uint32_t priority,
 			    uint32_t *id)
 {
 	struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
 	struct amdgpu_ctx *ctx;
 	int r;
 
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 	if (!ctx)
 		return -ENOMEM;
 
 	mutex_lock(&mgr->lock);
 	r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
 	if (r < 0) {
 		mutex_unlock(&mgr->lock);
 		kfree(ctx);
 		return r;
 	}
+
 	*id = (uint32_t)r;
-	r = amdgpu_ctx_init(adev, ctx);
+	r = amdgpu_ctx_init(adev, priority, ctx);
 	if (r) {
 		idr_remove(&mgr->ctx_handles, *id);
 		*id = 0;
 		kfree(ctx);
 	}
 	mutex_unlock(&mgr->lock);
 	return r;
 }
 
 static void amdgpu_ctx_do_release(struct kref *ref)
@@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device *adev,
 	ctx->reset_counter = reset_counter;
 
 	mutex_unlock(&mgr->lock);
 	return 0;
 }
 
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 		     struct drm_file *filp)
 {
 	int r;
-	uint32_t id;
+	uint32_t id, priority;
 
 	union drm_amdgpu_ctx *args = data;
 	struct amdgpu_device *adev = dev->dev_private;
 	struct amdgpu_fpriv *fpriv = filp->driver_priv;
 
 	r = 0;
 	id = args->in.ctx_id;
+	priority = args->in.priority;
 
 	switch (args->in.op) {
 	case AMDGPU_CTX_OP_ALLOC_CTX:
-		r = amdgpu_ctx_alloc(adev, fpriv, &id);
+		r = amdgpu_ctx_alloc(adev, fpriv, priority, &id);
 		args->out.alloc.ctx_id = id;
 		break;
 	case AMDGPU_CTX_OP_FREE_CTX:
 		r = amdgpu_ctx_free(fpriv, id);
 		break;
 	case AMDGPU_CTX_OP_QUERY_STATE:
 		r = amdgpu_ctx_query(adev, fpriv, id, &args->out);
 		break;
 	default:
 		return -EINVAL;
diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
index d8dc681..2e458de 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
@@ -101,20 +101,21 @@ static inline struct amd_sched_fence *to_amd_sched_fence(struct dma_fence *f)
 */
 struct amd_sched_backend_ops {
 	struct dma_fence *(*dependency)(struct amd_sched_job *sched_job);
 	struct dma_fence *(*run_job)(struct amd_sched_job *sched_job);
 	void (*timedout_job)(struct amd_sched_job *sched_job);
 	void (*free_job)(struct amd_sched_job *sched_job);
 };
 
 enum amd_sched_priority {
 	AMD_SCHED_PRIORITY_KERNEL = 0,
+	AMD_SCHED_PRIORITY_HIGH,
 	AMD_SCHED_PRIORITY_NORMAL,
 	AMD_SCHED_MAX_PRIORITY
 };
 
 /**
  * One scheduler is implemented for each hardware ring
 */
 struct amd_gpu_scheduler {
 	const struct amd_sched_backend_ops	*ops;
 	uint32_t			hw_submission_limit;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index b5ae774..b756599 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -153,27 +153,32 @@ union drm_amdgpu_bo_list {
 
 /* GPU reset status */
 #define AMDGPU_CTX_NO_RESET		0
 /* this the context caused it */
 #define AMDGPU_CTX_GUILTY_RESET		1
 /* some other context caused it */
 #define AMDGPU_CTX_INNOCENT_RESET	2
 /* unknown cause */
 #define AMDGPU_CTX_UNKNOWN_RESET	3
 
+/* Context priority level */
+#define AMDGPU_CTX_PRIORITY_NORMAL	0
+#define AMDGPU_CTX_PRIORITY_HIGH	1
+#define AMDGPU_CTX_PRIORITY_NUM		2
+
 struct drm_amdgpu_ctx_in {
 	/** AMDGPU_CTX_OP_* */
 	__u32	op;
 	/** For future use, no flags defined so far */
 	__u32	flags;
 	__u32	ctx_id;
-	__u32	_pad;
+	__u32	priority;
 };
 
 union drm_amdgpu_ctx_out {
 		struct {
 			__u32	ctx_id;
 			__u32	_pad;
 		} alloc;
 
 		struct {
 			/** For future use, no flags defined so far */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (17 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4 Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
       [not found]     ` <1488320089-22035-20-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-28 22:14   ` [PATCH 20/22] drm/amdgpu: implement ring set_priority for gfx_v8 compute Andres Rodriguez
                     ` (4 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Add an initial framework for changing the HW priorities of rings. The
framework allows requesting priority changes for the lifetime of an
amdgpu_job. After the job completes the priority will decay to the next
lowest priority for which a request is still valid.

A new ring function set_priority() can now be populated to take care of
the HW specific programming sequence for priority changes.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 10 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 71 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 11 +++++
 5 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 366f6d3..0676495 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -636,21 +636,21 @@ struct amdgpu_flip_work {
 struct amdgpu_ib {
 	struct amdgpu_sa_bo		*sa_bo;
 	uint32_t			length_dw;
 	uint64_t			gpu_addr;
 	uint32_t			*ptr;
 	uint32_t			flags;
 };
 
 extern const struct amd_sched_backend_ops amdgpu_sched_ops;
 
-int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
+int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int priority,
 		     struct amdgpu_job **job, struct amdgpu_vm *vm);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
 			     struct amdgpu_job **job);
 
 void amdgpu_job_free_resources(struct amdgpu_job *job);
 void amdgpu_job_free(struct amdgpu_job *job);
 int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring *ring,
 		      struct amd_sched_entity *entity, void *owner,
 		      struct dma_fence **f);
 
@@ -990,20 +990,22 @@ struct amdgpu_cs_parser {
 #define AMDGPU_VM_DOMAIN                    (1 << 3) /* bit set means in virtual memory context */
 
 struct amdgpu_job {
 	struct amd_sched_job    base;
 	struct amdgpu_device	*adev;
 	struct amdgpu_vm	*vm;
 	struct amdgpu_ring	*ring;
 	struct amdgpu_sync	sync;
 	struct amdgpu_ib	*ibs;
 	struct dma_fence	*fence; /* the hw fence */
+	struct dma_fence_cb	cb;
+	int			priority;
 	uint32_t		preamble_status;
 	uint32_t		num_ibs;
 	void			*owner;
 	uint64_t		fence_ctx; /* the fence_context this job uses */
 	bool                    vm_needs_flush;
 	unsigned		vm_id;
 	uint64_t		vm_pd_addr;
 	uint32_t		gds_base, gds_size;
 	uint32_t		gws_base, gws_size;
 	uint32_t		oa_base, oa_size;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 605d40e..19ce202 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -179,21 +179,21 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data)
 
 		case AMDGPU_CHUNK_ID_DEPENDENCIES:
 			break;
 
 		default:
 			ret = -EINVAL;
 			goto free_partial_kdata;
 		}
 	}
 
-	ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm);
+	ret = amdgpu_job_alloc(p->adev, num_ibs, p->ctx->priority, &p->job, vm);
 	if (ret)
 		goto free_all_kdata;
 
 	if (p->uf_entry.robj)
 		p->job->uf_addr = uf_offset;
 	kfree(chunk_array);
 	return 0;
 
 free_all_kdata:
 	i = p->nchunks - 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 86a1242..45b3c90 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -32,50 +32,51 @@ static void amdgpu_job_timedout(struct amd_sched_job *s_job)
 {
 	struct amdgpu_job *job = container_of(s_job, struct amdgpu_job, base);
 
 	DRM_ERROR("ring %s timeout, last signaled seq=%u, last emitted seq=%u\n",
 		  job->base.sched->name,
 		  atomic_read(&job->ring->fence_drv.last_seq),
 		  job->ring->fence_drv.sync_seq);
 	amdgpu_gpu_reset(job->adev);
 }
 
-int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
+int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int priority,
 		     struct amdgpu_job **job, struct amdgpu_vm *vm)
 {
 	size_t size = sizeof(struct amdgpu_job);
 
 	if (num_ibs == 0)
 		return -EINVAL;
 
 	size += sizeof(struct amdgpu_ib) * num_ibs;
 
 	*job = kzalloc(size, GFP_KERNEL);
 	if (!*job)
 		return -ENOMEM;
 
 	(*job)->adev = adev;
 	(*job)->vm = vm;
+	(*job)->priority = priority;
 	(*job)->ibs = (void *)&(*job)[1];
 	(*job)->num_ibs = num_ibs;
 
 	amdgpu_sync_create(&(*job)->sync);
 
 	return 0;
 }
 
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
 			     struct amdgpu_job **job)
 {
 	int r;
 
-	r = amdgpu_job_alloc(adev, 1, job, NULL);
+	r = amdgpu_job_alloc(adev, 1, AMDGPU_CTX_PRIORITY_NORMAL, job, NULL);
 	if (r)
 		return r;
 
 	r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
 	if (r)
 		kfree(*job);
 
 	return r;
 }
 
@@ -170,20 +171,25 @@ static struct dma_fence *amdgpu_job_run(struct amd_sched_job *sched_job)
 	BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));
 
 	trace_amdgpu_sched_run_job(job);
 	r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job, &fence);
 	if (r)
 		DRM_ERROR("Error scheduling IBs (%d)\n", r);
 
 	/* if gpu reset, hw fence will be replaced here */
 	dma_fence_put(job->fence);
 	job->fence = dma_fence_get(fence);
+
+	r = amdgpu_ring_elevate_priority(job->ring, job->priority, job);
+	if (r)
+		DRM_ERROR("Failed to set job priority (%d)\n", r);
+
 	amdgpu_job_free_resources(job);
 	return fence;
 }
 
 const struct amd_sched_backend_ops amdgpu_sched_ops = {
 	.dependency = amdgpu_job_dependency,
 	.run_job = amdgpu_job_run,
 	.timedout_job = amdgpu_job_timedout,
 	.free_job = amdgpu_job_free_cb
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 80cb051..12bc7a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -192,20 +192,89 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
  * Reset the driver's copy of the wptr (all asics).
  */
 void amdgpu_ring_undo(struct amdgpu_ring *ring)
 {
 	ring->wptr = ring->wptr_old;
 
 	if (ring->funcs->end_use)
 		ring->funcs->end_use(ring);
 }
 
+static void amdgpu_ring_restore_priority_cb(struct dma_fence *f,
+					    struct dma_fence_cb *cb)
+{
+	int i;
+	struct amdgpu_job *cb_job =
+		container_of(cb, struct amdgpu_job, cb);
+	struct amdgpu_ring *ring = cb_job->ring;
+
+	spin_lock(&ring->priority_lock);
+
+	/* remove ourselves from the list if necessary */
+	if (cb_job == ring->last_job[cb_job->priority])
+		ring->last_job[cb_job->priority] = NULL;
+
+	/* something higher prio is executing, no need to decay */
+	if (ring->priority > cb_job->priority)
+		goto out_unlock;
+
+	/* decay priority to the next level with a job available */
+	for (i = cb_job->priority; i >= 0; i--) {
+		if (i == AMDGPU_CTX_PRIORITY_NORMAL || ring->last_job[i]) {
+			ring->priority = i;
+			if (ring->funcs->set_priority)
+				ring->funcs->set_priority(ring, i);
+
+			break;
+		}
+	}
+
+out_unlock:
+	spin_unlock(&ring->priority_lock);
+}
+
+/**
+ * amdgpu_ring_elevate_priority - change the ring's priority
+ *
+ * @ring: amdgpu_ring structure holding the information
+ * @priority: target priority
+ * @job: priority should remain elevated for the duration of this job
+ *
+ * Use HW specific mechanism's to elevate the ring's priority while @job
+ * is executing. Once @job finishes executing, the ring will reset back
+ * to normal priority.
+ * Returns 0 on success, error otherwise
+ */
+int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
+				 struct amdgpu_job *job)
+{
+	if (priority < 0 || priority >= AMDGPU_CTX_PRIORITY_NUM)
+		return -EINVAL;
+
+	spin_lock(&ring->priority_lock);
+	ring->last_job[priority] = job;
+
+	if (priority <= ring->priority)
+		goto out_unlock;
+
+	ring->priority = priority;
+	if (ring->funcs->set_priority)
+		ring->funcs->set_priority(ring, priority);
+
+	dma_fence_add_callback(job->fence, &job->cb,
+			       amdgpu_ring_restore_priority_cb);
+
+out_unlock:
+	spin_unlock(&ring->priority_lock);
+	return 0;
+}
+
 /**
  * amdgpu_ring_init - init driver ring struct.
  *
  * @adev: amdgpu_device pointer
  * @ring: amdgpu_ring structure holding ring information
  * @max_ndw: maximum number of dw for ring alloc
  * @nop: nop packet for this ring
  *
  * Initialize the driver information for the selected ring (all asics).
  * Returns 0 on success, error on failure.
@@ -275,20 +344,22 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 					    (void **)&ring->ring);
 		if (r) {
 			dev_err(adev->dev, "(%d) ring create failed\n", r);
 			return r;
 		}
 		memset((void *)ring->ring, 0, ring->ring_size);
 	}
 	ring->ptr_mask = (ring->ring_size / 4) - 1;
 	ring->max_dw = max_dw;
 	ring->hw_ip = hw_ip;
+	ring->priority = AMDGPU_CTX_PRIORITY_NORMAL;
+	spin_lock_init(&ring->priority_lock);
 	INIT_LIST_HEAD(&ring->lru_list);
 	amdgpu_ring_lru_touch(adev, ring);
 
 	if (amdgpu_debugfs_ring_init(adev, ring)) {
 		DRM_ERROR("Failed to register debugfs file for rings !\n");
 	}
 	return 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index ecdd87c..befc29f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -17,20 +17,21 @@
  * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
  * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  * Authors: Christian König
  */
 #ifndef __AMDGPU_RING_H__
 #define __AMDGPU_RING_H__
 
+#include <drm/amdgpu_drm.h>
 #include "gpu_scheduler.h"
 
 /* max number of rings */
 #define AMDGPU_MAX_RINGS		16
 #define AMDGPU_MAX_GFX_RINGS		1
 #define AMDGPU_MAX_COMPUTE_RINGS	8
 #define AMDGPU_MAX_VCE_RINGS		3
 
 /* some special values for the owner field */
 #define AMDGPU_FENCE_OWNER_UNDEFINED	((void*)0ul)
@@ -130,20 +131,22 @@ struct amdgpu_ring_funcs {
 	void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
 	unsigned (*init_cond_exec)(struct amdgpu_ring *ring);
 	void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned offset);
 	/* note usage for clock and power gating */
 	void (*begin_use)(struct amdgpu_ring *ring);
 	void (*end_use)(struct amdgpu_ring *ring);
 	void (*emit_switch_buffer) (struct amdgpu_ring *ring);
 	void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
 	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
 	void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
+	/* priority functions */
+	void (*set_priority) (struct amdgpu_ring *ring, int priority);
 };
 
 struct amdgpu_ring {
 	struct amdgpu_device		*adev;
 	const struct amdgpu_ring_funcs	*funcs;
 	struct amdgpu_fence_driver	fence_drv;
 	struct amd_gpu_scheduler	sched;
 	struct list_head		lru_list;
 
 	struct amdgpu_bo	*ring_obj;
@@ -165,31 +168,39 @@ struct amdgpu_ring {
 	struct amdgpu_bo	*mqd_obj;
 	u32			doorbell_index;
 	bool			use_doorbell;
 	unsigned		wptr_offs;
 	unsigned		fence_offs;
 	uint64_t		current_ctx;
 	char			name[16];
 	unsigned		cond_exe_offs;
 	u64			cond_exe_gpu_addr;
 	volatile u32		*cond_exe_cpu_addr;
+
+	spinlock_t		priority_lock;
+	/* protected by priority_lock */
+	struct amdgpu_job 	*last_job[AMDGPU_CTX_PRIORITY_NUM];
+	int			priority;
+
 #if defined(CONFIG_DEBUG_FS)
 	struct dentry *ent;
 #endif
 };
 
 int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
 			       int hw_ip, int ring);
 int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
 void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
 void amdgpu_ring_commit(struct amdgpu_ring *ring);
 void amdgpu_ring_undo(struct amdgpu_ring *ring);
+int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
+				 struct amdgpu_job *job);
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 		     int hw_ip, unsigned ring_size,
 		     struct amdgpu_irq_src *irq_src, unsigned irq_type);
 void amdgpu_ring_fini(struct amdgpu_ring *ring);
 int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
 			struct amdgpu_ring **ring);
 void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct amdgpu_ring *ring);
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 20/22] drm/amdgpu: implement ring set_priority for gfx_v8 compute
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (18 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 21/22] drm/amdgpu: condense mqd programming sequence Andres Rodriguez
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

Programming CP_HQD_QUEUE_PRIORITY enables a queue to take priority over
other queues on the same pipe. Multiple queues on a pipe are timesliced
so this gives us full precedence over other queues.

Programming CP_HQD_PIPE_PRIORITY changes the SPI_ARB_PRIORITY of the
wave as follows:
        0x2: CS_H
        0x1: CS_M
        0x0: CS_L

The SPI block will then dispatch work according to the policy set by
SPI_ARB_PRIORITY. In the current policy CS_H is higher priority than
gfx.

In order to prevent getting stuck in loops of CUs bouncing between GFX
and high priority compute and introducing further latency, we reserve
CUs 2+ for high priority compute on-demand.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 85 +++++++++++++++++++++++++++++-
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0676495..e37f20d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -913,20 +913,23 @@ struct amdgpu_gfx {
 	uint32_t			me_feature_version;
 	uint32_t			ce_feature_version;
 	uint32_t			pfp_feature_version;
 	uint32_t			rlc_feature_version;
 	uint32_t			mec_feature_version;
 	uint32_t			mec2_feature_version;
 	struct amdgpu_ring		gfx_ring[AMDGPU_MAX_GFX_RINGS];
 	unsigned			num_gfx_rings;
 	struct amdgpu_ring		compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
 	unsigned			num_compute_rings;
+	spinlock_t			cu_reserve_lock;
+	uint32_t			cu_reserve_pipe_mask;
+	uint32_t			cu_reserve_queue_mask[AMDGPU_MAX_COMPUTE_RINGS];
 	struct amdgpu_irq_src		eop_irq;
 	struct amdgpu_irq_src		priv_reg_irq;
 	struct amdgpu_irq_src		priv_inst_irq;
 	/* gfx status */
 	uint32_t			gfx_current_status;
 	/* ce ram size*/
 	unsigned			ce_ram_size;
 	struct amdgpu_cu_info		cu_info;
 	const struct amdgpu_gfx_funcs	*funcs;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1fb1303..86d76e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1701,20 +1701,21 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	/* Registers mapping */
 	/* TODO: block userspace mapping of io register */
 	spin_lock_init(&adev->mmio_idx_lock);
 	spin_lock_init(&adev->smc_idx_lock);
 	spin_lock_init(&adev->pcie_idx_lock);
 	spin_lock_init(&adev->uvd_ctx_idx_lock);
 	spin_lock_init(&adev->didt_idx_lock);
 	spin_lock_init(&adev->gc_cac_idx_lock);
 	spin_lock_init(&adev->audio_endpt_idx_lock);
 	spin_lock_init(&adev->mm_stats.lock);
+	spin_lock_init(&adev->gfx.cu_reserve_lock);
 
 	INIT_LIST_HEAD(&adev->shadow_list);
 	mutex_init(&adev->shadow_list_lock);
 
 	INIT_LIST_HEAD(&adev->gtt_list);
 	spin_lock_init(&adev->gtt_list_lock);
 
 	INIT_LIST_HEAD(&adev->ring_lru_list);
 	mutex_init(&adev->ring_lru_list_lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a778d58..4fdec23 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -46,21 +46,23 @@
 #include "gca/gfx_8_0_sh_mask.h"
 #include "gca/gfx_8_0_enum.h"
 
 #include "dce/dce_10_0_d.h"
 #include "dce/dce_10_0_sh_mask.h"
 
 #include "smu/smu_7_1_3_d.h"
 
 #define GFX8_NUM_GFX_RINGS     1
 #define GFX8_MEC_HPD_SIZE 2048
-
+#define GFX8_CU_RESERVE_RESOURCES 0x45888
+#define GFX8_CU_NUM 8
+#define GFX8_CU_RESERVE_PIPE_SHIFT 7
 
 #define TOPAZ_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define CARRIZO_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define POLARIS11_GB_ADDR_CONFIG_GOLDEN 0x22011002
 #define TONGA_GB_ADDR_CONFIG_GOLDEN 0x22011003
 
 #define ARRAY_MODE(x)					((x) << GB_TILE_MODE0__ARRAY_MODE__SHIFT)
 #define PIPE_CONFIG(x)					((x) << GB_TILE_MODE0__PIPE_CONFIG__SHIFT)
 #define TILE_SPLIT(x)					((x) << GB_TILE_MODE0__TILE_SPLIT__SHIFT)
 #define MICRO_TILE_MODE_NEW(x)				((x) << GB_TILE_MODE0__MICRO_TILE_MODE_NEW__SHIFT)
@@ -6667,20 +6669,100 @@ static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring)
 
 static void gfx_v8_0_ring_set_wptr_compute(struct amdgpu_ring *ring)
 {
 	struct amdgpu_device *adev = ring->adev;
 
 	/* XXX check if swapping is necessary on BE */
 	adev->wb.wb[ring->wptr_offs] = ring->wptr;
 	WDOORBELL32(ring->doorbell_index, ring->wptr);
 }
 
+static void gfx_v8_0_cu_reserve(struct amdgpu_device *adev,
+				struct amdgpu_ring *ring, bool acquire)
+{
+	int i, resources;
+	int tmp = 0, queue_mask = 0, type_mask = 0;
+	int reserve_res_reg, reserve_en_reg;
+
+	/* gfx_v8_0_cu_reserve only supports compute path */
+	if (ring->hw_ip != AMDGPU_HW_IP_COMPUTE)
+		return;
+
+	spin_lock(&adev->gfx.cu_reserve_lock);
+	if (acquire) {
+		adev->gfx.cu_reserve_pipe_mask |= (1 << ring->pipe);
+		adev->gfx.cu_reserve_queue_mask[ring->pipe] |= (1 << ring->queue);
+	} else {
+		adev->gfx.cu_reserve_pipe_mask &= ~(1 << ring->pipe);
+		adev->gfx.cu_reserve_queue_mask[ring->pipe] &= ~(1 << ring->queue);
+	}
+
+	/* compute pipe 0 starts at GFX8_CU_RESERVE_PIPE_SHIFT */
+	type_mask = (adev->gfx.cu_reserve_pipe_mask << GFX8_CU_RESERVE_PIPE_SHIFT);
+
+	/* HW only has one register for queue mask, so we collaspse them */
+	for (i = 0; i < AMDGPU_MAX_COMPUTE_RINGS; i++)
+		queue_mask |= adev->gfx.cu_reserve_queue_mask[i];
+
+	/* leave the first 2 CUs for general processing */
+	for (i = 2; i < GFX8_CU_NUM; i++) {
+		reserve_res_reg = mmSPI_RESOURCE_RESERVE_CU_0 + i;
+		reserve_en_reg = mmSPI_RESOURCE_RESERVE_EN_CU_0 + i;
+
+		tmp = REG_SET_FIELD(tmp, SPI_RESOURCE_RESERVE_EN_CU_0,
+				    TYPE_MASK, type_mask);
+		tmp = REG_SET_FIELD(tmp, SPI_RESOURCE_RESERVE_EN_CU_0,
+				    QUEUE_MASK, queue_mask);
+		if (queue_mask) {
+			resources = GFX8_CU_RESERVE_RESOURCES;
+			tmp = REG_SET_FIELD(tmp, SPI_RESOURCE_RESERVE_EN_CU_0,
+					    EN, 1);
+		} else {
+			resources = 0;
+			tmp = REG_SET_FIELD(tmp, SPI_RESOURCE_RESERVE_EN_CU_0,
+					    EN, 0);
+		}
+		/* Commit */
+		WREG32(reserve_res_reg, resources);
+		WREG32(reserve_en_reg, tmp);
+	}
+
+	spin_unlock(&adev->gfx.cu_reserve_lock);
+}
+
+static void gfx_v8_0_ring_set_priority_compute(struct amdgpu_ring *ring,
+					       int priority)
+{
+	struct amdgpu_device *adev = ring->adev;
+
+	if (ring->hw_ip != AMDGPU_HW_IP_COMPUTE)
+		return;
+
+	mutex_lock(&adev->srbm_mutex);
+	vi_srbm_select(adev, ring->me, ring->pipe, 0, 0);
+
+	switch (priority) {
+	case AMDGPU_CTX_PRIORITY_NORMAL:
+		WREG32(mmCP_HQD_PIPE_PRIORITY, 0x0);
+		WREG32(mmCP_HQD_QUEUE_PRIORITY, 0x0);
+		gfx_v8_0_cu_reserve(adev, ring, false);
+		break;
+	case AMDGPU_CTX_PRIORITY_HIGH:
+		WREG32(mmCP_HQD_PIPE_PRIORITY, 0x2);
+		WREG32(mmCP_HQD_QUEUE_PRIORITY, 0xf);
+		gfx_v8_0_cu_reserve(adev, ring, true);
+		break;
+	}
+
+	vi_srbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
+}
 static void gfx_v8_0_ring_emit_fence_compute(struct amdgpu_ring *ring,
 					     u64 addr, u64 seq,
 					     unsigned flags)
 {
 	bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
 	bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
 
 	/* RELEASE_MEM - flush caches, send int */
 	amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 5));
 	amdgpu_ring_write(ring, (EOP_TCL1_ACTION_EN |
@@ -7074,20 +7156,21 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_compute = {
 	.emit_fence = gfx_v8_0_ring_emit_fence_compute,
 	.emit_pipeline_sync = gfx_v8_0_ring_emit_pipeline_sync,
 	.emit_vm_flush = gfx_v8_0_ring_emit_vm_flush,
 	.emit_gds_switch = gfx_v8_0_ring_emit_gds_switch,
 	.emit_hdp_flush = gfx_v8_0_ring_emit_hdp_flush,
 	.emit_hdp_invalidate = gfx_v8_0_ring_emit_hdp_invalidate,
 	.test_ring = gfx_v8_0_ring_test_ring,
 	.test_ib = gfx_v8_0_ring_test_ib,
 	.insert_nop = amdgpu_ring_insert_nop,
 	.pad_ib = amdgpu_ring_generic_pad_ib,
+	.set_priority = gfx_v8_0_ring_set_priority_compute,
 };
 
 static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_kiq = {
 	.type = AMDGPU_RING_TYPE_KIQ,
 	.align_mask = 0xff,
 	.nop = PACKET3(PACKET3_NOP, 0x3FFF),
 	.get_rptr = gfx_v8_0_ring_get_rptr,
 	.get_wptr = gfx_v8_0_ring_get_wptr_compute,
 	.set_wptr = gfx_v8_0_ring_set_wptr_compute,
 	.emit_frame_size =
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 21/22] drm/amdgpu: condense mqd programming sequence
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (19 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 20/22] drm/amdgpu: implement ring set_priority for gfx_v8 compute Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-02-28 22:14   ` [PATCH 22/22] drm/amdgpu: workaround tonga HW bug in HQD " Andres Rodriguez
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Andres Rodriguez

The MQD structure matches the reg layout. Take advantage of this to
simplify HQD programming.

Note that the ACTIVE field still needs to be programmed last.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 44 +++++--------------
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 83 +++++------------------------------
 2 files changed, 22 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index c76dcc8..a9893fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -3108,61 +3108,39 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
 	mqd->cp_hqd_pipe_priority = RREG32(mmCP_HQD_PIPE_PRIORITY);
 	mqd->cp_hqd_queue_priority = RREG32(mmCP_HQD_QUEUE_PRIORITY);
 	mqd->cp_hqd_iq_rptr = RREG32(mmCP_HQD_IQ_RPTR);
 
 	/* activate the queue */
 	mqd->cp_hqd_active = 1;
 }
 
 int gfx_v7_0_mqd_commit(struct amdgpu_device *adev, struct cik_mqd *mqd)
 {
-	u32 tmp;
+	uint32_t tmp;
+	uint32_t mqd_reg;
+	uint32_t *mqd_data;
+
+	/* HQD registers extend from mmCP_MQD_BASE_ADDR to mmCP_MQD_CONTROL */
+	mqd_data = &mqd->cp_mqd_base_addr_lo;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
-	/* program MQD field to HW */
-	WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
-	WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->cp_mqd_base_addr_hi);
-	WREG32(mmCP_MQD_CONTROL, mqd->cp_mqd_control);
-	WREG32(mmCP_HQD_PQ_BASE, mqd->cp_hqd_pq_base_lo);
-	WREG32(mmCP_HQD_PQ_BASE_HI, mqd->cp_hqd_pq_base_hi);
-	WREG32(mmCP_HQD_PQ_CONTROL, mqd->cp_hqd_pq_control);
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, mqd->cp_hqd_pq_rptr_report_addr_lo);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI, mqd->cp_hqd_pq_rptr_report_addr_hi);
-	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
-	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-	WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
-
-	WREG32(mmCP_HQD_IB_CONTROL, mqd->cp_hqd_ib_control);
-	WREG32(mmCP_HQD_IB_BASE_ADDR, mqd->cp_hqd_ib_base_addr_lo);
-	WREG32(mmCP_HQD_IB_BASE_ADDR_HI, mqd->cp_hqd_ib_base_addr_hi);
-	WREG32(mmCP_HQD_IB_RPTR, mqd->cp_hqd_ib_rptr);
-	WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
-	WREG32(mmCP_HQD_SEMA_CMD, mqd->cp_hqd_sema_cmd);
-	WREG32(mmCP_HQD_MSG_TYPE, mqd->cp_hqd_msg_type);
-	WREG32(mmCP_HQD_ATOMIC0_PREOP_LO, mqd->cp_hqd_atomic0_preop_lo);
-	WREG32(mmCP_HQD_ATOMIC0_PREOP_HI, mqd->cp_hqd_atomic0_preop_hi);
-	WREG32(mmCP_HQD_ATOMIC1_PREOP_LO, mqd->cp_hqd_atomic1_preop_lo);
-	WREG32(mmCP_HQD_ATOMIC1_PREOP_HI, mqd->cp_hqd_atomic1_preop_hi);
-	WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
-	WREG32(mmCP_HQD_QUANTUM, mqd->cp_hqd_quantum);
-	WREG32(mmCP_HQD_PIPE_PRIORITY, mqd->cp_hqd_pipe_priority);
-	WREG32(mmCP_HQD_QUEUE_PRIORITY, mqd->cp_hqd_queue_priority);
-	WREG32(mmCP_HQD_IQ_RPTR, mqd->cp_hqd_iq_rptr);
+	/* program all HQD registers */
+	for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_MQD_CONTROL; mqd_reg++)
+		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
 	/* activate the HQD */
-	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
+	for (mqd_reg = mmCP_MQD_BASE_ADDR; mqd_reg <= mmCP_HQD_ACTIVE; mqd_reg++)
+		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
 	return 0;
 }
 
 static int gfx_v7_0_compute_queue_init(struct amdgpu_device *adev, int ring_id)
 {
 	int r;
 	u64 mqd_gpu_addr;
 	struct cik_mqd *mqd;
 	struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 4fdec23..3593f36 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4918,99 +4918,38 @@ static void gfx_v8_0_enable_doorbell(struct amdgpu_device *adev, bool enable)
 	tmp = RREG32(mmCP_PQ_STATUS);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_STATUS, DOORBELL_ENABLE, 1);
 	WREG32(mmCP_PQ_STATUS, tmp);
 
 	adev->gfx.doorbell_enabled = true;
 }
 
 int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 {
 	uint32_t tmp;
+	uint32_t mqd_reg;
+	uint32_t *mqd_data;
+
+	/* HQD registers extend from mmCP_MQD_BASE_ADDR to mmCP_HQD_ERROR */
+	mqd_data = &mqd->cp_mqd_base_addr_lo;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
-	WREG32(mmCP_HQD_EOP_BASE_ADDR, mqd->cp_hqd_eop_base_addr_lo);
-	WREG32(mmCP_HQD_EOP_BASE_ADDR_HI, mqd->cp_hqd_eop_base_addr_hi);
-
-	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-	WREG32(mmCP_HQD_EOP_CONTROL, mqd->cp_hqd_eop_control);
-
-	/* enable doorbell? */
-	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
-
-	/* set pq read/write pointers */
-	WREG32(mmCP_HQD_DEQUEUE_REQUEST, mqd->cp_hqd_dequeue_request);
-	WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
-	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-
-	/* set the pointer to the MQD */
-	WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
-	WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->cp_mqd_base_addr_hi);
-
-	/* set MQD vmid to 0 */
-	WREG32(mmCP_MQD_CONTROL, mqd->cp_mqd_control);
-
-	/* set the pointer to the HQD, this is similar CP_RB0_BASE/_HI */
-	WREG32(mmCP_HQD_PQ_BASE, mqd->cp_hqd_pq_base_lo);
-	WREG32(mmCP_HQD_PQ_BASE_HI, mqd->cp_hqd_pq_base_hi);
-
-	/* set up the HQD, this is similar to CP_RB0_CNTL */
-	WREG32(mmCP_HQD_PQ_CONTROL, mqd->cp_hqd_pq_control);
-
-	/* set the wb address whether it's enabled or not */
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR,
-				mqd->cp_hqd_pq_rptr_report_addr_lo);
-	WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
-				mqd->cp_hqd_pq_rptr_report_addr_hi);
-
-	/* only used if CP_PQ_WPTR_POLL_CNTL.CP_PQ_WPTR_POLL_CNTL__EN_MASK=1 */
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
-	WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
-
-	/* enable the doorbell if requested */
-	WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
+	/* program all HQD registers */
+	for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_HQD_ERROR; mqd_reg++)
+		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
-	/* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
-	WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-	WREG32(mmCP_HQD_EOP_RPTR, mqd->cp_hqd_eop_rptr);
-	WREG32(mmCP_HQD_EOP_WPTR, mqd->cp_hqd_eop_wptr);
-
-	/* set the HQD priority */
-	WREG32(mmCP_HQD_PIPE_PRIORITY, mqd->cp_hqd_pipe_priority);
-	WREG32(mmCP_HQD_QUEUE_PRIORITY, mqd->cp_hqd_queue_priority);
-	WREG32(mmCP_HQD_QUANTUM, mqd->cp_hqd_quantum);
-
-	/* set cwsr save area */
-	WREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_LO, mqd->cp_hqd_ctx_save_base_addr_lo);
-	WREG32(mmCP_HQD_CTX_SAVE_BASE_ADDR_HI, mqd->cp_hqd_ctx_save_base_addr_hi);
-	WREG32(mmCP_HQD_CTX_SAVE_CONTROL, mqd->cp_hqd_ctx_save_control);
-	WREG32(mmCP_HQD_CNTL_STACK_OFFSET, mqd->cp_hqd_cntl_stack_offset);
-	WREG32(mmCP_HQD_CNTL_STACK_SIZE, mqd->cp_hqd_cntl_stack_size);
-	WREG32(mmCP_HQD_WG_STATE_OFFSET, mqd->cp_hqd_wg_state_offset);
-	WREG32(mmCP_HQD_CTX_SAVE_SIZE, mqd->cp_hqd_ctx_save_size);
-
-	WREG32(mmCP_HQD_IB_CONTROL, mqd->cp_hqd_ib_control);
-	WREG32(mmCP_HQD_EOP_EVENTS, mqd->cp_hqd_eop_done_events);
-	WREG32(mmCP_HQD_ERROR, mqd->cp_hqd_error);
-	WREG32(mmCP_HQD_EOP_WPTR_MEM, mqd->cp_hqd_eop_wptr_mem);
-	WREG32(mmCP_HQD_EOP_DONES, mqd->cp_hqd_eop_dones);
-
-	/* set the vmid for the queue */
-	WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
-
-	WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
-
-	/* activate the queue */
-	WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
+	/* activate the HQD */
+	for (mqd_reg = mmCP_MQD_BASE_ADDR; mqd_reg <= mmCP_HQD_ACTIVE; mqd_reg++)
+		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
 	return 0;
 }
 
 static int gfx_v8_0_kiq_queue_init(struct amdgpu_ring *ring,
 				   struct vi_mqd *mqd,
 				   u64 mqd_gpu_addr)
 {
 	int r = 0;
 	struct amdgpu_device *adev = ring->adev;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 22/22] drm/amdgpu: workaround tonga HW bug in HQD programming sequence
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (20 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 21/22] drm/amdgpu: condense mqd programming sequence Andres Rodriguez
@ 2017-02-28 22:14   ` Andres Rodriguez
  2017-03-01 11:42   ` Add support for high priority scheduling in amdgpu Christian König
  2017-03-01 16:14   ` Bridgman, John
  23 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-02-28 22:14 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Jay Cornwall, Andres Rodriguez

Tonga based asics may experience hangs when an HQD's EOP parameters
are modified.

Workaround this HW issue by avoiding writes to these registers for
tonga asics.

Based on the following ROCm commit:
2a0fb8 - drm/amdgpu: Synchronize KFD HQD load protocol with CP scheduler

From the ROCm git repository:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git

CC: Jay Cornwall <Jay.Cornwall@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 3593f36..9181655 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4930,21 +4930,35 @@ int gfx_v8_0_mqd_commit(struct amdgpu_device *adev, struct vi_mqd *mqd)
 
 	/* HQD registers extend from mmCP_MQD_BASE_ADDR to mmCP_HQD_ERROR */
 	mqd_data = &mqd->cp_mqd_base_addr_lo;
 
 	/* disable wptr polling */
 	tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
 	tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
 	WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
 	/* program all HQD registers */
-	for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_HQD_ERROR; mqd_reg++)
+	for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_HQD_EOP_CONTROL; mqd_reg++)
+		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
+
+	/* Tonga errata: EOP RPTR/WPTR should be left unmodified.
+	 * This is safe since EOP RPTR==WPTR for any inactive HQD
+	 * on ASICs that do not support context-save.
+	 * EOP writes/reads can start anywhere in the ring.
+	 */
+	if (adev->asic_type != CHIP_TONGA) {
+		WREG32(mmCP_HQD_EOP_RPTR, mqd->cp_hqd_eop_rptr);
+		WREG32(mmCP_HQD_EOP_WPTR, mqd->cp_hqd_eop_wptr);
+		WREG32(mmCP_HQD_EOP_WPTR_MEM, mqd->cp_hqd_eop_wptr_mem);
+	}
+
+	for (mqd_reg = mmCP_HQD_EOP_EVENTS; mqd_reg <= mmCP_HQD_ERROR; mqd_reg++)
 		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
 	/* activate the HQD */
 	for (mqd_reg = mmCP_MQD_BASE_ADDR; mqd_reg <= mmCP_HQD_ACTIVE; mqd_reg++)
 		WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
 	return 0;
 }
 
 static int gfx_v8_0_kiq_queue_init(struct amdgpu_ring *ring,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found]     ` <1488320089-22035-19-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-01  1:13       ` Emil Velikov
       [not found]         ` <CACvgo51=1-8dHmC8MOmbCijDv3vpD4dTC6hibQMe5bYB9zsB4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-03-01  6:52       ` zhoucm1
  1 sibling, 1 reply; 40+ messages in thread
From: Emil Velikov @ 2017-03-01  1:13 UTC (permalink / raw)
  To: Andres Rodriguez; +Cc: amd-gfx mailing list

Hi Andres,

There's a couple of nitpicks below, but feel free to address those as
follow-up. Considering they're correct of course ;-)

On 28 February 2017 at 22:14, Andres Rodriguez <andresx7@gmail.com> wrote:
> Add a new context creation parameter to express a global context priority.
>
> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
> (default) contexts.
>
> v2: Instead of using flags, repurpose __pad
> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
> v4: Validate usermode priority and store it
>
> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41 +++++++++++++++++++++++----
>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
>  include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
>  4 files changed, 44 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index e30c47e..366f6d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>         struct amd_sched_entity entity;
>  };
>
>  struct amdgpu_ctx {
>         struct kref             refcount;
>         struct amdgpu_device    *adev;
>         unsigned                reset_counter;
>         spinlock_t              ring_lock;
>         struct dma_fence        **fences;
>         struct amdgpu_ctx_ring  rings[AMDGPU_MAX_RINGS];
> +       int                     priority;
>         bool preamble_presented;
>  };
>
>  struct amdgpu_ctx_mgr {
>         struct amdgpu_device    *adev;
>         struct mutex            lock;
>         /* protected by lock */
>         struct idr              ctx_handles;
>  };
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 400c66b..22a15d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -18,47 +18,75 @@
>   * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>   * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>   * OTHER DEALINGS IN THE SOFTWARE.
>   *
>   * Authors: monk liu <monk.liu@amd.com>
>   */
>
>  #include <drm/drmP.h>
>  #include "amdgpu.h"
>
> -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
> +static enum amd_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
> +{
> +       switch (amdgpu_priority) {
> +       case AMDGPU_CTX_PRIORITY_HIGH:
> +               return AMD_SCHED_PRIORITY_HIGH;
> +       case AMDGPU_CTX_PRIORITY_NORMAL:
> +               return AMD_SCHED_PRIORITY_NORMAL;
> +       default:
> +               WARN(1, "Invalid context priority %d\n", amdgpu_priority);
> +               return AMD_SCHED_PRIORITY_NORMAL;
> +       }
> +}
> +
> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
> +                               uint32_t priority,
> +                               struct amdgpu_ctx *ctx)
>  {
>         unsigned i, j;
>         int r;
> +       enum amd_sched_priority sched_priority;
> +
> +       sched_priority = amdgpu_to_sched_priority(priority);
> +
This will trigger dmesg spam on normal user input. I'd keep the WARN
in amdgpu_to_sched_priority, but move the function call after the
validation of priority.
Thinking about it the input validation really belongs in the ioctl -
amdgpu_ctx_ioctl().

> +       if (priority >= AMDGPU_CTX_PRIORITY_NUM)
> +               return -EINVAL;
> +
> +       if (sched_priority < 0 || sched_priority >= AMD_SCHED_MAX_PRIORITY)
> +               return -EINVAL;
> +
> +       if (sched_priority == AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_ADMIN))
This is not obvious neither in the commit message nor the UAPI. I'd
suggest adding a comment in the latter.
If memory is not failing - high prio will _not_ work with render nodes
so you really want to cover and/or explain why.

> +               return -EACCES;
>
>         memset(ctx, 0, sizeof(*ctx));
>         ctx->adev = adev;
> +       ctx->priority = priority;
>         kref_init(&ctx->refcount);
>         spin_lock_init(&ctx->ring_lock);
>         ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>                               sizeof(struct dma_fence*), GFP_KERNEL);
>         if (!ctx->fences)
>                 return -ENOMEM;
>
>         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>                 ctx->rings[i].sequence = 1;
>                 ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
>         }
>
>         ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>
>         /* create context entity for each ring */
>         for (i = 0; i < adev->num_rings; i++) {
>                 struct amdgpu_ring *ring = adev->rings[i];
>                 struct amd_sched_rq *rq;
>
> -               rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
> +               rq = &ring->sched.sched_rq[sched_priority];
>                 r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
>                                           rq, amdgpu_sched_jobs);
>                 if (r)
>                         goto failed;
>         }
>
>         return 0;
>
>  failed:
>         for (j = 0; j < i; j++)
> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>         kfree(ctx->fences);
>         ctx->fences = NULL;
>
>         for (i = 0; i < adev->num_rings; i++)
>                 amd_sched_entity_fini(&adev->rings[i]->sched,
>                                       &ctx->rings[i].entity);
>  }
>
>  static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>                             struct amdgpu_fpriv *fpriv,
> +                           uint32_t priority,
>                             uint32_t *id)
>  {
>         struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>         struct amdgpu_ctx *ctx;
>         int r;
>
>         ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>         if (!ctx)
>                 return -ENOMEM;
>
>         mutex_lock(&mgr->lock);
>         r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>         if (r < 0) {
>                 mutex_unlock(&mgr->lock);
>                 kfree(ctx);
>                 return r;
>         }
> +
>         *id = (uint32_t)r;
> -       r = amdgpu_ctx_init(adev, ctx);
> +       r = amdgpu_ctx_init(adev, priority, ctx);
>         if (r) {
>                 idr_remove(&mgr->ctx_handles, *id);
>                 *id = 0;
>                 kfree(ctx);
>         }
>         mutex_unlock(&mgr->lock);
>         return r;
>  }
>
>  static void amdgpu_ctx_do_release(struct kref *ref)
> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device *adev,
>         ctx->reset_counter = reset_counter;
>
>         mutex_unlock(&mgr->lock);
>         return 0;
>  }
>
>  int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>                      struct drm_file *filp)
>  {
>         int r;
> -       uint32_t id;
> +       uint32_t id, priority;
>
>         union drm_amdgpu_ctx *args = data;
>         struct amdgpu_device *adev = dev->dev_private;
>         struct amdgpu_fpriv *fpriv = filp->driver_priv;
>
>         r = 0;
>         id = args->in.ctx_id;
> +       priority = args->in.priority;
>
Hmm we don't seem to be doing any in.flags validation - not cool.
Someone seriously wants to add that and check the remaining ioctls.
At the same time - I think you want to add a flag bit "HAS_PRIORITY"
[or similar] and honour in.priority only when that is set.

Even if the USM drivers are safe, this will break on a poor soul that
is learning how to program their GPU. "My program was running before -
I updated the kernel and it no longer does :-("

Either way, the patch is:
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>

-Emil
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found]     ` <1488320089-22035-19-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-03-01  1:13       ` Emil Velikov
@ 2017-03-01  6:52       ` zhoucm1
       [not found]         ` <58B66FB8.8050300-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 40+ messages in thread
From: zhoucm1 @ 2017-03-01  6:52 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2017年03月01日 06:14, Andres Rodriguez wrote:
> Add a new context creation parameter to express a global context priority.
>
> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
> (default) contexts.
>
> v2: Instead of using flags, repurpose __pad
> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
> v4: Validate usermode priority and store it
>
> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41 +++++++++++++++++++++++----
>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
>   include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
>   4 files changed, 44 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index e30c47e..366f6d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>   	struct amd_sched_entity	entity;
>   };
>   
>   struct amdgpu_ctx {
>   	struct kref		refcount;
>   	struct amdgpu_device    *adev;
>   	unsigned		reset_counter;
>   	spinlock_t		ring_lock;
>   	struct dma_fence	**fences;
>   	struct amdgpu_ctx_ring	rings[AMDGPU_MAX_RINGS];
> +	int			priority;
>   	bool preamble_presented;
>   };
>   
>   struct amdgpu_ctx_mgr {
>   	struct amdgpu_device	*adev;
>   	struct mutex		lock;
>   	/* protected by lock */
>   	struct idr		ctx_handles;
>   };
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 400c66b..22a15d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -18,47 +18,75 @@
>    * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>    * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>    * OTHER DEALINGS IN THE SOFTWARE.
>    *
>    * Authors: monk liu <monk.liu@amd.com>
>    */
>   
>   #include <drm/drmP.h>
>   #include "amdgpu.h"
>   
> -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
> +static enum amd_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
> +{
> +	switch (amdgpu_priority) {
> +	case AMDGPU_CTX_PRIORITY_HIGH:
> +		return AMD_SCHED_PRIORITY_HIGH;
> +	case AMDGPU_CTX_PRIORITY_NORMAL:
> +		return AMD_SCHED_PRIORITY_NORMAL;
> +	default:
> +		WARN(1, "Invalid context priority %d\n", amdgpu_priority);
> +		return AMD_SCHED_PRIORITY_NORMAL;
> +	}
> +}
> +
> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
> +				uint32_t priority,
> +				struct amdgpu_ctx *ctx)
>   {
>   	unsigned i, j;
>   	int r;
> +	enum amd_sched_priority sched_priority;
> +
> +	sched_priority = amdgpu_to_sched_priority(priority);
> +
> +	if (priority >= AMDGPU_CTX_PRIORITY_NUM)
> +		return -EINVAL;
> +
> +	if (sched_priority < 0 || sched_priority >= AMD_SCHED_MAX_PRIORITY)
> +		return -EINVAL;
> +
> +	if (sched_priority == AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_ADMIN))
> +		return -EACCES;
>   
>   	memset(ctx, 0, sizeof(*ctx));
>   	ctx->adev = adev;
> +	ctx->priority = priority;
seems not used.

Regards,
David Zhou
>   	kref_init(&ctx->refcount);
>   	spin_lock_init(&ctx->ring_lock);
>   	ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>   			      sizeof(struct dma_fence*), GFP_KERNEL);
>   	if (!ctx->fences)
>   		return -ENOMEM;
>   
>   	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>   		ctx->rings[i].sequence = 1;
>   		ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
>   	}
>   
>   	ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>   
>   	/* create context entity for each ring */
>   	for (i = 0; i < adev->num_rings; i++) {
>   		struct amdgpu_ring *ring = adev->rings[i];
>   		struct amd_sched_rq *rq;
>   
> -		rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
> +		rq = &ring->sched.sched_rq[sched_priority];
>   		r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
>   					  rq, amdgpu_sched_jobs);
>   		if (r)
>   			goto failed;
>   	}
>   
>   	return 0;
>   
>   failed:
>   	for (j = 0; j < i; j++)
> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>   	kfree(ctx->fences);
>   	ctx->fences = NULL;
>   
>   	for (i = 0; i < adev->num_rings; i++)
>   		amd_sched_entity_fini(&adev->rings[i]->sched,
>   				      &ctx->rings[i].entity);
>   }
>   
>   static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>   			    struct amdgpu_fpriv *fpriv,
> +			    uint32_t priority,
>   			    uint32_t *id)
>   {
>   	struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>   	struct amdgpu_ctx *ctx;
>   	int r;
>   
>   	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>   	if (!ctx)
>   		return -ENOMEM;
>   
>   	mutex_lock(&mgr->lock);
>   	r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>   	if (r < 0) {
>   		mutex_unlock(&mgr->lock);
>   		kfree(ctx);
>   		return r;
>   	}
> +
>   	*id = (uint32_t)r;
> -	r = amdgpu_ctx_init(adev, ctx);
> +	r = amdgpu_ctx_init(adev, priority, ctx);
>   	if (r) {
>   		idr_remove(&mgr->ctx_handles, *id);
>   		*id = 0;
>   		kfree(ctx);
>   	}
>   	mutex_unlock(&mgr->lock);
>   	return r;
>   }
>   
>   static void amdgpu_ctx_do_release(struct kref *ref)
> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device *adev,
>   	ctx->reset_counter = reset_counter;
>   
>   	mutex_unlock(&mgr->lock);
>   	return 0;
>   }
>   
>   int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>   		     struct drm_file *filp)
>   {
>   	int r;
> -	uint32_t id;
> +	uint32_t id, priority;
>   
>   	union drm_amdgpu_ctx *args = data;
>   	struct amdgpu_device *adev = dev->dev_private;
>   	struct amdgpu_fpriv *fpriv = filp->driver_priv;
>   
>   	r = 0;
>   	id = args->in.ctx_id;
> +	priority = args->in.priority;
>   
>   	switch (args->in.op) {
>   	case AMDGPU_CTX_OP_ALLOC_CTX:
> -		r = amdgpu_ctx_alloc(adev, fpriv, &id);
> +		r = amdgpu_ctx_alloc(adev, fpriv, priority, &id);
>   		args->out.alloc.ctx_id = id;
>   		break;
>   	case AMDGPU_CTX_OP_FREE_CTX:
>   		r = amdgpu_ctx_free(fpriv, id);
>   		break;
>   	case AMDGPU_CTX_OP_QUERY_STATE:
>   		r = amdgpu_ctx_query(adev, fpriv, id, &args->out);
>   		break;
>   	default:
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
> index d8dc681..2e458de 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
> @@ -101,20 +101,21 @@ static inline struct amd_sched_fence *to_amd_sched_fence(struct dma_fence *f)
>   */
>   struct amd_sched_backend_ops {
>   	struct dma_fence *(*dependency)(struct amd_sched_job *sched_job);
>   	struct dma_fence *(*run_job)(struct amd_sched_job *sched_job);
>   	void (*timedout_job)(struct amd_sched_job *sched_job);
>   	void (*free_job)(struct amd_sched_job *sched_job);
>   };
>   
>   enum amd_sched_priority {
>   	AMD_SCHED_PRIORITY_KERNEL = 0,
> +	AMD_SCHED_PRIORITY_HIGH,
>   	AMD_SCHED_PRIORITY_NORMAL,
>   	AMD_SCHED_MAX_PRIORITY
>   };
>   
>   /**
>    * One scheduler is implemented for each hardware ring
>   */
>   struct amd_gpu_scheduler {
>   	const struct amd_sched_backend_ops	*ops;
>   	uint32_t			hw_submission_limit;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index b5ae774..b756599 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -153,27 +153,32 @@ union drm_amdgpu_bo_list {
>   
>   /* GPU reset status */
>   #define AMDGPU_CTX_NO_RESET		0
>   /* this the context caused it */
>   #define AMDGPU_CTX_GUILTY_RESET		1
>   /* some other context caused it */
>   #define AMDGPU_CTX_INNOCENT_RESET	2
>   /* unknown cause */
>   #define AMDGPU_CTX_UNKNOWN_RESET	3
>   
> +/* Context priority level */
> +#define AMDGPU_CTX_PRIORITY_NORMAL	0
> +#define AMDGPU_CTX_PRIORITY_HIGH	1
> +#define AMDGPU_CTX_PRIORITY_NUM		2
> +
>   struct drm_amdgpu_ctx_in {
>   	/** AMDGPU_CTX_OP_* */
>   	__u32	op;
>   	/** For future use, no flags defined so far */
>   	__u32	flags;
>   	__u32	ctx_id;
> -	__u32	_pad;
> +	__u32	priority;
>   };
>   
>   union drm_amdgpu_ctx_out {
>   		struct {
>   			__u32	ctx_id;
>   			__u32	_pad;
>   		} alloc;
>   
>   		struct {
>   			/** For future use, no flags defined so far */

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found]         ` <58B66FB8.8050300-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-01  7:09           ` zhoucm1
       [not found]             ` <58B673C0.4070006-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: zhoucm1 @ 2017-03-01  7:09 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2017年03月01日 14:52, zhoucm1 wrote:
>
>
> On 2017年03月01日 06:14, Andres Rodriguez wrote:
>> Add a new context creation parameter to express a global context 
>> priority.
>>
>> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
>> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
>> (default) contexts.
>>
>> v2: Instead of using flags, repurpose __pad
>> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
>> v4: Validate usermode priority and store it
>>
>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41 
>> +++++++++++++++++++++++----
>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
>>   include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
>>   4 files changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index e30c47e..366f6d3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>>       struct amd_sched_entity    entity;
>>   };
>>     struct amdgpu_ctx {
>>       struct kref        refcount;
>>       struct amdgpu_device    *adev;
>>       unsigned        reset_counter;
>>       spinlock_t        ring_lock;
>>       struct dma_fence    **fences;
>>       struct amdgpu_ctx_ring    rings[AMDGPU_MAX_RINGS];
>> +    int            priority;
>>       bool preamble_presented;
>>   };
>>     struct amdgpu_ctx_mgr {
>>       struct amdgpu_device    *adev;
>>       struct mutex        lock;
>>       /* protected by lock */
>>       struct idr        ctx_handles;
>>   };
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 400c66b..22a15d6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -18,47 +18,75 @@
>>    * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>> OTHERWISE,
>>    * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>> USE OR
>>    * OTHER DEALINGS IN THE SOFTWARE.
>>    *
>>    * Authors: monk liu <monk.liu@amd.com>
>>    */
>>     #include <drm/drmP.h>
>>   #include "amdgpu.h"
>>   -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct 
>> amdgpu_ctx *ctx)
>> +static enum amd_sched_priority amdgpu_to_sched_priority(int 
>> amdgpu_priority)
>> +{
>> +    switch (amdgpu_priority) {
>> +    case AMDGPU_CTX_PRIORITY_HIGH:
>> +        return AMD_SCHED_PRIORITY_HIGH;
>> +    case AMDGPU_CTX_PRIORITY_NORMAL:
>> +        return AMD_SCHED_PRIORITY_NORMAL;
>> +    default:
>> +        WARN(1, "Invalid context priority %d\n", amdgpu_priority);
>> +        return AMD_SCHED_PRIORITY_NORMAL;
>> +    }
>> +}
>> +
>> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
>> +                uint32_t priority,
>> +                struct amdgpu_ctx *ctx)
>>   {
>>       unsigned i, j;
>>       int r;
>> +    enum amd_sched_priority sched_priority;
>> +
>> +    sched_priority = amdgpu_to_sched_priority(priority);
>> +
>> +    if (priority >= AMDGPU_CTX_PRIORITY_NUM)
>> +        return -EINVAL;
>> +
>> +    if (sched_priority < 0 || sched_priority >= AMD_SCHED_MAX_PRIORITY)
>> +        return -EINVAL;
>> +
>> +    if (sched_priority == AMD_SCHED_PRIORITY_HIGH && 
>> !capable(CAP_SYS_ADMIN))
>> +        return -EACCES;
>>         memset(ctx, 0, sizeof(*ctx));
>>       ctx->adev = adev;
>> +    ctx->priority = priority;
> seems not used.
I see ctx->priority is used in following patches, so pls remove it there.

Regards,
David Zhou
>
> Regards,
> David Zhou
>>       kref_init(&ctx->refcount);
>>       spin_lock_init(&ctx->ring_lock);
>>       ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>>                     sizeof(struct dma_fence*), GFP_KERNEL);
>>       if (!ctx->fences)
>>           return -ENOMEM;
>>         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>           ctx->rings[i].sequence = 1;
>>           ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
>>       }
>>         ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>>         /* create context entity for each ring */
>>       for (i = 0; i < adev->num_rings; i++) {
>>           struct amdgpu_ring *ring = adev->rings[i];
>>           struct amd_sched_rq *rq;
>>   -        rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
>> +        rq = &ring->sched.sched_rq[sched_priority];
>>           r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
>>                         rq, amdgpu_sched_jobs);
>>           if (r)
>>               goto failed;
>>       }
>>         return 0;
>>     failed:
>>       for (j = 0; j < i; j++)
>> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>>       kfree(ctx->fences);
>>       ctx->fences = NULL;
>>         for (i = 0; i < adev->num_rings; i++)
>> amd_sched_entity_fini(&adev->rings[i]->sched,
>>                         &ctx->rings[i].entity);
>>   }
>>     static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>>                   struct amdgpu_fpriv *fpriv,
>> +                uint32_t priority,
>>                   uint32_t *id)
>>   {
>>       struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>>       struct amdgpu_ctx *ctx;
>>       int r;
>>         ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>>       if (!ctx)
>>           return -ENOMEM;
>>         mutex_lock(&mgr->lock);
>>       r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>>       if (r < 0) {
>>           mutex_unlock(&mgr->lock);
>>           kfree(ctx);
>>           return r;
>>       }
>> +
>>       *id = (uint32_t)r;
>> -    r = amdgpu_ctx_init(adev, ctx);
>> +    r = amdgpu_ctx_init(adev, priority, ctx);
>>       if (r) {
>>           idr_remove(&mgr->ctx_handles, *id);
>>           *id = 0;
>>           kfree(ctx);
>>       }
>>       mutex_unlock(&mgr->lock);
>>       return r;
>>   }
>>     static void amdgpu_ctx_do_release(struct kref *ref)
>> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct 
>> amdgpu_device *adev,
>>       ctx->reset_counter = reset_counter;
>>         mutex_unlock(&mgr->lock);
>>       return 0;
>>   }
>>     int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>>                struct drm_file *filp)
>>   {
>>       int r;
>> -    uint32_t id;
>> +    uint32_t id, priority;
>>         union drm_amdgpu_ctx *args = data;
>>       struct amdgpu_device *adev = dev->dev_private;
>>       struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>         r = 0;
>>       id = args->in.ctx_id;
>> +    priority = args->in.priority;
>>         switch (args->in.op) {
>>       case AMDGPU_CTX_OP_ALLOC_CTX:
>> -        r = amdgpu_ctx_alloc(adev, fpriv, &id);
>> +        r = amdgpu_ctx_alloc(adev, fpriv, priority, &id);
>>           args->out.alloc.ctx_id = id;
>>           break;
>>       case AMDGPU_CTX_OP_FREE_CTX:
>>           r = amdgpu_ctx_free(fpriv, id);
>>           break;
>>       case AMDGPU_CTX_OP_QUERY_STATE:
>>           r = amdgpu_ctx_query(adev, fpriv, id, &args->out);
>>           break;
>>       default:
>>           return -EINVAL;
>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h 
>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>> index d8dc681..2e458de 100644
>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>> @@ -101,20 +101,21 @@ static inline struct amd_sched_fence 
>> *to_amd_sched_fence(struct dma_fence *f)
>>   */
>>   struct amd_sched_backend_ops {
>>       struct dma_fence *(*dependency)(struct amd_sched_job *sched_job);
>>       struct dma_fence *(*run_job)(struct amd_sched_job *sched_job);
>>       void (*timedout_job)(struct amd_sched_job *sched_job);
>>       void (*free_job)(struct amd_sched_job *sched_job);
>>   };
>>     enum amd_sched_priority {
>>       AMD_SCHED_PRIORITY_KERNEL = 0,
>> +    AMD_SCHED_PRIORITY_HIGH,
>>       AMD_SCHED_PRIORITY_NORMAL,
>>       AMD_SCHED_MAX_PRIORITY
>>   };
>>     /**
>>    * One scheduler is implemented for each hardware ring
>>   */
>>   struct amd_gpu_scheduler {
>>       const struct amd_sched_backend_ops    *ops;
>>       uint32_t            hw_submission_limit;
>> diff --git a/include/uapi/drm/amdgpu_drm.h 
>> b/include/uapi/drm/amdgpu_drm.h
>> index b5ae774..b756599 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -153,27 +153,32 @@ union drm_amdgpu_bo_list {
>>     /* GPU reset status */
>>   #define AMDGPU_CTX_NO_RESET        0
>>   /* this the context caused it */
>>   #define AMDGPU_CTX_GUILTY_RESET        1
>>   /* some other context caused it */
>>   #define AMDGPU_CTX_INNOCENT_RESET    2
>>   /* unknown cause */
>>   #define AMDGPU_CTX_UNKNOWN_RESET    3
>>   +/* Context priority level */
>> +#define AMDGPU_CTX_PRIORITY_NORMAL    0
>> +#define AMDGPU_CTX_PRIORITY_HIGH    1
>> +#define AMDGPU_CTX_PRIORITY_NUM        2
>> +
>>   struct drm_amdgpu_ctx_in {
>>       /** AMDGPU_CTX_OP_* */
>>       __u32    op;
>>       /** For future use, no flags defined so far */
>>       __u32    flags;
>>       __u32    ctx_id;
>> -    __u32    _pad;
>> +    __u32    priority;
>>   };
>>     union drm_amdgpu_ctx_out {
>>           struct {
>>               __u32    ctx_id;
>>               __u32    _pad;
>>           } alloc;
>>             struct {
>>               /** For future use, no flags defined so far */
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings
       [not found]     ` <1488320089-22035-20-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-01  7:27       ` zhoucm1
       [not found]         ` <58B677DD.4070408-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: zhoucm1 @ 2017-03-01  7:27 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2017年03月01日 06:14, Andres Rodriguez wrote:
> Add an initial framework for changing the HW priorities of rings. The
> framework allows requesting priority changes for the lifetime of an
> amdgpu_job. After the job completes the priority will decay to the next
> lowest priority for which a request is still valid.
>
> A new ring function set_priority() can now be populated to take care of
> the HW specific programming sequence for priority changes.
>
> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  4 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 10 ++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 71 ++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 11 +++++
>   5 files changed, 94 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 366f6d3..0676495 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -636,21 +636,21 @@ struct amdgpu_flip_work {
>   struct amdgpu_ib {
>   	struct amdgpu_sa_bo		*sa_bo;
>   	uint32_t			length_dw;
>   	uint64_t			gpu_addr;
>   	uint32_t			*ptr;
>   	uint32_t			flags;
>   };
>   
>   extern const struct amd_sched_backend_ops amdgpu_sched_ops;
>   
> -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int priority,
>   		     struct amdgpu_job **job, struct amdgpu_vm *vm);
>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>   			     struct amdgpu_job **job);
>   
>   void amdgpu_job_free_resources(struct amdgpu_job *job);
>   void amdgpu_job_free(struct amdgpu_job *job);
>   int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring *ring,
>   		      struct amd_sched_entity *entity, void *owner,
>   		      struct dma_fence **f);
>   
> @@ -990,20 +990,22 @@ struct amdgpu_cs_parser {
>   #define AMDGPU_VM_DOMAIN                    (1 << 3) /* bit set means in virtual memory context */
>   
>   struct amdgpu_job {
>   	struct amd_sched_job    base;
>   	struct amdgpu_device	*adev;
>   	struct amdgpu_vm	*vm;
>   	struct amdgpu_ring	*ring;
>   	struct amdgpu_sync	sync;
>   	struct amdgpu_ib	*ibs;
>   	struct dma_fence	*fence; /* the hw fence */
> +	struct dma_fence_cb	cb;
> +	int			priority;
>   	uint32_t		preamble_status;
>   	uint32_t		num_ibs;
>   	void			*owner;
>   	uint64_t		fence_ctx; /* the fence_context this job uses */
>   	bool                    vm_needs_flush;
>   	unsigned		vm_id;
>   	uint64_t		vm_pd_addr;
>   	uint32_t		gds_base, gds_size;
>   	uint32_t		gws_base, gws_size;
>   	uint32_t		oa_base, oa_size;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 605d40e..19ce202 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -179,21 +179,21 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data)
>   
>   		case AMDGPU_CHUNK_ID_DEPENDENCIES:
>   			break;
>   
>   		default:
>   			ret = -EINVAL;
>   			goto free_partial_kdata;
>   		}
>   	}
>   
> -	ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm);
> +	ret = amdgpu_job_alloc(p->adev, num_ibs, p->ctx->priority, &p->job, vm);
>   	if (ret)
>   		goto free_all_kdata;
>   
>   	if (p->uf_entry.robj)
>   		p->job->uf_addr = uf_offset;
>   	kfree(chunk_array);
>   	return 0;
>   
>   free_all_kdata:
>   	i = p->nchunks - 1;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 86a1242..45b3c90 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -32,50 +32,51 @@ static void amdgpu_job_timedout(struct amd_sched_job *s_job)
>   {
>   	struct amdgpu_job *job = container_of(s_job, struct amdgpu_job, base);
>   
>   	DRM_ERROR("ring %s timeout, last signaled seq=%u, last emitted seq=%u\n",
>   		  job->base.sched->name,
>   		  atomic_read(&job->ring->fence_drv.last_seq),
>   		  job->ring->fence_drv.sync_seq);
>   	amdgpu_gpu_reset(job->adev);
>   }
>   
> -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int priority,
>   		     struct amdgpu_job **job, struct amdgpu_vm *vm)
>   {
>   	size_t size = sizeof(struct amdgpu_job);
>   
>   	if (num_ibs == 0)
>   		return -EINVAL;
>   
>   	size += sizeof(struct amdgpu_ib) * num_ibs;
>   
>   	*job = kzalloc(size, GFP_KERNEL);
>   	if (!*job)
>   		return -ENOMEM;
>   
>   	(*job)->adev = adev;
>   	(*job)->vm = vm;
> +	(*job)->priority = priority;
>   	(*job)->ibs = (void *)&(*job)[1];
>   	(*job)->num_ibs = num_ibs;
>   
>   	amdgpu_sync_create(&(*job)->sync);
>   
>   	return 0;
>   }
>   
>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>   			     struct amdgpu_job **job)
>   {
>   	int r;
>   
> -	r = amdgpu_job_alloc(adev, 1, job, NULL);
> +	r = amdgpu_job_alloc(adev, 1, AMDGPU_CTX_PRIORITY_NORMAL, job, NULL);
>   	if (r)
>   		return r;
>   
>   	r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
>   	if (r)
>   		kfree(*job);
>   
>   	return r;
>   }
>   
> @@ -170,20 +171,25 @@ static struct dma_fence *amdgpu_job_run(struct amd_sched_job *sched_job)
>   	BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));
>   
>   	trace_amdgpu_sched_run_job(job);
>   	r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job, &fence);
>   	if (r)
>   		DRM_ERROR("Error scheduling IBs (%d)\n", r);
>   
>   	/* if gpu reset, hw fence will be replaced here */
>   	dma_fence_put(job->fence);
>   	job->fence = dma_fence_get(fence);
> +
> +	r = amdgpu_ring_elevate_priority(job->ring, job->priority, job);
> +	if (r)
> +		DRM_ERROR("Failed to set job priority (%d)\n", r);
elevate priority of ring is for this job, right?
you're setting priority by mmio method and restore priority by fence cb, 
your method isn't precise. I have some comments:
1. seems we should elevate_priority before amdgpu_ib_schedule, so that 
the ring is proper priority for this submission.
2. can we put setting of registers of SPI into ring buffer as a command 
frame? evevate_priority setting is front of a command frame and 
restore_priority is end of frame.
3. if no priority exchanges, we can skip setting for priority for 
command submission.

Regards,
David Zhou
> +
>   	amdgpu_job_free_resources(job);
>   	return fence;
>   }
>   
>   const struct amd_sched_backend_ops amdgpu_sched_ops = {
>   	.dependency = amdgpu_job_dependency,
>   	.run_job = amdgpu_job_run,
>   	.timedout_job = amdgpu_job_timedout,
>   	.free_job = amdgpu_job_free_cb
>   };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 80cb051..12bc7a9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -192,20 +192,89 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
>    * Reset the driver's copy of the wptr (all asics).
>    */
>   void amdgpu_ring_undo(struct amdgpu_ring *ring)
>   {
>   	ring->wptr = ring->wptr_old;
>   
>   	if (ring->funcs->end_use)
>   		ring->funcs->end_use(ring);
>   }
>   
> +static void amdgpu_ring_restore_priority_cb(struct dma_fence *f,
> +					    struct dma_fence_cb *cb)
> +{
> +	int i;
> +	struct amdgpu_job *cb_job =
> +		container_of(cb, struct amdgpu_job, cb);
> +	struct amdgpu_ring *ring = cb_job->ring;
> +
> +	spin_lock(&ring->priority_lock);
> +
> +	/* remove ourselves from the list if necessary */
> +	if (cb_job == ring->last_job[cb_job->priority])
> +		ring->last_job[cb_job->priority] = NULL;
> +
> +	/* something higher prio is executing, no need to decay */
> +	if (ring->priority > cb_job->priority)
> +		goto out_unlock;
> +
> +	/* decay priority to the next level with a job available */
> +	for (i = cb_job->priority; i >= 0; i--) {
> +		if (i == AMDGPU_CTX_PRIORITY_NORMAL || ring->last_job[i]) {
> +			ring->priority = i;
> +			if (ring->funcs->set_priority)
> +				ring->funcs->set_priority(ring, i);
> +
> +			break;
> +		}
> +	}
> +
> +out_unlock:
> +	spin_unlock(&ring->priority_lock);
> +}
> +
> +/**
> + * amdgpu_ring_elevate_priority - change the ring's priority
> + *
> + * @ring: amdgpu_ring structure holding the information
> + * @priority: target priority
> + * @job: priority should remain elevated for the duration of this job
> + *
> + * Use HW specific mechanism's to elevate the ring's priority while @job
> + * is executing. Once @job finishes executing, the ring will reset back
> + * to normal priority.
> + * Returns 0 on success, error otherwise
> + */
> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
> +				 struct amdgpu_job *job)
> +{
> +	if (priority < 0 || priority >= AMDGPU_CTX_PRIORITY_NUM)
> +		return -EINVAL;
> +
> +	spin_lock(&ring->priority_lock);
> +	ring->last_job[priority] = job;
> +
> +	if (priority <= ring->priority)
> +		goto out_unlock;
> +
> +	ring->priority = priority;
> +	if (ring->funcs->set_priority)
> +		ring->funcs->set_priority(ring, priority);
> +
> +	dma_fence_add_callback(job->fence, &job->cb,
> +			       amdgpu_ring_restore_priority_cb);
> +
> +out_unlock:
> +	spin_unlock(&ring->priority_lock);
> +	return 0;
> +}
> +
>   /**
>    * amdgpu_ring_init - init driver ring struct.
>    *
>    * @adev: amdgpu_device pointer
>    * @ring: amdgpu_ring structure holding ring information
>    * @max_ndw: maximum number of dw for ring alloc
>    * @nop: nop packet for this ring
>    *
>    * Initialize the driver information for the selected ring (all asics).
>    * Returns 0 on success, error on failure.
> @@ -275,20 +344,22 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
>   					    (void **)&ring->ring);
>   		if (r) {
>   			dev_err(adev->dev, "(%d) ring create failed\n", r);
>   			return r;
>   		}
>   		memset((void *)ring->ring, 0, ring->ring_size);
>   	}
>   	ring->ptr_mask = (ring->ring_size / 4) - 1;
>   	ring->max_dw = max_dw;
>   	ring->hw_ip = hw_ip;
> +	ring->priority = AMDGPU_CTX_PRIORITY_NORMAL;
> +	spin_lock_init(&ring->priority_lock);
>   	INIT_LIST_HEAD(&ring->lru_list);
>   	amdgpu_ring_lru_touch(adev, ring);
>   
>   	if (amdgpu_debugfs_ring_init(adev, ring)) {
>   		DRM_ERROR("Failed to register debugfs file for rings !\n");
>   	}
>   	return 0;
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index ecdd87c..befc29f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -17,20 +17,21 @@
>    * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>    * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>    * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>    * OTHER DEALINGS IN THE SOFTWARE.
>    *
>    * Authors: Christian König
>    */
>   #ifndef __AMDGPU_RING_H__
>   #define __AMDGPU_RING_H__
>   
> +#include <drm/amdgpu_drm.h>
>   #include "gpu_scheduler.h"
>   
>   /* max number of rings */
>   #define AMDGPU_MAX_RINGS		16
>   #define AMDGPU_MAX_GFX_RINGS		1
>   #define AMDGPU_MAX_COMPUTE_RINGS	8
>   #define AMDGPU_MAX_VCE_RINGS		3
>   
>   /* some special values for the owner field */
>   #define AMDGPU_FENCE_OWNER_UNDEFINED	((void*)0ul)
> @@ -130,20 +131,22 @@ struct amdgpu_ring_funcs {
>   	void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
>   	unsigned (*init_cond_exec)(struct amdgpu_ring *ring);
>   	void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned offset);
>   	/* note usage for clock and power gating */
>   	void (*begin_use)(struct amdgpu_ring *ring);
>   	void (*end_use)(struct amdgpu_ring *ring);
>   	void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>   	void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>   	void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>   	void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
> +	/* priority functions */
> +	void (*set_priority) (struct amdgpu_ring *ring, int priority);
>   };
>   
>   struct amdgpu_ring {
>   	struct amdgpu_device		*adev;
>   	const struct amdgpu_ring_funcs	*funcs;
>   	struct amdgpu_fence_driver	fence_drv;
>   	struct amd_gpu_scheduler	sched;
>   	struct list_head		lru_list;
>   
>   	struct amdgpu_bo	*ring_obj;
> @@ -165,31 +168,39 @@ struct amdgpu_ring {
>   	struct amdgpu_bo	*mqd_obj;
>   	u32			doorbell_index;
>   	bool			use_doorbell;
>   	unsigned		wptr_offs;
>   	unsigned		fence_offs;
>   	uint64_t		current_ctx;
>   	char			name[16];
>   	unsigned		cond_exe_offs;
>   	u64			cond_exe_gpu_addr;
>   	volatile u32		*cond_exe_cpu_addr;
> +
> +	spinlock_t		priority_lock;
> +	/* protected by priority_lock */
> +	struct amdgpu_job 	*last_job[AMDGPU_CTX_PRIORITY_NUM];
> +	int			priority;
> +
>   #if defined(CONFIG_DEBUG_FS)
>   	struct dentry *ent;
>   #endif
>   };
>   
>   int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
>   			       int hw_ip, int ring);
>   int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
>   void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
>   void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
>   void amdgpu_ring_commit(struct amdgpu_ring *ring);
>   void amdgpu_ring_undo(struct amdgpu_ring *ring);
> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
> +				 struct amdgpu_job *job);
>   int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
>   		     int hw_ip, unsigned ring_size,
>   		     struct amdgpu_irq_src *irq_src, unsigned irq_type);
>   void amdgpu_ring_fini(struct amdgpu_ring *ring);
>   int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
>   			struct amdgpu_ring **ring);
>   void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct amdgpu_ring *ring);
>   
>   #endif

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Add support for high priority scheduling in amdgpu
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (21 preceding siblings ...)
  2017-02-28 22:14   ` [PATCH 22/22] drm/amdgpu: workaround tonga HW bug in HQD " Andres Rodriguez
@ 2017-03-01 11:42   ` Christian König
       [not found]     ` <25194b1a-4756-e1ad-f597-17063a14eb4c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  2017-03-01 16:14   ` Bridgman, John
  23 siblings, 1 reply; 40+ messages in thread
From: Christian König @ 2017-03-01 11:42 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Patches #1-#14 are Acked-by: Christian König <christian.koenig@amd.com>.

Patch #15:

Not sure if that is a good idea or not, need to take a closer look after 
digging through the rest.

In general the HW IP is just for the IOCTL API and not for internal use 
inside the driver.

Patch #16:

Really nice :) I don't have time to look into it in detail, but you have 
one misconception I like to point out:
> The queue manager maintains a per-file descriptor map of user ring ids
> to amdgpu_ring pointers. Once a map is created it is permanent (this is
> required to maintain FIFO execution guarantees for a ring).
Actually we don't have a FIFO execution guarantee per ring. We only have 
that per context.

E.g. commands from different context can execute at the same time and 
out of order.

Making this per file is ok for now, but you should keep in mind that we 
might want to change that sooner or later.

Patch #17 & #18 need to take a closer look when I have more time, but 
the comments from others sounded valid to me as well.

Patch #19: Raising and lowering the priority of a ring during command 
submission doesn't sound like a good idea to me.

The way you currently have it implemented would also raise the priority 
of already running jobs on the same ring. Keep in mind that everything 
is pipelined here.

Additional to that you can't have a fence callback in the job structure, 
cause the job structure is freed by the same fence as well. So it can 
happen that you access freed up memory (but only for a very short period 
of time).

Patches #20-#22 are Acked-by: Christian König <christian.koenig@amd.com>.

Regards,
Christian.

Am 28.02.2017 um 23:14 schrieb Andres Rodriguez:
> This patch series introduces a mechanism that allows users with sufficient
> privileges to categorize their work as "high priority". A userspace app can
> create a high priority amdgpu context, where any work submitted to this context
> will receive preferential treatment over any other work.
>
> High priority contexts will be scheduled ahead of other contexts by the sw gpu
> scheduler. This functionality is generic for all HW blocks.
>
> Optionally, a ring can implement a set_priority() function that allows
> programming HW specific features to elevate a ring's priority.
>
> This patch series implements set_priority() for gfx8 compute rings. It takes
> advantage of SPI scheduling and CU reservation to provide improved frame
> latencies for high priority contexts.
>
> For compute + compute scenarios we get near perfect scheduling latency. E.g.
> one high priority ComputeParticles + one low priority ComputeParticles:
>      - High priority ComputeParticles: 2.0-2.6 ms/frame
>      - Regular ComputeParticles: 35.2-68.5 ms/frame
>
> For compute + gfx scenarios the high priority compute application does
> experience some latency variance. However, the variance has smaller bounds and
> a smalled deviation then without high priority scheduling.
>
> Following is a graph of the frame time experienced by a high priority compute
> app in 4 different scenarios to exemplify the compute + gfx latency variance:
>      - ComputeParticles: this scenario invloves running the compute particles
>        sample on its own.
>      - +SSAO: Previous scenario with the addition of running the ssao sample
>        application that clogs the GFX ring with constant work.
>      - +SPI Priority: Previous scenario with the addition of SPI priority
>        programming for compute rings.
>      - +CU Reserve: Previous scenario with the addition of dynamic CU
>        reservation for compute rings.
>
> Graph link:
> https://plot.ly/~lostgoat/9/
>
> As seen above, high priority contexts for compute allow us to schedule work
> with enhanced confidence of completion latency under high GPU loads. This
> property will be important for VR reprojection workloads.
>
> Note: The first part of this series is a resend of "Change queue/pipe split
> between amdkfd and amdgpu" with the following changes:
>      - Fixed kfdtest on Kaveri due to shift overflow. Refer to: "drm/amdkfdallow
>        split HQD on per-queue granularity v3"
>      - Used Felix's suggestions for a simplified HQD programming sequence
>      - Added a workaround for a Tonga HW bug during HQD programming
>
> This series is also available at:
> https://github.com/lostgoat/linux/tree/wip-high-priority
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found]             ` <58B673C0.4070006-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-01 11:51               ` Emil Velikov
  0 siblings, 0 replies; 40+ messages in thread
From: Emil Velikov @ 2017-03-01 11:51 UTC (permalink / raw)
  To: zhoucm1; +Cc: Andres Rodriguez, amd-gfx mailing list

Hi David,

On 1 March 2017 at 07:09, zhoucm1 <david1.zhou@amd.com> wrote:

>>> +    ctx->priority = priority;
>>
>> seems not used.
>
> I see ctx->priority is used in following patches, so pls remove it there.
>
Fwiw, I don't think that's a good idea.

Most places in the kernel are OK if you add plumbing with patch X and
use it with X+1. Moving things there will make for a big patch that is
harder/more annoying to review properly.
IMHO it's perfectly fine to add a note in the commit message -
"currently prio is propagated through the driver and the next
commit(s) will make use of it", right ?

-Emil
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 15/22] drm/amdgpu: add hw_ip member to amdgpu_ring
       [not found]     ` <1488320089-22035-16-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-01 15:33       ` Alex Deucher
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Deucher @ 2017-03-01 15:33 UTC (permalink / raw)
  To: Andres Rodriguez; +Cc: amd-gfx list

On Tue, Feb 28, 2017 at 5:14 PM, Andres Rodriguez <andresx7@gmail.com> wrote:
> Keep track of a ring's HW IP block so we can identify it later.
>

I think this patch can be dropped.  We already store the ring type in
ring->funcs->type.  We also shouldn't expose KIQ as a type to
userspace.

Alex


> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +++--
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c    | 2 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c    | 4 ++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c    | 4 ++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c    | 8 ++++----
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 2 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 2 +-
>  drivers/gpu/drm/amd/amdgpu/si_dma.c      | 2 +-
>  drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c    | 3 ++-
>  drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c    | 3 ++-
>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c    | 3 ++-
>  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c    | 2 +-
>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c    | 3 ++-
>  include/uapi/drm/amdgpu_drm.h            | 3 ++-
>  15 files changed, 28 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 7c842b7..4ff762c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -157,21 +157,21 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
>   *
>   * @adev: amdgpu_device pointer
>   * @ring: amdgpu_ring structure holding ring information
>   * @max_ndw: maximum number of dw for ring alloc
>   * @nop: nop packet for this ring
>   *
>   * Initialize the driver information for the selected ring (all asics).
>   * Returns 0 on success, error on failure.
>   */
>  int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
> -                    unsigned max_dw, struct amdgpu_irq_src *irq_src,
> +                    int hw_ip, unsigned max_dw, struct amdgpu_irq_src *irq_src,
>                      unsigned irq_type)
>  {
>         int r;
>
>         if (ring->adev == NULL) {
>                 if (adev->num_rings >= AMDGPU_MAX_RINGS)
>                         return -EINVAL;
>
>                 ring->adev = adev;
>                 ring->idx = adev->num_rings++;
> @@ -227,20 +227,21 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
>                                             &ring->gpu_addr,
>                                             (void **)&ring->ring);
>                 if (r) {
>                         dev_err(adev->dev, "(%d) ring create failed\n", r);
>                         return r;
>                 }
>                 memset((void *)ring->ring, 0, ring->ring_size);
>         }
>         ring->ptr_mask = (ring->ring_size / 4) - 1;
>         ring->max_dw = max_dw;
> +       ring->hw_ip = hw_ip;
>
>         if (amdgpu_debugfs_ring_init(adev, ring)) {
>                 DRM_ERROR("Failed to register debugfs file for rings !\n");
>         }
>         return 0;
>  }
>
>  /**
>   * amdgpu_ring_fini - tear down the driver ring struct.
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 2345b398..3ff021f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -150,20 +150,21 @@ struct amdgpu_ring {
>         unsigned                rptr_offs;
>         unsigned                wptr;
>         unsigned                wptr_old;
>         unsigned                ring_size;
>         unsigned                max_dw;
>         int                     count_dw;
>         uint64_t                gpu_addr;
>         uint32_t                ptr_mask;
>         bool                    ready;
>         u32                     idx;
> +       u32                     hw_ip;
>         u32                     me;
>         u32                     pipe;
>         u32                     queue;
>         struct amdgpu_bo        *mqd_obj;
>         u32                     doorbell_index;
>         bool                    use_doorbell;
>         unsigned                wptr_offs;
>         unsigned                fence_offs;
>         uint64_t                current_ctx;
>         char                    name[16];
> @@ -174,15 +175,15 @@ struct amdgpu_ring {
>         struct dentry *ent;
>  #endif
>  };
>
>  int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
>  void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
>  void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
>  void amdgpu_ring_commit(struct amdgpu_ring *ring);
>  void amdgpu_ring_undo(struct amdgpu_ring *ring);
>  int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
> -                    unsigned ring_size, struct amdgpu_irq_src *irq_src,
> -                    unsigned irq_type);
> +                    int hw_ip, unsigned ring_size,
> +                    struct amdgpu_irq_src *irq_src, unsigned irq_type);
>  void amdgpu_ring_fini(struct amdgpu_ring *ring);
>
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index 810bba5..64b6cb7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -935,21 +935,21 @@ static int cik_sdma_sw_init(void *handle)
>
>         /* SDMA Privileged inst */
>         r = amdgpu_irq_add_id(adev, 247, &adev->sdma.illegal_inst_irq);
>         if (r)
>                 return r;
>
>         for (i = 0; i < adev->sdma.num_instances; i++) {
>                 ring = &adev->sdma.instance[i].ring;
>                 ring->ring_obj = NULL;
>                 sprintf(ring->name, "sdma%d", i);
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
>                                      &adev->sdma.trap_irq,
>                                      (i == 0) ?
>                                      AMDGPU_SDMA_IRQ_TRAP0 :
>                                      AMDGPU_SDMA_IRQ_TRAP1);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> index 2086e7e..09ed842 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
> @@ -3261,21 +3261,21 @@ static int gfx_v6_0_sw_init(void *handle)
>         r = gfx_v6_0_rlc_init(adev);
>         if (r) {
>                 DRM_ERROR("Failed to init rlc BOs!\n");
>                 return r;
>         }
>
>         for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
>                 ring = &adev->gfx.gfx_ring[i];
>                 ring->ring_obj = NULL;
>                 sprintf(ring->name, "gfx");
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_GFX, 1024,
>                                      &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
>                 if (r)
>                         return r;
>         }
>
>         for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>                 unsigned irq_type;
>
>                 if ((i >= 32) || (i >= AMDGPU_MAX_COMPUTE_RINGS)) {
>                         DRM_ERROR("Too many (%d) compute rings!\n", i);
> @@ -3283,21 +3283,21 @@ static int gfx_v6_0_sw_init(void *handle)
>                 }
>                 ring = &adev->gfx.compute_ring[i];
>                 ring->ring_obj = NULL;
>                 ring->use_doorbell = false;
>                 ring->doorbell_index = 0;
>                 ring->me = 1;
>                 ring->pipe = i;
>                 ring->queue = i;
>                 sprintf(ring->name, "comp %d.%d.%d", ring->me, ring->pipe, ring->queue);
>                 irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP + ring->pipe;
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_COMPUTE, 1024,
>                                      &adev->gfx.eop_irq, irq_type);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
>
>  static int gfx_v6_0_sw_fini(void *handle)
>  {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index b0b0c89..c76dcc8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> @@ -4742,21 +4742,21 @@ static int gfx_v7_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
>         ring->ring_obj = NULL;
>         ring->use_doorbell = true;
>         ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
>         sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
>
>         irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
>                 + ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
>                 + ring->pipe;
>
>         /* type-2 packets are deprecated on MEC, use type-3 instead */
> -       r = amdgpu_ring_init(adev, ring, 1024,
> +       r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_COMPUTE, 1024,
>                         &adev->gfx.eop_irq, irq_type);
>         if (r)
>                 return r;
>
>
>         return 0;
>  }
>
>  static int gfx_v7_0_sw_init(void *handle)
>  {
> @@ -4797,21 +4797,21 @@ static int gfx_v7_0_sw_init(void *handle)
>         r = gfx_v7_0_mec_init(adev);
>         if (r) {
>                 DRM_ERROR("Failed to init MEC BOs!\n");
>                 return r;
>         }
>
>         for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
>                 ring = &adev->gfx.gfx_ring[i];
>                 ring->ring_obj = NULL;
>                 sprintf(ring->name, "gfx");
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_GFX, 1024,
>                                      &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
>                 if (r)
>                         return r;
>         }
>
>         /* set up the compute queues - allocate horizontally across pipes */
>         ring_id = 0;
>         for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; ++i) {
>                 for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
>                         for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 5db5bac..a778d58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -1392,21 +1392,21 @@ static int gfx_v8_0_kiq_init_ring(struct amdgpu_device *adev,
>                 ring->me = 2;
>                 ring->pipe = 0;
>         } else {
>                 ring->me = 1;
>                 ring->pipe = 1;
>         }
>
>         irq->data = ring;
>         ring->queue = 0;
>         sprintf(ring->name, "kiq %d.%d.%d", ring->me, ring->pipe, ring->queue);
> -       r = amdgpu_ring_init(adev, ring, 1024,
> +       r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_KIQ, 1024,
>                              irq, AMDGPU_CP_KIQ_IRQ_DRIVER0);
>         if (r)
>                 dev_warn(adev->dev, "(%d) failed to init kiq ring\n", r);
>
>         return r;
>  }
>
>  static void gfx_v8_0_kiq_free_ring(struct amdgpu_ring *ring,
>                                    struct amdgpu_irq_src *irq)
>  {
> @@ -2139,21 +2139,21 @@ static int gfx_v8_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
>         ring->ring_obj = NULL;
>         ring->use_doorbell = true;
>         ring->doorbell_index = AMDGPU_DOORBELL_MEC_RING0 + ring_id;
>         sprintf(ring->name, "comp_%d.%d.%d", ring->me, ring->pipe, ring->queue);
>
>         irq_type = AMDGPU_CP_IRQ_COMPUTE_MEC1_PIPE0_EOP
>                 + ((ring->me - 1) * adev->gfx.mec.num_pipe_per_mec)
>                 + ring->pipe;
>
>         /* type-2 packets are deprecated on MEC, use type-3 instead */
> -       r = amdgpu_ring_init(adev, ring, 1024,
> +       r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_COMPUTE, 1024,
>                         &adev->gfx.eop_irq, irq_type);
>         if (r)
>                 return r;
>
>
>         return 0;
>  }
>
>  static int gfx_v8_0_sw_init(void *handle)
>  {
> @@ -2219,22 +2219,22 @@ static int gfx_v8_0_sw_init(void *handle)
>         for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
>                 ring = &adev->gfx.gfx_ring[i];
>                 ring->ring_obj = NULL;
>                 sprintf(ring->name, "gfx");
>                 /* no gfx doorbells on iceland */
>                 if (adev->asic_type != CHIP_TOPAZ) {
>                         ring->use_doorbell = true;
>                         ring->doorbell_index = AMDGPU_DOORBELL_GFX_RING0;
>                 }
>
> -               r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
> -                                    AMDGPU_CP_IRQ_GFX_EOP);
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_GFX, 1024,
> +                                    &adev->gfx.eop_irq, AMDGPU_CP_IRQ_GFX_EOP);
>                 if (r)
>                         return r;
>         }
>
>         /* set up the compute queues - allocate horizontally across pipes */
>         ring_id = 0;
>         for (i = 0; i < adev->gfx.mec.num_pipe_per_mec; ++i) {
>                 for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
>                         for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index 896be64..62c3461 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -941,21 +941,21 @@ static int sdma_v2_4_sw_init(void *handle)
>         if (r) {
>                 DRM_ERROR("Failed to load sdma firmware!\n");
>                 return r;
>         }
>
>         for (i = 0; i < adev->sdma.num_instances; i++) {
>                 ring = &adev->sdma.instance[i].ring;
>                 ring->ring_obj = NULL;
>                 ring->use_doorbell = false;
>                 sprintf(ring->name, "sdma%d", i);
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
>                                      &adev->sdma.trap_irq,
>                                      (i == 0) ?
>                                      AMDGPU_SDMA_IRQ_TRAP0 :
>                                      AMDGPU_SDMA_IRQ_TRAP1);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 31375bd..7467a1e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -1159,21 +1159,21 @@ static int sdma_v3_0_sw_init(void *handle)
>         }
>
>         for (i = 0; i < adev->sdma.num_instances; i++) {
>                 ring = &adev->sdma.instance[i].ring;
>                 ring->ring_obj = NULL;
>                 ring->use_doorbell = true;
>                 ring->doorbell_index = (i == 0) ?
>                         AMDGPU_DOORBELL_sDMA_ENGINE0 : AMDGPU_DOORBELL_sDMA_ENGINE1;
>
>                 sprintf(ring->name, "sdma%d", i);
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
>                                      &adev->sdma.trap_irq,
>                                      (i == 0) ?
>                                      AMDGPU_SDMA_IRQ_TRAP0 :
>                                      AMDGPU_SDMA_IRQ_TRAP1);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> index 3372a07..64d22d2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/si_dma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c
> @@ -523,21 +523,21 @@ static int si_dma_sw_init(void *handle)
>         /* DMA1 trap event */
>         r = amdgpu_irq_add_id(adev, 244, &adev->sdma.trap_irq_1);
>         if (r)
>                 return r;
>
>         for (i = 0; i < adev->sdma.num_instances; i++) {
>                 ring = &adev->sdma.instance[i].ring;
>                 ring->ring_obj = NULL;
>                 ring->use_doorbell = false;
>                 sprintf(ring->name, "sdma%d", i);
> -               r = amdgpu_ring_init(adev, ring, 1024,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_DMA, 1024,
>                                      &adev->sdma.trap_irq,
>                                      (i == 0) ?
>                                      AMDGPU_SDMA_IRQ_TRAP0 :
>                                      AMDGPU_SDMA_IRQ_TRAP1);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> index b34cefc..9df30ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
> @@ -114,21 +114,22 @@ static int uvd_v4_2_sw_init(void *handle)
>         r = amdgpu_uvd_sw_init(adev);
>         if (r)
>                 return r;
>
>         r = amdgpu_uvd_resume(adev);
>         if (r)
>                 return r;
>
>         ring = &adev->uvd.ring;
>         sprintf(ring->name, "uvd");
> -       r = amdgpu_ring_init(adev, ring, 512, &adev->uvd.irq, 0);
> +       r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_UVD, 512,
> +                            &adev->uvd.irq, 0);
>
>         return r;
>  }
>
>  static int uvd_v4_2_sw_fini(void *handle)
>  {
>         int r;
>         struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>         r = amdgpu_uvd_suspend(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> index ad8c02e..9b4017f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
> @@ -110,21 +110,22 @@ static int uvd_v5_0_sw_init(void *handle)
>         r = amdgpu_uvd_sw_init(adev);
>         if (r)
>                 return r;
>
>         r = amdgpu_uvd_resume(adev);
>         if (r)
>                 return r;
>
>         ring = &adev->uvd.ring;
>         sprintf(ring->name, "uvd");
> -       r = amdgpu_ring_init(adev, ring, 512, &adev->uvd.irq, 0);
> +       r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_UVD, 512,
> +                            &adev->uvd.irq, 0);
>
>         return r;
>  }
>
>  static int uvd_v5_0_sw_fini(void *handle)
>  {
>         int r;
>         struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>         r = amdgpu_uvd_suspend(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index 18a6de4..de9cce1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -113,21 +113,22 @@ static int uvd_v6_0_sw_init(void *handle)
>         r = amdgpu_uvd_sw_init(adev);
>         if (r)
>                 return r;
>
>         r = amdgpu_uvd_resume(adev);
>         if (r)
>                 return r;
>
>         ring = &adev->uvd.ring;
>         sprintf(ring->name, "uvd");
> -       r = amdgpu_ring_init(adev, ring, 512, &adev->uvd.irq, 0);
> +       r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_UVD, 512,
> +                            &adev->uvd.irq, 0);
>
>         return r;
>  }
>
>  static int uvd_v6_0_sw_fini(void *handle)
>  {
>         int r;
>         struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>         r = amdgpu_uvd_suspend(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> index 9ea9934..38cd52b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
> @@ -438,21 +438,21 @@ static int vce_v2_0_sw_init(void *handle)
>         if (r)
>                 return r;
>
>         r = amdgpu_vce_resume(adev);
>         if (r)
>                 return r;
>
>         for (i = 0; i < adev->vce.num_rings; i++) {
>                 ring = &adev->vce.ring[i];
>                 sprintf(ring->name, "vce%d", i);
> -               r = amdgpu_ring_init(adev, ring, 512,
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_VCE, 512,
>                                      &adev->vce.irq, 0);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
>
>  static int vce_v2_0_sw_fini(void *handle)
>  {
> diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> index 93ec881..09d04e1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
> @@ -396,21 +396,22 @@ static int vce_v3_0_sw_init(void *handle)
>         if (adev->vce.fw_version < FW_52_8_3)
>                 adev->vce.num_rings = 2;
>
>         r = amdgpu_vce_resume(adev);
>         if (r)
>                 return r;
>
>         for (i = 0; i < adev->vce.num_rings; i++) {
>                 ring = &adev->vce.ring[i];
>                 sprintf(ring->name, "vce%d", i);
> -               r = amdgpu_ring_init(adev, ring, 512, &adev->vce.irq, 0);
> +               r = amdgpu_ring_init(adev, ring, AMDGPU_HW_IP_VCE, 512,
> +                                    &adev->vce.irq, 0);
>                 if (r)
>                         return r;
>         }
>
>         return r;
>  }
>
>  static int vce_v3_0_sw_fini(void *handle)
>  {
>         int r;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 5797283..b5ae774 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -376,21 +376,22 @@ struct drm_amdgpu_gem_va {
>         __u64 offset_in_bo;
>         /** Specify mapping size. Must be correctly aligned. */
>         __u64 map_size;
>  };
>
>  #define AMDGPU_HW_IP_GFX          0
>  #define AMDGPU_HW_IP_COMPUTE      1
>  #define AMDGPU_HW_IP_DMA          2
>  #define AMDGPU_HW_IP_UVD          3
>  #define AMDGPU_HW_IP_VCE          4
> -#define AMDGPU_HW_IP_NUM          5
> +#define AMDGPU_HW_IP_KIQ          5
> +#define AMDGPU_HW_IP_NUM          6

There's no reason to expose KIQ to userspace.

>
>  #define AMDGPU_HW_IP_INSTANCE_MAX_COUNT 1
>
>  #define AMDGPU_CHUNK_ID_IB             0x01
>  #define AMDGPU_CHUNK_ID_FENCE          0x02
>  #define AMDGPU_CHUNK_ID_DEPENDENCIES   0x03
>
>  struct drm_amdgpu_cs_chunk {
>         __u32           chunk_id;
>         __u32           length_dw;
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings
       [not found]         ` <58B677DD.4070408-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-01 15:49           ` Alex Deucher
       [not found]             ` <CADnq5_NhLAOsR7tHhRZRzA12j_-5MWFEXfWeGqKmSifHp_5jKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Deucher @ 2017-03-01 15:49 UTC (permalink / raw)
  To: zhoucm1; +Cc: Andres Rodriguez, amd-gfx list

On Wed, Mar 1, 2017 at 2:27 AM, zhoucm1 <david1.zhou@amd.com> wrote:
>
>
> On 2017年03月01日 06:14, Andres Rodriguez wrote:
>>
>> Add an initial framework for changing the HW priorities of rings. The
>> framework allows requesting priority changes for the lifetime of an
>> amdgpu_job. After the job completes the priority will decay to the next
>> lowest priority for which a request is still valid.
>>
>> A new ring function set_priority() can now be populated to take care of
>> the HW specific programming sequence for priority changes.
>>
>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 10 ++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 71
>> ++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 11 +++++
>>   5 files changed, 94 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 366f6d3..0676495 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -636,21 +636,21 @@ struct amdgpu_flip_work {
>>   struct amdgpu_ib {
>>         struct amdgpu_sa_bo             *sa_bo;
>>         uint32_t                        length_dw;
>>         uint64_t                        gpu_addr;
>>         uint32_t                        *ptr;
>>         uint32_t                        flags;
>>   };
>>     extern const struct amd_sched_backend_ops amdgpu_sched_ops;
>>   -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int
>> priority,
>>                      struct amdgpu_job **job, struct amdgpu_vm *vm);
>>   int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>>                              struct amdgpu_job **job);
>>     void amdgpu_job_free_resources(struct amdgpu_job *job);
>>   void amdgpu_job_free(struct amdgpu_job *job);
>>   int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring *ring,
>>                       struct amd_sched_entity *entity, void *owner,
>>                       struct dma_fence **f);
>>   @@ -990,20 +990,22 @@ struct amdgpu_cs_parser {
>>   #define AMDGPU_VM_DOMAIN                    (1 << 3) /* bit set means in
>> virtual memory context */
>>     struct amdgpu_job {
>>         struct amd_sched_job    base;
>>         struct amdgpu_device    *adev;
>>         struct amdgpu_vm        *vm;
>>         struct amdgpu_ring      *ring;
>>         struct amdgpu_sync      sync;
>>         struct amdgpu_ib        *ibs;
>>         struct dma_fence        *fence; /* the hw fence */
>> +       struct dma_fence_cb     cb;
>> +       int                     priority;
>>         uint32_t                preamble_status;
>>         uint32_t                num_ibs;
>>         void                    *owner;
>>         uint64_t                fence_ctx; /* the fence_context this job
>> uses */
>>         bool                    vm_needs_flush;
>>         unsigned                vm_id;
>>         uint64_t                vm_pd_addr;
>>         uint32_t                gds_base, gds_size;
>>         uint32_t                gws_base, gws_size;
>>         uint32_t                oa_base, oa_size;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 605d40e..19ce202 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -179,21 +179,21 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser
>> *p, void *data)
>>                 case AMDGPU_CHUNK_ID_DEPENDENCIES:
>>                         break;
>>                 default:
>>                         ret = -EINVAL;
>>                         goto free_partial_kdata;
>>                 }
>>         }
>>   -     ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm);
>> +       ret = amdgpu_job_alloc(p->adev, num_ibs, p->ctx->priority,
>> &p->job, vm);
>>         if (ret)
>>                 goto free_all_kdata;
>>         if (p->uf_entry.robj)
>>                 p->job->uf_addr = uf_offset;
>>         kfree(chunk_array);
>>         return 0;
>>     free_all_kdata:
>>         i = p->nchunks - 1;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 86a1242..45b3c90 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -32,50 +32,51 @@ static void amdgpu_job_timedout(struct amd_sched_job
>> *s_job)
>>   {
>>         struct amdgpu_job *job = container_of(s_job, struct amdgpu_job,
>> base);
>>         DRM_ERROR("ring %s timeout, last signaled seq=%u, last emitted
>> seq=%u\n",
>>                   job->base.sched->name,
>>                   atomic_read(&job->ring->fence_drv.last_seq),
>>                   job->ring->fence_drv.sync_seq);
>>         amdgpu_gpu_reset(job->adev);
>>   }
>>   -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int
>> priority,
>>                      struct amdgpu_job **job, struct amdgpu_vm *vm)
>>   {
>>         size_t size = sizeof(struct amdgpu_job);
>>         if (num_ibs == 0)
>>                 return -EINVAL;
>>         size += sizeof(struct amdgpu_ib) * num_ibs;
>>         *job = kzalloc(size, GFP_KERNEL);
>>         if (!*job)
>>                 return -ENOMEM;
>>         (*job)->adev = adev;
>>         (*job)->vm = vm;
>> +       (*job)->priority = priority;
>>         (*job)->ibs = (void *)&(*job)[1];
>>         (*job)->num_ibs = num_ibs;
>>         amdgpu_sync_create(&(*job)->sync);
>>         return 0;
>>   }
>>     int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned
>> size,
>>                              struct amdgpu_job **job)
>>   {
>>         int r;
>>   -     r = amdgpu_job_alloc(adev, 1, job, NULL);
>> +       r = amdgpu_job_alloc(adev, 1, AMDGPU_CTX_PRIORITY_NORMAL, job,
>> NULL);
>>         if (r)
>>                 return r;
>>         r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
>>         if (r)
>>                 kfree(*job);
>>         return r;
>>   }
>>   @@ -170,20 +171,25 @@ static struct dma_fence *amdgpu_job_run(struct
>> amd_sched_job *sched_job)
>>         BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));
>>         trace_amdgpu_sched_run_job(job);
>>         r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job,
>> &fence);
>>         if (r)
>>                 DRM_ERROR("Error scheduling IBs (%d)\n", r);
>>         /* if gpu reset, hw fence will be replaced here */
>>         dma_fence_put(job->fence);
>>         job->fence = dma_fence_get(fence);
>> +
>> +       r = amdgpu_ring_elevate_priority(job->ring, job->priority, job);
>> +       if (r)
>> +               DRM_ERROR("Failed to set job priority (%d)\n", r);
>
> elevate priority of ring is for this job, right?
> you're setting priority by mmio method and restore priority by fence cb,
> your method isn't precise. I have some comments:
> 1. seems we should elevate_priority before amdgpu_ib_schedule, so that the
> ring is proper priority for this submission.
> 2. can we put setting of registers of SPI into ring buffer as a command
> frame? evevate_priority setting is front of a command frame and
> restore_priority is end of frame.

We should probably check if changing the queue priority requires that
the engine be idle as well.  I vaguely recall there being a packet for
some of this that was added for HSA.

Alex

> 3. if no priority exchanges, we can skip setting for priority for command
> submission.
>
> Regards,
> David Zhou
>
>> +
>>         amdgpu_job_free_resources(job);
>>         return fence;
>>   }
>>     const struct amd_sched_backend_ops amdgpu_sched_ops = {
>>         .dependency = amdgpu_job_dependency,
>>         .run_job = amdgpu_job_run,
>>         .timedout_job = amdgpu_job_timedout,
>>         .free_job = amdgpu_job_free_cb
>>   };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> index 80cb051..12bc7a9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> @@ -192,20 +192,89 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
>>    * Reset the driver's copy of the wptr (all asics).
>>    */
>>   void amdgpu_ring_undo(struct amdgpu_ring *ring)
>>   {
>>         ring->wptr = ring->wptr_old;
>>         if (ring->funcs->end_use)
>>                 ring->funcs->end_use(ring);
>>   }
>>   +static void amdgpu_ring_restore_priority_cb(struct dma_fence *f,
>> +                                           struct dma_fence_cb *cb)
>> +{
>> +       int i;
>> +       struct amdgpu_job *cb_job =
>> +               container_of(cb, struct amdgpu_job, cb);
>> +       struct amdgpu_ring *ring = cb_job->ring;
>> +
>> +       spin_lock(&ring->priority_lock);
>> +
>> +       /* remove ourselves from the list if necessary */
>> +       if (cb_job == ring->last_job[cb_job->priority])
>> +               ring->last_job[cb_job->priority] = NULL;
>> +
>> +       /* something higher prio is executing, no need to decay */
>> +       if (ring->priority > cb_job->priority)
>> +               goto out_unlock;
>> +
>> +       /* decay priority to the next level with a job available */
>> +       for (i = cb_job->priority; i >= 0; i--) {
>> +               if (i == AMDGPU_CTX_PRIORITY_NORMAL || ring->last_job[i])
>> {
>> +                       ring->priority = i;
>> +                       if (ring->funcs->set_priority)
>> +                               ring->funcs->set_priority(ring, i);
>> +
>> +                       break;
>> +               }
>> +       }
>> +
>> +out_unlock:
>> +       spin_unlock(&ring->priority_lock);
>> +}
>> +
>> +/**
>> + * amdgpu_ring_elevate_priority - change the ring's priority
>> + *
>> + * @ring: amdgpu_ring structure holding the information
>> + * @priority: target priority
>> + * @job: priority should remain elevated for the duration of this job
>> + *
>> + * Use HW specific mechanism's to elevate the ring's priority while @job
>> + * is executing. Once @job finishes executing, the ring will reset back
>> + * to normal priority.
>> + * Returns 0 on success, error otherwise
>> + */
>> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
>> +                                struct amdgpu_job *job)
>> +{
>> +       if (priority < 0 || priority >= AMDGPU_CTX_PRIORITY_NUM)
>> +               return -EINVAL;
>> +
>> +       spin_lock(&ring->priority_lock);
>> +       ring->last_job[priority] = job;
>> +
>> +       if (priority <= ring->priority)
>> +               goto out_unlock;
>> +
>> +       ring->priority = priority;
>> +       if (ring->funcs->set_priority)
>> +               ring->funcs->set_priority(ring, priority);
>> +
>> +       dma_fence_add_callback(job->fence, &job->cb,
>> +                              amdgpu_ring_restore_priority_cb);
>> +
>> +out_unlock:
>> +       spin_unlock(&ring->priority_lock);
>> +       return 0;
>> +}
>> +
>>   /**
>>    * amdgpu_ring_init - init driver ring struct.
>>    *
>>    * @adev: amdgpu_device pointer
>>    * @ring: amdgpu_ring structure holding ring information
>>    * @max_ndw: maximum number of dw for ring alloc
>>    * @nop: nop packet for this ring
>>    *
>>    * Initialize the driver information for the selected ring (all asics).
>>    * Returns 0 on success, error on failure.
>> @@ -275,20 +344,22 @@ int amdgpu_ring_init(struct amdgpu_device *adev,
>> struct amdgpu_ring *ring,
>>                                             (void **)&ring->ring);
>>                 if (r) {
>>                         dev_err(adev->dev, "(%d) ring create failed\n",
>> r);
>>                         return r;
>>                 }
>>                 memset((void *)ring->ring, 0, ring->ring_size);
>>         }
>>         ring->ptr_mask = (ring->ring_size / 4) - 1;
>>         ring->max_dw = max_dw;
>>         ring->hw_ip = hw_ip;
>> +       ring->priority = AMDGPU_CTX_PRIORITY_NORMAL;
>> +       spin_lock_init(&ring->priority_lock);
>>         INIT_LIST_HEAD(&ring->lru_list);
>>         amdgpu_ring_lru_touch(adev, ring);
>>         if (amdgpu_debugfs_ring_init(adev, ring)) {
>>                 DRM_ERROR("Failed to register debugfs file for rings
>> !\n");
>>         }
>>         return 0;
>>   }
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index ecdd87c..befc29f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -17,20 +17,21 @@
>>    * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR
>>    * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>    * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>    * OTHER DEALINGS IN THE SOFTWARE.
>>    *
>>    * Authors: Christian König
>>    */
>>   #ifndef __AMDGPU_RING_H__
>>   #define __AMDGPU_RING_H__
>>   +#include <drm/amdgpu_drm.h>
>>   #include "gpu_scheduler.h"
>>     /* max number of rings */
>>   #define AMDGPU_MAX_RINGS              16
>>   #define AMDGPU_MAX_GFX_RINGS          1
>>   #define AMDGPU_MAX_COMPUTE_RINGS      8
>>   #define AMDGPU_MAX_VCE_RINGS          3
>>     /* some special values for the owner field */
>>   #define AMDGPU_FENCE_OWNER_UNDEFINED  ((void*)0ul)
>> @@ -130,20 +131,22 @@ struct amdgpu_ring_funcs {
>>         void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
>>         unsigned (*init_cond_exec)(struct amdgpu_ring *ring);
>>         void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned
>> offset);
>>         /* note usage for clock and power gating */
>>         void (*begin_use)(struct amdgpu_ring *ring);
>>         void (*end_use)(struct amdgpu_ring *ring);
>>         void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>>         void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>>         void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>>         void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t
>> val);
>> +       /* priority functions */
>> +       void (*set_priority) (struct amdgpu_ring *ring, int priority);
>>   };
>>     struct amdgpu_ring {
>>         struct amdgpu_device            *adev;
>>         const struct amdgpu_ring_funcs  *funcs;
>>         struct amdgpu_fence_driver      fence_drv;
>>         struct amd_gpu_scheduler        sched;
>>         struct list_head                lru_list;
>>         struct amdgpu_bo        *ring_obj;
>> @@ -165,31 +168,39 @@ struct amdgpu_ring {
>>         struct amdgpu_bo        *mqd_obj;
>>         u32                     doorbell_index;
>>         bool                    use_doorbell;
>>         unsigned                wptr_offs;
>>         unsigned                fence_offs;
>>         uint64_t                current_ctx;
>>         char                    name[16];
>>         unsigned                cond_exe_offs;
>>         u64                     cond_exe_gpu_addr;
>>         volatile u32            *cond_exe_cpu_addr;
>> +
>> +       spinlock_t              priority_lock;
>> +       /* protected by priority_lock */
>> +       struct amdgpu_job       *last_job[AMDGPU_CTX_PRIORITY_NUM];
>> +       int                     priority;
>> +
>>   #if defined(CONFIG_DEBUG_FS)
>>         struct dentry *ent;
>>   #endif
>>   };
>>     int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
>>                                int hw_ip, int ring);
>>   int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
>>   void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
>>   void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct
>> amdgpu_ib *ib);
>>   void amdgpu_ring_commit(struct amdgpu_ring *ring);
>>   void amdgpu_ring_undo(struct amdgpu_ring *ring);
>> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
>> +                                struct amdgpu_job *job);
>>   int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring
>> *ring,
>>                      int hw_ip, unsigned ring_size,
>>                      struct amdgpu_irq_src *irq_src, unsigned irq_type);
>>   void amdgpu_ring_fini(struct amdgpu_ring *ring);
>>   int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
>>                         struct amdgpu_ring **ring);
>>   void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct
>> amdgpu_ring *ring);
>>     #endif
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Add support for high priority scheduling in amdgpu
       [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
                     ` (22 preceding siblings ...)
  2017-03-01 11:42   ` Add support for high priority scheduling in amdgpu Christian König
@ 2017-03-01 16:14   ` Bridgman, John
       [not found]     ` <BN6PR12MB1348B8F1F537321557D522AFE8290-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  23 siblings, 1 reply; 40+ messages in thread
From: Bridgman, John @ 2017-03-01 16:14 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

In patch "drm/amdgpu: implement ring set_priority for gfx_v8 compute" can you remind me why you are only passing pipe and not queue to vi_srbm_select() ?

+static void gfx_v8_0_ring_set_priority_compute(struct amdgpu_ring *ring,  
+					       int priority)  
+{  
+	struct amdgpu_device *adev = ring->adev;  
+  
+	if (ring->hw_ip != AMDGPU_HW_IP_COMPUTE)  
+		return;  
+  
+	mutex_lock(&adev->srbm_mutex);  
+	vi_srbm_select(adev, ring->me, ring->pipe, 0, 0);  


>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of
>Andres Rodriguez
>Sent: Tuesday, February 28, 2017 5:14 PM
>To: amd-gfx@lists.freedesktop.org
>Subject: Add support for high priority scheduling in amdgpu
>
>This patch series introduces a mechanism that allows users with sufficient
>privileges to categorize their work as "high priority". A userspace app can
>create a high priority amdgpu context, where any work submitted to this
>context will receive preferential treatment over any other work.
>
>High priority contexts will be scheduled ahead of other contexts by the sw gpu
>scheduler. This functionality is generic for all HW blocks.
>
>Optionally, a ring can implement a set_priority() function that allows
>programming HW specific features to elevate a ring's priority.
>
>This patch series implements set_priority() for gfx8 compute rings. It takes
>advantage of SPI scheduling and CU reservation to provide improved frame
>latencies for high priority contexts.
>
>For compute + compute scenarios we get near perfect scheduling latency. E.g.
>one high priority ComputeParticles + one low priority ComputeParticles:
>    - High priority ComputeParticles: 2.0-2.6 ms/frame
>    - Regular ComputeParticles: 35.2-68.5 ms/frame
>
>For compute + gfx scenarios the high priority compute application does
>experience some latency variance. However, the variance has smaller bounds
>and a smalled deviation then without high priority scheduling.
>
>Following is a graph of the frame time experienced by a high priority compute
>app in 4 different scenarios to exemplify the compute + gfx latency variance:
>    - ComputeParticles: this scenario invloves running the compute particles
>      sample on its own.
>    - +SSAO: Previous scenario with the addition of running the ssao sample
>      application that clogs the GFX ring with constant work.
>    - +SPI Priority: Previous scenario with the addition of SPI priority
>      programming for compute rings.
>    - +CU Reserve: Previous scenario with the addition of dynamic CU
>      reservation for compute rings.
>
>Graph link:
>https://plot.ly/~lostgoat/9/
>
>As seen above, high priority contexts for compute allow us to schedule work
>with enhanced confidence of completion latency under high GPU loads. This
>property will be important for VR reprojection workloads.
>
>Note: The first part of this series is a resend of "Change queue/pipe split
>between amdkfd and amdgpu" with the following changes:
>    - Fixed kfdtest on Kaveri due to shift overflow. Refer to: "drm/amdkfdallow
>      split HQD on per-queue granularity v3"
>    - Used Felix's suggestions for a simplified HQD programming sequence
>    - Added a workaround for a Tonga HW bug during HQD programming
>
>This series is also available at:
>https://github.com/lostgoat/linux/tree/wip-high-priority
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Add support for high priority scheduling in amdgpu
       [not found]     ` <BN6PR12MB1348B8F1F537321557D522AFE8290-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-03-01 16:37       ` Andres Rodriguez
  0 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-03-01 16:37 UTC (permalink / raw)
  To: Bridgman, John, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 3/1/2017 11:14 AM, Bridgman, John wrote:
> In patch "drm/amdgpu: implement ring set_priority for gfx_v8 compute" can you remind me why you are only passing pipe and not queue to vi_srbm_select() ?
>
> +static void gfx_v8_0_ring_set_priority_compute(struct amdgpu_ring *ring,
> +					       int priority)
> +{
> +	struct amdgpu_device *adev = ring->adev;
> +
> +	if (ring->hw_ip != AMDGPU_HW_IP_COMPUTE)
> +		return;
> +
> +	mutex_lock(&adev->srbm_mutex);
> +	vi_srbm_select(adev, ring->me, ring->pipe, 0, 0);
That's a dumb mistake on my part. Probably got lucky that I was hitting 
queue 0 and also rebooting between tests.

Regards,
Andres

>
>
>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of
>> Andres Rodriguez
>> Sent: Tuesday, February 28, 2017 5:14 PM
>> To: amd-gfx@lists.freedesktop.org
>> Subject: Add support for high priority scheduling in amdgpu
>>
>> This patch series introduces a mechanism that allows users with sufficient
>> privileges to categorize their work as "high priority". A userspace app can
>> create a high priority amdgpu context, where any work submitted to this
>> context will receive preferential treatment over any other work.
>>
>> High priority contexts will be scheduled ahead of other contexts by the sw gpu
>> scheduler. This functionality is generic for all HW blocks.
>>
>> Optionally, a ring can implement a set_priority() function that allows
>> programming HW specific features to elevate a ring's priority.
>>
>> This patch series implements set_priority() for gfx8 compute rings. It takes
>> advantage of SPI scheduling and CU reservation to provide improved frame
>> latencies for high priority contexts.
>>
>> For compute + compute scenarios we get near perfect scheduling latency. E.g.
>> one high priority ComputeParticles + one low priority ComputeParticles:
>>     - High priority ComputeParticles: 2.0-2.6 ms/frame
>>     - Regular ComputeParticles: 35.2-68.5 ms/frame
>>
>> For compute + gfx scenarios the high priority compute application does
>> experience some latency variance. However, the variance has smaller bounds
>> and a smalled deviation then without high priority scheduling.
>>
>> Following is a graph of the frame time experienced by a high priority compute
>> app in 4 different scenarios to exemplify the compute + gfx latency variance:
>>     - ComputeParticles: this scenario invloves running the compute particles
>>       sample on its own.
>>     - +SSAO: Previous scenario with the addition of running the ssao sample
>>       application that clogs the GFX ring with constant work.
>>     - +SPI Priority: Previous scenario with the addition of SPI priority
>>       programming for compute rings.
>>     - +CU Reserve: Previous scenario with the addition of dynamic CU
>>       reservation for compute rings.
>>
>> Graph link:
>> https://plot.ly/~lostgoat/9/
>>
>> As seen above, high priority contexts for compute allow us to schedule work
>> with enhanced confidence of completion latency under high GPU loads. This
>> property will be important for VR reprojection workloads.
>>
>> Note: The first part of this series is a resend of "Change queue/pipe split
>> between amdkfd and amdgpu" with the following changes:
>>     - Fixed kfdtest on Kaveri due to shift overflow. Refer to: "drm/amdkfdallow
>>       split HQD on per-queue granularity v3"
>>     - Used Felix's suggestions for a simplified HQD programming sequence
>>     - Added a workaround for a Tonga HW bug during HQD programming
>>
>> This series is also available at:
>> https://github.com/lostgoat/linux/tree/wip-high-priority
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Add support for high priority scheduling in amdgpu
       [not found]     ` <25194b1a-4756-e1ad-f597-17063a14eb4c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-03-01 17:13       ` Andres Rodriguez
       [not found]         ` <ddeb4a53-ec4f-9a87-9323-897c571b1634-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-03-01 17:13 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 6043 bytes --]



On 3/1/2017 6:42 AM, Christian König wrote:
> Patches #1-#14 are Acked-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>.
>
> Patch #15:
>
> Not sure if that is a good idea or not, need to take a closer look 
> after digging through the rest.
>
> In general the HW IP is just for the IOCTL API and not for internal 
> use inside the driver.
I'll drop this patch and use ring->funcs->type instead.
>
> Patch #16:
>
> Really nice :) I don't have time to look into it in detail, but you 
> have one misconception I like to point out:
>> The queue manager maintains a per-file descriptor map of user ring ids
>> to amdgpu_ring pointers. Once a map is created it is permanent (this is
>> required to maintain FIFO execution guarantees for a ring).
> Actually we don't have a FIFO execution guarantee per ring. We only 
> have that per context.

Agreed. I'm using pretty imprecise terminology here which can be 
confusing. I wanted to be more precise than "context", because two 
amdgpu_cs_request submissions to the same context but with a different 
ring field can execute out of order.

I think s/ring/context's ring/ should be enough to clarify here if you 
think so as well.

>
> E.g. commands from different context can execute at the same time and 
> out of order.
>
> Making this per file is ok for now, but you should keep in mind that 
> we might want to change that sooner or later.
>
> Patch #17 & #18 need to take a closer look when I have more time, but 
> the comments from others sounded valid to me as well.
>
> Patch #19: Raising and lowering the priority of a ring during command 
> submission doesn't sound like a good idea to me.
I'm not really sure what would be a better time than at command submission.

If it was just SPI priorities we could have static partitioning of 
rings, some high priority and some regular, etc. But that approach 
reduces the number of rings
>
> The way you currently have it implemented would also raise the 
> priority of already running jobs on the same ring. Keep in mind that 
> everything is pipelined here.
That is actually intentional. If there is work already on the ring with 
lower priority we don't want the high priority work to have to wait for 
it to finish executing at regular priority. Therefore the work that has 
already been commited to the ring inherits the higher priority level.

I agree this isn't ideal, which is why the LRU ring mapping policy is 
there to make sure this doesn't happen often.
>
> Additional to that you can't have a fence callback in the job 
> structure, cause the job structure is freed by the same fence as well. 
> So it can happen that you access freed up memory (but only for a very 
> short period of time).
Any strong preference for either 1) refcounting the job structure, or 2) 
allocating a new piece of memory to store the callback parameters?

> Patches #20-#22 are Acked-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>.
>
> Regards,
> Christian.
>
> Am 28.02.2017 um 23:14 schrieb Andres Rodriguez:
>> This patch series introduces a mechanism that allows users with 
>> sufficient
>> privileges to categorize their work as "high priority". A userspace 
>> app can
>> create a high priority amdgpu context, where any work submitted to 
>> this context
>> will receive preferential treatment over any other work.
>>
>> High priority contexts will be scheduled ahead of other contexts by 
>> the sw gpu
>> scheduler. This functionality is generic for all HW blocks.
>>
>> Optionally, a ring can implement a set_priority() function that allows
>> programming HW specific features to elevate a ring's priority.
>>
>> This patch series implements set_priority() for gfx8 compute rings. 
>> It takes
>> advantage of SPI scheduling and CU reservation to provide improved frame
>> latencies for high priority contexts.
>>
>> For compute + compute scenarios we get near perfect scheduling 
>> latency. E.g.
>> one high priority ComputeParticles + one low priority ComputeParticles:
>>      - High priority ComputeParticles: 2.0-2.6 ms/frame
>>      - Regular ComputeParticles: 35.2-68.5 ms/frame
>>
>> For compute + gfx scenarios the high priority compute application does
>> experience some latency variance. However, the variance has smaller 
>> bounds and
>> a smalled deviation then without high priority scheduling.
>>
>> Following is a graph of the frame time experienced by a high priority 
>> compute
>> app in 4 different scenarios to exemplify the compute + gfx latency 
>> variance:
>>      - ComputeParticles: this scenario invloves running the compute 
>> particles
>>        sample on its own.
>>      - +SSAO: Previous scenario with the addition of running the ssao 
>> sample
>>        application that clogs the GFX ring with constant work.
>>      - +SPI Priority: Previous scenario with the addition of SPI 
>> priority
>>        programming for compute rings.
>>      - +CU Reserve: Previous scenario with the addition of dynamic CU
>>        reservation for compute rings.
>>
>> Graph link:
>> https://plot.ly/~lostgoat/9/
>>
>> As seen above, high priority contexts for compute allow us to 
>> schedule work
>> with enhanced confidence of completion latency under high GPU loads. 
>> This
>> property will be important for VR reprojection workloads.
>>
>> Note: The first part of this series is a resend of "Change queue/pipe 
>> split
>> between amdkfd and amdgpu" with the following changes:
>>      - Fixed kfdtest on Kaveri due to shift overflow. Refer to: 
>> "drm/amdkfdallow
>>        split HQD on per-queue granularity v3"
>>      - Used Felix's suggestions for a simplified HQD programming 
>> sequence
>>      - Added a workaround for a Tonga HW bug during HQD programming
>>
>> This series is also available at:
>> https://github.com/lostgoat/linux/tree/wip-high-priority
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>


[-- Attachment #1.2: Type: text/html, Size: 9211 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Add support for high priority scheduling in amdgpu
       [not found]         ` <ddeb4a53-ec4f-9a87-9323-897c571b1634-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-01 17:24           ` Andres Rodriguez
       [not found]             ` <4c908b1f-fcb2-7d89-026a-76fd3f4f1f22-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-03-01 17:24 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2017-03-01 12:13 PM, Andres Rodriguez wrote:
>
>
> On 3/1/2017 6:42 AM, Christian König wrote:
>> Patches #1-#14 are Acked-by: Christian König <christian.koenig@amd.com>.
>>
>> Patch #15:
>>
>> Not sure if that is a good idea or not, need to take a closer look 
>> after digging through the rest.
>>
>> In general the HW IP is just for the IOCTL API and not for internal 
>> use inside the driver.
> I'll drop this patch and use ring->funcs->type instead.
>>
>> Patch #16:
>>
>> Really nice :) I don't have time to look into it in detail, but you 
>> have one misconception I like to point out:
>>> The queue manager maintains a per-file descriptor map of user ring ids
>>> to amdgpu_ring pointers. Once a map is created it is permanent (this is
>>> required to maintain FIFO execution guarantees for a ring).
>> Actually we don't have a FIFO execution guarantee per ring. We only 
>> have that per context.
>
> Agreed. I'm using pretty imprecise terminology here which can be 
> confusing. I wanted to be more precise than "context", because two 
> amdgpu_cs_request submissions to the same context but with a different 
> ring field can execute out of order.
>
> I think s/ring/context's ring/ should be enough to clarify here if you 
> think so as well.
>
>>
>> E.g. commands from different context can execute at the same time and 
>> out of order.
>>
>> Making this per file is ok for now, but you should keep in mind that 
>> we might want to change that sooner or later.
>>
>> Patch #17 & #18 need to take a closer look when I have more time, but 
>> the comments from others sounded valid to me as well.
>>
>> Patch #19: Raising and lowering the priority of a ring during command 
>> submission doesn't sound like a good idea to me.
> I'm not really sure what would be a better time than at command 
> submission.
>
> If it was just SPI priorities we could have static partitioning of 
> rings, some high priority and some regular, etc. But that approach 
> reduces the number of rings
Sorry, I finished typing something else and forgot this section was 
incomplete. Full reply:

I'm not really sure what would be a better time than at command submission.

If it was just SPI priorities we could have static partitioning of 
rings, some high priority and some regular, etc. But that approach 
reduces the number of rings available. It would also require a callback 
at command submission time for CU reservation.
>>
>> The way you currently have it implemented would also raise the 
>> priority of already running jobs on the same ring. Keep in mind that 
>> everything is pipelined here.
> That is actually intentional. If there is work already on the ring 
> with lower priority we don't want the high priority work to have to 
> wait for it to finish executing at regular priority. Therefore the 
> work that has already been commited to the ring inherits the higher 
> priority level.
>
> I agree this isn't ideal, which is why the LRU ring mapping policy is 
> there to make sure this doesn't happen often.
>>
>> Additional to that you can't have a fence callback in the job 
>> structure, cause the job structure is freed by the same fence as 
>> well. So it can happen that you access freed up memory (but only for 
>> a very short period of time).
> Any strong preference for either 1) refcounting the job structure, or 
> 2) allocating a new piece of memory to store the callback parameters?
>
>> Patches #20-#22 are Acked-by: Christian König 
>> <christian.koenig@amd.com>.
>>
>> Regards,
>> Christian.
>>
>> Am 28.02.2017 um 23:14 schrieb Andres Rodriguez:
>>> This patch series introduces a mechanism that allows users with 
>>> sufficient
>>> privileges to categorize their work as "high priority". A userspace 
>>> app can
>>> create a high priority amdgpu context, where any work submitted to 
>>> this context
>>> will receive preferential treatment over any other work.
>>>
>>> High priority contexts will be scheduled ahead of other contexts by 
>>> the sw gpu
>>> scheduler. This functionality is generic for all HW blocks.
>>>
>>> Optionally, a ring can implement a set_priority() function that allows
>>> programming HW specific features to elevate a ring's priority.
>>>
>>> This patch series implements set_priority() for gfx8 compute rings. 
>>> It takes
>>> advantage of SPI scheduling and CU reservation to provide improved 
>>> frame
>>> latencies for high priority contexts.
>>>
>>> For compute + compute scenarios we get near perfect scheduling 
>>> latency. E.g.
>>> one high priority ComputeParticles + one low priority ComputeParticles:
>>>      - High priority ComputeParticles: 2.0-2.6 ms/frame
>>>      - Regular ComputeParticles: 35.2-68.5 ms/frame
>>>
>>> For compute + gfx scenarios the high priority compute application does
>>> experience some latency variance. However, the variance has smaller 
>>> bounds and
>>> a smalled deviation then without high priority scheduling.
>>>
>>> Following is a graph of the frame time experienced by a high 
>>> priority compute
>>> app in 4 different scenarios to exemplify the compute + gfx latency 
>>> variance:
>>>      - ComputeParticles: this scenario invloves running the compute 
>>> particles
>>>        sample on its own.
>>>      - +SSAO: Previous scenario with the addition of running the 
>>> ssao sample
>>>        application that clogs the GFX ring with constant work.
>>>      - +SPI Priority: Previous scenario with the addition of SPI 
>>> priority
>>>        programming for compute rings.
>>>      - +CU Reserve: Previous scenario with the addition of dynamic CU
>>>        reservation for compute rings.
>>>
>>> Graph link:
>>> https://plot.ly/~lostgoat/9/
>>>
>>> As seen above, high priority contexts for compute allow us to 
>>> schedule work
>>> with enhanced confidence of completion latency under high GPU loads. 
>>> This
>>> property will be important for VR reprojection workloads.
>>>
>>> Note: The first part of this series is a resend of "Change 
>>> queue/pipe split
>>> between amdkfd and amdgpu" with the following changes:
>>>      - Fixed kfdtest on Kaveri due to shift overflow. Refer to: 
>>> "drm/amdkfdallow
>>>        split HQD on per-queue granularity v3"
>>>      - Used Felix's suggestions for a simplified HQD programming 
>>> sequence
>>>      - Added a workaround for a Tonga HW bug during HQD programming
>>>
>>> This series is also available at:
>>> https://github.com/lostgoat/linux/tree/wip-high-priority
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings
       [not found]             ` <CADnq5_NhLAOsR7tHhRZRzA12j_-5MWFEXfWeGqKmSifHp_5jKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-01 17:44               ` Andres Rodriguez
       [not found]                 ` <f0de5e4f-bf94-9222-cc9e-1d535c228b0a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-03-01 17:44 UTC (permalink / raw)
  To: Alex Deucher, zhoucm1, Jay Cornwall; +Cc: amd-gfx list



On 2017-03-01 10:49 AM, Alex Deucher wrote:
> On Wed, Mar 1, 2017 at 2:27 AM, zhoucm1 <david1.zhou@amd.com> wrote:
>>
>> On 2017年03月01日 06:14, Andres Rodriguez wrote:
>>> Add an initial framework for changing the HW priorities of rings. The
>>> framework allows requesting priority changes for the lifetime of an
>>> amdgpu_job. After the job completes the priority will decay to the next
>>> lowest priority for which a request is still valid.
>>>
>>> A new ring function set_priority() can now be populated to take care of
>>> the HW specific programming sequence for priority changes.
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  4 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 10 ++++-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 71
>>> ++++++++++++++++++++++++++++++++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 11 +++++
>>>    5 files changed, 94 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 366f6d3..0676495 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -636,21 +636,21 @@ struct amdgpu_flip_work {
>>>    struct amdgpu_ib {
>>>          struct amdgpu_sa_bo             *sa_bo;
>>>          uint32_t                        length_dw;
>>>          uint64_t                        gpu_addr;
>>>          uint32_t                        *ptr;
>>>          uint32_t                        flags;
>>>    };
>>>      extern const struct amd_sched_backend_ops amdgpu_sched_ops;
>>>    -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>>> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int
>>> priority,
>>>                       struct amdgpu_job **job, struct amdgpu_vm *vm);
>>>    int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>>>                               struct amdgpu_job **job);
>>>      void amdgpu_job_free_resources(struct amdgpu_job *job);
>>>    void amdgpu_job_free(struct amdgpu_job *job);
>>>    int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring *ring,
>>>                        struct amd_sched_entity *entity, void *owner,
>>>                        struct dma_fence **f);
>>>    @@ -990,20 +990,22 @@ struct amdgpu_cs_parser {
>>>    #define AMDGPU_VM_DOMAIN                    (1 << 3) /* bit set means in
>>> virtual memory context */
>>>      struct amdgpu_job {
>>>          struct amd_sched_job    base;
>>>          struct amdgpu_device    *adev;
>>>          struct amdgpu_vm        *vm;
>>>          struct amdgpu_ring      *ring;
>>>          struct amdgpu_sync      sync;
>>>          struct amdgpu_ib        *ibs;
>>>          struct dma_fence        *fence; /* the hw fence */
>>> +       struct dma_fence_cb     cb;
>>> +       int                     priority;
>>>          uint32_t                preamble_status;
>>>          uint32_t                num_ibs;
>>>          void                    *owner;
>>>          uint64_t                fence_ctx; /* the fence_context this job
>>> uses */
>>>          bool                    vm_needs_flush;
>>>          unsigned                vm_id;
>>>          uint64_t                vm_pd_addr;
>>>          uint32_t                gds_base, gds_size;
>>>          uint32_t                gws_base, gws_size;
>>>          uint32_t                oa_base, oa_size;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 605d40e..19ce202 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -179,21 +179,21 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser
>>> *p, void *data)
>>>                  case AMDGPU_CHUNK_ID_DEPENDENCIES:
>>>                          break;
>>>                  default:
>>>                          ret = -EINVAL;
>>>                          goto free_partial_kdata;
>>>                  }
>>>          }
>>>    -     ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm);
>>> +       ret = amdgpu_job_alloc(p->adev, num_ibs, p->ctx->priority,
>>> &p->job, vm);
>>>          if (ret)
>>>                  goto free_all_kdata;
>>>          if (p->uf_entry.robj)
>>>                  p->job->uf_addr = uf_offset;
>>>          kfree(chunk_array);
>>>          return 0;
>>>      free_all_kdata:
>>>          i = p->nchunks - 1;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 86a1242..45b3c90 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -32,50 +32,51 @@ static void amdgpu_job_timedout(struct amd_sched_job
>>> *s_job)
>>>    {
>>>          struct amdgpu_job *job = container_of(s_job, struct amdgpu_job,
>>> base);
>>>          DRM_ERROR("ring %s timeout, last signaled seq=%u, last emitted
>>> seq=%u\n",
>>>                    job->base.sched->name,
>>>                    atomic_read(&job->ring->fence_drv.last_seq),
>>>                    job->ring->fence_drv.sync_seq);
>>>          amdgpu_gpu_reset(job->adev);
>>>    }
>>>    -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>>> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int
>>> priority,
>>>                       struct amdgpu_job **job, struct amdgpu_vm *vm)
>>>    {
>>>          size_t size = sizeof(struct amdgpu_job);
>>>          if (num_ibs == 0)
>>>                  return -EINVAL;
>>>          size += sizeof(struct amdgpu_ib) * num_ibs;
>>>          *job = kzalloc(size, GFP_KERNEL);
>>>          if (!*job)
>>>                  return -ENOMEM;
>>>          (*job)->adev = adev;
>>>          (*job)->vm = vm;
>>> +       (*job)->priority = priority;
>>>          (*job)->ibs = (void *)&(*job)[1];
>>>          (*job)->num_ibs = num_ibs;
>>>          amdgpu_sync_create(&(*job)->sync);
>>>          return 0;
>>>    }
>>>      int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned
>>> size,
>>>                               struct amdgpu_job **job)
>>>    {
>>>          int r;
>>>    -     r = amdgpu_job_alloc(adev, 1, job, NULL);
>>> +       r = amdgpu_job_alloc(adev, 1, AMDGPU_CTX_PRIORITY_NORMAL, job,
>>> NULL);
>>>          if (r)
>>>                  return r;
>>>          r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
>>>          if (r)
>>>                  kfree(*job);
>>>          return r;
>>>    }
>>>    @@ -170,20 +171,25 @@ static struct dma_fence *amdgpu_job_run(struct
>>> amd_sched_job *sched_job)
>>>          BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));
>>>          trace_amdgpu_sched_run_job(job);
>>>          r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job,
>>> &fence);
>>>          if (r)
>>>                  DRM_ERROR("Error scheduling IBs (%d)\n", r);
>>>          /* if gpu reset, hw fence will be replaced here */
>>>          dma_fence_put(job->fence);
>>>          job->fence = dma_fence_get(fence);
>>> +
>>> +       r = amdgpu_ring_elevate_priority(job->ring, job->priority, job);
>>> +       if (r)
>>> +               DRM_ERROR("Failed to set job priority (%d)\n", r);
>> elevate priority of ring is for this job, right?
>> you're setting priority by mmio method and restore priority by fence cb,
>> your method isn't precise. I have some comments:
>> 1. seems we should elevate_priority before amdgpu_ib_schedule, so that the
>> ring is proper priority for this submission.
Yeah thanks for the catch on that. This might result in the first 
packets being emitted with a new_wave_priority lower than desired and 
introduce extra delay.
>> 2. can we put setting of registers of SPI into ring buffer as a command
>> frame? evevate_priority setting is front of a command frame and
>> restore_priority is end of frame.
We can't do it on the ring because we need any work that is already 
committed to the ring the inherit the highest priority. Otherwise the 
high priority work would have to wait for the regular priority work (to 
complete at regular priority latency) before it can begin.

For more details see my reply to Christian's email regarding this topic.
> We should probably check if changing the queue priority requires that
> the engine be idle as well.  I vaguely recall there being a packet for
> some of this that was added for HSA.
+Jay

Hey Jay,

Do you have more details on this?

I wrote these patches under the assumption that changing the priority 
while the engine is busy was allowed, but that it would not have any 
effect for waves that have already been emitted.

Would changing the priorities while the engines are active cause any 
hangs or bad side effects?
>
> Alex
>
>> 3. if no priority exchanges, we can skip setting for priority for command
>> submission.
I originally had a check that skipped set_priority() if the priorities 
matched, but I decided to remove it.

The reason is that while no changes to the HW state may need to happen, 
the implementation may need to track extra metadata about who is 
executing high priority work on the ring. Therefore whether the 
operation is a noop or not should be decided by the specific implementation.

Having said that, I think gfx_v8_0_ring_set_priority_compute() can be 
improved to skip more work. So I'll apply your suggestion there.

>>
>> Regards,
>> David Zhou
>>
>>> +
>>>          amdgpu_job_free_resources(job);
>>>          return fence;
>>>    }
>>>      const struct amd_sched_backend_ops amdgpu_sched_ops = {
>>>          .dependency = amdgpu_job_dependency,
>>>          .run_job = amdgpu_job_run,
>>>          .timedout_job = amdgpu_job_timedout,
>>>          .free_job = amdgpu_job_free_cb
>>>    };
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> index 80cb051..12bc7a9 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>> @@ -192,20 +192,89 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
>>>     * Reset the driver's copy of the wptr (all asics).
>>>     */
>>>    void amdgpu_ring_undo(struct amdgpu_ring *ring)
>>>    {
>>>          ring->wptr = ring->wptr_old;
>>>          if (ring->funcs->end_use)
>>>                  ring->funcs->end_use(ring);
>>>    }
>>>    +static void amdgpu_ring_restore_priority_cb(struct dma_fence *f,
>>> +                                           struct dma_fence_cb *cb)
>>> +{
>>> +       int i;
>>> +       struct amdgpu_job *cb_job =
>>> +               container_of(cb, struct amdgpu_job, cb);
>>> +       struct amdgpu_ring *ring = cb_job->ring;
>>> +
>>> +       spin_lock(&ring->priority_lock);
>>> +
>>> +       /* remove ourselves from the list if necessary */
>>> +       if (cb_job == ring->last_job[cb_job->priority])
>>> +               ring->last_job[cb_job->priority] = NULL;
>>> +
>>> +       /* something higher prio is executing, no need to decay */
>>> +       if (ring->priority > cb_job->priority)
>>> +               goto out_unlock;
>>> +
>>> +       /* decay priority to the next level with a job available */
>>> +       for (i = cb_job->priority; i >= 0; i--) {
>>> +               if (i == AMDGPU_CTX_PRIORITY_NORMAL || ring->last_job[i])
>>> {
>>> +                       ring->priority = i;
>>> +                       if (ring->funcs->set_priority)
>>> +                               ring->funcs->set_priority(ring, i);
>>> +
>>> +                       break;
>>> +               }
>>> +       }
>>> +
>>> +out_unlock:
>>> +       spin_unlock(&ring->priority_lock);
>>> +}
>>> +
>>> +/**
>>> + * amdgpu_ring_elevate_priority - change the ring's priority
>>> + *
>>> + * @ring: amdgpu_ring structure holding the information
>>> + * @priority: target priority
>>> + * @job: priority should remain elevated for the duration of this job
>>> + *
>>> + * Use HW specific mechanism's to elevate the ring's priority while @job
>>> + * is executing. Once @job finishes executing, the ring will reset back
>>> + * to normal priority.
>>> + * Returns 0 on success, error otherwise
>>> + */
>>> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
>>> +                                struct amdgpu_job *job)
>>> +{
>>> +       if (priority < 0 || priority >= AMDGPU_CTX_PRIORITY_NUM)
>>> +               return -EINVAL;
>>> +
>>> +       spin_lock(&ring->priority_lock);
>>> +       ring->last_job[priority] = job;
>>> +
>>> +       if (priority <= ring->priority)
>>> +               goto out_unlock;
>>> +
>>> +       ring->priority = priority;
>>> +       if (ring->funcs->set_priority)
>>> +               ring->funcs->set_priority(ring, priority);
>>> +
>>> +       dma_fence_add_callback(job->fence, &job->cb,
>>> +                              amdgpu_ring_restore_priority_cb);
>>> +
>>> +out_unlock:
>>> +       spin_unlock(&ring->priority_lock);
>>> +       return 0;
>>> +}
>>> +
>>>    /**
>>>     * amdgpu_ring_init - init driver ring struct.
>>>     *
>>>     * @adev: amdgpu_device pointer
>>>     * @ring: amdgpu_ring structure holding ring information
>>>     * @max_ndw: maximum number of dw for ring alloc
>>>     * @nop: nop packet for this ring
>>>     *
>>>     * Initialize the driver information for the selected ring (all asics).
>>>     * Returns 0 on success, error on failure.
>>> @@ -275,20 +344,22 @@ int amdgpu_ring_init(struct amdgpu_device *adev,
>>> struct amdgpu_ring *ring,
>>>                                              (void **)&ring->ring);
>>>                  if (r) {
>>>                          dev_err(adev->dev, "(%d) ring create failed\n",
>>> r);
>>>                          return r;
>>>                  }
>>>                  memset((void *)ring->ring, 0, ring->ring_size);
>>>          }
>>>          ring->ptr_mask = (ring->ring_size / 4) - 1;
>>>          ring->max_dw = max_dw;
>>>          ring->hw_ip = hw_ip;
>>> +       ring->priority = AMDGPU_CTX_PRIORITY_NORMAL;
>>> +       spin_lock_init(&ring->priority_lock);
>>>          INIT_LIST_HEAD(&ring->lru_list);
>>>          amdgpu_ring_lru_touch(adev, ring);
>>>          if (amdgpu_debugfs_ring_init(adev, ring)) {
>>>                  DRM_ERROR("Failed to register debugfs file for rings
>>> !\n");
>>>          }
>>>          return 0;
>>>    }
>>>      /**
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index ecdd87c..befc29f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -17,20 +17,21 @@
>>>     * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES
>>> OR
>>>     * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>     * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>     * OTHER DEALINGS IN THE SOFTWARE.
>>>     *
>>>     * Authors: Christian König
>>>     */
>>>    #ifndef __AMDGPU_RING_H__
>>>    #define __AMDGPU_RING_H__
>>>    +#include <drm/amdgpu_drm.h>
>>>    #include "gpu_scheduler.h"
>>>      /* max number of rings */
>>>    #define AMDGPU_MAX_RINGS              16
>>>    #define AMDGPU_MAX_GFX_RINGS          1
>>>    #define AMDGPU_MAX_COMPUTE_RINGS      8
>>>    #define AMDGPU_MAX_VCE_RINGS          3
>>>      /* some special values for the owner field */
>>>    #define AMDGPU_FENCE_OWNER_UNDEFINED  ((void*)0ul)
>>> @@ -130,20 +131,22 @@ struct amdgpu_ring_funcs {
>>>          void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
>>>          unsigned (*init_cond_exec)(struct amdgpu_ring *ring);
>>>          void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned
>>> offset);
>>>          /* note usage for clock and power gating */
>>>          void (*begin_use)(struct amdgpu_ring *ring);
>>>          void (*end_use)(struct amdgpu_ring *ring);
>>>          void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>>>          void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
>>>          void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>>>          void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t
>>> val);
>>> +       /* priority functions */
>>> +       void (*set_priority) (struct amdgpu_ring *ring, int priority);
>>>    };
>>>      struct amdgpu_ring {
>>>          struct amdgpu_device            *adev;
>>>          const struct amdgpu_ring_funcs  *funcs;
>>>          struct amdgpu_fence_driver      fence_drv;
>>>          struct amd_gpu_scheduler        sched;
>>>          struct list_head                lru_list;
>>>          struct amdgpu_bo        *ring_obj;
>>> @@ -165,31 +168,39 @@ struct amdgpu_ring {
>>>          struct amdgpu_bo        *mqd_obj;
>>>          u32                     doorbell_index;
>>>          bool                    use_doorbell;
>>>          unsigned                wptr_offs;
>>>          unsigned                fence_offs;
>>>          uint64_t                current_ctx;
>>>          char                    name[16];
>>>          unsigned                cond_exe_offs;
>>>          u64                     cond_exe_gpu_addr;
>>>          volatile u32            *cond_exe_cpu_addr;
>>> +
>>> +       spinlock_t              priority_lock;
>>> +       /* protected by priority_lock */
>>> +       struct amdgpu_job       *last_job[AMDGPU_CTX_PRIORITY_NUM];
>>> +       int                     priority;
>>> +
>>>    #if defined(CONFIG_DEBUG_FS)
>>>          struct dentry *ent;
>>>    #endif
>>>    };
>>>      int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
>>>                                 int hw_ip, int ring);
>>>    int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
>>>    void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
>>>    void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct
>>> amdgpu_ib *ib);
>>>    void amdgpu_ring_commit(struct amdgpu_ring *ring);
>>>    void amdgpu_ring_undo(struct amdgpu_ring *ring);
>>> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int priority,
>>> +                                struct amdgpu_job *job);
>>>    int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring
>>> *ring,
>>>                       int hw_ip, unsigned ring_size,
>>>                       struct amdgpu_irq_src *irq_src, unsigned irq_type);
>>>    void amdgpu_ring_fini(struct amdgpu_ring *ring);
>>>    int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
>>>                          struct amdgpu_ring **ring);
>>>    void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct
>>> amdgpu_ring *ring);
>>>      #endif
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found]         ` <CACvgo51=1-8dHmC8MOmbCijDv3vpD4dTC6hibQMe5bYB9zsB4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-02  3:52           ` Andres Rodriguez
       [not found]             ` <782283a5-3871-0827-ed2c-9069a6dc6734-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Andres Rodriguez @ 2017-03-02  3:52 UTC (permalink / raw)
  To: Emil Velikov; +Cc: amd-gfx mailing list



On 2017-02-28 08:13 PM, Emil Velikov wrote:
> Hi Andres,
>
> There's a couple of nitpicks below, but feel free to address those as
> follow-up. Considering they're correct of course ;-)

As much as I'd like the to let future me deal with those issues, the 
UAPI behavior is something I'd like to get nailed down early and avoid 
changing.

So any nitpicks here are more than welcome now (better than later :) )
>
> On 28 February 2017 at 22:14, Andres Rodriguez <andresx7@gmail.com> wrote:
>> Add a new context creation parameter to express a global context priority.
>>
>> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
>> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
>> (default) contexts.
>>
>> v2: Instead of using flags, repurpose __pad
>> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
>> v4: Validate usermode priority and store it
>>
>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41 +++++++++++++++++++++++----
>>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
>>  include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
>>  4 files changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index e30c47e..366f6d3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>>         struct amd_sched_entity entity;
>>  };
>>
>>  struct amdgpu_ctx {
>>         struct kref             refcount;
>>         struct amdgpu_device    *adev;
>>         unsigned                reset_counter;
>>         spinlock_t              ring_lock;
>>         struct dma_fence        **fences;
>>         struct amdgpu_ctx_ring  rings[AMDGPU_MAX_RINGS];
>> +       int                     priority;
>>         bool preamble_presented;
>>  };
>>
>>  struct amdgpu_ctx_mgr {
>>         struct amdgpu_device    *adev;
>>         struct mutex            lock;
>>         /* protected by lock */
>>         struct idr              ctx_handles;
>>  };
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> index 400c66b..22a15d6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> @@ -18,47 +18,75 @@
>>   * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>   * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>   * OTHER DEALINGS IN THE SOFTWARE.
>>   *
>>   * Authors: monk liu <monk.liu@amd.com>
>>   */
>>
>>  #include <drm/drmP.h>
>>  #include "amdgpu.h"
>>
>> -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
>> +static enum amd_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
>> +{
>> +       switch (amdgpu_priority) {
>> +       case AMDGPU_CTX_PRIORITY_HIGH:
>> +               return AMD_SCHED_PRIORITY_HIGH;
>> +       case AMDGPU_CTX_PRIORITY_NORMAL:
>> +               return AMD_SCHED_PRIORITY_NORMAL;
>> +       default:
>> +               WARN(1, "Invalid context priority %d\n", amdgpu_priority);
>> +               return AMD_SCHED_PRIORITY_NORMAL;
>> +       }
>> +}
>> +
>> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
>> +                               uint32_t priority,
>> +                               struct amdgpu_ctx *ctx)
>>  {
>>         unsigned i, j;
>>         int r;
>> +       enum amd_sched_priority sched_priority;
>> +
>> +       sched_priority = amdgpu_to_sched_priority(priority);
>> +
> This will trigger dmesg spam on normal user input. I'd keep the WARN
> in amdgpu_to_sched_priority, but move the function call after the
> validation of priority.
> Thinking about it the input validation really belongs in the ioctl -
> amdgpu_ctx_ioctl().
>

Agreed.

>> +       if (priority >= AMDGPU_CTX_PRIORITY_NUM)
>> +               return -EINVAL;
>> +
>> +       if (sched_priority < 0 || sched_priority >= AMD_SCHED_MAX_PRIORITY)
>> +               return -EINVAL;
>> +
>> +       if (sched_priority == AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_ADMIN))
> This is not obvious neither in the commit message nor the UAPI. I'd
> suggest adding a comment in the latter.

Will do.

> If memory is not failing - high prio will _not_ work with render nodes
> so you really want to cover and/or explain why.

High priority will work fine with render nodes. I'm testing using radv 
which uses render nodes actually.

But I've had my fair share of two bugs canceling each other out. So if 
you do have some insight on why this is the case, let me know.
>
>> +               return -EACCES;
>>
>>         memset(ctx, 0, sizeof(*ctx));
>>         ctx->adev = adev;
>> +       ctx->priority = priority;
>>         kref_init(&ctx->refcount);
>>         spin_lock_init(&ctx->ring_lock);
>>         ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>>                               sizeof(struct dma_fence*), GFP_KERNEL);
>>         if (!ctx->fences)
>>                 return -ENOMEM;
>>
>>         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>                 ctx->rings[i].sequence = 1;
>>                 ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
>>         }
>>
>>         ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>>
>>         /* create context entity for each ring */
>>         for (i = 0; i < adev->num_rings; i++) {
>>                 struct amdgpu_ring *ring = adev->rings[i];
>>                 struct amd_sched_rq *rq;
>>
>> -               rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
>> +               rq = &ring->sched.sched_rq[sched_priority];
>>                 r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
>>                                           rq, amdgpu_sched_jobs);
>>                 if (r)
>>                         goto failed;
>>         }
>>
>>         return 0;
>>
>>  failed:
>>         for (j = 0; j < i; j++)
>> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>>         kfree(ctx->fences);
>>         ctx->fences = NULL;
>>
>>         for (i = 0; i < adev->num_rings; i++)
>>                 amd_sched_entity_fini(&adev->rings[i]->sched,
>>                                       &ctx->rings[i].entity);
>>  }
>>
>>  static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>>                             struct amdgpu_fpriv *fpriv,
>> +                           uint32_t priority,
>>                             uint32_t *id)
>>  {
>>         struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>>         struct amdgpu_ctx *ctx;
>>         int r;
>>
>>         ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>>         if (!ctx)
>>                 return -ENOMEM;
>>
>>         mutex_lock(&mgr->lock);
>>         r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>>         if (r < 0) {
>>                 mutex_unlock(&mgr->lock);
>>                 kfree(ctx);
>>                 return r;
>>         }
>> +
>>         *id = (uint32_t)r;
>> -       r = amdgpu_ctx_init(adev, ctx);
>> +       r = amdgpu_ctx_init(adev, priority, ctx);
>>         if (r) {
>>                 idr_remove(&mgr->ctx_handles, *id);
>>                 *id = 0;
>>                 kfree(ctx);
>>         }
>>         mutex_unlock(&mgr->lock);
>>         return r;
>>  }
>>
>>  static void amdgpu_ctx_do_release(struct kref *ref)
>> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device *adev,
>>         ctx->reset_counter = reset_counter;
>>
>>         mutex_unlock(&mgr->lock);
>>         return 0;
>>  }
>>
>>  int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>>                      struct drm_file *filp)
>>  {
>>         int r;
>> -       uint32_t id;
>> +       uint32_t id, priority;
>>
>>         union drm_amdgpu_ctx *args = data;
>>         struct amdgpu_device *adev = dev->dev_private;
>>         struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>
>>         r = 0;
>>         id = args->in.ctx_id;
>> +       priority = args->in.priority;
>>
> Hmm we don't seem to be doing any in.flags validation - not cool.
> Someone seriously wants to add that and check the remaining ioctls.
> At the same time - I think you want to add a flag bit "HAS_PRIORITY"
> [or similar] and honour in.priority only when that is set.
>
> Even if the USM drivers are safe, this will break on a poor soul that
> is learning how to program their GPU. "My program was running before -
> I updated the kernel and it no longer does :-("

Improving validation of already existing parameters is probably 
something better served in a separate series (so in this case I will 
take you up on the earlier offer)

As far as a HAS_PRIORITY flag goes, I'm not sure it would really be 
necessary. libdrm_amdgpu zeroes their data structures since its first 
commit (0936139), so the change should be backwards compatible with all 
libdrm_amdgpu versions.

If someone is bypassing libdrm_amdgpu and hitting the IOCTLs directly, 
then I hope they know what they are doing. The only apps I've really 
seen do this are fuzzers, and they probably wouldn't care.

I'm assuming we do have the flexibility of relying on the usermode 
library as the point of the backwards compatibility.

>
> Either way, the patch is:
> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>
> -Emil
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings
       [not found]                 ` <f0de5e4f-bf94-9222-cc9e-1d535c228b0a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-02  6:45                   ` Andres Rodriguez
  0 siblings, 0 replies; 40+ messages in thread
From: Andres Rodriguez @ 2017-03-02  6:45 UTC (permalink / raw)
  To: Alex Deucher, zhoucm1, Jay Cornwall; +Cc: amd-gfx list



On 2017-03-01 12:44 PM, Andres Rodriguez wrote:
>
>
> On 2017-03-01 10:49 AM, Alex Deucher wrote:
>> On Wed, Mar 1, 2017 at 2:27 AM, zhoucm1 <david1.zhou@amd.com> wrote:
>>>
>>> On 2017年03月01日 06:14, Andres Rodriguez wrote:
>>>> Add an initial framework for changing the HW priorities of rings. The
>>>> framework allows requesting priority changes for the lifetime of an
>>>> amdgpu_job. After the job completes the priority will decay to the next
>>>> lowest priority for which a request is still valid.
>>>>
>>>> A new ring function set_priority() can now be populated to take care of
>>>> the HW specific programming sequence for priority changes.
>>>>
>>>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  4 +-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 +-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 10 ++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 71
>>>> ++++++++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 11 +++++
>>>>    5 files changed, 94 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 366f6d3..0676495 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -636,21 +636,21 @@ struct amdgpu_flip_work {
>>>>    struct amdgpu_ib {
>>>>          struct amdgpu_sa_bo             *sa_bo;
>>>>          uint32_t                        length_dw;
>>>>          uint64_t                        gpu_addr;
>>>>          uint32_t                        *ptr;
>>>>          uint32_t                        flags;
>>>>    };
>>>>      extern const struct amd_sched_backend_ops amdgpu_sched_ops;
>>>>    -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>>>> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int
>>>> priority,
>>>>                       struct amdgpu_job **job, struct amdgpu_vm *vm);
>>>>    int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned
>>>> size,
>>>>                               struct amdgpu_job **job);
>>>>      void amdgpu_job_free_resources(struct amdgpu_job *job);
>>>>    void amdgpu_job_free(struct amdgpu_job *job);
>>>>    int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring
>>>> *ring,
>>>>                        struct amd_sched_entity *entity, void *owner,
>>>>                        struct dma_fence **f);
>>>>    @@ -990,20 +990,22 @@ struct amdgpu_cs_parser {
>>>>    #define AMDGPU_VM_DOMAIN                    (1 << 3) /* bit set
>>>> means in
>>>> virtual memory context */
>>>>      struct amdgpu_job {
>>>>          struct amd_sched_job    base;
>>>>          struct amdgpu_device    *adev;
>>>>          struct amdgpu_vm        *vm;
>>>>          struct amdgpu_ring      *ring;
>>>>          struct amdgpu_sync      sync;
>>>>          struct amdgpu_ib        *ibs;
>>>>          struct dma_fence        *fence; /* the hw fence */
>>>> +       struct dma_fence_cb     cb;
>>>> +       int                     priority;
>>>>          uint32_t                preamble_status;
>>>>          uint32_t                num_ibs;
>>>>          void                    *owner;
>>>>          uint64_t                fence_ctx; /* the fence_context
>>>> this job
>>>> uses */
>>>>          bool                    vm_needs_flush;
>>>>          unsigned                vm_id;
>>>>          uint64_t                vm_pd_addr;
>>>>          uint32_t                gds_base, gds_size;
>>>>          uint32_t                gws_base, gws_size;
>>>>          uint32_t                oa_base, oa_size;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 605d40e..19ce202 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -179,21 +179,21 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser
>>>> *p, void *data)
>>>>                  case AMDGPU_CHUNK_ID_DEPENDENCIES:
>>>>                          break;
>>>>                  default:
>>>>                          ret = -EINVAL;
>>>>                          goto free_partial_kdata;
>>>>                  }
>>>>          }
>>>>    -     ret = amdgpu_job_alloc(p->adev, num_ibs, &p->job, vm);
>>>> +       ret = amdgpu_job_alloc(p->adev, num_ibs, p->ctx->priority,
>>>> &p->job, vm);
>>>>          if (ret)
>>>>                  goto free_all_kdata;
>>>>          if (p->uf_entry.robj)
>>>>                  p->job->uf_addr = uf_offset;
>>>>          kfree(chunk_array);
>>>>          return 0;
>>>>      free_all_kdata:
>>>>          i = p->nchunks - 1;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index 86a1242..45b3c90 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -32,50 +32,51 @@ static void amdgpu_job_timedout(struct
>>>> amd_sched_job
>>>> *s_job)
>>>>    {
>>>>          struct amdgpu_job *job = container_of(s_job, struct
>>>> amdgpu_job,
>>>> base);
>>>>          DRM_ERROR("ring %s timeout, last signaled seq=%u, last emitted
>>>> seq=%u\n",
>>>>                    job->base.sched->name,
>>>>                    atomic_read(&job->ring->fence_drv.last_seq),
>>>>                    job->ring->fence_drv.sync_seq);
>>>>          amdgpu_gpu_reset(job->adev);
>>>>    }
>>>>    -int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>>>> +int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int
>>>> priority,
>>>>                       struct amdgpu_job **job, struct amdgpu_vm *vm)
>>>>    {
>>>>          size_t size = sizeof(struct amdgpu_job);
>>>>          if (num_ibs == 0)
>>>>                  return -EINVAL;
>>>>          size += sizeof(struct amdgpu_ib) * num_ibs;
>>>>          *job = kzalloc(size, GFP_KERNEL);
>>>>          if (!*job)
>>>>                  return -ENOMEM;
>>>>          (*job)->adev = adev;
>>>>          (*job)->vm = vm;
>>>> +       (*job)->priority = priority;
>>>>          (*job)->ibs = (void *)&(*job)[1];
>>>>          (*job)->num_ibs = num_ibs;
>>>>          amdgpu_sync_create(&(*job)->sync);
>>>>          return 0;
>>>>    }
>>>>      int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned
>>>> size,
>>>>                               struct amdgpu_job **job)
>>>>    {
>>>>          int r;
>>>>    -     r = amdgpu_job_alloc(adev, 1, job, NULL);
>>>> +       r = amdgpu_job_alloc(adev, 1, AMDGPU_CTX_PRIORITY_NORMAL, job,
>>>> NULL);
>>>>          if (r)
>>>>                  return r;
>>>>          r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
>>>>          if (r)
>>>>                  kfree(*job);
>>>>          return r;
>>>>    }
>>>>    @@ -170,20 +171,25 @@ static struct dma_fence *amdgpu_job_run(struct
>>>> amd_sched_job *sched_job)
>>>>          BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));
>>>>          trace_amdgpu_sched_run_job(job);
>>>>          r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job,
>>>> &fence);
>>>>          if (r)
>>>>                  DRM_ERROR("Error scheduling IBs (%d)\n", r);
>>>>          /* if gpu reset, hw fence will be replaced here */
>>>>          dma_fence_put(job->fence);
>>>>          job->fence = dma_fence_get(fence);
>>>> +
>>>> +       r = amdgpu_ring_elevate_priority(job->ring, job->priority,
>>>> job);
>>>> +       if (r)
>>>> +               DRM_ERROR("Failed to set job priority (%d)\n", r);
>>> elevate priority of ring is for this job, right?
>>> you're setting priority by mmio method and restore priority by fence cb,
>>> your method isn't precise. I have some comments:
>>> 1. seems we should elevate_priority before amdgpu_ib_schedule, so
>>> that the
>>> ring is proper priority for this submission.
> Yeah thanks for the catch on that. This might result in the first
> packets being emitted with a new_wave_priority lower than desired and
> introduce extra delay.
>>> 2. can we put setting of registers of SPI into ring buffer as a command
>>> frame? evevate_priority setting is front of a command frame and
>>> restore_priority is end of frame.
> We can't do it on the ring because we need any work that is already
> committed to the ring the inherit the highest priority. Otherwise the
> high priority work would have to wait for the regular priority work (to
> complete at regular priority latency) before it can begin.
>
> For more details see my reply to Christian's email regarding this topic.
>> We should probably check if changing the queue priority requires that
>> the engine be idle as well.  I vaguely recall there being a packet for
>> some of this that was added for HSA.
> +Jay
>
> Hey Jay,
>
> Do you have more details on this?
>
> I wrote these patches under the assumption that changing the priority
> while the engine is busy was allowed, but that it would not have any
> effect for waves that have already been emitted.
>
> Would changing the priorities while the engines are active cause any
> hangs or bad side effects?
>>
>> Alex
>>
>>> 3. if no priority exchanges, we can skip setting for priority for
>>> command
>>> submission.
> I originally had a check that skipped set_priority() if the priorities
> matched, but I decided to remove it.
>
> The reason is that while no changes to the HW state may need to happen,
> the implementation may need to track extra metadata about who is
> executing high priority work on the ring. Therefore whether the
> operation is a noop or not should be decided by the specific
> implementation.
>
> Having said that, I think gfx_v8_0_ring_set_priority_compute() can be
> improved to skip more work. So I'll apply your suggestion there.

Apparently I was misremembering an old implementation of cu_reserve().

The current patch already does what you requested. Refer to the 
following code in  amdgpu_ring_elevate_priority():

if (priority <= ring->priority)
                goto out_unlock;

>
>>>
>>> Regards,
>>> David Zhou
>>>
>>>> +
>>>>          amdgpu_job_free_resources(job);
>>>>          return fence;
>>>>    }
>>>>      const struct amd_sched_backend_ops amdgpu_sched_ops = {
>>>>          .dependency = amdgpu_job_dependency,
>>>>          .run_job = amdgpu_job_run,
>>>>          .timedout_job = amdgpu_job_timedout,
>>>>          .free_job = amdgpu_job_free_cb
>>>>    };
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>>> index 80cb051..12bc7a9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>>>> @@ -192,20 +192,89 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
>>>>     * Reset the driver's copy of the wptr (all asics).
>>>>     */
>>>>    void amdgpu_ring_undo(struct amdgpu_ring *ring)
>>>>    {
>>>>          ring->wptr = ring->wptr_old;
>>>>          if (ring->funcs->end_use)
>>>>                  ring->funcs->end_use(ring);
>>>>    }
>>>>    +static void amdgpu_ring_restore_priority_cb(struct dma_fence *f,
>>>> +                                           struct dma_fence_cb *cb)
>>>> +{
>>>> +       int i;
>>>> +       struct amdgpu_job *cb_job =
>>>> +               container_of(cb, struct amdgpu_job, cb);
>>>> +       struct amdgpu_ring *ring = cb_job->ring;
>>>> +
>>>> +       spin_lock(&ring->priority_lock);
>>>> +
>>>> +       /* remove ourselves from the list if necessary */
>>>> +       if (cb_job == ring->last_job[cb_job->priority])
>>>> +               ring->last_job[cb_job->priority] = NULL;
>>>> +
>>>> +       /* something higher prio is executing, no need to decay */
>>>> +       if (ring->priority > cb_job->priority)
>>>> +               goto out_unlock;
>>>> +
>>>> +       /* decay priority to the next level with a job available */
>>>> +       for (i = cb_job->priority; i >= 0; i--) {
>>>> +               if (i == AMDGPU_CTX_PRIORITY_NORMAL ||
>>>> ring->last_job[i])
>>>> {
>>>> +                       ring->priority = i;
>>>> +                       if (ring->funcs->set_priority)
>>>> +                               ring->funcs->set_priority(ring, i);
>>>> +
>>>> +                       break;
>>>> +               }
>>>> +       }
>>>> +
>>>> +out_unlock:
>>>> +       spin_unlock(&ring->priority_lock);
>>>> +}
>>>> +
>>>> +/**
>>>> + * amdgpu_ring_elevate_priority - change the ring's priority
>>>> + *
>>>> + * @ring: amdgpu_ring structure holding the information
>>>> + * @priority: target priority
>>>> + * @job: priority should remain elevated for the duration of this job
>>>> + *
>>>> + * Use HW specific mechanism's to elevate the ring's priority while
>>>> @job
>>>> + * is executing. Once @job finishes executing, the ring will reset
>>>> back
>>>> + * to normal priority.
>>>> + * Returns 0 on success, error otherwise
>>>> + */
>>>> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int
>>>> priority,
>>>> +                                struct amdgpu_job *job)
>>>> +{
>>>> +       if (priority < 0 || priority >= AMDGPU_CTX_PRIORITY_NUM)
>>>> +               return -EINVAL;
>>>> +
>>>> +       spin_lock(&ring->priority_lock);
>>>> +       ring->last_job[priority] = job;
>>>> +
>>>> +       if (priority <= ring->priority)
>>>> +               goto out_unlock;
>>>> +
>>>> +       ring->priority = priority;
>>>> +       if (ring->funcs->set_priority)
>>>> +               ring->funcs->set_priority(ring, priority);
>>>> +
>>>> +       dma_fence_add_callback(job->fence, &job->cb,
>>>> +                              amdgpu_ring_restore_priority_cb);
>>>> +
>>>> +out_unlock:
>>>> +       spin_unlock(&ring->priority_lock);
>>>> +       return 0;
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_ring_init - init driver ring struct.
>>>>     *
>>>>     * @adev: amdgpu_device pointer
>>>>     * @ring: amdgpu_ring structure holding ring information
>>>>     * @max_ndw: maximum number of dw for ring alloc
>>>>     * @nop: nop packet for this ring
>>>>     *
>>>>     * Initialize the driver information for the selected ring (all
>>>> asics).
>>>>     * Returns 0 on success, error on failure.
>>>> @@ -275,20 +344,22 @@ int amdgpu_ring_init(struct amdgpu_device *adev,
>>>> struct amdgpu_ring *ring,
>>>>                                              (void **)&ring->ring);
>>>>                  if (r) {
>>>>                          dev_err(adev->dev, "(%d) ring create
>>>> failed\n",
>>>> r);
>>>>                          return r;
>>>>                  }
>>>>                  memset((void *)ring->ring, 0, ring->ring_size);
>>>>          }
>>>>          ring->ptr_mask = (ring->ring_size / 4) - 1;
>>>>          ring->max_dw = max_dw;
>>>>          ring->hw_ip = hw_ip;
>>>> +       ring->priority = AMDGPU_CTX_PRIORITY_NORMAL;
>>>> +       spin_lock_init(&ring->priority_lock);
>>>>          INIT_LIST_HEAD(&ring->lru_list);
>>>>          amdgpu_ring_lru_touch(adev, ring);
>>>>          if (amdgpu_debugfs_ring_init(adev, ring)) {
>>>>                  DRM_ERROR("Failed to register debugfs file for rings
>>>> !\n");
>>>>          }
>>>>          return 0;
>>>>    }
>>>>      /**
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> index ecdd87c..befc29f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> @@ -17,20 +17,21 @@
>>>>     * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>>>> DAMAGES
>>>> OR
>>>>     * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>>>> OTHERWISE,
>>>>     * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>>>> USE OR
>>>>     * OTHER DEALINGS IN THE SOFTWARE.
>>>>     *
>>>>     * Authors: Christian König
>>>>     */
>>>>    #ifndef __AMDGPU_RING_H__
>>>>    #define __AMDGPU_RING_H__
>>>>    +#include <drm/amdgpu_drm.h>
>>>>    #include "gpu_scheduler.h"
>>>>      /* max number of rings */
>>>>    #define AMDGPU_MAX_RINGS              16
>>>>    #define AMDGPU_MAX_GFX_RINGS          1
>>>>    #define AMDGPU_MAX_COMPUTE_RINGS      8
>>>>    #define AMDGPU_MAX_VCE_RINGS          3
>>>>      /* some special values for the owner field */
>>>>    #define AMDGPU_FENCE_OWNER_UNDEFINED  ((void*)0ul)
>>>> @@ -130,20 +131,22 @@ struct amdgpu_ring_funcs {
>>>>          void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib
>>>> *ib);
>>>>          unsigned (*init_cond_exec)(struct amdgpu_ring *ring);
>>>>          void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned
>>>> offset);
>>>>          /* note usage for clock and power gating */
>>>>          void (*begin_use)(struct amdgpu_ring *ring);
>>>>          void (*end_use)(struct amdgpu_ring *ring);
>>>>          void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>>>>          void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t
>>>> flags);
>>>>          void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>>>>          void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg,
>>>> uint32_t
>>>> val);
>>>> +       /* priority functions */
>>>> +       void (*set_priority) (struct amdgpu_ring *ring, int priority);
>>>>    };
>>>>      struct amdgpu_ring {
>>>>          struct amdgpu_device            *adev;
>>>>          const struct amdgpu_ring_funcs  *funcs;
>>>>          struct amdgpu_fence_driver      fence_drv;
>>>>          struct amd_gpu_scheduler        sched;
>>>>          struct list_head                lru_list;
>>>>          struct amdgpu_bo        *ring_obj;
>>>> @@ -165,31 +168,39 @@ struct amdgpu_ring {
>>>>          struct amdgpu_bo        *mqd_obj;
>>>>          u32                     doorbell_index;
>>>>          bool                    use_doorbell;
>>>>          unsigned                wptr_offs;
>>>>          unsigned                fence_offs;
>>>>          uint64_t                current_ctx;
>>>>          char                    name[16];
>>>>          unsigned                cond_exe_offs;
>>>>          u64                     cond_exe_gpu_addr;
>>>>          volatile u32            *cond_exe_cpu_addr;
>>>> +
>>>> +       spinlock_t              priority_lock;
>>>> +       /* protected by priority_lock */
>>>> +       struct amdgpu_job       *last_job[AMDGPU_CTX_PRIORITY_NUM];
>>>> +       int                     priority;
>>>> +
>>>>    #if defined(CONFIG_DEBUG_FS)
>>>>          struct dentry *ent;
>>>>    #endif
>>>>    };
>>>>      int amdgpu_ring_is_valid_index(struct amdgpu_device *adev,
>>>>                                 int hw_ip, int ring);
>>>>    int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
>>>>    void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t
>>>> count);
>>>>    void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct
>>>> amdgpu_ib *ib);
>>>>    void amdgpu_ring_commit(struct amdgpu_ring *ring);
>>>>    void amdgpu_ring_undo(struct amdgpu_ring *ring);
>>>> +int amdgpu_ring_elevate_priority(struct amdgpu_ring *ring, int
>>>> priority,
>>>> +                                struct amdgpu_job *job);
>>>>    int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring
>>>> *ring,
>>>>                       int hw_ip, unsigned ring_size,
>>>>                       struct amdgpu_irq_src *irq_src, unsigned
>>>> irq_type);
>>>>    void amdgpu_ring_fini(struct amdgpu_ring *ring);
>>>>    int amdgpu_ring_lru_get(struct amdgpu_device *adev, int hw_ip,
>>>>                          struct amdgpu_ring **ring);
>>>>    void amdgpu_ring_lru_touch(struct amdgpu_device *adev, struct
>>>> amdgpu_ring *ring);
>>>>      #endif
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Add support for high priority scheduling in amdgpu
       [not found]             ` <4c908b1f-fcb2-7d89-026a-76fd3f4f1f22-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-02 11:00               ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2017-03-02 11:00 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 01.03.2017 um 18:24 schrieb Andres Rodriguez:
>
>
> On 2017-03-01 12:13 PM, Andres Rodriguez wrote:
>>
>>
>> On 3/1/2017 6:42 AM, Christian König wrote:
>>> Patches #1-#14 are Acked-by: Christian König 
>>> <christian.koenig@amd.com>.
>>>
>>> Patch #15:
>>>
>>> Not sure if that is a good idea or not, need to take a closer look 
>>> after digging through the rest.
>>>
>>> In general the HW IP is just for the IOCTL API and not for internal 
>>> use inside the driver.
>> I'll drop this patch and use ring->funcs->type instead.
>>>
>>> Patch #16:
>>>
>>> Really nice :) I don't have time to look into it in detail, but you 
>>> have one misconception I like to point out:
>>>> The queue manager maintains a per-file descriptor map of user ring ids
>>>> to amdgpu_ring pointers. Once a map is created it is permanent 
>>>> (this is
>>>> required to maintain FIFO execution guarantees for a ring).
>>> Actually we don't have a FIFO execution guarantee per ring. We only 
>>> have that per context.
>>
>> Agreed. I'm using pretty imprecise terminology here which can be 
>> confusing. I wanted to be more precise than "context", because two 
>> amdgpu_cs_request submissions to the same context but with a 
>> different ring field can execute out of order.
>>
>> I think s/ring/context's ring/ should be enough to clarify here if 
>> you think so as well.

Yeah, just fix the description a bit and we are good to go.

>>
>>>
>>> E.g. commands from different context can execute at the same time 
>>> and out of order.
>>>
>>> Making this per file is ok for now, but you should keep in mind that 
>>> we might want to change that sooner or later.
>>>
>>> Patch #17 & #18 need to take a closer look when I have more time, 
>>> but the comments from others sounded valid to me as well.
>>>
>>> Patch #19: Raising and lowering the priority of a ring during 
>>> command submission doesn't sound like a good idea to me.
>> I'm not really sure what would be a better time than at command 
>> submission.
>>
>> If it was just SPI priorities we could have static partitioning of 
>> rings, some high priority and some regular, etc. But that approach 
>> reduces the number of rings
> Sorry, I finished typing something else and forgot this section was 
> incomplete. Full reply:
>
> I'm not really sure what would be a better time than at command 
> submission.
>
> If it was just SPI priorities we could have static partitioning of 
> rings, some high priority and some regular, etc. But that approach 
> reduces the number of rings available. It would also require a 
> callback at command submission time for CU reservation.

Ok, as Alex wrote as well I'm not 100% sure if that really works on all 
hardware. But we could give it a try, cause I don't see much of a better 
alternative either.

>>>
>>> The way you currently have it implemented would also raise the 
>>> priority of already running jobs on the same ring. Keep in mind that 
>>> everything is pipelined here.
>> That is actually intentional. If there is work already on the ring 
>> with lower priority we don't want the high priority work to have to 
>> wait for it to finish executing at regular priority. Therefore the 
>> work that has already been commited to the ring inherits the higher 
>> priority level.
>>
>> I agree this isn't ideal, which is why the LRU ring mapping policy is 
>> there to make sure this doesn't happen often.
>>>
>>> Additional to that you can't have a fence callback in the job 
>>> structure, cause the job structure is freed by the same fence as 
>>> well. So it can happen that you access freed up memory (but only for 
>>> a very short period of time).
>> Any strong preference for either 1) refcounting the job structure, or 
>> 2) allocating a new piece of memory to store the callback parameters?

How about option #3, just add that to the job lifetime.

See drivers/gpu/drm/amd/amdgpu/amdgpu_job.c. amdgpu_job_run() is called 
when the job is ready to run.

amdgpu_job_free_cb() is from a work item after a scheduler job has 
finished executing. If that is to late we could also add another 
callback to the scheduler for this.

amdgpu_job_free() is called when we directly submitted the job without 
going through the scheduler. Don't touch it that is just for GPU reset 
handling.

Regards,
Christian.

>>
>>> Patches #20-#22 are Acked-by: Christian König 
>>> <christian.koenig@amd.com>.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 28.02.2017 um 23:14 schrieb Andres Rodriguez:
>>>> This patch series introduces a mechanism that allows users with 
>>>> sufficient
>>>> privileges to categorize their work as "high priority". A userspace 
>>>> app can
>>>> create a high priority amdgpu context, where any work submitted to 
>>>> this context
>>>> will receive preferential treatment over any other work.
>>>>
>>>> High priority contexts will be scheduled ahead of other contexts by 
>>>> the sw gpu
>>>> scheduler. This functionality is generic for all HW blocks.
>>>>
>>>> Optionally, a ring can implement a set_priority() function that allows
>>>> programming HW specific features to elevate a ring's priority.
>>>>
>>>> This patch series implements set_priority() for gfx8 compute rings. 
>>>> It takes
>>>> advantage of SPI scheduling and CU reservation to provide improved 
>>>> frame
>>>> latencies for high priority contexts.
>>>>
>>>> For compute + compute scenarios we get near perfect scheduling 
>>>> latency. E.g.
>>>> one high priority ComputeParticles + one low priority 
>>>> ComputeParticles:
>>>>      - High priority ComputeParticles: 2.0-2.6 ms/frame
>>>>      - Regular ComputeParticles: 35.2-68.5 ms/frame
>>>>
>>>> For compute + gfx scenarios the high priority compute application does
>>>> experience some latency variance. However, the variance has smaller 
>>>> bounds and
>>>> a smalled deviation then without high priority scheduling.
>>>>
>>>> Following is a graph of the frame time experienced by a high 
>>>> priority compute
>>>> app in 4 different scenarios to exemplify the compute + gfx latency 
>>>> variance:
>>>>      - ComputeParticles: this scenario invloves running the compute 
>>>> particles
>>>>        sample on its own.
>>>>      - +SSAO: Previous scenario with the addition of running the 
>>>> ssao sample
>>>>        application that clogs the GFX ring with constant work.
>>>>      - +SPI Priority: Previous scenario with the addition of SPI 
>>>> priority
>>>>        programming for compute rings.
>>>>      - +CU Reserve: Previous scenario with the addition of dynamic CU
>>>>        reservation for compute rings.
>>>>
>>>> Graph link:
>>>> https://plot.ly/~lostgoat/9/
>>>>
>>>> As seen above, high priority contexts for compute allow us to 
>>>> schedule work
>>>> with enhanced confidence of completion latency under high GPU 
>>>> loads. This
>>>> property will be important for VR reprojection workloads.
>>>>
>>>> Note: The first part of this series is a resend of "Change 
>>>> queue/pipe split
>>>> between amdkfd and amdgpu" with the following changes:
>>>>      - Fixed kfdtest on Kaveri due to shift overflow. Refer to: 
>>>> "drm/amdkfdallow
>>>>        split HQD on per-queue granularity v3"
>>>>      - Used Felix's suggestions for a simplified HQD programming 
>>>> sequence
>>>>      - Added a workaround for a Tonga HW bug during HQD programming
>>>>
>>>> This series is also available at:
>>>> https://github.com/lostgoat/linux/tree/wip-high-priority
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
       [not found]             ` <782283a5-3871-0827-ed2c-9069a6dc6734-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-03 14:48               ` Emil Velikov
  0 siblings, 0 replies; 40+ messages in thread
From: Emil Velikov @ 2017-03-03 14:48 UTC (permalink / raw)
  To: Andres Rodriguez; +Cc: amd-gfx mailing list

On 2 March 2017 at 03:52, Andres Rodriguez <andresx7@gmail.com> wrote:
>
>
> On 2017-02-28 08:13 PM, Emil Velikov wrote:
>>
>> Hi Andres,
>>
>> There's a couple of nitpicks below, but feel free to address those as
>> follow-up. Considering they're correct of course ;-)
>
>
> As much as I'd like the to let future me deal with those issues, the UAPI
> behavior is something I'd like to get nailed down early and avoid changing.
>
> So any nitpicks here are more than welcome now (better than later :) )
>
>>
>> On 28 February 2017 at 22:14, Andres Rodriguez <andresx7@gmail.com> wrote:
>>>
>>> Add a new context creation parameter to express a global context
>>> priority.
>>>
>>> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
>>> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
>>> (default) contexts.
>>>
>>> v2: Instead of using flags, repurpose __pad
>>> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
>>> v4: Validate usermode priority and store it
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41
>>> +++++++++++++++++++++++----
>>>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
>>>  include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
>>>  4 files changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index e30c47e..366f6d3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>>>         struct amd_sched_entity entity;
>>>  };
>>>
>>>  struct amdgpu_ctx {
>>>         struct kref             refcount;
>>>         struct amdgpu_device    *adev;
>>>         unsigned                reset_counter;
>>>         spinlock_t              ring_lock;
>>>         struct dma_fence        **fences;
>>>         struct amdgpu_ctx_ring  rings[AMDGPU_MAX_RINGS];
>>> +       int                     priority;
>>>         bool preamble_presented;
>>>  };
>>>
>>>  struct amdgpu_ctx_mgr {
>>>         struct amdgpu_device    *adev;
>>>         struct mutex            lock;
>>>         /* protected by lock */
>>>         struct idr              ctx_handles;
>>>  };
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index 400c66b..22a15d6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -18,47 +18,75 @@
>>>   * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>   * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>   * OTHER DEALINGS IN THE SOFTWARE.
>>>   *
>>>   * Authors: monk liu <monk.liu@amd.com>
>>>   */
>>>
>>>  #include <drm/drmP.h>
>>>  #include "amdgpu.h"
>>>
>>> -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx
>>> *ctx)
>>> +static enum amd_sched_priority amdgpu_to_sched_priority(int
>>> amdgpu_priority)
>>> +{
>>> +       switch (amdgpu_priority) {
>>> +       case AMDGPU_CTX_PRIORITY_HIGH:
>>> +               return AMD_SCHED_PRIORITY_HIGH;
>>> +       case AMDGPU_CTX_PRIORITY_NORMAL:
>>> +               return AMD_SCHED_PRIORITY_NORMAL;
>>> +       default:
>>> +               WARN(1, "Invalid context priority %d\n",
>>> amdgpu_priority);
>>> +               return AMD_SCHED_PRIORITY_NORMAL;
>>> +       }
>>> +}
>>> +
>>> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
>>> +                               uint32_t priority,
>>> +                               struct amdgpu_ctx *ctx)
>>>  {
>>>         unsigned i, j;
>>>         int r;
>>> +       enum amd_sched_priority sched_priority;
>>> +
>>> +       sched_priority = amdgpu_to_sched_priority(priority);
>>> +
>>
>> This will trigger dmesg spam on normal user input. I'd keep the WARN
>> in amdgpu_to_sched_priority, but move the function call after the
>> validation of priority.
>> Thinking about it the input validation really belongs in the ioctl -
>> amdgpu_ctx_ioctl().
>>
>
> Agreed.
>
>>> +       if (priority >= AMDGPU_CTX_PRIORITY_NUM)
>>> +               return -EINVAL;
>>> +
>>> +       if (sched_priority < 0 || sched_priority >=
>>> AMD_SCHED_MAX_PRIORITY)
>>> +               return -EINVAL;
>>> +
>>> +       if (sched_priority == AMD_SCHED_PRIORITY_HIGH &&
>>> !capable(CAP_SYS_ADMIN))
>>
>> This is not obvious neither in the commit message nor the UAPI. I'd
>> suggest adding a comment in the latter.
>
>
> Will do.
>
>> If memory is not failing - high prio will _not_ work with render nodes
>> so you really want to cover and/or explain why.
>
>
> High priority will work fine with render nodes. I'm testing using radv which
> uses render nodes actually.
>
> But I've had my fair share of two bugs canceling each other out. So if you
> do have some insight on why this is the case, let me know.
>
Right - got confused by the CAP_SYS_ADMIN cap. What I meant above is -
why do we need it, does it impose specific workflow, permissions for
the userspace to use high prio ?
This is the type of thing, I think, must be documented in the UAPI header.

>>
>>> +               return -EACCES;
>>>
>>>         memset(ctx, 0, sizeof(*ctx));
>>>         ctx->adev = adev;
>>> +       ctx->priority = priority;
>>>         kref_init(&ctx->refcount);
>>>         spin_lock_init(&ctx->ring_lock);
>>>         ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>>>                               sizeof(struct dma_fence*), GFP_KERNEL);
>>>         if (!ctx->fences)
>>>                 return -ENOMEM;
>>>
>>>         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>                 ctx->rings[i].sequence = 1;
>>>                 ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs *
>>> i];
>>>         }
>>>
>>>         ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>>>
>>>         /* create context entity for each ring */
>>>         for (i = 0; i < adev->num_rings; i++) {
>>>                 struct amdgpu_ring *ring = adev->rings[i];
>>>                 struct amd_sched_rq *rq;
>>>
>>> -               rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
>>> +               rq = &ring->sched.sched_rq[sched_priority];
>>>                 r = amd_sched_entity_init(&ring->sched,
>>> &ctx->rings[i].entity,
>>>                                           rq, amdgpu_sched_jobs);
>>>                 if (r)
>>>                         goto failed;
>>>         }
>>>
>>>         return 0;
>>>
>>>  failed:
>>>         for (j = 0; j < i; j++)
>>> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>>>         kfree(ctx->fences);
>>>         ctx->fences = NULL;
>>>
>>>         for (i = 0; i < adev->num_rings; i++)
>>>                 amd_sched_entity_fini(&adev->rings[i]->sched,
>>>                                       &ctx->rings[i].entity);
>>>  }
>>>
>>>  static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>>>                             struct amdgpu_fpriv *fpriv,
>>> +                           uint32_t priority,
>>>                             uint32_t *id)
>>>  {
>>>         struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>>>         struct amdgpu_ctx *ctx;
>>>         int r;
>>>
>>>         ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>>>         if (!ctx)
>>>                 return -ENOMEM;
>>>
>>>         mutex_lock(&mgr->lock);
>>>         r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>>>         if (r < 0) {
>>>                 mutex_unlock(&mgr->lock);
>>>                 kfree(ctx);
>>>                 return r;
>>>         }
>>> +
>>>         *id = (uint32_t)r;
>>> -       r = amdgpu_ctx_init(adev, ctx);
>>> +       r = amdgpu_ctx_init(adev, priority, ctx);
>>>         if (r) {
>>>                 idr_remove(&mgr->ctx_handles, *id);
>>>                 *id = 0;
>>>                 kfree(ctx);
>>>         }
>>>         mutex_unlock(&mgr->lock);
>>>         return r;
>>>  }
>>>
>>>  static void amdgpu_ctx_do_release(struct kref *ref)
>>> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device
>>> *adev,
>>>         ctx->reset_counter = reset_counter;
>>>
>>>         mutex_unlock(&mgr->lock);
>>>         return 0;
>>>  }
>>>
>>>  int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>>>                      struct drm_file *filp)
>>>  {
>>>         int r;
>>> -       uint32_t id;
>>> +       uint32_t id, priority;
>>>
>>>         union drm_amdgpu_ctx *args = data;
>>>         struct amdgpu_device *adev = dev->dev_private;
>>>         struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>
>>>         r = 0;
>>>         id = args->in.ctx_id;
>>> +       priority = args->in.priority;
>>>
>> Hmm we don't seem to be doing any in.flags validation - not cool.
>> Someone seriously wants to add that and check the remaining ioctls.
>> At the same time - I think you want to add a flag bit "HAS_PRIORITY"
>> [or similar] and honour in.priority only when that is set.
>>
>> Even if the USM drivers are safe, this will break on a poor soul that
>> is learning how to program their GPU. "My program was running before -
>> I updated the kernel and it no longer does :-("
>
>
> Improving validation of already existing parameters is probably something
> better served in a separate series (so in this case I will take you up on
> the earlier offer)
>
Agreed.

> As far as a HAS_PRIORITY flag goes, I'm not sure it would really be
> necessary. libdrm_amdgpu zeroes their data structures since its first commit
> (0936139), so the change should be backwards compatible with all
> libdrm_amdgpu versions.
>
> If someone is bypassing libdrm_amdgpu and hitting the IOCTLs directly, then
> I hope they know what they are doing. The only apps I've really seen do this
> are fuzzers, and they probably wouldn't care.
>
> I'm assuming we do have the flexibility of relying on the usermode library
> as the point of the backwards compatibility.
>
Since there is no validation in the first place anyone can feed
garbage into the ioctl without knowing that they're doing it wrong.
Afaict the rule tends to be - kernel updates should never break
userspace.
And by the time you realise there is one here, it'll be too late.

An even if we ignore all that for a moment, flags is there exactly for
things like HAS_PRIORITY and adding/managing is trivial.
Then again, if you feel so strongly against ~5 lines of code, so be it :-(

Thanks
Emil
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-03-03 14:48 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-28 22:14 Add support for high priority scheduling in amdgpu Andres Rodriguez
     [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-28 22:14   ` [PATCH 01/22] drm/amdgpu: refactor MQD/HQD initialization Andres Rodriguez
2017-02-28 22:14   ` [PATCH 02/22] drm/amdgpu: doorbell registers need only be set once v2 Andres Rodriguez
2017-02-28 22:14   ` [PATCH 03/22] drm/amdgpu: detect timeout error when deactivating hqd Andres Rodriguez
2017-02-28 22:14   ` [PATCH 04/22] drm/amdgpu: remove duplicate definition of cik_mqd Andres Rodriguez
2017-02-28 22:14   ` [PATCH 05/22] drm/amdgpu: unify MQD programming sequence for kfd and amdgpu Andres Rodriguez
2017-02-28 22:14   ` [PATCH 06/22] drm/amdgpu: rename rdev to adev Andres Rodriguez
2017-02-28 22:14   ` [PATCH 07/22] drm/amdgpu: take ownership of per-pipe configuration Andres Rodriguez
2017-02-28 22:14   ` [PATCH 08/22] drm/radeon: take ownership of pipe initialization Andres Rodriguez
2017-02-28 22:14   ` [PATCH 09/22] drm/amdgpu: allow split of queues with kfd at queue granularity Andres Rodriguez
2017-02-28 22:14   ` [PATCH 10/22] drm/amdgpu: teach amdgpu how to enable interrupts for any pipe Andres Rodriguez
2017-02-28 22:14   ` [PATCH 11/22] drm/amdkfd: allow split HQD on per-queue granularity v3 Andres Rodriguez
2017-02-28 22:14   ` [PATCH 12/22] drm/amdgpu: remove duplicate magic constants from amdgpu_amdkfd_gfx*.c Andres Rodriguez
2017-02-28 22:14   ` [PATCH 13/22] drm/amdgpu: allocate queues horizontally across pipes Andres Rodriguez
2017-02-28 22:14   ` [PATCH 14/22] drm/amdgpu: new queue policy, take first 2 queues of each pipe Andres Rodriguez
2017-02-28 22:14   ` [PATCH 15/22] drm/amdgpu: add hw_ip member to amdgpu_ring Andres Rodriguez
     [not found]     ` <1488320089-22035-16-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01 15:33       ` Alex Deucher
2017-02-28 22:14   ` [PATCH 16/22] drm/amdgpu: add a mechanism to untie user ring ids from kernel ring ids Andres Rodriguez
2017-02-28 22:14   ` [PATCH 17/22] drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute Andres Rodriguez
2017-02-28 22:14   ` [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4 Andres Rodriguez
     [not found]     ` <1488320089-22035-19-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01  1:13       ` Emil Velikov
     [not found]         ` <CACvgo51=1-8dHmC8MOmbCijDv3vpD4dTC6hibQMe5bYB9zsB4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-02  3:52           ` Andres Rodriguez
     [not found]             ` <782283a5-3871-0827-ed2c-9069a6dc6734-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-03 14:48               ` Emil Velikov
2017-03-01  6:52       ` zhoucm1
     [not found]         ` <58B66FB8.8050300-5C7GfCeVMHo@public.gmane.org>
2017-03-01  7:09           ` zhoucm1
     [not found]             ` <58B673C0.4070006-5C7GfCeVMHo@public.gmane.org>
2017-03-01 11:51               ` Emil Velikov
2017-02-28 22:14   ` [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings Andres Rodriguez
     [not found]     ` <1488320089-22035-20-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01  7:27       ` zhoucm1
     [not found]         ` <58B677DD.4070408-5C7GfCeVMHo@public.gmane.org>
2017-03-01 15:49           ` Alex Deucher
     [not found]             ` <CADnq5_NhLAOsR7tHhRZRzA12j_-5MWFEXfWeGqKmSifHp_5jKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-01 17:44               ` Andres Rodriguez
     [not found]                 ` <f0de5e4f-bf94-9222-cc9e-1d535c228b0a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-02  6:45                   ` Andres Rodriguez
2017-02-28 22:14   ` [PATCH 20/22] drm/amdgpu: implement ring set_priority for gfx_v8 compute Andres Rodriguez
2017-02-28 22:14   ` [PATCH 21/22] drm/amdgpu: condense mqd programming sequence Andres Rodriguez
2017-02-28 22:14   ` [PATCH 22/22] drm/amdgpu: workaround tonga HW bug in HQD " Andres Rodriguez
2017-03-01 11:42   ` Add support for high priority scheduling in amdgpu Christian König
     [not found]     ` <25194b1a-4756-e1ad-f597-17063a14eb4c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-03-01 17:13       ` Andres Rodriguez
     [not found]         ` <ddeb4a53-ec4f-9a87-9323-897c571b1634-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01 17:24           ` Andres Rodriguez
     [not found]             ` <4c908b1f-fcb2-7d89-026a-76fd3f4f1f22-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-02 11:00               ` Christian König
2017-03-01 16:14   ` Bridgman, John
     [not found]     ` <BN6PR12MB1348B8F1F537321557D522AFE8290-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-03-01 16:37       ` Andres Rodriguez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.