amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] add direct IB pool
@ 2020-03-26  2:01 xinhui pan
  2020-03-26  2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: xinhui pan @ 2020-03-26  2:01 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig

druing gpu recovery, we alloc ibs for ring tests to test if recovery
succeed or not.

As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib
resource has no chance to free. So the ring test above got stuck if no
ib to alloc.

If we schedule IBs directly in job_submit_direct, we can alloc ibs in
the new ib pool. It should have less contention.

If the IB could be freed in time, IOW, not depending on any scheduler,
nor any other blocking code. It is better to alloc ibs in direct pool.

xinhui pan (2):
  drm/amdgpu: add direct ib pool
  drm/amdgpu: use new job alloc variation if possible

 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  4 ++--
 13 files changed, 35 insertions(+), 18 deletions(-)

-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/2] drm/amdgpu: add direct ib pool
  2020-03-26  2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan
@ 2020-03-26  2:01 ` xinhui pan
  2020-03-26  5:38   ` Koenig, Christian
  2020-03-26  2:01 ` [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible xinhui pan
  2020-03-26  5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk
  2 siblings, 1 reply; 9+ messages in thread
From: xinhui pan @ 2020-03-26  2:01 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig

Another ib poll for direct submit.
Any jobs schedule IBs without dependence on gpu scheduler should use
this pool firstly.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 7dd74253e7b6..c01423ffb8ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -849,6 +849,7 @@ struct amdgpu_device {
 	struct amdgpu_ring		*rings[AMDGPU_MAX_RINGS];
 	bool				ib_pool_ready;
 	struct amdgpu_sa_manager	ring_tmp_bo;
+	struct amdgpu_sa_manager	ring_tmp_bo_direct;
 
 	/* interrupts */
 	struct amdgpu_irq		irq;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8304d0c87899..28be4efb3d5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
 		parser->entity = entity;
 
 		ring = to_amdgpu_ring(entity->rq->sched);
-		r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
+		r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ?
 				   chunk_ib->ib_bytes : 0, ib);
 		if (r) {
 			DRM_ERROR("Failed to get ib !\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index bece01f1cf09..f2e08c372d57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	int r;
 
 	if (size) {
-		r = amdgpu_sa_bo_new(&adev->ring_tmp_bo,
+		r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct,
 				      &ib->sa_bo, size, 256);
 		if (r) {
 			dev_err(adev->dev, "failed to get a new IB (%d)\n", r);
@@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 
 		ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo);
 
-		if (!vm)
+		if (!((unsigned long)vm & ~0x1))
 			ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
 	}
 
@@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
 		return r;
 	}
 
+	r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct,
+				      AMDGPU_IB_POOL_SIZE*64*1024,
+				      AMDGPU_GPU_PAGE_SIZE,
+				      AMDGPU_GEM_DOMAIN_GTT);
+	if (r) {
+		return r;
+	}
 	adev->ib_pool_ready = true;
 
 	return 0;
@@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
 {
 	if (adev->ib_pool_ready) {
 		amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo);
+		amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct);
 		adev->ib_pool_ready = false;
 	}
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 4981e443a884..6a63826c6760 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
 
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
 			     struct amdgpu_job **job)
+{
+	return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0);
+}
+
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+			     struct amdgpu_job **job, int direct)
 {
 	int r;
 
@@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
 	if (r)
 		return r;
 
-	r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
+	r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]);
 	if (r)
 		kfree(*job);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2e2110dddb76..be9dd72b9912 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
 		     struct amdgpu_job **job, struct amdgpu_vm *vm);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
 			     struct amdgpu_job **job);
-
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+			     struct amdgpu_job **job, int direct);
 void amdgpu_job_free_resources(struct amdgpu_job *job);
 void amdgpu_job_free(struct amdgpu_job *job);
 int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible
  2020-03-26  2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan
  2020-03-26  2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan
@ 2020-03-26  2:01 ` xinhui pan
  2020-03-26  5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk
  2 siblings, 0 replies; 9+ messages in thread
From: xinhui pan @ 2020-03-26  2:01 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig

If we scheduler IB directly, then alloc job ibs in direct ib pool.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       | 4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       | 4 ++--
 8 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
index 5727f00afc8e..75458f15f032 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
@@ -144,7 +144,7 @@ static int amdgpu_jpeg_dec_set_reg(struct amdgpu_ring *ring, uint32_t handle,
 	const unsigned ib_size_dw = 16;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 54cfa3a12135..3360f5eaf19f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2123,7 +2123,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
 	num_loops = DIV_ROUND_UP(byte_count, max_bytes);
 	num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
 
-	r = amdgpu_job_alloc_with_ib(adev, num_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(adev, num_dw * 4, &job, direct_submit);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 5fd32ad1c575..a301f8f49997 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1056,7 +1056,7 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, struct amdgpu_bo *bo,
 			goto err;
 	}
 
-	r = amdgpu_job_alloc_with_ib(adev, 64, &job);
+	r = amdgpu_job_alloc_with_ib_direct(adev, 64, &job, direct);
 	if (r)
 		goto err;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59ddba137946..80b750e15c67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -446,7 +446,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle,
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
@@ -524,7 +524,7 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle,
 	struct dma_fence *f = NULL;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, direct);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index a41272fbcba2..7a9d917f9704 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -390,7 +390,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring,
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(adev, 64, &job);
+	r = amdgpu_job_alloc_with_ib_direct(adev, 64, &job, 1);
 	if (r)
 		goto err;
 
@@ -557,7 +557,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
@@ -610,7 +610,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index d30d103e48a2..65e920892884 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -64,7 +64,8 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
 	unsigned int ndw = AMDGPU_VM_SDMA_MIN_NUM_DW;
 	int r;
 
-	r = amdgpu_job_alloc_with_ib(p->adev, ndw * 4, &p->job);
+	r = amdgpu_job_alloc_with_ib_direct(p->adev, ndw * 4, &p->job,
+					p->direct /* direct pool has less contention*/);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index e0aadcaf6c8b..a06aead049c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -216,7 +216,7 @@ static int uvd_v6_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
@@ -279,7 +279,7 @@ static int uvd_v6_0_enc_get_destroy_msg(struct amdgpu_ring *ring,
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
index 0995378d8263..d82b4dfa6ca4 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
@@ -224,7 +224,7 @@ static int uvd_v7_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
@@ -286,7 +286,7 @@ static int uvd_v7_0_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handl
 	uint64_t addr;
 	int i, r;
 
-	r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job);
+	r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1);
 	if (r)
 		return r;
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 0/2] add direct IB pool
  2020-03-26  2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan
  2020-03-26  2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan
  2020-03-26  2:01 ` [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible xinhui pan
@ 2020-03-26  5:31 ` Liu, Monk
  2020-03-26  6:01   ` Pan, Xinhui
  2 siblings, 1 reply; 9+ messages in thread
From: Liu, Monk @ 2020-03-26  5:31 UTC (permalink / raw)
  To: Pan, Xinhui, amd-gfx
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, Koenig,  Christian

That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue.

>>> So the ring test above got stuck if no ib to alloc.
Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ?



_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of xinhui pan
Sent: Thursday, March 26, 2020 10:02 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: [RFC PATCH 0/2] add direct IB pool

druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not.

As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc.

If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention.

If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool.

xinhui pan (2):
  drm/amdgpu: add direct ib pool
  drm/amdgpu: use new job alloc variation if possible

 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  4 ++--
 13 files changed, 35 insertions(+), 18 deletions(-)

--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&amp;sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool
  2020-03-26  2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan
@ 2020-03-26  5:38   ` Koenig, Christian
  2020-03-26  6:15     ` Pan, Xinhui
  0 siblings, 1 reply; 9+ messages in thread
From: Koenig, Christian @ 2020-03-26  5:38 UTC (permalink / raw)
  To: Pan, Xinhui; +Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5868 bytes --]

Yeah that's on my TODO list for quite a while as well.

But we even need three IB pools. One very small for the IB tests, one for direct VM updates and one for the rest.

So please make the pool a parameter to ib_get() and not the hack you have below.

Thanks,
Christian.

Am 26.03.2020 03:02 schrieb "Pan, Xinhui" <Xinhui.Pan@amd.com>:
Another ib poll for direct submit.
Any jobs schedule IBs without dependence on gpu scheduler should use
this pool firstly.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 7dd74253e7b6..c01423ffb8ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -849,6 +849,7 @@ struct amdgpu_device {
         struct amdgpu_ring              *rings[AMDGPU_MAX_RINGS];
         bool                            ib_pool_ready;
         struct amdgpu_sa_manager        ring_tmp_bo;
+       struct amdgpu_sa_manager        ring_tmp_bo_direct;

         /* interrupts */
         struct amdgpu_irq               irq;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8304d0c87899..28be4efb3d5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
                 parser->entity = entity;

                 ring = to_amdgpu_ring(entity->rq->sched);
-               r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
+               r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ?
                                    chunk_ib->ib_bytes : 0, ib);
                 if (r) {
                         DRM_ERROR("Failed to get ib !\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index bece01f1cf09..f2e08c372d57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
         int r;

         if (size) {
-               r = amdgpu_sa_bo_new(&adev->ring_tmp_bo,
+               r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct,
                                       &ib->sa_bo, size, 256);
                 if (r) {
                         dev_err(adev->dev, "failed to get a new IB (%d)\n", r);
@@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,

                 ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo);

-               if (!vm)
+               if (!((unsigned long)vm & ~0x1))
                         ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
         }

@@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
                 return r;
         }

+       r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct,
+                                     AMDGPU_IB_POOL_SIZE*64*1024,
+                                     AMDGPU_GPU_PAGE_SIZE,
+                                     AMDGPU_GEM_DOMAIN_GTT);
+       if (r) {
+               return r;
+       }
         adev->ib_pool_ready = true;

         return 0;
@@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
 {
         if (adev->ib_pool_ready) {
                 amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo);
+               amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct);
                 adev->ib_pool_ready = false;
         }
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 4981e443a884..6a63826c6760 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,

 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
                              struct amdgpu_job **job)
+{
+       return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0);
+}
+
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+                            struct amdgpu_job **job, int direct)
 {
         int r;

@@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
         if (r)
                 return r;

-       r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
+       r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]);
         if (r)
                 kfree(*job);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2e2110dddb76..be9dd72b9912 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
                      struct amdgpu_job **job, struct amdgpu_vm *vm);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
                              struct amdgpu_job **job);
-
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+                            struct amdgpu_job **job, int direct);
 void amdgpu_job_free_resources(struct amdgpu_job *job);
 void amdgpu_job_free(struct amdgpu_job *job);
 int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
--
2.17.1


[-- Attachment #1.2: Type: text/html, Size: 12202 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/2] add direct IB pool
  2020-03-26  5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk
@ 2020-03-26  6:01   ` Pan, Xinhui
  2020-03-27 13:17     ` Liu, Monk
  2020-03-27 13:18     ` Liu, Monk
  0 siblings, 2 replies; 9+ messages in thread
From: Pan, Xinhui @ 2020-03-26  6:01 UTC (permalink / raw)
  To: amd-gfx, Liu, Monk; +Cc: Deucher, Alexander, Kuehling, Felix, Koenig, Christian


[-- Attachment #1.1: Type: text/plain, Size: 3611 bytes --]

[AMD Official Use Only - Internal Distribution Only]

yes, IB test and  vram restore will alloc IBs.

I hit this issue for quite a long time ago. We test benchmarks on ARM server which is running android.
Hunders of processes hit too many issues. Panic and memory corruption everywhere.

Now i have a littke time to fix this deadlock.

if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together.
I believe you will see more weird issues.

________________________________
From: Liu, Monk <Monk.Liu@amd.com>
Sent: Thursday, March 26, 2020 1:31:04 PM
To: Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: RE: [RFC PATCH 0/2] add direct IB pool

That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue.

>>> So the ring test above got stuck if no ib to alloc.
Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ?



_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of xinhui pan
Sent: Thursday, March 26, 2020 10:02 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: [RFC PATCH 0/2] add direct IB pool

druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not.

As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc.

If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention.

If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool.

xinhui pan (2):
  drm/amdgpu: add direct ib pool
  drm/amdgpu: use new job alloc variation if possible

 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  4 ++--
 13 files changed, 35 insertions(+), 18 deletions(-)

--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&amp;sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 6243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool
  2020-03-26  5:38   ` Koenig, Christian
@ 2020-03-26  6:15     ` Pan, Xinhui
  0 siblings, 0 replies; 9+ messages in thread
From: Pan, Xinhui @ 2020-03-26  6:15 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx



> 2020年3月26日 13:38,Koenig, Christian <Christian.Koenig@amd.com> 写道:
> 
> Yeah that's on my TODO list for quite a while as well.
> 
> But we even need three IB pools. One very small for the IB tests, one for direct VM updates and one for the rest.
> 
> So please make the pool a parameter to ib_get() and not the hack you have below.

yep, I will make IB pool  a parameter.

IB tests for gfx need many IBs, PAGE_SIZE for ib pool is still not enough.
but the default size for ib pool is 2MB now, just one hugepage, today we have memory in TB.
so no need make a different size for IB tests pool.

> 
> Thanks,
> Christian.
> 
> Am 26.03.2020 03:02 schrieb "Pan, Xinhui" <Xinhui.Pan@amd.com>:
> Another ib poll for direct submit.
> Any jobs schedule IBs without dependence on gpu scheduler should use
> this pool firstly.
> 
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++++++++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
>  5 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 7dd74253e7b6..c01423ffb8ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -849,6 +849,7 @@ struct amdgpu_device {
>          struct amdgpu_ring              *rings[AMDGPU_MAX_RINGS];
>          bool                            ib_pool_ready;
>          struct amdgpu_sa_manager        ring_tmp_bo;
> +       struct amdgpu_sa_manager        ring_tmp_bo_direct;
>  
>          /* interrupts */
>          struct amdgpu_irq               irq;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 8304d0c87899..28be4efb3d5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
>                  parser->entity = entity;
>  
>                  ring = to_amdgpu_ring(entity->rq->sched);
> -               r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
> +               r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ?
>                                     chunk_ib->ib_bytes : 0, ib);
>                  if (r) {
>                          DRM_ERROR("Failed to get ib !\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index bece01f1cf09..f2e08c372d57 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>          int r;
>  
>          if (size) {
> -               r = amdgpu_sa_bo_new(&adev->ring_tmp_bo,
> +               r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct,
>                                        &ib->sa_bo, size, 256);
>                  if (r) {
>                          dev_err(adev->dev, "failed to get a new IB (%d)\n", r);
> @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  
>                  ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo);
>  
> -               if (!vm)
> +               if (!((unsigned long)vm & ~0x1))
>                          ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
>          }
>  
> @@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
>                  return r;
>          }
>  
> +       r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct,
> +                                     AMDGPU_IB_POOL_SIZE*64*1024,
> +                                     AMDGPU_GPU_PAGE_SIZE,
> +                                     AMDGPU_GEM_DOMAIN_GTT);
> +       if (r) {
> +               return r;
> +       }
>          adev->ib_pool_ready = true;
>  
>          return 0;
> @@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
>  {
>          if (adev->ib_pool_ready) {
>                  amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo);
> +               amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct);
>                  adev->ib_pool_ready = false;
>          }
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 4981e443a884..6a63826c6760 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>  
>  int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>                               struct amdgpu_job **job)
> +{
> +       return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0);
> +}
> +
> +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
> +                            struct amdgpu_job **job, int direct)
>  {
>          int r;
>  
> @@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>          if (r)
>                  return r;
>  
> -       r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
> +       r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]);
>          if (r)
>                  kfree(*job);
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index 2e2110dddb76..be9dd72b9912 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
>                       struct amdgpu_job **job, struct amdgpu_vm *vm);
>  int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
>                               struct amdgpu_job **job);
> -
> +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
> +                            struct amdgpu_job **job, int direct);
>  void amdgpu_job_free_resources(struct amdgpu_job *job);
>  void amdgpu_job_free(struct amdgpu_job *job);
>  int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
> -- 
> 2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 0/2] add direct IB pool
  2020-03-26  6:01   ` Pan, Xinhui
@ 2020-03-27 13:17     ` Liu, Monk
  2020-03-27 13:18     ` Liu, Monk
  1 sibling, 0 replies; 9+ messages in thread
From: Liu, Monk @ 2020-03-27 13:17 UTC (permalink / raw)
  To: Pan, Xinhui, amd-gfx
  Cc: Deucher, Alexander, Kuehling, Felix, Koenig, Christian


[-- Attachment #1.1.1: Type: text/plain, Size: 4624 bytes --]

Oh, 50ms … umm I can advice our IQE team to introduce this stress test option

Thanks

_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: Pan, Xinhui <Xinhui.Pan@amd.com>
Sent: Thursday, March 26, 2020 2:02 PM
To: amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: Re: [RFC PATCH 0/2] add direct IB pool


[AMD Official Use Only - Internal Distribution Only]

yes, IB test and  vram restore will alloc IBs.
I hit this issue for quite a long time ago. We test benchmarks on ARM server which is running android.
Hunders of processes hit too many issues. Panic and memory corruption everywhere.
Now i have a littke time to fix this deadlock.
if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together.
I believe you will see more weird issues.
________________________________
From: Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>
Sent: Thursday, March 26, 2020 1:31:04 PM
To: Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Subject: RE: [RFC PATCH 0/2] add direct IB pool

That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue.

>>> So the ring test above got stuck if no ib to alloc.
Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ?



_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of xinhui pan
Sent: Thursday, March 26, 2020 10:02 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Subject: [RFC PATCH 0/2] add direct IB pool

druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not.

As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc.

If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention.

If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool.

xinhui pan (2):
  drm/amdgpu: add direct ib pool
  drm/amdgpu: use new job alloc variation if possible

 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  4 ++--
 13 files changed, 35 insertions(+), 18 deletions(-)

--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&amp;sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&amp;reserved=0

[-- Attachment #1.1.2: Type: text/html, Size: 11921 bytes --]

[-- Attachment #1.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC PATCH 0/2] add direct IB pool
  2020-03-26  6:01   ` Pan, Xinhui
  2020-03-27 13:17     ` Liu, Monk
@ 2020-03-27 13:18     ` Liu, Monk
  1 sibling, 0 replies; 9+ messages in thread
From: Liu, Monk @ 2020-03-27 13:18 UTC (permalink / raw)
  To: Pan, Xinhui, amd-gfx
  Cc: Deucher, Alexander, Kuehling, Felix, Koenig, Christian


[-- Attachment #1.1.1: Type: text/plain, Size: 4715 bytes --]

Bye the way: if you want to avoid IB test pending by GPU recover, you can move IB test out of TDR routine, so IB test will execute after GPU scheduler resumed with TDR completed .

_____________________________________
Monk Liu|GPU Virtualization Team |AMD
[sig-cloud-gpu]

From: Pan, Xinhui <Xinhui.Pan@amd.com>
Sent: Thursday, March 26, 2020 2:02 PM
To: amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: Re: [RFC PATCH 0/2] add direct IB pool


[AMD Official Use Only - Internal Distribution Only]

yes, IB test and  vram restore will alloc IBs.
I hit this issue for quite a long time ago. We test benchmarks on ARM server which is running android.
Hunders of processes hit too many issues. Panic and memory corruption everywhere.
Now i have a littke time to fix this deadlock.
if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together.
I believe you will see more weird issues.
________________________________
From: Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>
Sent: Thursday, March 26, 2020 1:31:04 PM
To: Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Subject: RE: [RFC PATCH 0/2] add direct IB pool

That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue.

>>> So the ring test above got stuck if no ib to alloc.
Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ?



_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of xinhui pan
Sent: Thursday, March 26, 2020 10:02 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Subject: [RFC PATCH 0/2] add direct IB pool

druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not.

As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc.

If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention.

If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool.

xinhui pan (2):
  drm/amdgpu: add direct ib pool
  drm/amdgpu: use new job alloc variation if possible

 drivers/gpu/drm/amd/amdgpu/amdgpu.h         |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  8 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c    |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c     |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c       |  4 ++--
 13 files changed, 35 insertions(+), 18 deletions(-)

--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&amp;sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&amp;reserved=0

[-- Attachment #1.1.2: Type: text/html, Size: 11716 bytes --]

[-- Attachment #1.2: image001.png --]
[-- Type: image/png, Size: 12243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-27 13:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26  2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan
2020-03-26  2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan
2020-03-26  5:38   ` Koenig, Christian
2020-03-26  6:15     ` Pan, Xinhui
2020-03-26  2:01 ` [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible xinhui pan
2020-03-26  5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk
2020-03-26  6:01   ` Pan, Xinhui
2020-03-27 13:17     ` Liu, Monk
2020-03-27 13:18     ` Liu, Monk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).