* [RFC PATCH 0/2] add direct IB pool @ 2020-03-26 2:01 xinhui pan 2020-03-26 2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: xinhui pan @ 2020-03-26 2:01 UTC (permalink / raw) To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not. As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc. If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention. If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool. xinhui pan (2): drm/amdgpu: add direct ib pool drm/amdgpu: use new job alloc variation if possible drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 4 ++-- 13 files changed, 35 insertions(+), 18 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 1/2] drm/amdgpu: add direct ib pool 2020-03-26 2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan @ 2020-03-26 2:01 ` xinhui pan 2020-03-26 5:38 ` Koenig, Christian 2020-03-26 2:01 ` [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible xinhui pan 2020-03-26 5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk 2 siblings, 1 reply; 9+ messages in thread From: xinhui pan @ 2020-03-26 2:01 UTC (permalink / raw) To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig Another ib poll for direct submit. Any jobs schedule IBs without dependence on gpu scheduler should use this pool firstly. Signed-off-by: xinhui pan <xinhui.pan@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- 5 files changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 7dd74253e7b6..c01423ffb8ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -849,6 +849,7 @@ struct amdgpu_device { struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; bool ib_pool_ready; struct amdgpu_sa_manager ring_tmp_bo; + struct amdgpu_sa_manager ring_tmp_bo_direct; /* interrupts */ struct amdgpu_irq irq; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 8304d0c87899..28be4efb3d5b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev, parser->entity = entity; ring = to_amdgpu_ring(entity->rq->sched); - r = amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ? + r = amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ? chunk_ib->ib_bytes : 0, ib); if (r) { DRM_ERROR("Failed to get ib !\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index bece01f1cf09..f2e08c372d57 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, int r; if (size) { - r = amdgpu_sa_bo_new(&adev->ring_tmp_bo, + r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct, &ib->sa_bo, size, 256); if (r) { dev_err(adev->dev, "failed to get a new IB (%d)\n", r); @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo); - if (!vm) + if (!((unsigned long)vm & ~0x1)) ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo); } @@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev) return r; } + r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct, + AMDGPU_IB_POOL_SIZE*64*1024, + AMDGPU_GPU_PAGE_SIZE, + AMDGPU_GEM_DOMAIN_GTT); + if (r) { + return r; + } adev->ib_pool_ready = true; return 0; @@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev) { if (adev->ib_pool_ready) { amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo); + amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct); adev->ib_pool_ready = false; } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4981e443a884..6a63826c6760 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, struct amdgpu_job **job) +{ + return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0); +} + +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size, + struct amdgpu_job **job, int direct) { int r; @@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, if (r) return r; - r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]); + r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]); if (r) kfree(*job); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h index 2e2110dddb76..be9dd72b9912 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h @@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, struct amdgpu_job **job, struct amdgpu_vm *vm); int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, struct amdgpu_job **job); - +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size, + struct amdgpu_job **job, int direct); void amdgpu_job_free_resources(struct amdgpu_job *job); void amdgpu_job_free(struct amdgpu_job *job); int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity, -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool 2020-03-26 2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan @ 2020-03-26 5:38 ` Koenig, Christian 2020-03-26 6:15 ` Pan, Xinhui 0 siblings, 1 reply; 9+ messages in thread From: Koenig, Christian @ 2020-03-26 5:38 UTC (permalink / raw) To: Pan, Xinhui; +Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx [-- Attachment #1.1: Type: text/plain, Size: 5868 bytes --] Yeah that's on my TODO list for quite a while as well. But we even need three IB pools. One very small for the IB tests, one for direct VM updates and one for the rest. So please make the pool a parameter to ib_get() and not the hack you have below. Thanks, Christian. Am 26.03.2020 03:02 schrieb "Pan, Xinhui" <Xinhui.Pan@amd.com>: Another ib poll for direct submit. Any jobs schedule IBs without dependence on gpu scheduler should use this pool firstly. Signed-off-by: xinhui pan <xinhui.pan@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- 5 files changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 7dd74253e7b6..c01423ffb8ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -849,6 +849,7 @@ struct amdgpu_device { struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; bool ib_pool_ready; struct amdgpu_sa_manager ring_tmp_bo; + struct amdgpu_sa_manager ring_tmp_bo_direct; /* interrupts */ struct amdgpu_irq irq; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 8304d0c87899..28be4efb3d5b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev, parser->entity = entity; ring = to_amdgpu_ring(entity->rq->sched); - r = amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ? + r = amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ? chunk_ib->ib_bytes : 0, ib); if (r) { DRM_ERROR("Failed to get ib !\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index bece01f1cf09..f2e08c372d57 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, int r; if (size) { - r = amdgpu_sa_bo_new(&adev->ring_tmp_bo, + r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct, &ib->sa_bo, size, 256); if (r) { dev_err(adev->dev, "failed to get a new IB (%d)\n", r); @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo); - if (!vm) + if (!((unsigned long)vm & ~0x1)) ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo); } @@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev) return r; } + r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct, + AMDGPU_IB_POOL_SIZE*64*1024, + AMDGPU_GPU_PAGE_SIZE, + AMDGPU_GEM_DOMAIN_GTT); + if (r) { + return r; + } adev->ib_pool_ready = true; return 0; @@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev) { if (adev->ib_pool_ready) { amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo); + amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct); adev->ib_pool_ready = false; } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 4981e443a884..6a63826c6760 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, struct amdgpu_job **job) +{ + return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0); +} + +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size, + struct amdgpu_job **job, int direct) { int r; @@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, if (r) return r; - r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]); + r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]); if (r) kfree(*job); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h index 2e2110dddb76..be9dd72b9912 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h @@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, struct amdgpu_job **job, struct amdgpu_vm *vm); int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, struct amdgpu_job **job); - +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size, + struct amdgpu_job **job, int direct); void amdgpu_job_free_resources(struct amdgpu_job *job); void amdgpu_job_free(struct amdgpu_job *job); int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity, -- 2.17.1 [-- Attachment #1.2: Type: text/html, Size: 12202 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool 2020-03-26 5:38 ` Koenig, Christian @ 2020-03-26 6:15 ` Pan, Xinhui 0 siblings, 0 replies; 9+ messages in thread From: Pan, Xinhui @ 2020-03-26 6:15 UTC (permalink / raw) To: Koenig, Christian Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx > 2020年3月26日 13:38,Koenig, Christian <Christian.Koenig@amd.com> 写道: > > Yeah that's on my TODO list for quite a while as well. > > But we even need three IB pools. One very small for the IB tests, one for direct VM updates and one for the rest. > > So please make the pool a parameter to ib_get() and not the hack you have below. yep, I will make IB pool a parameter. IB tests for gfx need many IBs, PAGE_SIZE for ib pool is still not enough. but the default size for ib pool is 2MB now, just one hugepage, today we have memory in TB. so no need make a different size for IB tests pool. > > Thanks, > Christian. > > Am 26.03.2020 03:02 schrieb "Pan, Xinhui" <Xinhui.Pan@amd.com>: > Another ib poll for direct submit. > Any jobs schedule IBs without dependence on gpu scheduler should use > this pool firstly. > > Signed-off-by: xinhui pan <xinhui.pan@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- > 5 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 7dd74253e7b6..c01423ffb8ed 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -849,6 +849,7 @@ struct amdgpu_device { > struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; > bool ib_pool_ready; > struct amdgpu_sa_manager ring_tmp_bo; > + struct amdgpu_sa_manager ring_tmp_bo_direct; > > /* interrupts */ > struct amdgpu_irq irq; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 8304d0c87899..28be4efb3d5b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev, > parser->entity = entity; > > ring = to_amdgpu_ring(entity->rq->sched); > - r = amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ? > + r = amdgpu_ib_get(adev, (unsigned long )vm|0x1, ring->funcs->parse_cs ? > chunk_ib->ib_bytes : 0, ib); > if (r) { > DRM_ERROR("Failed to get ib !\n"); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > index bece01f1cf09..f2e08c372d57 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, > int r; > > if (size) { > - r = amdgpu_sa_bo_new(&adev->ring_tmp_bo, > + r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : &adev->ring_tmp_bo_direct, > &ib->sa_bo, size, 256); > if (r) { > dev_err(adev->dev, "failed to get a new IB (%d)\n", r); > @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, > > ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo); > > - if (!vm) > + if (!((unsigned long)vm & ~0x1)) > ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo); > } > > @@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev) > return r; > } > > + r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct, > + AMDGPU_IB_POOL_SIZE*64*1024, > + AMDGPU_GPU_PAGE_SIZE, > + AMDGPU_GEM_DOMAIN_GTT); > + if (r) { > + return r; > + } > adev->ib_pool_ready = true; > > return 0; > @@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev) > { > if (adev->ib_pool_ready) { > amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo); > + amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct); > adev->ib_pool_ready = false; > } > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 4981e443a884..6a63826c6760 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, > > int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, > struct amdgpu_job **job) > +{ > + return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0); > +} > + > +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size, > + struct amdgpu_job **job, int direct) > { > int r; > > @@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, > if (r) > return r; > > - r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]); > + r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]); > if (r) > kfree(*job); > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > index 2e2110dddb76..be9dd72b9912 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > @@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs, > struct amdgpu_job **job, struct amdgpu_vm *vm); > int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size, > struct amdgpu_job **job); > - > +int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size, > + struct amdgpu_job **job, int direct); > void amdgpu_job_free_resources(struct amdgpu_job *job); > void amdgpu_job_free(struct amdgpu_job *job); > int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity, > -- > 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible 2020-03-26 2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan 2020-03-26 2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan @ 2020-03-26 2:01 ` xinhui pan 2020-03-26 5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk 2 siblings, 0 replies; 9+ messages in thread From: xinhui pan @ 2020-03-26 2:01 UTC (permalink / raw) To: amd-gfx; +Cc: alexander.deucher, Felix.Kuehling, xinhui pan, christian.koenig If we scheduler IB directly, then alloc job ibs in direct ib pool. Signed-off-by: xinhui pan <xinhui.pan@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 4 ++-- 8 files changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c index 5727f00afc8e..75458f15f032 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c @@ -144,7 +144,7 @@ static int amdgpu_jpeg_dec_set_reg(struct amdgpu_ring *ring, uint32_t handle, const unsigned ib_size_dw = 16; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 54cfa3a12135..3360f5eaf19f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2123,7 +2123,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); - r = amdgpu_job_alloc_with_ib(adev, num_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(adev, num_dw * 4, &job, direct_submit); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index 5fd32ad1c575..a301f8f49997 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c @@ -1056,7 +1056,7 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, struct amdgpu_bo *bo, goto err; } - r = amdgpu_job_alloc_with_ib(adev, 64, &job); + r = amdgpu_job_alloc_with_ib_direct(adev, 64, &job, direct); if (r) goto err; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 59ddba137946..80b750e15c67 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -446,7 +446,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle, uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; @@ -524,7 +524,7 @@ static int amdgpu_vce_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handle, struct dma_fence *f = NULL; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, direct); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index a41272fbcba2..7a9d917f9704 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -390,7 +390,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring, uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(adev, 64, &job); + r = amdgpu_job_alloc_with_ib_direct(adev, 64, &job, 1); if (r) goto err; @@ -557,7 +557,7 @@ static int amdgpu_vcn_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t hand uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; @@ -610,7 +610,7 @@ static int amdgpu_vcn_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t han uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index d30d103e48a2..65e920892884 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -64,7 +64,8 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p, unsigned int ndw = AMDGPU_VM_SDMA_MIN_NUM_DW; int r; - r = amdgpu_job_alloc_with_ib(p->adev, ndw * 4, &p->job); + r = amdgpu_job_alloc_with_ib_direct(p->adev, ndw * 4, &p->job, + p->direct /* direct pool has less contention*/); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c index e0aadcaf6c8b..a06aead049c4 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c @@ -216,7 +216,7 @@ static int uvd_v6_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; @@ -279,7 +279,7 @@ static int uvd_v6_0_enc_get_destroy_msg(struct amdgpu_ring *ring, uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c index 0995378d8263..d82b4dfa6ca4 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c @@ -224,7 +224,7 @@ static int uvd_v7_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; @@ -286,7 +286,7 @@ static int uvd_v7_0_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handl uint64_t addr; int i, r; - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, &job); + r = amdgpu_job_alloc_with_ib_direct(ring->adev, ib_size_dw * 4, &job, 1); if (r) return r; -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [RFC PATCH 0/2] add direct IB pool 2020-03-26 2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan 2020-03-26 2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan 2020-03-26 2:01 ` [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible xinhui pan @ 2020-03-26 5:31 ` Liu, Monk 2020-03-26 6:01 ` Pan, Xinhui 2 siblings, 1 reply; 9+ messages in thread From: Liu, Monk @ 2020-03-26 5:31 UTC (permalink / raw) To: Pan, Xinhui, amd-gfx Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, Koenig, Christian That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue. >>> So the ring test above got stuck if no ib to alloc. Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ? _____________________________________ Monk Liu|GPU Virtualization Team |AMD -----Original Message----- From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of xinhui pan Sent: Thursday, March 26, 2020 10:02 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com> Subject: [RFC PATCH 0/2] add direct IB pool druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not. As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc. If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention. If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool. xinhui pan (2): drm/amdgpu: add direct ib pool drm/amdgpu: use new job alloc variation if possible drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 4 ++-- 13 files changed, 35 insertions(+), 18 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 0/2] add direct IB pool 2020-03-26 5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk @ 2020-03-26 6:01 ` Pan, Xinhui 2020-03-27 13:17 ` Liu, Monk 2020-03-27 13:18 ` Liu, Monk 0 siblings, 2 replies; 9+ messages in thread From: Pan, Xinhui @ 2020-03-26 6:01 UTC (permalink / raw) To: amd-gfx, Liu, Monk; +Cc: Deucher, Alexander, Kuehling, Felix, Koenig, Christian [-- Attachment #1.1: Type: text/plain, Size: 3611 bytes --] [AMD Official Use Only - Internal Distribution Only] yes, IB test and vram restore will alloc IBs. I hit this issue for quite a long time ago. We test benchmarks on ARM server which is running android. Hunders of processes hit too many issues. Panic and memory corruption everywhere. Now i have a littke time to fix this deadlock. if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together. I believe you will see more weird issues. ________________________________ From: Liu, Monk <Monk.Liu@amd.com> Sent: Thursday, March 26, 2020 1:31:04 PM To: Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com> Subject: RE: [RFC PATCH 0/2] add direct IB pool That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue. >>> So the ring test above got stuck if no ib to alloc. Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ? _____________________________________ Monk Liu|GPU Virtualization Team |AMD -----Original Message----- From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of xinhui pan Sent: Thursday, March 26, 2020 10:02 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com> Subject: [RFC PATCH 0/2] add direct IB pool druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not. As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc. If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention. If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool. xinhui pan (2): drm/amdgpu: add direct ib pool drm/amdgpu: use new job alloc variation if possible drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 4 ++-- 13 files changed, 35 insertions(+), 18 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&reserved=0 [-- Attachment #1.2: Type: text/html, Size: 6243 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [RFC PATCH 0/2] add direct IB pool 2020-03-26 6:01 ` Pan, Xinhui @ 2020-03-27 13:17 ` Liu, Monk 2020-03-27 13:18 ` Liu, Monk 1 sibling, 0 replies; 9+ messages in thread From: Liu, Monk @ 2020-03-27 13:17 UTC (permalink / raw) To: Pan, Xinhui, amd-gfx Cc: Deucher, Alexander, Kuehling, Felix, Koenig, Christian [-- Attachment #1.1.1: Type: text/plain, Size: 4624 bytes --] Oh, 50ms … umm I can advice our IQE team to introduce this stress test option Thanks _____________________________________ Monk Liu|GPU Virtualization Team |AMD [sig-cloud-gpu] From: Pan, Xinhui <Xinhui.Pan@amd.com> Sent: Thursday, March 26, 2020 2:02 PM To: amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com> Subject: Re: [RFC PATCH 0/2] add direct IB pool [AMD Official Use Only - Internal Distribution Only] yes, IB test and vram restore will alloc IBs. I hit this issue for quite a long time ago. We test benchmarks on ARM server which is running android. Hunders of processes hit too many issues. Panic and memory corruption everywhere. Now i have a littke time to fix this deadlock. if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together. I believe you will see more weird issues. ________________________________ From: Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>> Sent: Thursday, March 26, 2020 1:31:04 PM To: Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>> Subject: RE: [RFC PATCH 0/2] add direct IB pool That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue. >>> So the ring test above got stuck if no ib to alloc. Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ? _____________________________________ Monk Liu|GPU Virtualization Team |AMD -----Original Message----- From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of xinhui pan Sent: Thursday, March 26, 2020 10:02 AM To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>> Subject: [RFC PATCH 0/2] add direct IB pool druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not. As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc. If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention. If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool. xinhui pan (2): drm/amdgpu: add direct ib pool drm/amdgpu: use new job alloc variation if possible drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 4 ++-- 13 files changed, 35 insertions(+), 18 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&reserved=0 [-- Attachment #1.1.2: Type: text/html, Size: 11921 bytes --] [-- Attachment #1.2: image001.png --] [-- Type: image/png, Size: 12243 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [RFC PATCH 0/2] add direct IB pool 2020-03-26 6:01 ` Pan, Xinhui 2020-03-27 13:17 ` Liu, Monk @ 2020-03-27 13:18 ` Liu, Monk 1 sibling, 0 replies; 9+ messages in thread From: Liu, Monk @ 2020-03-27 13:18 UTC (permalink / raw) To: Pan, Xinhui, amd-gfx Cc: Deucher, Alexander, Kuehling, Felix, Koenig, Christian [-- Attachment #1.1.1: Type: text/plain, Size: 4715 bytes --] Bye the way: if you want to avoid IB test pending by GPU recover, you can move IB test out of TDR routine, so IB test will execute after GPU scheduler resumed with TDR completed . _____________________________________ Monk Liu|GPU Virtualization Team |AMD [sig-cloud-gpu] From: Pan, Xinhui <Xinhui.Pan@amd.com> Sent: Thursday, March 26, 2020 2:02 PM To: amd-gfx@lists.freedesktop.org; Liu, Monk <Monk.Liu@amd.com> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Koenig, Christian <Christian.Koenig@amd.com> Subject: Re: [RFC PATCH 0/2] add direct IB pool [AMD Official Use Only - Internal Distribution Only] yes, IB test and vram restore will alloc IBs. I hit this issue for quite a long time ago. We test benchmarks on ARM server which is running android. Hunders of processes hit too many issues. Panic and memory corruption everywhere. Now i have a littke time to fix this deadlock. if you want to repro it, set gpu timeout to 50ms,then run vulkan,ocl, amdgputest, etc together. I believe you will see more weird issues. ________________________________ From: Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>> Sent: Thursday, March 26, 2020 1:31:04 PM To: Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>> Subject: RE: [RFC PATCH 0/2] add direct IB pool That sounds a roughly doable plan to me , although we didn't hit this issue in our virtualization stress test but like a possible issue. >>> So the ring test above got stuck if no ib to alloc. Why there is IB alloc happened in ring test ? I remember there is no IB allocated for ring test, are you referring to IB test ? _____________________________________ Monk Liu|GPU Virtualization Team |AMD -----Original Message----- From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of xinhui pan Sent: Thursday, March 26, 2020 10:02 AM To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>> Subject: [RFC PATCH 0/2] add direct IB pool druing gpu recovery, we alloc ibs for ring tests to test if recovery succeed or not. As gpu recovery parked the gpu scheduler thread, any pending jobs hold the ib resource has no chance to free. So the ring test above got stuck if no ib to alloc. If we schedule IBs directly in job_submit_direct, we can alloc ibs in the new ib pool. It should have less contention. If the IB could be freed in time, IOW, not depending on any scheduler, nor any other blocking code. It is better to alloc ibs in direct pool. xinhui pan (2): drm/amdgpu: add direct ib pool drm/amdgpu: use new job alloc variation if possible drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 4 ++-- 13 files changed, 35 insertions(+), 18 deletions(-) -- 2.17.1 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cmonk.liu%40amd.com%7C1f5b1a3ba10a452c9de608d7d129b396%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207850237679644&sdata=cS7S7a8gDmIgyJNbr4qXSPMZTLwKz0W429Z%2F2Zo6gek%3D&reserved=0 [-- Attachment #1.1.2: Type: text/html, Size: 11716 bytes --] [-- Attachment #1.2: image001.png --] [-- Type: image/png, Size: 12243 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-27 13:18 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-26 2:01 [RFC PATCH 0/2] add direct IB pool xinhui pan 2020-03-26 2:01 ` [RFC PATCH 1/2] drm/amdgpu: add direct ib pool xinhui pan 2020-03-26 5:38 ` Koenig, Christian 2020-03-26 6:15 ` Pan, Xinhui 2020-03-26 2:01 ` [RFC PATCH 2/2] drm/amdgpu: use new job alloc variation if possible xinhui pan 2020-03-26 5:31 ` [RFC PATCH 0/2] add direct IB pool Liu, Monk 2020-03-26 6:01 ` Pan, Xinhui 2020-03-27 13:17 ` Liu, Monk 2020-03-27 13:18 ` Liu, Monk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).