All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] amd/amdgpu: Fix resv shared fence overflow
@ 2020-09-29  5:57 xinhui pan
  2020-09-29  7:00 ` Christian König
  0 siblings, 1 reply; 2+ messages in thread
From: xinhui pan @ 2020-09-29  5:57 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, xinhui pan, christian.koenig

[  179.556745] kernel BUG at drivers/dma-buf/dma-resv.c:282!
[snip]
[  179.702910] Call Trace:
[  179.705696]  amdgpu_bo_fence+0x21/0x50 [amdgpu]
[  179.710707]  amdgpu_vm_sdma_commit+0x299/0x430 [amdgpu]
[  179.716497]  amdgpu_vm_bo_update_mapping.constprop.0+0x29f/0x390 [amdgpu]
[  179.723927]  ? find_held_lock+0x38/0x90
[  179.728183]  amdgpu_vm_handle_fault+0x1af/0x420 [amdgpu]
[  179.734063]  gmc_v9_0_process_interrupt+0x245/0x2e0 [amdgpu]
[  179.740347]  ? kgd2kfd_interrupt+0xb8/0x1e0 [amdgpu]
[  179.745808]  amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
[  179.751380]  ? amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
[  179.757159]  amdgpu_ih_process+0xbb/0x1a0 [amdgpu]
[  179.762466]  amdgpu_irq_handle_ih1+0x27/0x40 [amdgpu]
[  179.767997]  process_one_work+0x23c/0x580
[  179.772371]  worker_thread+0x50/0x3b0
[  179.776356]  ? process_one_work+0x580/0x580
[  179.780939]  kthread+0x128/0x160
[  179.784462]  ? kthread_park+0x90/0x90
[  179.788466]  ret_from_fork+0x1f/0x30

We have two scheduler entities, immediate and delayed.
So there are two kinds of scheduler finished fences.
We might add these two fences in root bo resv at same time while we
only reserve one slot.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 37221b99ca96..9e0116c7f8d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2869,7 +2869,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r)
 		goto error_free_root;
 
-	r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
+	r = dma_resv_reserve_shared(root->tbo.base.resv, 2);
 	if (r)
 		goto error_unreserve;
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] amd/amdgpu: Fix resv shared fence overflow
  2020-09-29  5:57 [PATCH v2] amd/amdgpu: Fix resv shared fence overflow xinhui pan
@ 2020-09-29  7:00 ` Christian König
  0 siblings, 0 replies; 2+ messages in thread
From: Christian König @ 2020-09-29  7:00 UTC (permalink / raw)
  To: xinhui pan, amd-gfx, Philip Yang; +Cc: alexander.deucher

Philip already stumbled over this issue as well, but this is the wrong 
place to fix this.

dma_resv_reserve_shared() needs to be called after we reserved the page 
tables and before we do the update in amdgpu_vm_handle_fault().

Reserved slots are freed (in a debug build) as soon as we release the 
reservation.

Christian.

Am 29.09.20 um 07:57 schrieb xinhui pan:
> [  179.556745] kernel BUG at drivers/dma-buf/dma-resv.c:282!
> [snip]
> [  179.702910] Call Trace:
> [  179.705696]  amdgpu_bo_fence+0x21/0x50 [amdgpu]
> [  179.710707]  amdgpu_vm_sdma_commit+0x299/0x430 [amdgpu]
> [  179.716497]  amdgpu_vm_bo_update_mapping.constprop.0+0x29f/0x390 [amdgpu]
> [  179.723927]  ? find_held_lock+0x38/0x90
> [  179.728183]  amdgpu_vm_handle_fault+0x1af/0x420 [amdgpu]
> [  179.734063]  gmc_v9_0_process_interrupt+0x245/0x2e0 [amdgpu]
> [  179.740347]  ? kgd2kfd_interrupt+0xb8/0x1e0 [amdgpu]
> [  179.745808]  amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
> [  179.751380]  ? amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
> [  179.757159]  amdgpu_ih_process+0xbb/0x1a0 [amdgpu]
> [  179.762466]  amdgpu_irq_handle_ih1+0x27/0x40 [amdgpu]
> [  179.767997]  process_one_work+0x23c/0x580
> [  179.772371]  worker_thread+0x50/0x3b0
> [  179.776356]  ? process_one_work+0x580/0x580
> [  179.780939]  kthread+0x128/0x160
> [  179.784462]  ? kthread_park+0x90/0x90
> [  179.788466]  ret_from_fork+0x1f/0x30
>
> We have two scheduler entities, immediate and delayed.
> So there are two kinds of scheduler finished fences.
> We might add these two fences in root bo resv at same time while we
> only reserve one slot.
>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 37221b99ca96..9e0116c7f8d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2869,7 +2869,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   	if (r)
>   		goto error_free_root;
>   
> -	r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
> +	r = dma_resv_reserve_shared(root->tbo.base.resv, 2);
>   	if (r)
>   		goto error_unreserve;
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-09-29  7:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29  5:57 [PATCH v2] amd/amdgpu: Fix resv shared fence overflow xinhui pan
2020-09-29  7:00 ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.