* [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
@ 2017-12-01 0:23 Lyude Paul
2017-12-01 0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Lyude Paul @ 2017-12-01 0:23 UTC (permalink / raw)
To: stable
Cc: Alex Deucher, Sinclair Yeh, Christian König, David Airlie,
linux-kernel, Nicolai Hähnle, dri-devel, Peter Zijlstra,
Chunming Zhou, Michel Dänzer, Sumit Semwal, linux-media,
linaro-mm-sig, Harish Kasiviswanathan, Alex Xie, Zhang, Jerry,
Felix Kuehling, amd-gfx
I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
DRI_PRIME=1 glxinfo
Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:
=============================
WARNING: suspicious RCU usage
4.14.3Lyude-Test+ #2 Not tainted
-----------------------------
./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ext_image_dma_b/27451:
#0: (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
stack backtrace:
CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
Call Trace:
dump_stack+0x8e/0xce
lockdep_rcu_suspicious+0xc5/0x100
reservation_object_copy_fences+0x292/0x2b0
? ttm_bo_unref+0x9f/0x3c0 [ttm]
ttm_bo_unref+0xbd/0x3c0 [ttm]
amdgpu_bo_unref+0x2a/0x50 [amdgpu]
amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
drm_gem_object_free+0x1f/0x40 [drm]
drm_gem_object_put_unlocked+0x40/0xb0 [drm]
drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
drm_gem_object_release_handle+0x51/0x90 [drm]
drm_gem_handle_delete+0x5e/0x90 [drm]
? drm_gem_handle_create+0x40/0x40 [drm]
drm_gem_close_ioctl+0x20/0x30 [drm]
drm_ioctl_kernel+0x5d/0xb0 [drm]
drm_ioctl+0x2f7/0x3b0 [drm]
? drm_gem_handle_create+0x40/0x40 [drm]
? trace_hardirqs_on_caller+0xf4/0x190
? trace_hardirqs_on+0xd/0x10
amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
do_vfs_ioctl+0x93/0x670
? __fget+0x108/0x1f0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
I've also added the relevant fixes for the issue mentioned above.
Christian König (3):
drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
dma-buf: make reservation_object_copy_fences rcu save
drm/amdgpu: reserve root PD while releasing it
Michel Dänzer (1):
drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
drivers/dma-buf/reservation.c | 56 +++++++++++++++++++++++++---------
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
drivers/gpu/drm/ttm/ttm_bo.c | 43 +++++++++++++-------------
3 files changed, 74 insertions(+), 38 deletions(-)
--
2.14.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save
2017-12-01 0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
@ 2017-12-01 0:23 ` Lyude Paul
2017-12-01 8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
2017-12-04 11:45 ` Greg KH
2 siblings, 0 replies; 4+ messages in thread
From: Lyude Paul @ 2017-12-01 0:23 UTC (permalink / raw)
To: stable
Cc: Christian König, Sumit Semwal, linux-media, dri-devel,
linaro-mm-sig, linux-kernel
From: Christian König <christian.koenig@amd.com>
Stop requiring that the src reservation object is locked for this operation.
commit 39e16ba16c147e662bf9fbcee9a99d70d420382f upstream
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1504551766-5093-1-git-send-email-deathsimple@vodafone.de
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
drivers/dma-buf/reservation.c | 56 ++++++++++++++++++++++++++++++++-----------
1 file changed, 42 insertions(+), 14 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..b44d9d7db347 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -266,8 +266,7 @@ EXPORT_SYMBOL(reservation_object_add_excl_fence);
* @dst: the destination reservation object
* @src: the source reservation object
*
-* Copy all fences from src to dst. Both src->lock as well as dst-lock must be
-* held.
+* Copy all fences from src to dst. dst-lock must be held.
*/
int reservation_object_copy_fences(struct reservation_object *dst,
struct reservation_object *src)
@@ -277,33 +276,62 @@ int reservation_object_copy_fences(struct reservation_object *dst,
size_t size;
unsigned i;
- src_list = reservation_object_get_list(src);
+ rcu_read_lock();
+ src_list = rcu_dereference(src->fence);
+retry:
if (src_list) {
- size = offsetof(typeof(*src_list),
- shared[src_list->shared_count]);
+ unsigned shared_count = src_list->shared_count;
+
+ size = offsetof(typeof(*src_list), shared[shared_count]);
+ rcu_read_unlock();
+
dst_list = kmalloc(size, GFP_KERNEL);
if (!dst_list)
return -ENOMEM;
- dst_list->shared_count = src_list->shared_count;
- dst_list->shared_max = src_list->shared_count;
- for (i = 0; i < src_list->shared_count; ++i)
- dst_list->shared[i] =
- dma_fence_get(src_list->shared[i]);
+ rcu_read_lock();
+ src_list = rcu_dereference(src->fence);
+ if (!src_list || src_list->shared_count > shared_count) {
+ kfree(dst_list);
+ goto retry;
+ }
+
+ dst_list->shared_count = 0;
+ dst_list->shared_max = shared_count;
+ for (i = 0; i < src_list->shared_count; ++i) {
+ struct dma_fence *fence;
+
+ fence = rcu_dereference(src_list->shared[i]);
+ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+ &fence->flags))
+ continue;
+
+ if (!dma_fence_get_rcu(fence)) {
+ kfree(dst_list);
+ src_list = rcu_dereference(src->fence);
+ goto retry;
+ }
+
+ if (dma_fence_is_signaled(fence)) {
+ dma_fence_put(fence);
+ continue;
+ }
+
+ dst_list->shared[dst_list->shared_count++] = fence;
+ }
} else {
dst_list = NULL;
}
+ new = dma_fence_get_rcu_safe(&src->fence_excl);
+ rcu_read_unlock();
+
kfree(dst->staged);
dst->staged = NULL;
src_list = reservation_object_get_list(dst);
-
old = reservation_object_get_excl(dst);
- new = reservation_object_get_excl(src);
-
- dma_fence_get(new);
preempt_disable();
write_seqcount_begin(&dst->seq);
--
2.14.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
2017-12-01 0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
2017-12-01 0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
@ 2017-12-01 8:27 ` Christian König
2017-12-04 11:45 ` Greg KH
2 siblings, 0 replies; 4+ messages in thread
From: Christian König @ 2017-12-01 8:27 UTC (permalink / raw)
To: Lyude Paul, stable
Cc: Chunming Zhou, Nicolai Hähnle, Sinclair Yeh, David Airlie,
Harish Kasiviswanathan, Felix Kuehling, Zhang, Jerry,
Michel Dänzer, linux-kernel, dri-devel, Sumit Semwal,
linaro-mm-sig, Peter Zijlstra, amd-gfx, Alex Deucher, Alex Xie,
Christian König, linux-media
Am 01.12.2017 um 01:23 schrieb Lyude Paul:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
>
> DRI_PRIME=1 glxinfo
Acked-by: Christian König <christian.koenig@amd.com>
Thanks for taking care of this,
Christian.
>
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
>
> =============================
> WARNING: suspicious RCU usage
> 4.14.3Lyude-Test+ #2 Not tainted
> -----------------------------
> ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by ext_image_dma_b/27451:
> #0: (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
>
> stack backtrace:
> CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
> Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
> Call Trace:
> dump_stack+0x8e/0xce
> lockdep_rcu_suspicious+0xc5/0x100
> reservation_object_copy_fences+0x292/0x2b0
> ? ttm_bo_unref+0x9f/0x3c0 [ttm]
> ttm_bo_unref+0xbd/0x3c0 [ttm]
> amdgpu_bo_unref+0x2a/0x50 [amdgpu]
> amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
> drm_gem_object_free+0x1f/0x40 [drm]
> drm_gem_object_put_unlocked+0x40/0xb0 [drm]
> drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
> drm_gem_object_release_handle+0x51/0x90 [drm]
> drm_gem_handle_delete+0x5e/0x90 [drm]
> ? drm_gem_handle_create+0x40/0x40 [drm]
> drm_gem_close_ioctl+0x20/0x30 [drm]
> drm_ioctl_kernel+0x5d/0xb0 [drm]
> drm_ioctl+0x2f7/0x3b0 [drm]
> ? drm_gem_handle_create+0x40/0x40 [drm]
> ? trace_hardirqs_on_caller+0xf4/0x190
> ? trace_hardirqs_on+0xd/0x10
> amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
> do_vfs_ioctl+0x93/0x670
> ? __fget+0x108/0x1f0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> I've also added the relevant fixes for the issue mentioned above.
>
> Christian König (3):
> drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
> dma-buf: make reservation_object_copy_fences rcu save
> drm/amdgpu: reserve root PD while releasing it
>
> Michel Dänzer (1):
> drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
>
> drivers/dma-buf/reservation.c | 56 +++++++++++++++++++++++++---------
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
> drivers/gpu/drm/ttm/ttm_bo.c | 43 +++++++++++++-------------
> 3 files changed, 74 insertions(+), 38 deletions(-)
>
> --
> 2.14.3
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
2017-12-01 0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
2017-12-01 0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
2017-12-01 8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
@ 2017-12-04 11:45 ` Greg KH
2 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2017-12-04 11:45 UTC (permalink / raw)
To: Lyude Paul
Cc: stable, Alex Deucher, Sinclair Yeh, Christian König,
David Airlie, linux-kernel, Nicolai Hähnle, dri-devel,
Peter Zijlstra, Chunming Zhou, Michel Dänzer, Sumit Semwal,
linux-media, linaro-mm-sig, Harish Kasiviswanathan, Alex Xie,
Zhang, Jerry, Felix Kuehling, amd-gfx
On Thu, Nov 30, 2017 at 07:23:02PM -0500, Lyude Paul wrote:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
>
> DRI_PRIME=1 glxinfo
>
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
>
> =============================
> WARNING: suspicious RCU usage
> 4.14.3Lyude-Test+ #2 Not tainted
> -----------------------------
> ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by ext_image_dma_b/27451:
> #0: (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
>
> stack backtrace:
> CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
> Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
> Call Trace:
> dump_stack+0x8e/0xce
> lockdep_rcu_suspicious+0xc5/0x100
> reservation_object_copy_fences+0x292/0x2b0
> ? ttm_bo_unref+0x9f/0x3c0 [ttm]
> ttm_bo_unref+0xbd/0x3c0 [ttm]
> amdgpu_bo_unref+0x2a/0x50 [amdgpu]
> amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
> drm_gem_object_free+0x1f/0x40 [drm]
> drm_gem_object_put_unlocked+0x40/0xb0 [drm]
> drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
> drm_gem_object_release_handle+0x51/0x90 [drm]
> drm_gem_handle_delete+0x5e/0x90 [drm]
> ? drm_gem_handle_create+0x40/0x40 [drm]
> drm_gem_close_ioctl+0x20/0x30 [drm]
> drm_ioctl_kernel+0x5d/0xb0 [drm]
> drm_ioctl+0x2f7/0x3b0 [drm]
> ? drm_gem_handle_create+0x40/0x40 [drm]
> ? trace_hardirqs_on_caller+0xf4/0x190
> ? trace_hardirqs_on+0xd/0x10
> amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
> do_vfs_ioctl+0x93/0x670
> ? __fget+0x108/0x1f0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> I've also added the relevant fixes for the issue mentioned above.
>
> Christian König (3):
> drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
> dma-buf: make reservation_object_copy_fences rcu save
> drm/amdgpu: reserve root PD while releasing it
>
> Michel Dänzer (1):
> drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
All now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-12-04 11:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
2017-12-01 0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
2017-12-01 8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
2017-12-04 11:45 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).