linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
@ 2017-12-01  0:23 Lyude Paul
  2017-12-01  0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Alex Deucher, Sinclair Yeh, Christian König, David Airlie,
	linux-kernel, Nicolai Hähnle, dri-devel, Peter Zijlstra,
	Chunming Zhou, Michel Dänzer, Sumit Semwal, linux-media,
	linaro-mm-sig, Harish Kasiviswanathan, Alex Xie, Zhang, Jerry,
	Felix Kuehling, amd-gfx

I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:

DRI_PRIME=1 glxinfo

Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:

    =============================
    WARNING: suspicious RCU usage
    4.14.3Lyude-Test+ #2 Not tainted
    -----------------------------
    ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ext_image_dma_b/27451:
     #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]

    stack backtrace:
    CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
    Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
    Call Trace:
     dump_stack+0x8e/0xce
     lockdep_rcu_suspicious+0xc5/0x100
     reservation_object_copy_fences+0x292/0x2b0
     ? ttm_bo_unref+0x9f/0x3c0 [ttm]
     ttm_bo_unref+0xbd/0x3c0 [ttm]
     amdgpu_bo_unref+0x2a/0x50 [amdgpu]
     amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
     drm_gem_object_free+0x1f/0x40 [drm]
     drm_gem_object_put_unlocked+0x40/0xb0 [drm]
     drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
     drm_gem_object_release_handle+0x51/0x90 [drm]
     drm_gem_handle_delete+0x5e/0x90 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     drm_gem_close_ioctl+0x20/0x30 [drm]
     drm_ioctl_kernel+0x5d/0xb0 [drm]
     drm_ioctl+0x2f7/0x3b0 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     ? trace_hardirqs_on_caller+0xf4/0x190
     ? trace_hardirqs_on+0xd/0x10
     amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
     do_vfs_ioctl+0x93/0x670
     ? __fget+0x108/0x1f0
     SyS_ioctl+0x79/0x90
     entry_SYSCALL_64_fastpath+0x23/0xc2

I've also added the relevant fixes for the issue mentioned above.

Christian König (3):
  drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
  dma-buf: make reservation_object_copy_fences rcu save
  drm/amdgpu: reserve root PD while releasing it

Michel Dänzer (1):
  drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

 drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
 drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
 3 files changed, 74 insertions(+), 38 deletions(-)

--
2.14.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save
  2017-12-01  0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
@ 2017-12-01  0:23 ` Lyude Paul
  2017-12-01  8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
  2017-12-04 11:45 ` Greg KH
  2 siblings, 0 replies; 4+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Christian König, Sumit Semwal, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel

From: Christian König <christian.koenig@amd.com>

Stop requiring that the src reservation object is locked for this operation.

commit 39e16ba16c147e662bf9fbcee9a99d70d420382f upstream

Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1504551766-5093-1-git-send-email-deathsimple@vodafone.de
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/dma-buf/reservation.c | 56 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..b44d9d7db347 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -266,8 +266,7 @@ EXPORT_SYMBOL(reservation_object_add_excl_fence);
 * @dst: the destination reservation object
 * @src: the source reservation object
 *
-* Copy all fences from src to dst. Both src->lock as well as dst-lock must be
-* held.
+* Copy all fences from src to dst. dst-lock must be held.
 */
 int reservation_object_copy_fences(struct reservation_object *dst,
 				   struct reservation_object *src)
@@ -277,33 +276,62 @@ int reservation_object_copy_fences(struct reservation_object *dst,
 	size_t size;
 	unsigned i;
 
-	src_list = reservation_object_get_list(src);
+	rcu_read_lock();
+	src_list = rcu_dereference(src->fence);
 
+retry:
 	if (src_list) {
-		size = offsetof(typeof(*src_list),
-				shared[src_list->shared_count]);
+		unsigned shared_count = src_list->shared_count;
+
+		size = offsetof(typeof(*src_list), shared[shared_count]);
+		rcu_read_unlock();
+
 		dst_list = kmalloc(size, GFP_KERNEL);
 		if (!dst_list)
 			return -ENOMEM;
 
-		dst_list->shared_count = src_list->shared_count;
-		dst_list->shared_max = src_list->shared_count;
-		for (i = 0; i < src_list->shared_count; ++i)
-			dst_list->shared[i] =
-				dma_fence_get(src_list->shared[i]);
+		rcu_read_lock();
+		src_list = rcu_dereference(src->fence);
+		if (!src_list || src_list->shared_count > shared_count) {
+			kfree(dst_list);
+			goto retry;
+		}
+
+		dst_list->shared_count = 0;
+		dst_list->shared_max = shared_count;
+		for (i = 0; i < src_list->shared_count; ++i) {
+			struct dma_fence *fence;
+
+			fence = rcu_dereference(src_list->shared[i]);
+			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				     &fence->flags))
+				continue;
+
+			if (!dma_fence_get_rcu(fence)) {
+				kfree(dst_list);
+				src_list = rcu_dereference(src->fence);
+				goto retry;
+			}
+
+			if (dma_fence_is_signaled(fence)) {
+				dma_fence_put(fence);
+				continue;
+			}
+
+			dst_list->shared[dst_list->shared_count++] = fence;
+		}
 	} else {
 		dst_list = NULL;
 	}
 
+	new = dma_fence_get_rcu_safe(&src->fence_excl);
+	rcu_read_unlock();
+
 	kfree(dst->staged);
 	dst->staged = NULL;
 
 	src_list = reservation_object_get_list(dst);
-
 	old = reservation_object_get_excl(dst);
-	new = reservation_object_get_excl(src);
-
-	dma_fence_get(new);
 
 	preempt_disable();
 	write_seqcount_begin(&dst->seq);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
  2017-12-01  0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
  2017-12-01  0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
@ 2017-12-01  8:27 ` Christian König
  2017-12-04 11:45 ` Greg KH
  2 siblings, 0 replies; 4+ messages in thread
From: Christian König @ 2017-12-01  8:27 UTC (permalink / raw)
  To: Lyude Paul, stable
  Cc: Chunming Zhou, Nicolai Hähnle, Sinclair Yeh, David Airlie,
	Harish Kasiviswanathan, Felix Kuehling, Zhang, Jerry,
	Michel Dänzer, linux-kernel, dri-devel, Sumit Semwal,
	linaro-mm-sig, Peter Zijlstra, amd-gfx, Alex Deucher, Alex Xie,
	Christian König, linux-media

Am 01.12.2017 um 01:23 schrieb Lyude Paul:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
>
> DRI_PRIME=1 glxinfo

Acked-by: Christian König <christian.koenig@amd.com>

Thanks for taking care of this,
Christian.

>
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
>
>      =============================
>      WARNING: suspicious RCU usage
>      4.14.3Lyude-Test+ #2 Not tainted
>      -----------------------------
>      ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
>
>      other info that might help us debug this:
>
>      rcu_scheduler_active = 2, debug_locks = 1
>      1 lock held by ext_image_dma_b/27451:
>       #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
>
>      stack backtrace:
>      CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
>      Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
>      Call Trace:
>       dump_stack+0x8e/0xce
>       lockdep_rcu_suspicious+0xc5/0x100
>       reservation_object_copy_fences+0x292/0x2b0
>       ? ttm_bo_unref+0x9f/0x3c0 [ttm]
>       ttm_bo_unref+0xbd/0x3c0 [ttm]
>       amdgpu_bo_unref+0x2a/0x50 [amdgpu]
>       amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
>       drm_gem_object_free+0x1f/0x40 [drm]
>       drm_gem_object_put_unlocked+0x40/0xb0 [drm]
>       drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
>       drm_gem_object_release_handle+0x51/0x90 [drm]
>       drm_gem_handle_delete+0x5e/0x90 [drm]
>       ? drm_gem_handle_create+0x40/0x40 [drm]
>       drm_gem_close_ioctl+0x20/0x30 [drm]
>       drm_ioctl_kernel+0x5d/0xb0 [drm]
>       drm_ioctl+0x2f7/0x3b0 [drm]
>       ? drm_gem_handle_create+0x40/0x40 [drm]
>       ? trace_hardirqs_on_caller+0xf4/0x190
>       ? trace_hardirqs_on+0xd/0x10
>       amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>       do_vfs_ioctl+0x93/0x670
>       ? __fget+0x108/0x1f0
>       SyS_ioctl+0x79/0x90
>       entry_SYSCALL_64_fastpath+0x23/0xc2
>
> I've also added the relevant fixes for the issue mentioned above.
>
> Christian König (3):
>    drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
>    dma-buf: make reservation_object_copy_fences rcu save
>    drm/amdgpu: reserve root PD while releasing it
>
> Michel Dänzer (1):
>    drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
>
>   drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
>   drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
>   3 files changed, 74 insertions(+), 38 deletions(-)
>
> --
> 2.14.3
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
  2017-12-01  0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
  2017-12-01  0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
  2017-12-01  8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
@ 2017-12-04 11:45 ` Greg KH
  2 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2017-12-04 11:45 UTC (permalink / raw)
  To: Lyude Paul
  Cc: stable, Alex Deucher, Sinclair Yeh, Christian König,
	David Airlie, linux-kernel, Nicolai Hähnle, dri-devel,
	Peter Zijlstra, Chunming Zhou, Michel Dänzer, Sumit Semwal,
	linux-media, linaro-mm-sig, Harish Kasiviswanathan, Alex Xie,
	Zhang, Jerry, Felix Kuehling, amd-gfx

On Thu, Nov 30, 2017 at 07:23:02PM -0500, Lyude Paul wrote:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
> 
> DRI_PRIME=1 glxinfo
> 
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
> 
>     =============================
>     WARNING: suspicious RCU usage
>     4.14.3Lyude-Test+ #2 Not tainted
>     -----------------------------
>     ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
> 
>     other info that might help us debug this:
> 
>     rcu_scheduler_active = 2, debug_locks = 1
>     1 lock held by ext_image_dma_b/27451:
>      #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
> 
>     stack backtrace:
>     CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
>     Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
>     Call Trace:
>      dump_stack+0x8e/0xce
>      lockdep_rcu_suspicious+0xc5/0x100
>      reservation_object_copy_fences+0x292/0x2b0
>      ? ttm_bo_unref+0x9f/0x3c0 [ttm]
>      ttm_bo_unref+0xbd/0x3c0 [ttm]
>      amdgpu_bo_unref+0x2a/0x50 [amdgpu]
>      amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
>      drm_gem_object_free+0x1f/0x40 [drm]
>      drm_gem_object_put_unlocked+0x40/0xb0 [drm]
>      drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
>      drm_gem_object_release_handle+0x51/0x90 [drm]
>      drm_gem_handle_delete+0x5e/0x90 [drm]
>      ? drm_gem_handle_create+0x40/0x40 [drm]
>      drm_gem_close_ioctl+0x20/0x30 [drm]
>      drm_ioctl_kernel+0x5d/0xb0 [drm]
>      drm_ioctl+0x2f7/0x3b0 [drm]
>      ? drm_gem_handle_create+0x40/0x40 [drm]
>      ? trace_hardirqs_on_caller+0xf4/0x190
>      ? trace_hardirqs_on+0xd/0x10
>      amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>      do_vfs_ioctl+0x93/0x670
>      ? __fget+0x108/0x1f0
>      SyS_ioctl+0x79/0x90
>      entry_SYSCALL_64_fastpath+0x23/0xc2
> 
> I've also added the relevant fixes for the issue mentioned above.
> 
> Christian König (3):
>   drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
>   dma-buf: make reservation_object_copy_fences rcu save
>   drm/amdgpu: reserve root PD while releasing it
> 
> Michel Dänzer (1):
>   drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-12-04 11:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01  0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
2017-12-01  0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
2017-12-01  8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
2017-12-04 11:45 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).