All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
@ 2017-12-01  0:23 ` Lyude Paul
  0 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Alex Deucher, Sinclair Yeh, Christian König, David Airlie,
	linux-kernel, Nicolai Hähnle, dri-devel, Peter Zijlstra,
	Chunming Zhou, Michel Dänzer, Sumit Semwal, linux-media,
	linaro-mm-sig, Harish Kasiviswanathan, Alex Xie, Zhang, Jerry,
	Felix Kuehling, amd-gfx

I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:

DRI_PRIME=1 glxinfo

Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:

    =============================
    WARNING: suspicious RCU usage
    4.14.3Lyude-Test+ #2 Not tainted
    -----------------------------
    ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ext_image_dma_b/27451:
     #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]

    stack backtrace:
    CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
    Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
    Call Trace:
     dump_stack+0x8e/0xce
     lockdep_rcu_suspicious+0xc5/0x100
     reservation_object_copy_fences+0x292/0x2b0
     ? ttm_bo_unref+0x9f/0x3c0 [ttm]
     ttm_bo_unref+0xbd/0x3c0 [ttm]
     amdgpu_bo_unref+0x2a/0x50 [amdgpu]
     amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
     drm_gem_object_free+0x1f/0x40 [drm]
     drm_gem_object_put_unlocked+0x40/0xb0 [drm]
     drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
     drm_gem_object_release_handle+0x51/0x90 [drm]
     drm_gem_handle_delete+0x5e/0x90 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     drm_gem_close_ioctl+0x20/0x30 [drm]
     drm_ioctl_kernel+0x5d/0xb0 [drm]
     drm_ioctl+0x2f7/0x3b0 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     ? trace_hardirqs_on_caller+0xf4/0x190
     ? trace_hardirqs_on+0xd/0x10
     amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
     do_vfs_ioctl+0x93/0x670
     ? __fget+0x108/0x1f0
     SyS_ioctl+0x79/0x90
     entry_SYSCALL_64_fastpath+0x23/0xc2

I've also added the relevant fixes for the issue mentioned above.

Christian König (3):
  drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
  dma-buf: make reservation_object_copy_fences rcu save
  drm/amdgpu: reserve root PD while releasing it

Michel Dänzer (1):
  drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

 drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
 drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
 3 files changed, 74 insertions(+), 38 deletions(-)

--
2.14.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
@ 2017-12-01  0:23 ` Lyude Paul
  0 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable-u79uwXL29TY76Z2rM5mHXA
  Cc: Chunming Zhou, Nicolai Hähnle, Sinclair Yeh, David Airlie,
	Harish Kasiviswanathan, Felix Kuehling, Zhang, Jerry,
	Michel Dänzer, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Sumit Semwal,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw, Peter Zijlstra,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Alex Deucher, Alex Xie,
	Christian König, linux-media-u79uwXL29TY76Z2rM5mHXA

I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:

DRI_PRIME=1 glxinfo

Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:

    =============================
    WARNING: suspicious RCU usage
    4.14.3Lyude-Test+ #2 Not tainted
    -----------------------------
    ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ext_image_dma_b/27451:
     #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]

    stack backtrace:
    CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
    Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
    Call Trace:
     dump_stack+0x8e/0xce
     lockdep_rcu_suspicious+0xc5/0x100
     reservation_object_copy_fences+0x292/0x2b0
     ? ttm_bo_unref+0x9f/0x3c0 [ttm]
     ttm_bo_unref+0xbd/0x3c0 [ttm]
     amdgpu_bo_unref+0x2a/0x50 [amdgpu]
     amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
     drm_gem_object_free+0x1f/0x40 [drm]
     drm_gem_object_put_unlocked+0x40/0xb0 [drm]
     drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
     drm_gem_object_release_handle+0x51/0x90 [drm]
     drm_gem_handle_delete+0x5e/0x90 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     drm_gem_close_ioctl+0x20/0x30 [drm]
     drm_ioctl_kernel+0x5d/0xb0 [drm]
     drm_ioctl+0x2f7/0x3b0 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     ? trace_hardirqs_on_caller+0xf4/0x190
     ? trace_hardirqs_on+0xd/0x10
     amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
     do_vfs_ioctl+0x93/0x670
     ? __fget+0x108/0x1f0
     SyS_ioctl+0x79/0x90
     entry_SYSCALL_64_fastpath+0x23/0xc2

I've also added the relevant fixes for the issue mentioned above.

Christian König (3):
  drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
  dma-buf: make reservation_object_copy_fences rcu save
  drm/amdgpu: reserve root PD while releasing it

Michel Dänzer (1):
  drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

 drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
 drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
 3 files changed, 74 insertions(+), 38 deletions(-)

--
2.14.3

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
  2017-12-01  0:23 ` Lyude Paul
@ 2017-12-01  0:23   ` Lyude Paul
  -1 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Christian König, David Airlie, Alex Deucher, Sinclair Yeh,
	Nicolai Hähnle, Chunming Zhou, Michel Dänzer,
	Peter Zijlstra, dri-devel, linux-kernel

From: Christian König <christian.koenig@amd.com>

With shared reservation objects __ttm_bo_reserve() can easily fail even on
destroyed BOs. This prevents correct handling when we need to individualize
the reservation object.

Fix this by individualizing the object before even trying to reserve it.

commit 378e2d5b504fe0231c557751e58b80fcf717cc20 upstream

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 180ce6296416..bee77d31895b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -440,28 +440,29 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 	struct ttm_bo_global *glob = bo->glob;
 	int ret;
 
+	ret = ttm_bo_individualize_resv(bo);
+	if (ret) {
+		/* Last resort, if we fail to allocate memory for the
+		 * fences block for the BO to become idle
+		 */
+		reservation_object_wait_timeout_rcu(bo->resv, true, false,
+						    30 * HZ);
+		spin_lock(&glob->lru_lock);
+		goto error;
+	}
+
 	spin_lock(&glob->lru_lock);
 	ret = __ttm_bo_reserve(bo, false, true, NULL);
-
 	if (!ret) {
-		if (!ttm_bo_wait(bo, false, true)) {
+		if (reservation_object_test_signaled_rcu(&bo->ttm_resv, true)) {
 			ttm_bo_del_from_lru(bo);
 			spin_unlock(&glob->lru_lock);
+			if (bo->resv != &bo->ttm_resv)
+				reservation_object_unlock(&bo->ttm_resv);
 			ttm_bo_cleanup_memtype_use(bo);
-
 			return;
 		}
 
-		ret = ttm_bo_individualize_resv(bo);
-		if (ret) {
-			/* Last resort, if we fail to allocate memory for the
-			 * fences block for the BO to become idle and free it.
-			 */
-			spin_unlock(&glob->lru_lock);
-			ttm_bo_wait(bo, true, true);
-			ttm_bo_cleanup_memtype_use(bo);
-			return;
-		}
 		ttm_bo_flush_all_fences(bo);
 
 		/*
@@ -474,11 +475,12 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 			ttm_bo_add_to_lru(bo);
 		}
 
-		if (bo->resv != &bo->ttm_resv)
-			reservation_object_unlock(&bo->ttm_resv);
 		__ttm_bo_unreserve(bo);
 	}
+	if (bo->resv != &bo->ttm_resv)
+		reservation_object_unlock(&bo->ttm_resv);
 
+error:
 	kref_get(&bo->list_kref);
 	list_add_tail(&bo->ddestroy, &bdev->ddestroy);
 	spin_unlock(&glob->lru_lock);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/4] drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
@ 2017-12-01  0:23   ` Lyude Paul
  0 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Nicolai Hähnle, David Airlie, Michel Dänzer,
	linux-kernel, dri-devel, Peter Zijlstra, Alex Deucher,
	Christian König

From: Christian König <christian.koenig@amd.com>

With shared reservation objects __ttm_bo_reserve() can easily fail even on
destroyed BOs. This prevents correct handling when we need to individualize
the reservation object.

Fix this by individualizing the object before even trying to reserve it.

commit 378e2d5b504fe0231c557751e58b80fcf717cc20 upstream

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 180ce6296416..bee77d31895b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -440,28 +440,29 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 	struct ttm_bo_global *glob = bo->glob;
 	int ret;
 
+	ret = ttm_bo_individualize_resv(bo);
+	if (ret) {
+		/* Last resort, if we fail to allocate memory for the
+		 * fences block for the BO to become idle
+		 */
+		reservation_object_wait_timeout_rcu(bo->resv, true, false,
+						    30 * HZ);
+		spin_lock(&glob->lru_lock);
+		goto error;
+	}
+
 	spin_lock(&glob->lru_lock);
 	ret = __ttm_bo_reserve(bo, false, true, NULL);
-
 	if (!ret) {
-		if (!ttm_bo_wait(bo, false, true)) {
+		if (reservation_object_test_signaled_rcu(&bo->ttm_resv, true)) {
 			ttm_bo_del_from_lru(bo);
 			spin_unlock(&glob->lru_lock);
+			if (bo->resv != &bo->ttm_resv)
+				reservation_object_unlock(&bo->ttm_resv);
 			ttm_bo_cleanup_memtype_use(bo);
-
 			return;
 		}
 
-		ret = ttm_bo_individualize_resv(bo);
-		if (ret) {
-			/* Last resort, if we fail to allocate memory for the
-			 * fences block for the BO to become idle and free it.
-			 */
-			spin_unlock(&glob->lru_lock);
-			ttm_bo_wait(bo, true, true);
-			ttm_bo_cleanup_memtype_use(bo);
-			return;
-		}
 		ttm_bo_flush_all_fences(bo);
 
 		/*
@@ -474,11 +475,12 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 			ttm_bo_add_to_lru(bo);
 		}
 
-		if (bo->resv != &bo->ttm_resv)
-			reservation_object_unlock(&bo->ttm_resv);
 		__ttm_bo_unreserve(bo);
 	}
+	if (bo->resv != &bo->ttm_resv)
+		reservation_object_unlock(&bo->ttm_resv);
 
+error:
 	kref_get(&bo->list_kref);
 	list_add_tail(&bo->ddestroy, &bdev->ddestroy);
 	spin_unlock(&glob->lru_lock);
-- 
2.14.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save
  2017-12-01  0:23 ` Lyude Paul
@ 2017-12-01  0:23   ` Lyude Paul
  -1 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Christian König, Sumit Semwal, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel

From: Christian König <christian.koenig@amd.com>

Stop requiring that the src reservation object is locked for this operation.

commit 39e16ba16c147e662bf9fbcee9a99d70d420382f upstream

Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1504551766-5093-1-git-send-email-deathsimple@vodafone.de
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/dma-buf/reservation.c | 56 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..b44d9d7db347 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -266,8 +266,7 @@ EXPORT_SYMBOL(reservation_object_add_excl_fence);
 * @dst: the destination reservation object
 * @src: the source reservation object
 *
-* Copy all fences from src to dst. Both src->lock as well as dst-lock must be
-* held.
+* Copy all fences from src to dst. dst-lock must be held.
 */
 int reservation_object_copy_fences(struct reservation_object *dst,
 				   struct reservation_object *src)
@@ -277,33 +276,62 @@ int reservation_object_copy_fences(struct reservation_object *dst,
 	size_t size;
 	unsigned i;
 
-	src_list = reservation_object_get_list(src);
+	rcu_read_lock();
+	src_list = rcu_dereference(src->fence);
 
+retry:
 	if (src_list) {
-		size = offsetof(typeof(*src_list),
-				shared[src_list->shared_count]);
+		unsigned shared_count = src_list->shared_count;
+
+		size = offsetof(typeof(*src_list), shared[shared_count]);
+		rcu_read_unlock();
+
 		dst_list = kmalloc(size, GFP_KERNEL);
 		if (!dst_list)
 			return -ENOMEM;
 
-		dst_list->shared_count = src_list->shared_count;
-		dst_list->shared_max = src_list->shared_count;
-		for (i = 0; i < src_list->shared_count; ++i)
-			dst_list->shared[i] =
-				dma_fence_get(src_list->shared[i]);
+		rcu_read_lock();
+		src_list = rcu_dereference(src->fence);
+		if (!src_list || src_list->shared_count > shared_count) {
+			kfree(dst_list);
+			goto retry;
+		}
+
+		dst_list->shared_count = 0;
+		dst_list->shared_max = shared_count;
+		for (i = 0; i < src_list->shared_count; ++i) {
+			struct dma_fence *fence;
+
+			fence = rcu_dereference(src_list->shared[i]);
+			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				     &fence->flags))
+				continue;
+
+			if (!dma_fence_get_rcu(fence)) {
+				kfree(dst_list);
+				src_list = rcu_dereference(src->fence);
+				goto retry;
+			}
+
+			if (dma_fence_is_signaled(fence)) {
+				dma_fence_put(fence);
+				continue;
+			}
+
+			dst_list->shared[dst_list->shared_count++] = fence;
+		}
 	} else {
 		dst_list = NULL;
 	}
 
+	new = dma_fence_get_rcu_safe(&src->fence_excl);
+	rcu_read_unlock();
+
 	kfree(dst->staged);
 	dst->staged = NULL;
 
 	src_list = reservation_object_get_list(dst);
-
 	old = reservation_object_get_excl(dst);
-	new = reservation_object_get_excl(src);
-
-	dma_fence_get(new);
 
 	preempt_disable();
 	write_seqcount_begin(&dst->seq);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save
@ 2017-12-01  0:23   ` Lyude Paul
  0 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: linux-kernel, dri-devel, linaro-mm-sig, Christian König,
	linux-media

From: Christian König <christian.koenig@amd.com>

Stop requiring that the src reservation object is locked for this operation.

commit 39e16ba16c147e662bf9fbcee9a99d70d420382f upstream

Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1504551766-5093-1-git-send-email-deathsimple@vodafone.de
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/dma-buf/reservation.c | 56 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..b44d9d7db347 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -266,8 +266,7 @@ EXPORT_SYMBOL(reservation_object_add_excl_fence);
 * @dst: the destination reservation object
 * @src: the source reservation object
 *
-* Copy all fences from src to dst. Both src->lock as well as dst-lock must be
-* held.
+* Copy all fences from src to dst. dst-lock must be held.
 */
 int reservation_object_copy_fences(struct reservation_object *dst,
 				   struct reservation_object *src)
@@ -277,33 +276,62 @@ int reservation_object_copy_fences(struct reservation_object *dst,
 	size_t size;
 	unsigned i;
 
-	src_list = reservation_object_get_list(src);
+	rcu_read_lock();
+	src_list = rcu_dereference(src->fence);
 
+retry:
 	if (src_list) {
-		size = offsetof(typeof(*src_list),
-				shared[src_list->shared_count]);
+		unsigned shared_count = src_list->shared_count;
+
+		size = offsetof(typeof(*src_list), shared[shared_count]);
+		rcu_read_unlock();
+
 		dst_list = kmalloc(size, GFP_KERNEL);
 		if (!dst_list)
 			return -ENOMEM;
 
-		dst_list->shared_count = src_list->shared_count;
-		dst_list->shared_max = src_list->shared_count;
-		for (i = 0; i < src_list->shared_count; ++i)
-			dst_list->shared[i] =
-				dma_fence_get(src_list->shared[i]);
+		rcu_read_lock();
+		src_list = rcu_dereference(src->fence);
+		if (!src_list || src_list->shared_count > shared_count) {
+			kfree(dst_list);
+			goto retry;
+		}
+
+		dst_list->shared_count = 0;
+		dst_list->shared_max = shared_count;
+		for (i = 0; i < src_list->shared_count; ++i) {
+			struct dma_fence *fence;
+
+			fence = rcu_dereference(src_list->shared[i]);
+			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				     &fence->flags))
+				continue;
+
+			if (!dma_fence_get_rcu(fence)) {
+				kfree(dst_list);
+				src_list = rcu_dereference(src->fence);
+				goto retry;
+			}
+
+			if (dma_fence_is_signaled(fence)) {
+				dma_fence_put(fence);
+				continue;
+			}
+
+			dst_list->shared[dst_list->shared_count++] = fence;
+		}
 	} else {
 		dst_list = NULL;
 	}
 
+	new = dma_fence_get_rcu_safe(&src->fence_excl);
+	rcu_read_unlock();
+
 	kfree(dst->staged);
 	dst->staged = NULL;
 
 	src_list = reservation_object_get_list(dst);
-
 	old = reservation_object_get_excl(dst);
-	new = reservation_object_get_excl(src);
-
-	dma_fence_get(new);
 
 	preempt_disable();
 	write_seqcount_begin(&dst->seq);
-- 
2.14.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] drm/amdgpu: reserve root PD while releasing it
  2017-12-01  0:23 ` Lyude Paul
@ 2017-12-01  0:23   ` Lyude Paul
  -1 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Christian König, Alex Deucher, David Airlie, Chunming Zhou,
	Zhang, Jerry, Felix Kuehling, Harish Kasiviswanathan, Alex Xie,
	amd-gfx, dri-devel, linux-kernel

From: Christian König <christian.koenig@amd.com>

Otherwise somebody could try to evict it at the same time and try to use
half torn down structures.

commit 2642cf110d08a403f585a051e4cbf45a90b3adea upstream

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bd20ff018512..591c6dc2c1f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2586,7 +2586,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
 	struct amdgpu_bo_va_mapping *mapping, *tmp;
 	bool prt_fini_needed = !!adev->gart.gart_funcs->set_prt;
-	int i;
+	struct amdgpu_bo *root;
+	int i, r;
 
 	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
 
@@ -2609,7 +2610,15 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		amdgpu_vm_free_mapping(adev, vm, mapping, NULL);
 	}
 
-	amdgpu_vm_free_levels(&vm->root);
+	root = amdgpu_bo_ref(vm->root.bo);
+	r = amdgpu_bo_reserve(root, true);
+	if (r) {
+		dev_err(adev->dev, "Leaking page tables because BO reservation failed\n");
+	} else {
+		amdgpu_vm_free_levels(&vm->root);
+		amdgpu_bo_unreserve(root);
+	}
+	amdgpu_bo_unref(&root);
 	dma_fence_put(vm->last_dir_update);
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		amdgpu_vm_free_reserved_vmid(adev, vm, i);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] drm/amdgpu: reserve root PD while releasing it
@ 2017-12-01  0:23   ` Lyude Paul
  0 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: David Airlie, Felix Kuehling, Harish Kasiviswanathan,
	linux-kernel, amd-gfx, Zhang, Jerry, dri-devel, Alex Deucher,
	Christian König, Alex Xie

From: Christian König <christian.koenig@amd.com>

Otherwise somebody could try to evict it at the same time and try to use
half torn down structures.

commit 2642cf110d08a403f585a051e4cbf45a90b3adea upstream

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bd20ff018512..591c6dc2c1f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2586,7 +2586,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
 	struct amdgpu_bo_va_mapping *mapping, *tmp;
 	bool prt_fini_needed = !!adev->gart.gart_funcs->set_prt;
-	int i;
+	struct amdgpu_bo *root;
+	int i, r;
 
 	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
 
@@ -2609,7 +2610,15 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		amdgpu_vm_free_mapping(adev, vm, mapping, NULL);
 	}
 
-	amdgpu_vm_free_levels(&vm->root);
+	root = amdgpu_bo_ref(vm->root.bo);
+	r = amdgpu_bo_reserve(root, true);
+	if (r) {
+		dev_err(adev->dev, "Leaking page tables because BO reservation failed\n");
+	} else {
+		amdgpu_vm_free_levels(&vm->root);
+		amdgpu_bo_unreserve(root);
+	}
+	amdgpu_bo_unref(&root);
 	dma_fence_put(vm->last_dir_update);
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		amdgpu_vm_free_reserved_vmid(adev, vm, i);
-- 
2.14.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
  2017-12-01  0:23 ` Lyude Paul
@ 2017-12-01  0:23   ` Lyude Paul
  -1 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Michel Dänzer, David Airlie, Alex Deucher,
	Christian König, Sinclair Yeh, Nicolai Hähnle,
	Chunming Zhou, Peter Zijlstra, dri-devel, linux-kernel

From: Michel Dänzer <michel.daenzer@amd.com>

Fixes a use-after-free due to a race condition in
ttm_bo_cleanup_refs_and_unlock, which allows one task to reserve a BO
and destroy its ttm_resv while another task is waiting for it to signal
in reservation_object_wait_timeout_rcu.

commit e1fc12c5d9ad06a2a74e97a91f1b0c5f4c723b50 upstream

v2:
* Always initialize bo->ttm_resv in ttm_bo_init_reserved
 (Christian König)

Fixes: 0d2bd2ae045d "drm/ttm: fix memory leak while individualizing BOs"
Reviewed-by: Chunming Zhou <david1.zhou@amd.com> # v1
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bee77d31895b..c088703777e2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -150,8 +150,7 @@ static void ttm_bo_release_list(struct kref *list_kref)
 	ttm_tt_destroy(bo->ttm);
 	atomic_dec(&bo->glob->bo_count);
 	dma_fence_put(bo->moving);
-	if (bo->resv == &bo->ttm_resv)
-		reservation_object_fini(&bo->ttm_resv);
+	reservation_object_fini(&bo->ttm_resv);
 	mutex_destroy(&bo->wu_mutex);
 	if (bo->destroy)
 		bo->destroy(bo);
@@ -402,14 +401,11 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
 	if (bo->resv == &bo->ttm_resv)
 		return 0;
 
-	reservation_object_init(&bo->ttm_resv);
 	BUG_ON(!reservation_object_trylock(&bo->ttm_resv));
 
 	r = reservation_object_copy_fences(&bo->ttm_resv, bo->resv);
-	if (r) {
+	if (r)
 		reservation_object_unlock(&bo->ttm_resv);
-		reservation_object_fini(&bo->ttm_resv);
-	}
 
 	return r;
 }
@@ -459,6 +455,7 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 			spin_unlock(&glob->lru_lock);
 			if (bo->resv != &bo->ttm_resv)
 				reservation_object_unlock(&bo->ttm_resv);
+
 			ttm_bo_cleanup_memtype_use(bo);
 			return;
 		}
@@ -1205,8 +1202,8 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
 		lockdep_assert_held(&bo->resv->lock.base);
 	} else {
 		bo->resv = &bo->ttm_resv;
-		reservation_object_init(&bo->ttm_resv);
 	}
+	reservation_object_init(&bo->ttm_resv);
 	atomic_inc(&bo->glob->bo_count);
 	drm_vma_node_reset(&bo->vma_node);
 	bo->priority = 0;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
@ 2017-12-01  0:23   ` Lyude Paul
  0 siblings, 0 replies; 13+ messages in thread
From: Lyude Paul @ 2017-12-01  0:23 UTC (permalink / raw)
  To: stable
  Cc: Nicolai Hähnle, David Airlie, Michel Dänzer,
	linux-kernel, dri-devel, Peter Zijlstra, Alex Deucher,
	Christian König

From: Michel Dänzer <michel.daenzer@amd.com>

Fixes a use-after-free due to a race condition in
ttm_bo_cleanup_refs_and_unlock, which allows one task to reserve a BO
and destroy its ttm_resv while another task is waiting for it to signal
in reservation_object_wait_timeout_rcu.

commit e1fc12c5d9ad06a2a74e97a91f1b0c5f4c723b50 upstream

v2:
* Always initialize bo->ttm_resv in ttm_bo_init_reserved
 (Christian König)

Fixes: 0d2bd2ae045d "drm/ttm: fix memory leak while individualizing BOs"
Reviewed-by: Chunming Zhou <david1.zhou@amd.com> # v1
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bee77d31895b..c088703777e2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -150,8 +150,7 @@ static void ttm_bo_release_list(struct kref *list_kref)
 	ttm_tt_destroy(bo->ttm);
 	atomic_dec(&bo->glob->bo_count);
 	dma_fence_put(bo->moving);
-	if (bo->resv == &bo->ttm_resv)
-		reservation_object_fini(&bo->ttm_resv);
+	reservation_object_fini(&bo->ttm_resv);
 	mutex_destroy(&bo->wu_mutex);
 	if (bo->destroy)
 		bo->destroy(bo);
@@ -402,14 +401,11 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
 	if (bo->resv == &bo->ttm_resv)
 		return 0;
 
-	reservation_object_init(&bo->ttm_resv);
 	BUG_ON(!reservation_object_trylock(&bo->ttm_resv));
 
 	r = reservation_object_copy_fences(&bo->ttm_resv, bo->resv);
-	if (r) {
+	if (r)
 		reservation_object_unlock(&bo->ttm_resv);
-		reservation_object_fini(&bo->ttm_resv);
-	}
 
 	return r;
 }
@@ -459,6 +455,7 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo)
 			spin_unlock(&glob->lru_lock);
 			if (bo->resv != &bo->ttm_resv)
 				reservation_object_unlock(&bo->ttm_resv);
+
 			ttm_bo_cleanup_memtype_use(bo);
 			return;
 		}
@@ -1205,8 +1202,8 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
 		lockdep_assert_held(&bo->resv->lock.base);
 	} else {
 		bo->resv = &bo->ttm_resv;
-		reservation_object_init(&bo->ttm_resv);
 	}
+	reservation_object_init(&bo->ttm_resv);
 	atomic_inc(&bo->glob->bo_count);
 	drm_vma_node_reset(&bo->vma_node);
 	bo->priority = 0;
-- 
2.14.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
  2017-12-01  0:23 ` Lyude Paul
                   ` (4 preceding siblings ...)
  (?)
@ 2017-12-01  8:27 ` Christian König
  -1 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2017-12-01  8:27 UTC (permalink / raw)
  To: Lyude Paul, stable
  Cc: Chunming Zhou, Nicolai Hähnle, Sinclair Yeh, David Airlie,
	Harish Kasiviswanathan, Felix Kuehling, Zhang, Jerry,
	Michel Dänzer, linux-kernel, dri-devel, Sumit Semwal,
	linaro-mm-sig, Peter Zijlstra, amd-gfx, Alex Deucher, Alex Xie,
	Christian König, linux-media

Am 01.12.2017 um 01:23 schrieb Lyude Paul:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
>
> DRI_PRIME=1 glxinfo

Acked-by: Christian König <christian.koenig@amd.com>

Thanks for taking care of this,
Christian.

>
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
>
>      =============================
>      WARNING: suspicious RCU usage
>      4.14.3Lyude-Test+ #2 Not tainted
>      -----------------------------
>      ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
>
>      other info that might help us debug this:
>
>      rcu_scheduler_active = 2, debug_locks = 1
>      1 lock held by ext_image_dma_b/27451:
>       #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
>
>      stack backtrace:
>      CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
>      Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
>      Call Trace:
>       dump_stack+0x8e/0xce
>       lockdep_rcu_suspicious+0xc5/0x100
>       reservation_object_copy_fences+0x292/0x2b0
>       ? ttm_bo_unref+0x9f/0x3c0 [ttm]
>       ttm_bo_unref+0xbd/0x3c0 [ttm]
>       amdgpu_bo_unref+0x2a/0x50 [amdgpu]
>       amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
>       drm_gem_object_free+0x1f/0x40 [drm]
>       drm_gem_object_put_unlocked+0x40/0xb0 [drm]
>       drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
>       drm_gem_object_release_handle+0x51/0x90 [drm]
>       drm_gem_handle_delete+0x5e/0x90 [drm]
>       ? drm_gem_handle_create+0x40/0x40 [drm]
>       drm_gem_close_ioctl+0x20/0x30 [drm]
>       drm_ioctl_kernel+0x5d/0xb0 [drm]
>       drm_ioctl+0x2f7/0x3b0 [drm]
>       ? drm_gem_handle_create+0x40/0x40 [drm]
>       ? trace_hardirqs_on_caller+0xf4/0x190
>       ? trace_hardirqs_on+0xd/0x10
>       amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>       do_vfs_ioctl+0x93/0x670
>       ? __fget+0x108/0x1f0
>       SyS_ioctl+0x79/0x90
>       entry_SYSCALL_64_fastpath+0x23/0xc2
>
> I've also added the relevant fixes for the issue mentioned above.
>
> Christian König (3):
>    drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
>    dma-buf: make reservation_object_copy_fences rcu save
>    drm/amdgpu: reserve root PD while releasing it
>
> Michel Dänzer (1):
>    drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list
>
>   drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
>   drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
>   3 files changed, 74 insertions(+), 38 deletions(-)
>
> --
> 2.14.3
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
  2017-12-01  0:23 ` Lyude Paul
@ 2017-12-04 11:45   ` Greg KH
  -1 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2017-12-04 11:45 UTC (permalink / raw)
  To: Lyude Paul
  Cc: stable, Alex Deucher, Sinclair Yeh, Christian König,
	David Airlie, linux-kernel, Nicolai Hähnle, dri-devel,
	Peter Zijlstra, Chunming Zhou, Michel Dänzer, Sumit Semwal,
	linux-media, linaro-mm-sig, Harish Kasiviswanathan, Alex Xie,
	Zhang, Jerry, Felix Kuehling, amd-gfx

On Thu, Nov 30, 2017 at 07:23:02PM -0500, Lyude Paul wrote:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
> 
> DRI_PRIME=1 glxinfo
> 
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
> 
>     =============================
>     WARNING: suspicious RCU usage
>     4.14.3Lyude-Test+ #2 Not tainted
>     -----------------------------
>     ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
> 
>     other info that might help us debug this:
> 
>     rcu_scheduler_active = 2, debug_locks = 1
>     1 lock held by ext_image_dma_b/27451:
>      #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
> 
>     stack backtrace:
>     CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
>     Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
>     Call Trace:
>      dump_stack+0x8e/0xce
>      lockdep_rcu_suspicious+0xc5/0x100
>      reservation_object_copy_fences+0x292/0x2b0
>      ? ttm_bo_unref+0x9f/0x3c0 [ttm]
>      ttm_bo_unref+0xbd/0x3c0 [ttm]
>      amdgpu_bo_unref+0x2a/0x50 [amdgpu]
>      amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
>      drm_gem_object_free+0x1f/0x40 [drm]
>      drm_gem_object_put_unlocked+0x40/0xb0 [drm]
>      drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
>      drm_gem_object_release_handle+0x51/0x90 [drm]
>      drm_gem_handle_delete+0x5e/0x90 [drm]
>      ? drm_gem_handle_create+0x40/0x40 [drm]
>      drm_gem_close_ioctl+0x20/0x30 [drm]
>      drm_ioctl_kernel+0x5d/0xb0 [drm]
>      drm_ioctl+0x2f7/0x3b0 [drm]
>      ? drm_gem_handle_create+0x40/0x40 [drm]
>      ? trace_hardirqs_on_caller+0xf4/0x190
>      ? trace_hardirqs_on+0xd/0x10
>      amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>      do_vfs_ioctl+0x93/0x670
>      ? __fget+0x108/0x1f0
>      SyS_ioctl+0x79/0x90
>      entry_SYSCALL_64_fastpath+0x23/0xc2
> 
> I've also added the relevant fixes for the issue mentioned above.
> 
> Christian König (3):
>   drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
>   dma-buf: make reservation_object_copy_fences rcu save
>   drm/amdgpu: reserve root PD while releasing it
> 
> Michel Dänzer (1):
>   drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
@ 2017-12-04 11:45   ` Greg KH
  0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2017-12-04 11:45 UTC (permalink / raw)
  To: Lyude Paul
  Cc: stable, Alex Deucher, Sinclair Yeh, Christian König,
	David Airlie, linux-kernel, Nicolai Hähnle, dri-devel,
	Peter Zijlstra, Chunming Zhou, Michel Dänzer, Sumit Semwal,
	linux-media, linaro-mm-sig, Harish Kasiviswanathan, Alex Xie,
	Zhang, Jerry, Felix Kuehling, amd-gfx

On Thu, Nov 30, 2017 at 07:23:02PM -0500, Lyude Paul wrote:
> I haven't gone to see where it started, but as of late a good number of
> pretty nasty deadlock issues have appeared with the kernel. Easy
> reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:
> 
> DRI_PRIME=1 glxinfo
> 
> Additionally, some more race conditions exist that I've managed to
> trigger with piglit and lockdep enabled after applying these patches:
> 
>     =============================
>     WARNING: suspicious RCU usage
>     4.14.3Lyude-Test+ #2 Not tainted
>     -----------------------------
>     ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!
> 
>     other info that might help us debug this:
> 
>     rcu_scheduler_active = 2, debug_locks = 1
>     1 lock held by ext_image_dma_b/27451:
>      #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]
> 
>     stack backtrace:
>     CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
>     Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
>     Call Trace:
>      dump_stack+0x8e/0xce
>      lockdep_rcu_suspicious+0xc5/0x100
>      reservation_object_copy_fences+0x292/0x2b0
>      ? ttm_bo_unref+0x9f/0x3c0 [ttm]
>      ttm_bo_unref+0xbd/0x3c0 [ttm]
>      amdgpu_bo_unref+0x2a/0x50 [amdgpu]
>      amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
>      drm_gem_object_free+0x1f/0x40 [drm]
>      drm_gem_object_put_unlocked+0x40/0xb0 [drm]
>      drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
>      drm_gem_object_release_handle+0x51/0x90 [drm]
>      drm_gem_handle_delete+0x5e/0x90 [drm]
>      ? drm_gem_handle_create+0x40/0x40 [drm]
>      drm_gem_close_ioctl+0x20/0x30 [drm]
>      drm_ioctl_kernel+0x5d/0xb0 [drm]
>      drm_ioctl+0x2f7/0x3b0 [drm]
>      ? drm_gem_handle_create+0x40/0x40 [drm]
>      ? trace_hardirqs_on_caller+0xf4/0x190
>      ? trace_hardirqs_on+0xd/0x10
>      amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>      do_vfs_ioctl+0x93/0x670
>      ? __fget+0x108/0x1f0
>      SyS_ioctl+0x79/0x90
>      entry_SYSCALL_64_fastpath+0x23/0xc2
> 
> I've also added the relevant fixes for the issue mentioned above.
> 
> Christian K�nig (3):
>   drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
>   dma-buf: make reservation_object_copy_fences rcu save
>   drm/amdgpu: reserve root PD while releasing it
> 
> Michel D�nzer (1):
>   drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-12-04 11:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01  0:23 [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
2017-12-01  0:23 ` Lyude Paul
2017-12-01  0:23 ` [PATCH 1/4] drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  0:23 ` [PATCH 3/4] drm/amdgpu: reserve root PD while releasing it Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  0:23 ` [PATCH 4/4] drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
2017-12-04 11:45 ` Greg KH
2017-12-04 11:45   ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.