All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: fix a potential deadlock in amdgpu_bo_create_restricted()
@ 2017-02-08 23:35 Samuel Pitoiset
       [not found] ` <20170208233518.1334-1-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Samuel Pitoiset @ 2017-02-08 23:35 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Samuel Pitoiset

When ttm_bo_init() fails, the reservation mutex should be unlocked.

In debug build, the kernel reported "possible recursive locking
detected" in this codepath. For debugging purposes, I also added
a "WARN_ON(ww_mutex_is_locked())" when ttm_bo_init() fails and the
mutex was locked as expected.

This should fix (random) GPU hangs. The easy way to reproduce the
issue is to change the "Super Sampling" option from 1.0 to 2.0 in
Hitman. It will create a huge buffer, evict a bunch of buffers
(around ~5k) and deadlock.

This regression has been introduced pretty recently.

Fixes: "drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
---

Here's the report: https://hastebin.com/durodivoma.xml

 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d1ef1d064de4..531e16ce256e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -403,8 +403,10 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
 			&bo->placement, page_align, !kernel, NULL,
 			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
 			&amdgpu_ttm_bo_destroy);
-	if (unlikely(r != 0))
+	if (unlikely(r != 0)) {
+		ww_mutex_unlock(&bo->tbo.resv->lock);
 		return r;
+	}
 
 	bo->tbo.priority = ilog2(bo->tbo.num_pages);
 	if (kernel)
-- 
2.11.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] drm/amdgpu: report the number of bytes moved at buffer creation
       [not found] ` <20170208233518.1334-1-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-02-08 23:35   ` Samuel Pitoiset
       [not found]     ` <20170208233518.1334-2-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-09  0:53   ` [PATCH 1/2] drm/amdgpu: fix a potential deadlock in amdgpu_bo_create_restricted() Michel Dänzer
  1 sibling, 1 reply; 5+ messages in thread
From: Samuel Pitoiset @ 2017-02-08 23:35 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Samuel Pitoiset

Like ttm_bo_validate(), ttm_bo_init() might need to move BO and
the number of bytes moved by TTM should be reported. This can help
the throttle buffer migration mechanism to make a better decision.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++++++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 402a8954c6d8..5227e4d1d5db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1720,6 +1720,7 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data);
 int amdgpu_cs_get_ring(struct amdgpu_device *adev, u32 ip_type,
 		       u32 ip_instance, u32 ring,
 		       struct amdgpu_ring **out_ring);
+void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes);
 void amdgpu_ttm_placement_from_domain(struct amdgpu_bo *abo, u32 domain);
 bool amdgpu_ttm_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
 int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 6e948e4986ec..dade2fa9593a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -351,8 +351,7 @@ static u64 amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev)
  * submission. This can result in a debt that can stop buffer migrations
  * temporarily.
  */
-static void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev,
-					 u64 num_bytes)
+void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes)
 {
 	spin_lock(&adev->mm_stats.lock);
 	adev->mm_stats.accum_us -= bytes_to_us(adev, num_bytes);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 531e16ce256e..28c4615a6128 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -323,6 +323,7 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
 	struct amdgpu_bo *bo;
 	enum ttm_bo_type type;
 	unsigned long page_align;
+	u64 initial_bytes_moved;
 	size_t acc_size;
 	int r;
 
@@ -399,10 +400,15 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
 		locked = ww_mutex_trylock(&bo->tbo.ttm_resv.lock);
 		WARN_ON(!locked);
 	}
+
+	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
 	r = ttm_bo_init(&adev->mman.bdev, &bo->tbo, size, type,
 			&bo->placement, page_align, !kernel, NULL,
 			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
 			&amdgpu_ttm_bo_destroy);
+	amdgpu_cs_report_moved_bytes(adev,
+		initial_bytes_moved - atomic64_read(&adev->num_bytes_moved));
+
 	if (unlikely(r != 0)) {
 		ww_mutex_unlock(&bo->tbo.resv->lock);
 		return r;
-- 
2.11.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: fix a potential deadlock in amdgpu_bo_create_restricted()
       [not found] ` <20170208233518.1334-1-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-02-08 23:35   ` [PATCH 2/2] drm/amdgpu: report the number of bytes moved at buffer creation Samuel Pitoiset
@ 2017-02-09  0:53   ` Michel Dänzer
  1 sibling, 0 replies; 5+ messages in thread
From: Michel Dänzer @ 2017-02-09  0:53 UTC (permalink / raw)
  To: Samuel Pitoiset; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 09/02/17 08:35 AM, Samuel Pitoiset wrote:
> When ttm_bo_init() fails, the reservation mutex should be unlocked.
> 
> In debug build, the kernel reported "possible recursive locking
> detected" in this codepath. For debugging purposes, I also added
> a "WARN_ON(ww_mutex_is_locked())" when ttm_bo_init() fails and the
> mutex was locked as expected.
> 
> This should fix (random) GPU hangs. The easy way to reproduce the
> issue is to change the "Super Sampling" option from 1.0 to 2.0 in
> Hitman. It will create a huge buffer, evict a bunch of buffers
> (around ~5k) and deadlock.
> 
> This regression has been introduced pretty recently.
> 
> Fixes: "drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)"

Please add at least the first ~12 characters of the commit hash
(f45dc74c93241ad0125fbc08c48b2ebe20f2f472) to the Fixes tag.


> @@ -403,8 +403,10 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>  			&bo->placement, page_align, !kernel, NULL,
>  			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
>  			&amdgpu_ttm_bo_destroy);
> -	if (unlikely(r != 0))
> +	if (unlikely(r != 0)) {
> +		ww_mutex_unlock(&bo->tbo.resv->lock);
>  		return r;
> +	}

This must only be done if (!resv).


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: report the number of bytes moved at buffer creation
       [not found]     ` <20170208233518.1334-2-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-02-09  0:58       ` Michel Dänzer
       [not found]         ` <46a014c2-9fff-ba34-3508-b0444aa49354-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Michel Dänzer @ 2017-02-09  0:58 UTC (permalink / raw)
  To: Samuel Pitoiset; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 09/02/17 08:35 AM, Samuel Pitoiset wrote:
> Like ttm_bo_validate(), ttm_bo_init() might need to move BO and
> the number of bytes moved by TTM should be reported. This can help
> the throttle buffer migration mechanism to make a better decision.
> 
> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

[...]

> @@ -399,10 +400,15 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>  		locked = ww_mutex_trylock(&bo->tbo.ttm_resv.lock);
>  		WARN_ON(!locked);
>  	}
> +
> +	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
>  	r = ttm_bo_init(&adev->mman.bdev, &bo->tbo, size, type,
>  			&bo->placement, page_align, !kernel, NULL,
>  			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
>  			&amdgpu_ttm_bo_destroy);
> +	amdgpu_cs_report_moved_bytes(adev,
> +		initial_bytes_moved - atomic64_read(&adev->num_bytes_moved));

This looks backwards, should be 

		atomic64_read(&adev->num_bytes_moved) - initial_bytes_moved);

?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: report the number of bytes moved at buffer creation
       [not found]         ` <46a014c2-9fff-ba34-3508-b0444aa49354-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-02-09 10:26           ` Samuel Pitoiset
  0 siblings, 0 replies; 5+ messages in thread
From: Samuel Pitoiset @ 2017-02-09 10:26 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 02/09/2017 01:58 AM, Michel Dänzer wrote:
> On 09/02/17 08:35 AM, Samuel Pitoiset wrote:
>> Like ttm_bo_validate(), ttm_bo_init() might need to move BO and
>> the number of bytes moved by TTM should be reported. This can help
>> the throttle buffer migration mechanism to make a better decision.
>>
>> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
>
> [...]
>
>> @@ -399,10 +400,15 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>>  		locked = ww_mutex_trylock(&bo->tbo.ttm_resv.lock);
>>  		WARN_ON(!locked);
>>  	}
>> +
>> +	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
>>  	r = ttm_bo_init(&adev->mman.bdev, &bo->tbo, size, type,
>>  			&bo->placement, page_align, !kernel, NULL,
>>  			acc_size, sg, resv ? resv : &bo->tbo.ttm_resv,
>>  			&amdgpu_ttm_bo_destroy);
>> +	amdgpu_cs_report_moved_bytes(adev,
>> +		initial_bytes_moved - atomic64_read(&adev->num_bytes_moved));
>
> This looks backwards, should be
>
> 		atomic64_read(&adev->num_bytes_moved) - initial_bytes_moved);
>
> ?

My mistake, thanks for noticing.

>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-09 10:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-08 23:35 [PATCH 1/2] drm/amdgpu: fix a potential deadlock in amdgpu_bo_create_restricted() Samuel Pitoiset
     [not found] ` <20170208233518.1334-1-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-08 23:35   ` [PATCH 2/2] drm/amdgpu: report the number of bytes moved at buffer creation Samuel Pitoiset
     [not found]     ` <20170208233518.1334-2-samuel.pitoiset-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-09  0:58       ` Michel Dänzer
     [not found]         ` <46a014c2-9fff-ba34-3508-b0444aa49354-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-02-09 10:26           ` Samuel Pitoiset
2017-02-09  0:53   ` [PATCH 1/2] drm/amdgpu: fix a potential deadlock in amdgpu_bo_create_restricted() Michel Dänzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.