linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves
@ 2023-06-26  9:14 Thomas Hellström
  2023-06-26  9:14 ` [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26  9:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, dri-devel, intel-gfx, linux-kernel,
	Christian König, Christian König, Andi Shyti

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A couple of ttm fixes for issues that either were hit while developing the
xe driver or, for the resource leak patches, discovered during code
inspection.

v2:
- Avoid a goto in patch 3 (Andi Shyti)
- Add some RB's

Thomas Hellström (4):
  drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
  drm/ttm: Don't shadow the operation context
  drm/ttm: Don't leak a resource on eviction error
  drm/ttm: Don't leak a resource on swapout move error

 drivers/gpu/drm/ttm/ttm_bo.c       | 26 +++++++++++++-------------
 drivers/gpu/drm/ttm/ttm_resource.c |  2 ++
 2 files changed, 15 insertions(+), 13 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
  2023-06-26  9:14 [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
@ 2023-06-26  9:14 ` Thomas Hellström
  2023-06-26 10:21   ` Christian König
  2023-06-26  9:14 ` [PATCH v2 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26  9:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Christian König, Daniel Vetter, dri-devel, stable,
	intel-gfx, linux-kernel, Andi Shyti

The value of pos->first was not updated when the first resource of the
range was moved. This could lead to errors like the one below.
Fix this by updating pos->first when needed.

<3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
<3> [218.963546]
<3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
<3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
<3> [218.963841] Call Trace:
<3> [218.963881]  <TASK>
<3> [218.963915]  dump_stack_lvl+0x64/0xb0
<3> [218.963976]  print_report+0x3e5/0x600
<3> [218.964036]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964127]  kasan_report+0x96/0xc0
<3> [218.964183]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964276]  ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
<3> [218.964365]  ttm_bo_set_bulk_move+0x92/0x140 [ttm]
<3> [218.964454]  xe_gem_object_close+0xc8/0x120 [xe]
<3> [218.964675]  ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
<3> [218.964908]  ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
<3> [218.965071]  drm_gem_object_release_handle+0x45/0x80 [drm]
<3> [218.965220]  ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
<3> [218.965381]  idr_for_each+0xc9/0x180
<3> [218.965437]  ? __pfx_idr_for_each+0x10/0x10
<3> [218.965504]  drm_gem_release+0x20/0x30 [drm]
<3> [218.965637]  drm_file_free.part.0+0x4cb/0x4f0 [drm]
<3> [218.965778]  ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
<3> [218.965921]  drm_release_noglobal+0x49/0x90 [drm]
<3> [218.966061]  __fput+0x122/0x450
<3> [218.966115]  task_work_run+0xfe/0x190
<3> [218.966175]  ? __pfx_task_work_run+0x10/0x10
<3> [218.966239]  ? do_raw_spin_unlock+0xa7/0x140
<3> [218.966308]  do_exit+0x55f/0x1430
<3> [218.966364]  ? __pfx_lock_release+0x10/0x10
<3> [218.966431]  ? do_raw_spin_lock+0x11d/0x1e0
<3> [218.966498]  ? __pfx_do_exit+0x10/0x10
<3> [218.966554]  ? __pfx_do_raw_spin_lock+0x10/0x10
<3> [218.966625]  ? mark_held_locks+0x24/0x90
<3> [218.966688]  ? lockdep_hardirqs_on_prepare+0x136/0x210
<3> [218.966768]  do_group_exit+0x68/0x110
<3> [218.966828]  __x64_sys_exit_group+0x2c/0x30
<3> [218.966896]  do_syscall_64+0x3c/0x90
<3> [218.966955]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
<3> [218.967035] RIP: 0033:0x7f77b194f146
<3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
<3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
<3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
<3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
<3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
<3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
<3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
<3> [218.967745]  </TASK>

Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.19+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 7333f7a87a2f..cb05e0a36576 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
 				       struct ttm_resource *res)
 {
 	if (pos->last != res) {
+		if (pos->first == res)
+			pos->first = list_next_entry(res, lru);
 		list_move(&res->lru, &pos->last->lru);
 		pos->last = res;
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/4] drm/ttm: Don't shadow the operation context
  2023-06-26  9:14 [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
  2023-06-26  9:14 ` [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
@ 2023-06-26  9:14 ` Thomas Hellström
  2023-06-26 15:15   ` Christian König
  2023-06-26  9:14 ` [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26  9:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König, Roger He, dri-devel,
	intel-gfx, stable, Matthew Brost, linux-kernel,
	Christian König, Andi Shyti

ttm_bo_swapout() shadows the ttm operation context which may cause
major confusion in driver callbacks when swapping out !TTM_PL_SYSTEM
memory. Fix this by reusing the operation context argument to
ttm_bo_swapout().

Cc: "Christian König" <christian.koenig@amd.com>
Cc: Roger He <Hongbo.He@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <intel-gfx@lists.freedesktop.org>
Cc: <stable@vger.kernel.org> # v4.16+
Fixes: dc947770cf34 ("drm/ttm: enable swapout for reserved BOs during allocation")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bd5dae4d1624..615d30c4262d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1154,7 +1154,6 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	 * Move to system cached
 	 */
 	if (bo->resource->mem_type != TTM_PL_SYSTEM) {
-		struct ttm_operation_ctx ctx = { false, false };
 		struct ttm_resource *evict_mem;
 		struct ttm_place hop;
 
@@ -1164,7 +1163,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 		if (unlikely(ret))
 			goto out;
 
-		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, &ctx, &hop);
+		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
 		if (unlikely(ret != 0)) {
 			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
 			goto out;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-26  9:14 [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
  2023-06-26  9:14 ` [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
  2023-06-26  9:14 ` [PATCH v2 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
@ 2023-06-26  9:14 ` Thomas Hellström
  2023-06-26 11:24   ` Andi Shyti
  2023-06-26 11:32   ` Christian König
  2023-06-26  9:14 ` [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
  2023-06-27  9:05 ` [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
  4 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26  9:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Andrey Grodzovsky, Christian König,
	Huang Rui, dri-devel, stable, Nirmoy Das, intel-gfx,
	linux-kernel, Christian König, Andi Shyti

On eviction errors other than -EMULTIHOP we were leaking a resource.
Fix.

v2:
- Avoid yet another goto (Andi Shyti)

Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1
---
 drivers/gpu/drm/ttm/ttm_bo.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 615d30c4262d..c0e3bbd21d3d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -458,18 +458,18 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 		goto out;
 	}
 
-bounce:
-	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
-	if (ret == -EMULTIHOP) {
+	do {
+		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
+		if (ret != -EMULTIHOP)
+			break;
+
 		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
-		if (ret) {
-			if (ret != -ERESTARTSYS && ret != -EINTR)
-				pr_err("Buffer eviction failed\n");
-			ttm_resource_free(bo, &evict_mem);
-			goto out;
-		}
-		/* try and move to final place now. */
-		goto bounce;
+	} while (!ret);
+
+	if (ret) {
+		ttm_resource_free(bo, &evict_mem);
+		if (ret != -ERESTARTSYS && ret != -EINTR)
+			pr_err("Buffer eviction failed\n");
 	}
 out:
 	return ret;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error
  2023-06-26  9:14 [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
                   ` (2 preceding siblings ...)
  2023-06-26  9:14 ` [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
@ 2023-06-26  9:14 ` Thomas Hellström
  2023-06-26 11:33   ` Christian König
  2023-06-27  9:05 ` [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26  9:14 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Christian König, dri-devel, stable, Nirmoy Das, Andi Shyti,
	intel-gfx, linux-kernel

If moving the bo to system for swapout failed, we were leaking
a resource. Fix.

Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
Cc: Christian König <christian.koenig@amd.com>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.14+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c0e3bbd21d3d..d9a8f227f310 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
 		if (unlikely(ret != 0)) {
 			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
+			ttm_resource_free(bo, &evict_mem);
 			goto out;
 		}
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
  2023-06-26  9:14 ` [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
@ 2023-06-26 10:21   ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2023-06-26 10:21 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Christian König, Daniel Vetter, dri-devel, stable,
	intel-gfx, linux-kernel, Andi Shyti

I've already pushed the version from Teddy to drm-misc-fixes last week.

So no need for that one any more.

Christian.

Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> The value of pos->first was not updated when the first resource of the
> range was moved. This could lead to errors like the one below.
> Fix this by updating pos->first when needed.
>
> <3> [218.963342] BUG: KASAN: null-ptr-deref in ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.963456] Read of size 8 at addr 0000000000000038 by task xe_evict/1529
> <3> [218.963546]
> <3> [218.963566] CPU: 0 PID: 1529 Comm: xe_evict Not tainted 6.3.0-xe #1
> <3> [218.963664] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake H DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.4064.A00.2102041619 02/04/2021
> <3> [218.963841] Call Trace:
> <3> [218.963881]  <TASK>
> <3> [218.963915]  dump_stack_lvl+0x64/0xb0
> <3> [218.963976]  print_report+0x3e5/0x600
> <3> [218.964036]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964127]  kasan_report+0x96/0xc0
> <3> [218.964183]  ? ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964276]  ttm_lru_bulk_move_del+0xc5/0x180 [ttm]
> <3> [218.964365]  ttm_bo_set_bulk_move+0x92/0x140 [ttm]
> <3> [218.964454]  xe_gem_object_close+0xc8/0x120 [xe]
> <3> [218.964675]  ? __pfx_xe_gem_object_close+0x10/0x10 [xe]
> <3> [218.964908]  ? drm_gem_object_handle_put_unlocked+0xc7/0x170 [drm]
> <3> [218.965071]  drm_gem_object_release_handle+0x45/0x80 [drm]
> <3> [218.965220]  ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
> <3> [218.965381]  idr_for_each+0xc9/0x180
> <3> [218.965437]  ? __pfx_idr_for_each+0x10/0x10
> <3> [218.965504]  drm_gem_release+0x20/0x30 [drm]
> <3> [218.965637]  drm_file_free.part.0+0x4cb/0x4f0 [drm]
> <3> [218.965778]  ? drm_close_helper.isra.0+0xb7/0xe0 [drm]
> <3> [218.965921]  drm_release_noglobal+0x49/0x90 [drm]
> <3> [218.966061]  __fput+0x122/0x450
> <3> [218.966115]  task_work_run+0xfe/0x190
> <3> [218.966175]  ? __pfx_task_work_run+0x10/0x10
> <3> [218.966239]  ? do_raw_spin_unlock+0xa7/0x140
> <3> [218.966308]  do_exit+0x55f/0x1430
> <3> [218.966364]  ? __pfx_lock_release+0x10/0x10
> <3> [218.966431]  ? do_raw_spin_lock+0x11d/0x1e0
> <3> [218.966498]  ? __pfx_do_exit+0x10/0x10
> <3> [218.966554]  ? __pfx_do_raw_spin_lock+0x10/0x10
> <3> [218.966625]  ? mark_held_locks+0x24/0x90
> <3> [218.966688]  ? lockdep_hardirqs_on_prepare+0x136/0x210
> <3> [218.966768]  do_group_exit+0x68/0x110
> <3> [218.966828]  __x64_sys_exit_group+0x2c/0x30
> <3> [218.966896]  do_syscall_64+0x3c/0x90
> <3> [218.966955]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> <3> [218.967035] RIP: 0033:0x7f77b194f146
> <3> [218.967094] Code: Unable to access opcode bytes at 0x7f77b194f11c.
> <3> [218.967174] RSP: 002b:00007ffc64791188 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> <3> [218.967271] RAX: ffffffffffffffda RBX: 00007f77b1a548a0 RCX: 00007f77b194f146
> <3> [218.967364] RDX: 0000000000000062 RSI: 000000000000003c RDI: 0000000000000062
> <3> [218.967458] RBP: 0000000000000062 R08: 00000000000000e7 R09: ffffffffffffff78
> <3> [218.967553] R10: 0000000000000058 R11: 0000000000000246 R12: 00007f77b1a548a0
> <3> [218.967648] R13: 0000000000000003 R14: 00007f77b1a5d2e8 R15: 0000000000000000
> <3> [218.967745]  </TASK>
>
> Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.19+
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/411
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_resource.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 7333f7a87a2f..cb05e0a36576 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
>   				       struct ttm_resource *res)
>   {
>   	if (pos->last != res) {
> +		if (pos->first == res)
> +			pos->first = list_next_entry(res, lru);
>   		list_move(&res->lru, &pos->last->lru);
>   		pos->last = res;
>   	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-26  9:14 ` [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
@ 2023-06-26 11:24   ` Andi Shyti
  2023-06-26 11:32   ` Christian König
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Shyti @ 2023-06-26 11:24 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Andrey Grodzovsky, Christian König, Huang Rui,
	dri-devel, stable, Nirmoy Das, intel-gfx, linux-kernel,
	Christian König, Andi Shyti

Hi Thomas,

On Mon, Jun 26, 2023 at 11:14:49AM +0200, Thomas Hellström wrote:
> On eviction errors other than -EMULTIHOP we were leaking a resource.
> Fix.
> 
> v2:
> - Avoid yet another goto (Andi Shyti)
> 
> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.15+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> 

Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error
  2023-06-26  9:14 ` [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
  2023-06-26 11:24   ` Andi Shyti
@ 2023-06-26 11:32   ` Christian König
  1 sibling, 0 replies; 14+ messages in thread
From: Christian König @ 2023-06-26 11:32 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Andrey Grodzovsky, Christian König, Huang Rui, dri-devel,
	stable, Nirmoy Das, intel-gfx, linux-kernel, Andi Shyti

Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> On eviction errors other than -EMULTIHOP we were leaking a resource.
> Fix.
>
> v2:
> - Avoid yet another goto (Andi Shyti)
>
> Fixes: 403797925768 ("drm/ttm: Fix multihop assert on eviction.")
> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.15+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 615d30c4262d..c0e3bbd21d3d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -458,18 +458,18 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   		goto out;
>   	}
>   
> -bounce:
> -	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> -	if (ret == -EMULTIHOP) {
> +	do {
> +		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> +		if (ret != -EMULTIHOP)
> +			break;
> +
>   		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
> -		if (ret) {
> -			if (ret != -ERESTARTSYS && ret != -EINTR)
> -				pr_err("Buffer eviction failed\n");
> -			ttm_resource_free(bo, &evict_mem);
> -			goto out;
> -		}
> -		/* try and move to final place now. */
> -		goto bounce;
> +	} while (!ret);
> +
> +	if (ret) {
> +		ttm_resource_free(bo, &evict_mem);
> +		if (ret != -ERESTARTSYS && ret != -EINTR)
> +			pr_err("Buffer eviction failed\n");
>   	}
>   out:
>   	return ret;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error
  2023-06-26  9:14 ` [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
@ 2023-06-26 11:33   ` Christian König
  2023-06-26 12:23     ` Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-06-26 11:33 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Christian König, dri-devel, stable, Nirmoy Das, Andi Shyti,
	intel-gfx, linux-kernel

Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> If moving the bo to system for swapout failed, we were leaking
> a resource. Fix.
>
> Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.14+
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index c0e3bbd21d3d..d9a8f227f310 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>   		if (unlikely(ret != 0)) {
>   			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
> +			ttm_resource_free(bo, &evict_mem);
>   			goto out;
>   		}
>   	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error
  2023-06-26 11:33   ` Christian König
@ 2023-06-26 12:23     ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26 12:23 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Christian König, dri-devel, stable, Nirmoy Das, Andi Shyti,
	intel-gfx, linux-kernel

Hi, Christian,

Will you take a look at 2/4 as well? Will you merge these?

Thanks,

Thomas


On 6/26/23 13:33, Christian König wrote:
> Am 26.06.23 um 11:14 schrieb Thomas Hellström:
>> If moving the bo to system for swapout failed, we were leaking
>> a resource. Fix.
>>
>> Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of 
>> embedding it v2")
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: <stable@vger.kernel.org> # v5.14+
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index c0e3bbd21d3d..d9a8f227f310 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -1166,6 +1166,7 @@ int ttm_bo_swapout(struct ttm_buffer_object 
>> *bo, struct ttm_operation_ctx *ctx,
>>           ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>>           if (unlikely(ret != 0)) {
>>               WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput 
>> - likely driver bug.\n");
>> +            ttm_resource_free(bo, &evict_mem);
>>               goto out;
>>           }
>>       }
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] drm/ttm: Don't shadow the operation context
  2023-06-26  9:14 ` [PATCH v2 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
@ 2023-06-26 15:15   ` Christian König
  2023-06-26 15:18     ` Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-06-26 15:15 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Christian König, Roger He, dri-devel, intel-gfx, stable,
	Matthew Brost, linux-kernel, Andi Shyti

Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> ttm_bo_swapout() shadows the ttm operation context which may cause
> major confusion in driver callbacks when swapping out !TTM_PL_SYSTEM
> memory. Fix this by reusing the operation context argument to
> ttm_bo_swapout().
>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Roger He <Hongbo.He@amd.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Cc: <intel-gfx@lists.freedesktop.org>
> Cc: <stable@vger.kernel.org> # v4.16+
> Fixes: dc947770cf34 ("drm/ttm: enable swapout for reserved BOs during allocation")
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Acked-by: Matthew Brost <matthew.brost@intel.com>

We intentionally didn't used the parameter here, but I absolutely can't 
figure out why.

Feel free to add my rb, but let's give it some time upstream before you 
base anything on top of this. Just in case we missed something.

Regards,
Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index bd5dae4d1624..615d30c4262d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1154,7 +1154,6 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   	 * Move to system cached
>   	 */
>   	if (bo->resource->mem_type != TTM_PL_SYSTEM) {
> -		struct ttm_operation_ctx ctx = { false, false };
>   		struct ttm_resource *evict_mem;
>   		struct ttm_place hop;
>   
> @@ -1164,7 +1163,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   		if (unlikely(ret))
>   			goto out;
>   
> -		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, &ctx, &hop);
> +		ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
>   		if (unlikely(ret != 0)) {
>   			WARN(ret == -EMULTIHOP, "Unexpected multihop in swaput - likely driver bug.\n");
>   			goto out;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] drm/ttm: Don't shadow the operation context
  2023-06-26 15:15   ` Christian König
@ 2023-06-26 15:18     ` Thomas Hellström
  2023-06-26 15:23       ` [Intel-xe] " Thomas Hellström
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26 15:18 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Christian König, Roger He, dri-devel, intel-gfx, stable,
	Matthew Brost, linux-kernel, Andi Shyti

On Mon, 2023-06-26 at 17:15 +0200, Christian König wrote:
> Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> > ttm_bo_swapout() shadows the ttm operation context which may cause
> > major confusion in driver callbacks when swapping out
> > !TTM_PL_SYSTEM
> > memory. Fix this by reusing the operation context argument to
> > ttm_bo_swapout().
> > 
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Roger He <Hongbo.He@amd.com>
> > Cc: <dri-devel@lists.freedesktop.org>
> > Cc: <intel-gfx@lists.freedesktop.org>
> > Cc: <stable@vger.kernel.org> # v4.16+
> > Fixes: dc947770cf34 ("drm/ttm: enable swapout for reserved BOs
> > during allocation")
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Acked-by: Matthew Brost <matthew.brost@intel.com>
> 
> We intentionally didn't used the parameter here, but I absolutely
> can't 
> figure out why.
> 
> Feel free to add my rb, but let's give it some time upstream before
> you 
> base anything on top of this. Just in case we missed something.

Sure. Thanks for reviewing,
/Thomas

> 
> Regards,
> Christian.
> 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > b/drivers/gpu/drm/ttm/ttm_bo.c
> > index bd5dae4d1624..615d30c4262d 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1154,7 +1154,6 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > *bo, struct ttm_operation_ctx *ctx,
> >          * Move to system cached
> >          */
> >         if (bo->resource->mem_type != TTM_PL_SYSTEM) {
> > -               struct ttm_operation_ctx ctx = { false, false };
> >                 struct ttm_resource *evict_mem;
> >                 struct ttm_place hop;
> >   
> > @@ -1164,7 +1163,7 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > *bo, struct ttm_operation_ctx *ctx,
> >                 if (unlikely(ret))
> >                         goto out;
> >   
> > -               ret = ttm_bo_handle_move_mem(bo, evict_mem, true,
> > &ctx, &hop);
> > +               ret = ttm_bo_handle_move_mem(bo, evict_mem, true,
> > ctx, &hop);
> >                 if (unlikely(ret != 0)) {
> >                         WARN(ret == -EMULTIHOP, "Unexpected
> > multihop in swaput - likely driver bug.\n");
> >                         goto out;
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 2/4] drm/ttm: Don't shadow the operation context
  2023-06-26 15:18     ` Thomas Hellström
@ 2023-06-26 15:23       ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-26 15:23 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: intel-gfx, linux-kernel, stable, Roger He, dri-devel,
	Christian König

On Mon, 2023-06-26 at 17:18 +0200, Thomas Hellström wrote:
> On Mon, 2023-06-26 at 17:15 +0200, Christian König wrote:
> > Am 26.06.23 um 11:14 schrieb Thomas Hellström:
> > > ttm_bo_swapout() shadows the ttm operation context which may
> > > cause
> > > major confusion in driver callbacks when swapping out
> > > !TTM_PL_SYSTEM
> > > memory. Fix this by reusing the operation context argument to
> > > ttm_bo_swapout().
> > > 
> > > Cc: "Christian König" <christian.koenig@amd.com>
> > > Cc: Roger He <Hongbo.He@amd.com>
> > > Cc: <dri-devel@lists.freedesktop.org>
> > > Cc: <intel-gfx@lists.freedesktop.org>
> > > Cc: <stable@vger.kernel.org> # v4.16+
> > > Fixes: dc947770cf34 ("drm/ttm: enable swapout for reserved BOs
> > > during allocation")
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > Acked-by: Matthew Brost <matthew.brost@intel.com>
> > 
> > We intentionally didn't used the parameter here, but I absolutely
> > can't 
> > figure out why.
> > 
> > Feel free to add my rb, but let's give it some time upstream before
> > you 
> > base anything on top of this. Just in case we missed something.
> 
> Sure. Thanks for reviewing,

BTW, I'll remove the Fixes: tag as well.

/Thomas


> /Thomas
> 
> > 
> > Regards,
> > Christian.
> > 
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_bo.c | 3 +--
> > >   1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > > b/drivers/gpu/drm/ttm/ttm_bo.c
> > > index bd5dae4d1624..615d30c4262d 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > @@ -1154,7 +1154,6 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > > *bo, struct ttm_operation_ctx *ctx,
> > >          * Move to system cached
> > >          */
> > >         if (bo->resource->mem_type != TTM_PL_SYSTEM) {
> > > -               struct ttm_operation_ctx ctx = { false, false };
> > >                 struct ttm_resource *evict_mem;
> > >                 struct ttm_place hop;
> > >   
> > > @@ -1164,7 +1163,7 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > > *bo, struct ttm_operation_ctx *ctx,
> > >                 if (unlikely(ret))
> > >                         goto out;
> > >   
> > > -               ret = ttm_bo_handle_move_mem(bo, evict_mem, true,
> > > &ctx, &hop);
> > > +               ret = ttm_bo_handle_move_mem(bo, evict_mem, true,
> > > ctx, &hop);
> > >                 if (unlikely(ret != 0)) {
> > >                         WARN(ret == -EMULTIHOP, "Unexpected
> > > multihop in swaput - likely driver bug.\n");
> > >                         goto out;
> > 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves
  2023-06-26  9:14 [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
                   ` (3 preceding siblings ...)
  2023-06-26  9:14 ` [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
@ 2023-06-27  9:05 ` Thomas Hellström
  4 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2023-06-27  9:05 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, intel-gfx, linux-kernel, Christian König,
	Christian König, Andi Shyti


On 6/26/23 11:14, Thomas Hellström wrote:
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> A couple of ttm fixes for issues that either were hit while developing the
> xe driver or, for the resource leak patches, discovered during code
> inspection.
>
> v2:
> - Avoid a goto in patch 3 (Andi Shyti)
> - Add some RB's
>
> Thomas Hellström (4):
>    drm/ttm: Fix ttm_lru_bulk_move_pos_tail()
>    drm/ttm: Don't shadow the operation context
>    drm/ttm: Don't leak a resource on eviction error
>    drm/ttm: Don't leak a resource on swapout move error
>
>   drivers/gpu/drm/ttm/ttm_bo.c       | 26 +++++++++++++-------------
>   drivers/gpu/drm/ttm/ttm_resource.c |  2 ++
>   2 files changed, 15 insertions(+), 13 deletions(-)
>
Pushed 2/4 to drm-misc-next, 3/4 & 4/4 to drm-misc-fixes.

/Thomas



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-27  9:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-26  9:14 [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström
2023-06-26  9:14 ` [PATCH v2 1/4] drm/ttm: Fix ttm_lru_bulk_move_pos_tail() Thomas Hellström
2023-06-26 10:21   ` Christian König
2023-06-26  9:14 ` [PATCH v2 2/4] drm/ttm: Don't shadow the operation context Thomas Hellström
2023-06-26 15:15   ` Christian König
2023-06-26 15:18     ` Thomas Hellström
2023-06-26 15:23       ` [Intel-xe] " Thomas Hellström
2023-06-26  9:14 ` [PATCH v2 3/4] drm/ttm: Don't leak a resource on eviction error Thomas Hellström
2023-06-26 11:24   ` Andi Shyti
2023-06-26 11:32   ` Christian König
2023-06-26  9:14 ` [PATCH v2 4/4] drm/ttm: Don't leak a resource on swapout move error Thomas Hellström
2023-06-26 11:33   ` Christian König
2023-06-26 12:23     ` Thomas Hellström
2023-06-27  9:05 ` [PATCH v2 0/4] drm/ttm: Fixes around resources and bulk moves Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).