dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
@ 2022-04-11  8:56 Matthew Auld
  2022-04-11  9:56 ` Thomas Hellström
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Matthew Auld @ 2022-04-11  8:56 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Lucas De Marchi, Christian König,
	dri-devel, Nirmoy Das

If we hit the sync case, like when skipping clearing for kernel internal
objects, or when falling back to cpu clearing, like in i915, we end up
trying to add a NULL fence, but with some recent changes in this area
this now just results in NULL deref in dma_resv_add_fence:

<1>[    5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008
<1>[    5.466384] #PF: supervisor read access in kernel mode
<1>[    5.466385] #PF: error_code(0x0000) - not-present page
<6>[    5.466386] PGD 0 P4D 0
<4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1
<4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
<4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82
<4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
<4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff
<4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087
<4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001
<4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000
<4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600
<4>[    5.466401] FS:  00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000
<4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0
<4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[    5.466404] Call Trace:
<4>[    5.466405]  <TASK>
<4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
<4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
<4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
<4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
<4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm]
<4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
<4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
<4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
<4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
<4>[    5.467094]  ____i915_gem_object_get_pages+0x32/0xf0 [i915]
<4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
<4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
<4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
<4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
<4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
<4>[    5.467704]  intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915]
<4>[    5.467833]  intel_crtc_initial_plane_config+0x83/0x340 [i915]

In the ttm_bo_move_sync_cleanup() case it seems we only really care
about calling ttm_bo_wait_free_node(), so let's instead just call that
directly.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
 include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index bc5190340b9c..1cbfb00c1d65 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
 
+void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
+			      struct ttm_resource *new_mem)
+{
+	struct ttm_device *bdev = bo->bdev;
+	struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type);
+	int ret;
+
+	ret = ttm_bo_wait_free_node(bo, man->use_tt);
+	if (WARN_ON(ret))
+		return;
+
+	ttm_bo_assign_mem(bo, new_mem);
+}
+EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
+
 /**
  * ttm_bo_pipeline_gutting - purge the contents of a bo
  * @bo: The buffer object
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 059a595e14e5..897b88f0bd59 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
 			      struct ttm_resource *new_mem);
 
 /**
- * ttm_bo_move_accel_cleanup.
+ * ttm_bo_move_sync_cleanup.
  *
  * @bo: A pointer to a struct ttm_buffer_object.
  * @new_mem: struct ttm_resource indicating where to move.
@@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
  * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed
  * by the caller to be idle. Typically used after memcpy buffer moves.
  */
-static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
-					    struct ttm_resource *new_mem)
-{
-	int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem);
-
-	WARN_ON(ret);
-}
+void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
+			      struct ttm_resource *new_mem);
 
 /**
  * ttm_bo_pipeline_gutting.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11  8:56 [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup Matthew Auld
@ 2022-04-11  9:56 ` Thomas Hellström
  2022-04-11 10:06   ` Matthew Auld
  2022-04-11 12:39 ` Christian König
  2022-04-11 20:05 ` Thomas Zimmermann
  2 siblings, 1 reply; 8+ messages in thread
From: Thomas Hellström @ 2022-04-11  9:56 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Nirmoy Das, Lucas De Marchi, Christian König, dri-devel

Hi, Matthew

On 4/11/22 10:56, Matthew Auld wrote:
> If we hit the sync case, like when skipping clearing for kernel internal
> objects, or when falling back to cpu clearing, like in i915, we end up
> trying to add a NULL fence, but with some recent changes in this area
> this now just results in NULL deref in dma_resv_add_fence:
>
> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008
> <1>[    5.466384] #PF: supervisor read access in kernel mode
> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
> <6>[    5.466386] PGD 0 P4D 0
> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1
> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82
> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff
> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087
> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001
> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000
> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600
> <4>[    5.466401] FS:  00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000
> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0
> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> <4>[    5.466404] Call Trace:
> <4>[    5.466405]  <TASK>
> <4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm]
> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
> <4>[    5.467094]  ____i915_gem_object_get_pages+0x32/0xf0 [i915]
> <4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
> <4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
> <4>[    5.467704]  intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915]
> <4>[    5.467833]  intel_crtc_initial_plane_config+0x83/0x340 [i915]
>
> In the ttm_bo_move_sync_cleanup() case it seems we only really care
> about calling ttm_bo_wait_free_node(), so let's instead just call that
> directly.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>   2 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index bc5190340b9c..1cbfb00c1d65 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>   }
>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>   
> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> +			      struct ttm_resource *new_mem)
> +{
> +	struct ttm_device *bdev = bo->bdev;
> +	struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type);
> +	int ret;
> +
> +	ret = ttm_bo_wait_free_node(bo, man->use_tt);
> +	if (WARN_ON(ret))
> +		return;
> +
> +	ttm_bo_assign_mem(bo, new_mem);
> +}
> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);

I don't think this will help in the case where we call 
ttm_bo_move_accel_cleanup() with a NULL fence....

Perhaps we need to fix it there.

/Thomas



> +
>   /**
>    * ttm_bo_pipeline_gutting - purge the contents of a bo
>    * @bo: The buffer object
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 059a595e14e5..897b88f0bd59 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>   			      struct ttm_resource *new_mem);
>   
>   /**
> - * ttm_bo_move_accel_cleanup.
> + * ttm_bo_move_sync_cleanup.
>    *
>    * @bo: A pointer to a struct ttm_buffer_object.
>    * @new_mem: struct ttm_resource indicating where to move.
> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>    * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed
>    * by the caller to be idle. Typically used after memcpy buffer moves.
>    */
> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> -					    struct ttm_resource *new_mem)
> -{
> -	int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem);
> -
> -	WARN_ON(ret);
> -}
> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> +			      struct ttm_resource *new_mem);
>   
>   /**
>    * ttm_bo_pipeline_gutting.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11  9:56 ` Thomas Hellström
@ 2022-04-11 10:06   ` Matthew Auld
  2022-04-11 10:45     ` Thomas Hellström
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Auld @ 2022-04-11 10:06 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx
  Cc: Nirmoy Das, Lucas De Marchi, Christian König, dri-devel

On 11/04/2022 10:56, Thomas Hellström wrote:
> Hi, Matthew
> 
> On 4/11/22 10:56, Matthew Auld wrote:
>> If we hit the sync case, like when skipping clearing for kernel internal
>> objects, or when falling back to cpu clearing, like in i915, we end up
>> trying to add a NULL fence, but with some recent changes in this area
>> this now just results in NULL deref in dma_resv_add_fence:
>>
>> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 
>> 0000000000000008
>> <1>[    5.466384] #PF: supervisor read access in kernel mode
>> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
>> <6>[    5.466386] PGD 0 P4D 0
>> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
>> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 
>> 5.18.0-rc2-CI-CI_DRM_11481+ #1
>> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
>> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 
>> 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 
>> 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 
>> a5 0a 82
>> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
>> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 
>> 00000000ffffffff
>> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: 
>> ffffffff8233f087
>> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 
>> 0000000000000001
>> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 
>> 0000000000000000
>> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: 
>> ffff88810a88d600
>> <4>[    5.466401] FS:  00007f5fa1193540(0000) 
>> GS:ffff88845d880000(0000) knlGS:0000000000000000
>> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 
>> 00000000003706e0
>> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> <4>[    5.466404] Call Trace:
>> <4>[    5.466405]  <TASK>
>> <4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
>> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
>> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
>> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
>> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm]
>> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
>> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
>> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
>> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
>> <4>[    5.467094]  ____i915_gem_object_get_pages+0x32/0xf0 [i915]
>> <4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
>> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
>> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
>> <4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
>> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
>> <4>[    5.467704]  intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 
>> [i915]
>> <4>[    5.467833]  intel_crtc_initial_plane_config+0x83/0x340 [i915]
>>
>> In the ttm_bo_move_sync_cleanup() case it seems we only really care
>> about calling ttm_bo_wait_free_node(), so let's instead just call that
>> directly.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>>   2 files changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index bc5190340b9c..1cbfb00c1d65 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>   }
>>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
>> +                  struct ttm_resource *new_mem)
>> +{
>> +    struct ttm_device *bdev = bo->bdev;
>> +    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
>> new_mem->mem_type);
>> +    int ret;
>> +
>> +    ret = ttm_bo_wait_free_node(bo, man->use_tt);
>> +    if (WARN_ON(ret))
>> +        return;
>> +
>> +    ttm_bo_assign_mem(bo, new_mem);
>> +}
>> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
> 
> I don't think this will help in the case where we call 
> ttm_bo_move_accel_cleanup() with a NULL fence....

Hmm, do you know if that case actually exists? I thought the only 
current user passing fence == NULL was ttm_bo_move_sync_cleanup().

> 
> Perhaps we need to fix it there.
> 
> /Thomas
> 
> 
> 
>> +
>>   /**
>>    * ttm_bo_pipeline_gutting - purge the contents of a bo
>>    * @bo: The buffer object
>> diff --git a/include/drm/ttm/ttm_bo_driver.h 
>> b/include/drm/ttm/ttm_bo_driver.h
>> index 059a595e14e5..897b88f0bd59 100644
>> --- a/include/drm/ttm/ttm_bo_driver.h
>> +++ b/include/drm/ttm/ttm_bo_driver.h
>> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>                     struct ttm_resource *new_mem);
>>   /**
>> - * ttm_bo_move_accel_cleanup.
>> + * ttm_bo_move_sync_cleanup.
>>    *
>>    * @bo: A pointer to a struct ttm_buffer_object.
>>    * @new_mem: struct ttm_resource indicating where to move.
>> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>    * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed
>>    * by the caller to be idle. Typically used after memcpy buffer moves.
>>    */
>> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object 
>> *bo,
>> -                        struct ttm_resource *new_mem)
>> -{
>> -    int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem);
>> -
>> -    WARN_ON(ret);
>> -}
>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
>> +                  struct ttm_resource *new_mem);
>>   /**
>>    * ttm_bo_pipeline_gutting.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11 10:06   ` Matthew Auld
@ 2022-04-11 10:45     ` Thomas Hellström
  2022-04-11 12:44       ` Christian König
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Hellström @ 2022-04-11 10:45 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Nirmoy Das, Lucas De Marchi, Christian König, dri-devel


On 4/11/22 12:06, Matthew Auld wrote:
> On 11/04/2022 10:56, Thomas Hellström wrote:
>> Hi, Matthew
>>
>> On 4/11/22 10:56, Matthew Auld wrote:
>>> If we hit the sync case, like when skipping clearing for kernel 
>>> internal
>>> objects, or when falling back to cpu clearing, like in i915, we end up
>>> trying to add a NULL fence, but with some recent changes in this area
>>> this now just results in NULL deref in dma_resv_add_fence:
>>>
>>> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 
>>> 0000000000000008
>>> <1>[    5.466384] #PF: supervisor read access in kernel mode
>>> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
>>> <6>[    5.466386] PGD 0 P4D 0
>>> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 
>>> 5.18.0-rc2-CI-CI_DRM_11481+ #1
>>> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
>>> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 
>>> 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 
>>> 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 
>>> 60 a5 0a 82
>>> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
>>> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 
>>> 00000000ffffffff
>>> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: 
>>> ffffffff8233f087
>>> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 
>>> 0000000000000001
>>> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 
>>> 0000000000000000
>>> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: 
>>> ffff88810a88d600
>>> <4>[    5.466401] FS:  00007f5fa1193540(0000) 
>>> GS:ffff88845d880000(0000) knlGS:0000000000000000
>>> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 
>>> 00000000003706e0
>>> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>> 0000000000000000
>>> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>>> 0000000000000400
>>> <4>[    5.466404] Call Trace:
>>> <4>[    5.466405]  <TASK>
>>> <4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
>>> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
>>> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
>>> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
>>> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 
>>> [ttm]
>>> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
>>> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
>>> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
>>> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
>>> <4>[    5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915]
>>> <4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
>>> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
>>> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
>>> <4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
>>> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
>>> <4>[    5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 
>>> [i915]
>>> <4>[    5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915]
>>>
>>> In the ttm_bo_move_sync_cleanup() case it seems we only really care
>>> about calling ttm_bo_wait_free_node(), so let's instead just call that
>>> directly.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>>>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>>>   2 files changed, 18 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index bc5190340b9c..1cbfb00c1d65 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct 
>>> ttm_buffer_object *bo,
>>>   }
>>>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
>>> +                  struct ttm_resource *new_mem)
>>> +{
>>> +    struct ttm_device *bdev = bo->bdev;
>>> +    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
>>> new_mem->mem_type);
>>> +    int ret;
>>> +
>>> +    ret = ttm_bo_wait_free_node(bo, man->use_tt);
>>> +    if (WARN_ON(ret))
>>> +        return;
>>> +
>>> +    ttm_bo_assign_mem(bo, new_mem);
>>> +}
>>> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
>>
>> I don't think this will help in the case where we call 
>> ttm_bo_move_accel_cleanup() with a NULL fence....
>
> Hmm, do you know if that case actually exists? I thought the only 
> current user passing fence == NULL was ttm_bo_move_sync_cleanup().


Yes, indeed, you're right. Not sure if any other driver hits that, though.

/Thomas



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11  8:56 [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup Matthew Auld
  2022-04-11  9:56 ` Thomas Hellström
@ 2022-04-11 12:39 ` Christian König
  2022-04-12 12:08   ` Matthew Auld
  2022-04-11 20:05 ` Thomas Zimmermann
  2 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2022-04-11 12:39 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Thomas Hellström, Lucas De Marchi, dri-devel, Nirmoy Das

Am 11.04.22 um 10:56 schrieb Matthew Auld:
> If we hit the sync case, like when skipping clearing for kernel internal
> objects, or when falling back to cpu clearing, like in i915, we end up
> trying to add a NULL fence, but with some recent changes in this area
> this now just results in NULL deref in dma_resv_add_fence:
>
> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008
> <1>[    5.466384] #PF: supervisor read access in kernel mode
> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
> <6>[    5.466386] PGD 0 P4D 0
> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1
> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82
> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff
> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087
> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001
> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000
> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600
> <4>[    5.466401] FS:  00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000
> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0
> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> <4>[    5.466404] Call Trace:
> <4>[    5.466405]  <TASK>
> <4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm]
> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
> <4>[    5.467094]  ____i915_gem_object_get_pages+0x32/0xf0 [i915]
> <4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
> <4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
> <4>[    5.467704]  intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915]
> <4>[    5.467833]  intel_crtc_initial_plane_config+0x83/0x340 [i915]
>
> In the ttm_bo_move_sync_cleanup() case it seems we only really care
> about calling ttm_bo_wait_free_node(), so let's instead just call that
> directly.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>

Ideally we wouldn't export that to drivers, but that's a different problem.

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>   2 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index bc5190340b9c..1cbfb00c1d65 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>   }
>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>   
> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> +			      struct ttm_resource *new_mem)
> +{
> +	struct ttm_device *bdev = bo->bdev;
> +	struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type);
> +	int ret;
> +
> +	ret = ttm_bo_wait_free_node(bo, man->use_tt);
> +	if (WARN_ON(ret))
> +		return;
> +
> +	ttm_bo_assign_mem(bo, new_mem);
> +}
> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
> +
>   /**
>    * ttm_bo_pipeline_gutting - purge the contents of a bo
>    * @bo: The buffer object
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 059a595e14e5..897b88f0bd59 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>   			      struct ttm_resource *new_mem);
>   
>   /**
> - * ttm_bo_move_accel_cleanup.
> + * ttm_bo_move_sync_cleanup.
>    *
>    * @bo: A pointer to a struct ttm_buffer_object.
>    * @new_mem: struct ttm_resource indicating where to move.
> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>    * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed
>    * by the caller to be idle. Typically used after memcpy buffer moves.
>    */
> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> -					    struct ttm_resource *new_mem)
> -{
> -	int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem);
> -
> -	WARN_ON(ret);
> -}
> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> +			      struct ttm_resource *new_mem);
>   
>   /**
>    * ttm_bo_pipeline_gutting.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11 10:45     ` Thomas Hellström
@ 2022-04-11 12:44       ` Christian König
  0 siblings, 0 replies; 8+ messages in thread
From: Christian König @ 2022-04-11 12:44 UTC (permalink / raw)
  To: Thomas Hellström, Matthew Auld, intel-gfx
  Cc: Nirmoy Das, Lucas De Marchi, dri-devel

Am 11.04.22 um 12:45 schrieb Thomas Hellström:
>
> On 4/11/22 12:06, Matthew Auld wrote:
>> On 11/04/2022 10:56, Thomas Hellström wrote:
>>> Hi, Matthew
>>>
>>> On 4/11/22 10:56, Matthew Auld wrote:
>>>> If we hit the sync case, like when skipping clearing for kernel 
>>>> internal
>>>> objects, or when falling back to cpu clearing, like in i915, we end up
>>>> trying to add a NULL fence, but with some recent changes in this area
>>>> this now just results in NULL deref in dma_resv_add_fence:
>>>>
>>>> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 
>>>> 0000000000000008
>>>> <1>[    5.466384] #PF: supervisor read access in kernel mode
>>>> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
>>>> <6>[    5.466386] PGD 0 P4D 0
>>>> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 
>>>> 5.18.0-rc2-CI-CI_DRM_11481+ #1
>>>> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
>>>> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 
>>>> 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 
>>>> 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 
>>>> 48 3d 60 a5 0a 82
>>>> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
>>>> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 
>>>> 00000000ffffffff
>>>> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: 
>>>> ffffffff8233f087
>>>> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 
>>>> 0000000000000001
>>>> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 
>>>> 0000000000000000
>>>> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: 
>>>> ffff88810a88d600
>>>> <4>[    5.466401] FS:  00007f5fa1193540(0000) 
>>>> GS:ffff88845d880000(0000) knlGS:0000000000000000
>>>> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 
>>>> 00000000003706e0
>>>> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>>> 0000000000000000
>>>> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>>>> 0000000000000400
>>>> <4>[    5.466404] Call Trace:
>>>> <4>[    5.466405]  <TASK>
>>>> <4>[    5.466406] ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
>>>> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
>>>> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
>>>> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
>>>> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 
>>>> [ttm]
>>>> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
>>>> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
>>>> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
>>>> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
>>>> <4>[    5.467094] ____i915_gem_object_get_pages+0x32/0xf0 [i915]
>>>> <4>[    5.467210] __i915_gem_object_get_pages+0x89/0xa0 [i915]
>>>> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
>>>> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
>>>> <4>[    5.467570] i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
>>>> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
>>>> <4>[    5.467704] intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 
>>>> [i915]
>>>> <4>[    5.467833] intel_crtc_initial_plane_config+0x83/0x340 [i915]
>>>>
>>>> In the ttm_bo_move_sync_cleanup() case it seems we only really care
>>>> about calling ttm_bo_wait_free_node(), so let's instead just call that
>>>> directly.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>>>>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>>>>   2 files changed, 18 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> index bc5190340b9c..1cbfb00c1d65 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct 
>>>> ttm_buffer_object *bo,
>>>>   }
>>>>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>>>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
>>>> +                  struct ttm_resource *new_mem)
>>>> +{
>>>> +    struct ttm_device *bdev = bo->bdev;
>>>> +    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
>>>> new_mem->mem_type);
>>>> +    int ret;
>>>> +
>>>> +    ret = ttm_bo_wait_free_node(bo, man->use_tt);
>>>> +    if (WARN_ON(ret))
>>>> +        return;
>>>> +
>>>> +    ttm_bo_assign_mem(bo, new_mem);
>>>> +}
>>>> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
>>>
>>> I don't think this will help in the case where we call 
>>> ttm_bo_move_accel_cleanup() with a NULL fence....
>>
>> Hmm, do you know if that case actually exists? I thought the only 
>> current user passing fence == NULL was ttm_bo_move_sync_cleanup().
>
>
> Yes, indeed, you're right. Not sure if any other driver hits that, 
> though.

At least when I came up with that ~halve a year ago I double checked all 
drivers to not call ttm_bo_move_accel_cleanup() without a fence.

But I also somehow missed ttm_bo_move_sync_cleanup(), so take that with 
a grain of salt.

Christian.

>
> /Thomas
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11  8:56 [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup Matthew Auld
  2022-04-11  9:56 ` Thomas Hellström
  2022-04-11 12:39 ` Christian König
@ 2022-04-11 20:05 ` Thomas Zimmermann
  2 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-04-11 20:05 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Thomas Hellström, Lucas De Marchi, Christian König,
	dri-devel, Nirmoy Das


[-- Attachment #1.1: Type: text/plain, Size: 6060 bytes --]



Am 11.04.22 um 10:56 schrieb Matthew Auld:
> If we hit the sync case, like when skipping clearing for kernel internal
> objects, or when falling back to cpu clearing, like in i915, we end up
> trying to add a NULL fence, but with some recent changes in this area
> this now just results in NULL deref in dma_resv_add_fence:
> 
> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 0000000000000008
> <1>[    5.466384] #PF: supervisor read access in kernel mode
> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
> <6>[    5.466386] PGD 0 P4D 0
> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 5.18.0-rc2-CI-CI_DRM_11481+ #1
> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 a5 0a 82
> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 00000000ffffffff
> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: ffffffff8233f087
> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 0000000000000001
> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 0000000000000000
> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: ffff88810a88d600
> <4>[    5.466401] FS:  00007f5fa1193540(0000) GS:ffff88845d880000(0000) knlGS:0000000000000000
> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 00000000003706e0
> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> <4>[    5.466404] Call Trace:
> <4>[    5.466405]  <TASK>
> <4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm]
> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
> <4>[    5.467094]  ____i915_gem_object_get_pages+0x32/0xf0 [i915]
> <4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
> <4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
> <4>[    5.467704]  intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 [i915]
> <4>[    5.467833]  intel_crtc_initial_plane_config+0x83/0x340 [i915]
> 
> In the ttm_bo_move_sync_cleanup() case it seems we only really care
> about calling ttm_bo_wait_free_node(), so let's instead just call that
> directly.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>

with bochs on ppc64le.

> ---
>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>   2 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index bc5190340b9c..1cbfb00c1d65 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>   }
>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>   
> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> +			      struct ttm_resource *new_mem)
> +{
> +	struct ttm_device *bdev = bo->bdev;
> +	struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type);
> +	int ret;
> +
> +	ret = ttm_bo_wait_free_node(bo, man->use_tt);
> +	if (WARN_ON(ret))
> +		return;
> +
> +	ttm_bo_assign_mem(bo, new_mem);
> +}
> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
> +
>   /**
>    * ttm_bo_pipeline_gutting - purge the contents of a bo
>    * @bo: The buffer object
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 059a595e14e5..897b88f0bd59 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>   			      struct ttm_resource *new_mem);
>   
>   /**
> - * ttm_bo_move_accel_cleanup.
> + * ttm_bo_move_sync_cleanup.
>    *
>    * @bo: A pointer to a struct ttm_buffer_object.
>    * @new_mem: struct ttm_resource indicating where to move.
> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
>    * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed
>    * by the caller to be idle. Typically used after memcpy buffer moves.
>    */
> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> -					    struct ttm_resource *new_mem)
> -{
> -	int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem);
> -
> -	WARN_ON(ret);
> -}
> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
> +			      struct ttm_resource *new_mem);
>   
>   /**
>    * ttm_bo_pipeline_gutting.

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup
  2022-04-11 12:39 ` Christian König
@ 2022-04-12 12:08   ` Matthew Auld
  0 siblings, 0 replies; 8+ messages in thread
From: Matthew Auld @ 2022-04-12 12:08 UTC (permalink / raw)
  To: Christian König, intel-gfx
  Cc: Thomas Hellström, Lucas De Marchi, dri-devel, Nirmoy Das

On 11/04/2022 13:39, Christian König wrote:
> Am 11.04.22 um 10:56 schrieb Matthew Auld:
>> If we hit the sync case, like when skipping clearing for kernel internal
>> objects, or when falling back to cpu clearing, like in i915, we end up
>> trying to add a NULL fence, but with some recent changes in this area
>> this now just results in NULL deref in dma_resv_add_fence:
>>
>> <1>[    5.466383] BUG: kernel NULL pointer dereference, address: 
>> 0000000000000008
>> <1>[    5.466384] #PF: supervisor read access in kernel mode
>> <1>[    5.466385] #PF: error_code(0x0000) - not-present page
>> <6>[    5.466386] PGD 0 P4D 0
>> <4>[    5.466387] Oops: 0000 [#1] PREEMPT SMP NOPTI
>> <4>[    5.466389] CPU: 5 PID: 267 Comm: modprobe Not tainted 
>> 5.18.0-rc2-CI-CI_DRM_11481+ #1
>> <4>[    5.466391] RIP: 0010:dma_resv_add_fence+0x63/0x260
>> <4>[    5.466395] Code: 38 85 c0 0f 84 df 01 00 00 0f 88 e8 01 00 00 
>> 83 c0 01 0f 88 df 01 00 00 8b 05 35 89 10 01 49 8d 5e 68 85 c0 0f 85 
>> 45 01 00 00 <48> 8b 45 08 48 3d c0 a5 0a 82 0f 84 5c 01 00 00 48 3d 60 
>> a5 0a 82
>> <4>[    5.466396] RSP: 0018:ffffc90000e974f8 EFLAGS: 00010202
>> <4>[    5.466397] RAX: 0000000000000001 RBX: ffff888123e88b28 RCX: 
>> 00000000ffffffff
>> <4>[    5.466398] RDX: 0000000000000001 RSI: ffffffff822e4f50 RDI: 
>> ffffffff8233f087
>> <4>[    5.466399] RBP: 0000000000000000 R08: ffff8881313dbc80 R09: 
>> 0000000000000001
>> <4>[    5.466399] R10: 0000000000000001 R11: 00000000da354294 R12: 
>> 0000000000000000
>> <4>[    5.466400] R13: ffff88810927dc58 R14: ffff888123e88ac0 R15: 
>> ffff88810a88d600
>> <4>[    5.466401] FS:  00007f5fa1193540(0000) 
>> GS:ffff88845d880000(0000) knlGS:0000000000000000
>> <4>[    5.466402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4>[    5.466402] CR2: 0000000000000008 CR3: 0000000106dd6003 CR4: 
>> 00000000003706e0
>> <4>[    5.466403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> <4>[    5.466404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> <4>[    5.466404] Call Trace:
>> <4>[    5.466405]  <TASK>
>> <4>[    5.466406]  ttm_bo_move_accel_cleanup+0x62/0x270 [ttm]
>> <4>[    5.466411]  ? i915_rsgt_from_buddy_resource+0x185/0x1e0 [i915]
>> <4>[    5.466529]  i915_ttm_move+0xfd/0x430 [i915]
>> <4>[    5.466833]  ? dma_resv_reserve_fences+0x4e/0x320
>> <4>[    5.466836]  ? ttm_bo_add_move_fence.constprop.20+0xf7/0x140 [ttm]
>> <4>[    5.466841]  ttm_bo_handle_move_mem+0xa1/0x140 [ttm]
>> <4>[    5.466845]  ttm_bo_validate+0xee/0x160 [ttm]
>> <4>[    5.466849]  __i915_ttm_get_pages+0x4f/0x210 [i915]
>> <4>[    5.466976]  i915_ttm_get_pages+0xad/0x140 [i915]
>> <4>[    5.467094]  ____i915_gem_object_get_pages+0x32/0xf0 [i915]
>> <4>[    5.467210]  __i915_gem_object_get_pages+0x89/0xa0 [i915]
>> <4>[    5.467323]  i915_vma_get_pages+0x114/0x1d0 [i915]
>> <4>[    5.467446]  i915_vma_pin_ww+0xd3/0xa90 [i915]
>> <4>[    5.467570]  i915_vma_pin.constprop.10+0x119/0x1b0 [i915]
>> <4>[    5.467700]  ? __mutex_unlock_slowpath+0x3e/0x2b0
>> <4>[    5.467704]  intel_alloc_initial_plane_obj.isra.6+0x1a9/0x390 
>> [i915]
>> <4>[    5.467833]  intel_crtc_initial_plane_config+0x83/0x340 [i915]
>>
>> In the ttm_bo_move_sync_cleanup() case it seems we only really care
>> about calling ttm_bo_wait_free_node(), so let's instead just call that
>> directly.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
> 
> Ideally we wouldn't export that to drivers, but that's a different problem.
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>

Thanks. Would you be able to merge this?

> 
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 15 +++++++++++++++
>>   include/drm/ttm/ttm_bo_driver.h   | 11 +++--------
>>   2 files changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index bc5190340b9c..1cbfb00c1d65 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -572,6 +572,21 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>   }
>>   EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
>> +                  struct ttm_resource *new_mem)
>> +{
>> +    struct ttm_device *bdev = bo->bdev;
>> +    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
>> new_mem->mem_type);
>> +    int ret;
>> +
>> +    ret = ttm_bo_wait_free_node(bo, man->use_tt);
>> +    if (WARN_ON(ret))
>> +        return;
>> +
>> +    ttm_bo_assign_mem(bo, new_mem);
>> +}
>> +EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
>> +
>>   /**
>>    * ttm_bo_pipeline_gutting - purge the contents of a bo
>>    * @bo: The buffer object
>> diff --git a/include/drm/ttm/ttm_bo_driver.h 
>> b/include/drm/ttm/ttm_bo_driver.h
>> index 059a595e14e5..897b88f0bd59 100644
>> --- a/include/drm/ttm/ttm_bo_driver.h
>> +++ b/include/drm/ttm/ttm_bo_driver.h
>> @@ -245,7 +245,7 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>                     struct ttm_resource *new_mem);
>>   /**
>> - * ttm_bo_move_accel_cleanup.
>> + * ttm_bo_move_sync_cleanup.
>>    *
>>    * @bo: A pointer to a struct ttm_buffer_object.
>>    * @new_mem: struct ttm_resource indicating where to move.
>> @@ -253,13 +253,8 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>    * Special case of ttm_bo_move_accel_cleanup where the bo is guaranteed
>>    * by the caller to be idle. Typically used after memcpy buffer moves.
>>    */
>> -static inline void ttm_bo_move_sync_cleanup(struct ttm_buffer_object 
>> *bo,
>> -                        struct ttm_resource *new_mem)
>> -{
>> -    int ret = ttm_bo_move_accel_cleanup(bo, NULL, true, false, new_mem);
>> -
>> -    WARN_ON(ret);
>> -}
>> +void ttm_bo_move_sync_cleanup(struct ttm_buffer_object *bo,
>> +                  struct ttm_resource *new_mem);
>>   /**
>>    * ttm_bo_pipeline_gutting.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-12 12:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-11  8:56 [PATCH] drm/ttm: stop passing NULL fence in ttm_bo_move_sync_cleanup Matthew Auld
2022-04-11  9:56 ` Thomas Hellström
2022-04-11 10:06   ` Matthew Auld
2022-04-11 10:45     ` Thomas Hellström
2022-04-11 12:44       ` Christian König
2022-04-11 12:39 ` Christian König
2022-04-12 12:08   ` Matthew Auld
2022-04-11 20:05 ` Thomas Zimmermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).