All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH 4/5] drm/xe: Prevent evicting for page tables
Date: Fri, 26 May 2023 14:35:12 +0200	[thread overview]
Message-ID: <b6be6ecf-5737-0f2a-2e7b-7737ef52af50@linux.intel.com> (raw)
In-Reply-To: <20230526121101.1619278-5-maarten.lankhorst@linux.intel.com>


On 5/26/23 14:11, Maarten Lankhorst wrote:
> When creating page tables from xe_exec_ioctl, we may end up freeing
> memory we just validated. To be certain this does not happen, do not
> allow the current reservation to be evicted from the ioctl.
>
> Callchain:
> [  109.008522]  xe_bo_move_notify+0x5c/0xf0 [xe]
> [  109.008548]  xe_bo_move+0x90/0x510 [xe]
> [  109.008573]  ttm_bo_handle_move_mem+0xb7/0x170 [ttm]
> [  109.008581]  ttm_bo_swapout+0x15e/0x360 [ttm]
> [  109.008586]  ttm_device_swapout+0xc2/0x110 [ttm]
> [  109.008592]  ttm_global_swapout+0x47/0xc0 [ttm]
> [  109.008598]  ttm_tt_populate+0x7a/0x130 [ttm]
> [  109.008603]  ttm_bo_handle_move_mem+0x160/0x170 [ttm]
> [  109.008609]  ttm_bo_validate+0xe5/0x1d0 [ttm]
> [  109.008614]  ttm_bo_init_reserved+0xac/0x190 [ttm]
> [  109.008620]  __xe_bo_create_locked+0x153/0x260 [xe]
> [  109.008645]  xe_bo_create_locked_range+0x77/0x360 [xe]
> [  109.008671]  xe_bo_create_pin_map_at+0x33/0x1f0 [xe]
> [  109.008695]  xe_bo_create_pin_map+0x11/0x20 [xe]
> [  109.008721]  xe_pt_create+0x69/0xf0 [xe]
> [  109.008749]  xe_pt_stage_bind_entry+0x208/0x430 [xe]
> [  109.008776]  xe_pt_walk_range+0xe9/0x2a0 [xe]
> [  109.008802]  xe_pt_walk_range+0x223/0x2a0 [xe]
> [  109.008828]  xe_pt_walk_range+0x223/0x2a0 [xe]
> [  109.008853]  __xe_pt_bind_vma+0x28d/0xbd0 [xe]
> [  109.008878]  xe_vm_bind_vma+0xc7/0x2f0 [xe]
> [  109.008904]  xe_vm_rebind+0x72/0x160 [xe]
> [  109.008930]  xe_exec_ioctl+0x22b/0xa70 [xe]
> [  109.008955]  drm_ioctl_kernel+0xb9/0x150 [drm]
> [  109.008972]  drm_ioctl+0x210/0x430 [drm]
> [  109.008988]  __x64_sys_ioctl+0x85/0xb0
> [  109.008990]  do_syscall_64+0x38/0x90
> [  109.008991]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>
> Original warning:
> [ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe]
> ...
> [ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe]
> [ 5613.316098] Call Trace:
> [ 5613.318595]  <TASK>
> [ 5613.320743]  xe_exec_ioctl+0x383/0x8a0 [xe]
> [ 5613.325278]  ? __is_insn_slot_addr+0x8e/0x110
> [ 5613.329719]  ? __is_insn_slot_addr+0x8e/0x110
> [ 5613.334116]  ? kernel_text_address+0x75/0xf0
> [ 5613.338429]  ? __pfx_stack_trace_consume_entry+0x10/0x10
> [ 5613.343778]  ? __kernel_text_address+0x9/0x40
> [ 5613.348181]  ? unwind_get_return_address+0x1a/0x30
> [ 5613.353013]  ? __pfx_stack_trace_consume_entry+0x10/0x10
> [ 5613.358362]  ? arch_stack_walk+0x99/0xf0
> [ 5613.362329]  ? rcu_read_lock_sched_held+0xb/0x70
> [ 5613.366996]  ? lock_acquire+0x287/0x2f0
> [ 5613.370873]  ? rcu_read_lock_sched_held+0xb/0x70
> [ 5613.375530]  ? rcu_read_lock_sched_held+0xb/0x70
> [ 5613.380181]  ? lock_release+0x225/0x2e0
> [ 5613.384059]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> [ 5613.389092]  drm_ioctl_kernel+0xc0/0x170
> [ 5613.393068]  drm_ioctl+0x1b7/0x490
> [ 5613.396519]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
> [ 5613.401547]  ? lock_release+0x225/0x2e0
> [ 5613.405432]  __x64_sys_ioctl+0x8a/0xb0
> [ 5613.409232]  do_syscall_64+0x37/0x90
>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/239

Did you look at passing around the ttm_operation_ctx, or a 
"allow_res_evict" bool?
In any case would be good to have this fixed asap, so

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>   drivers/gpu/drm/xe/xe_bo.c | 2 +-
>   drivers/gpu/drm/xe/xe_bo.h | 7 ++++---
>   drivers/gpu/drm/xe/xe_pt.c | 3 ++-
>   3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 0db9c05097d0..8735facb1cf9 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1130,7 +1130,7 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>   	drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
>   
>   	if (resv) {
> -		ctx.allow_res_evict = true;
> +		ctx.allow_res_evict = !(flags & XE_BO_CREATE_NO_RESV_EVICT);
>   		ctx.resv = resv;
>   	}
>   
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index a1c51cc0ac3c..dd27d8c7f3b0 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -27,9 +27,10 @@
>   #define XE_BO_CREATE_GGTT_BIT		BIT(5)
>   #define XE_BO_CREATE_IGNORE_MIN_PAGE_SIZE_BIT BIT(6)
>   #define XE_BO_CREATE_PINNED_BIT		BIT(7)
> -#define XE_BO_DEFER_BACKING		BIT(8)
> -#define XE_BO_SCANOUT_BIT		BIT(9)
> -#define XE_BO_FIXED_PLACEMENT_BIT	BIT(10)
> +#define XE_BO_CREATE_NO_RESV_EVICT	BIT(8)
> +#define XE_BO_DEFER_BACKING		BIT(9)
> +#define XE_BO_SCANOUT_BIT		BIT(10)
> +#define XE_BO_FIXED_PLACEMENT_BIT	BIT(11)
>   /* this one is trigger internally only */
>   #define XE_BO_INTERNAL_TEST		BIT(30)
>   #define XE_BO_INTERNAL_64K		BIT(31)
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index f15282996c3b..30de6e902a8e 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -219,7 +219,8 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_gt *gt,
>   				  ttm_bo_type_kernel,
>   				  XE_BO_CREATE_VRAM_IF_DGFX(gt) |
>   				  XE_BO_CREATE_IGNORE_MIN_PAGE_SIZE_BIT |
> -				  XE_BO_CREATE_PINNED_BIT);
> +				  XE_BO_CREATE_PINNED_BIT |
> +				  XE_BO_CREATE_NO_RESV_EVICT);
>   	if (IS_ERR(bo)) {
>   		err = PTR_ERR(bo);
>   		goto err_kfree;

  reply	other threads:[~2023-05-26 12:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26 12:10 [Intel-xe] [PATCH 0/5] Robustness fixes with eviction and invalidation Maarten Lankhorst
2023-05-26 12:10 ` [Intel-xe] [PATCH 1/5] drm/xe: Kill small race with userptr invalidation Maarten Lankhorst
2023-05-26 12:26   ` Thomas Hellström
2023-05-26 12:10 ` [Intel-xe] [PATCH 2/5] drm/xe: Add debugfs files to evict everything Maarten Lankhorst
2023-05-26 13:02   ` Thomas Hellström
2023-05-29 13:35     ` Maarten Lankhorst
2023-05-30 14:41       ` Thomas Hellström
2023-05-30 14:54         ` Maarten Lankhorst
2023-05-26 12:10 ` [Intel-xe] [PATCH 3/5] drm/xe: Fix extobj dropping issue Maarten Lankhorst
2023-05-26 12:31   ` Thomas Hellström
2023-05-26 12:11 ` [Intel-xe] [PATCH 4/5] drm/xe: Prevent evicting for page tables Maarten Lankhorst
2023-05-26 12:35   ` Thomas Hellström [this message]
2023-05-29 13:44     ` Maarten Lankhorst
2023-05-29 15:02       ` Thomas Hellström
2023-05-29 15:11         ` Maarten Lankhorst
2023-05-29 15:13           ` Thomas Hellström
2023-05-29 15:23             ` Maarten Lankhorst
2023-05-30  8:45               ` Thomas Hellström
2023-05-30  8:50                 ` Maarten Lankhorst
2023-05-26 12:11 ` [Intel-xe] [PATCH 5/5] drm/xe: Return the correct error when dma_resv_wait_timeout fails Maarten Lankhorst
2023-05-26 12:40   ` Thomas Hellström
2023-05-26 19:15   ` Souza, Jose
2023-05-27  5:17     ` Christopher Snowhill
2023-05-29 15:21       ` Maarten Lankhorst
2023-05-26 12:13 ` [Intel-xe] ✓ CI.Patch_applied: success for Robustness fixes with eviction and invalidation Patchwork
2023-05-26 12:15 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-05-26 12:19 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-05-26 12:48 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6be6ecf-5737-0f2a-2e7b-7737ef52af50@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.