All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH 4/5] drm/xe: Prevent evicting for page tables
Date: Mon, 29 May 2023 17:23:29 +0200	[thread overview]
Message-ID: <a8e9c44a-84b3-36f6-ba89-d05962bc3491@linux.intel.com> (raw)
In-Reply-To: <0d897527-d80b-5fd9-f338-2a5d83d6c682@linux.intel.com>


On 2023-05-29 17:13, Thomas Hellström wrote:
>
> On 5/29/23 17:11, Maarten Lankhorst wrote:
>> Hey,
>>
>> On 2023-05-29 17:02, Thomas Hellström wrote:
>>> On 5/29/23 15:44, Maarten Lankhorst wrote:
>>>> On 2023-05-26 14:35, Thomas Hellström wrote:
>>>>> On 5/26/23 14:11, Maarten Lankhorst wrote:
>>>>>> When creating page tables from xe_exec_ioctl, we may end up freeing
>>>>>> memory we just validated. To be certain this does not happen, do not
>>>>>> allow the current reservation to be evicted from the ioctl.
>>>>>>
>>>>>> Callchain:
>>>>>> [  109.008522]  xe_bo_move_notify+0x5c/0xf0 [xe]
>>>>>> [  109.008548]  xe_bo_move+0x90/0x510 [xe]
>>>>>> [  109.008573]  ttm_bo_handle_move_mem+0xb7/0x170 [ttm]
>>>>>> [  109.008581]  ttm_bo_swapout+0x15e/0x360 [ttm]
>>>>>> [  109.008586]  ttm_device_swapout+0xc2/0x110 [ttm]
>>>>>> [  109.008592]  ttm_global_swapout+0x47/0xc0 [ttm]
>>>>>> [  109.008598]  ttm_tt_populate+0x7a/0x130 [ttm]
>>>>>> [  109.008603]  ttm_bo_handle_move_mem+0x160/0x170 [ttm]
>>>>>> [  109.008609]  ttm_bo_validate+0xe5/0x1d0 [ttm]
>>>>>> [  109.008614]  ttm_bo_init_reserved+0xac/0x190 [ttm]
>>>>>> [  109.008620]  __xe_bo_create_locked+0x153/0x260 [xe]
>>>>>> [  109.008645]  xe_bo_create_locked_range+0x77/0x360 [xe]
>>>>>> [  109.008671]  xe_bo_create_pin_map_at+0x33/0x1f0 [xe]
>>>>>> [  109.008695]  xe_bo_create_pin_map+0x11/0x20 [xe]
>>>>>> [  109.008721]  xe_pt_create+0x69/0xf0 [xe]
>>>>>> [  109.008749]  xe_pt_stage_bind_entry+0x208/0x430 [xe]
>>>>>> [  109.008776]  xe_pt_walk_range+0xe9/0x2a0 [xe]
>>>>>> [  109.008802]  xe_pt_walk_range+0x223/0x2a0 [xe]
>>>>>> [  109.008828]  xe_pt_walk_range+0x223/0x2a0 [xe]
>>>>>> [  109.008853]  __xe_pt_bind_vma+0x28d/0xbd0 [xe]
>>>>>> [  109.008878]  xe_vm_bind_vma+0xc7/0x2f0 [xe]
>>>>>> [  109.008904]  xe_vm_rebind+0x72/0x160 [xe]
>>>>>> [  109.008930]  xe_exec_ioctl+0x22b/0xa70 [xe]
>>>>>> [  109.008955]  drm_ioctl_kernel+0xb9/0x150 [drm]
>>>>>> [  109.008972]  drm_ioctl+0x210/0x430 [drm]
>>>>>> [  109.008988]  __x64_sys_ioctl+0x85/0xb0
>>>>>> [  109.008990]  do_syscall_64+0x38/0x90
>>>>>> [  109.008991]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>>
>>>>>> Original warning:
>>>>>> [ 5613.149126] WARNING: CPU: 3 PID: 45883 at drivers/gpu/drm/xe/xe_vm.c:504 xe_vm_unlock_dma_resv+0x43/0x50 [xe]
>>>>>> ...
>>>>>> [ 5613.226398] RIP: 0010:xe_vm_unlock_dma_resv+0x43/0x50 [xe]
>>>>>> [ 5613.316098] Call Trace:
>>>>>> [ 5613.318595]  <TASK>
>>>>>> [ 5613.320743]  xe_exec_ioctl+0x383/0x8a0 [xe]
>>>>>> [ 5613.325278]  ? __is_insn_slot_addr+0x8e/0x110
>>>>>> [ 5613.329719]  ? __is_insn_slot_addr+0x8e/0x110
>>>>>> [ 5613.334116]  ? kernel_text_address+0x75/0xf0
>>>>>> [ 5613.338429]  ? __pfx_stack_trace_consume_entry+0x10/0x10
>>>>>> [ 5613.343778]  ? __kernel_text_address+0x9/0x40
>>>>>> [ 5613.348181]  ? unwind_get_return_address+0x1a/0x30
>>>>>> [ 5613.353013]  ? __pfx_stack_trace_consume_entry+0x10/0x10
>>>>>> [ 5613.358362]  ? arch_stack_walk+0x99/0xf0
>>>>>> [ 5613.362329]  ? rcu_read_lock_sched_held+0xb/0x70
>>>>>> [ 5613.366996]  ? lock_acquire+0x287/0x2f0
>>>>>> [ 5613.370873]  ? rcu_read_lock_sched_held+0xb/0x70
>>>>>> [ 5613.375530]  ? rcu_read_lock_sched_held+0xb/0x70
>>>>>> [ 5613.380181]  ? lock_release+0x225/0x2e0
>>>>>> [ 5613.384059]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
>>>>>> [ 5613.389092]  drm_ioctl_kernel+0xc0/0x170
>>>>>> [ 5613.393068]  drm_ioctl+0x1b7/0x490
>>>>>> [ 5613.396519]  ? __pfx_xe_exec_ioctl+0x10/0x10 [xe]
>>>>>> [ 5613.401547]  ? lock_release+0x225/0x2e0
>>>>>> [ 5613.405432]  __x64_sys_ioctl+0x8a/0xb0
>>>>>> [ 5613.409232]  do_syscall_64+0x37/0x90
>>>>>>
>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/239
>>>>> Did you look at passing around the ttm_operation_ctx, or a "allow_res_evict" bool?
>>>>> In any case would be good to have this fixed asap, so
>>>>>
>>>>> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>>
>>>> I considered it, but the original callchain was too long. I don't think there is any usecase
>>>> in which we want to evict from the current context to make room for new pagetables for VM_BIND.
>>>> Anything locked is most likely used, making room by evicting from current VM (or its bound extobjs)
>>>> will likely lead to ENOSPC anyway.
>>> Well, I think the use-case where this will cause problems is if we're doing a single VM_BIND on a brand new VRAM BO, and need to evict other VRAM bos from the same VM to make room.
>>>
>>> This will then ofc ENOSPC on the next exec, but if we were to introduce a two-pass validation scheme, where we explicitly move suitable BOs with multiple placement options to TT on the first ENOSPC, we could avoid that...
>> Allowing same-reservation eviction will allow you to evict the BO its VM_BIND page table, leaving no entries to write. :-)
>
> Aren't those pinned?
>
> /Thomas

Not the bo itself.

~Maarten


  reply	other threads:[~2023-05-29 15:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26 12:10 [Intel-xe] [PATCH 0/5] Robustness fixes with eviction and invalidation Maarten Lankhorst
2023-05-26 12:10 ` [Intel-xe] [PATCH 1/5] drm/xe: Kill small race with userptr invalidation Maarten Lankhorst
2023-05-26 12:26   ` Thomas Hellström
2023-05-26 12:10 ` [Intel-xe] [PATCH 2/5] drm/xe: Add debugfs files to evict everything Maarten Lankhorst
2023-05-26 13:02   ` Thomas Hellström
2023-05-29 13:35     ` Maarten Lankhorst
2023-05-30 14:41       ` Thomas Hellström
2023-05-30 14:54         ` Maarten Lankhorst
2023-05-26 12:10 ` [Intel-xe] [PATCH 3/5] drm/xe: Fix extobj dropping issue Maarten Lankhorst
2023-05-26 12:31   ` Thomas Hellström
2023-05-26 12:11 ` [Intel-xe] [PATCH 4/5] drm/xe: Prevent evicting for page tables Maarten Lankhorst
2023-05-26 12:35   ` Thomas Hellström
2023-05-29 13:44     ` Maarten Lankhorst
2023-05-29 15:02       ` Thomas Hellström
2023-05-29 15:11         ` Maarten Lankhorst
2023-05-29 15:13           ` Thomas Hellström
2023-05-29 15:23             ` Maarten Lankhorst [this message]
2023-05-30  8:45               ` Thomas Hellström
2023-05-30  8:50                 ` Maarten Lankhorst
2023-05-26 12:11 ` [Intel-xe] [PATCH 5/5] drm/xe: Return the correct error when dma_resv_wait_timeout fails Maarten Lankhorst
2023-05-26 12:40   ` Thomas Hellström
2023-05-26 19:15   ` Souza, Jose
2023-05-27  5:17     ` Christopher Snowhill
2023-05-29 15:21       ` Maarten Lankhorst
2023-05-26 12:13 ` [Intel-xe] ✓ CI.Patch_applied: success for Robustness fixes with eviction and invalidation Patchwork
2023-05-26 12:15 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-05-26 12:19 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-05-26 12:48 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8e9c44a-84b3-36f6-ba89-d05962bc3491@linux.intel.com \
    --to=maarten.lankhorst@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.