[PATCH v3 00/22] Refactor VM bind code

* [PATCH v3 00/22] Refactor VM bind code
@ 2024-02-06 23:37 Matthew Brost
  2024-02-06 23:37 ` [PATCH v3 01/22] drm/xe: Lock all gpuva ops during VM bind IOCTL Matthew Brost
                   ` (25 more replies)
  0 siblings, 26 replies; 28+ messages in thread
From: Matthew Brost @ 2024-02-06 23:37 UTC (permalink / raw)
  To: intel-xe; +Cc: thomas.hellstrom, Matthew Brost

Implement proper error handling for VM bind IOCTL by allowing failures
of GPU memory allocation (either system or VRAM) to be propagated to
user without corrupting VM state. Mainly implemented by converting VM
bind IOCTL to 1 job per IOCTL rather than potential many jobs.

Broken into roughly 4 parts:
Part 1: Prep patches, patches 1-11
Part 2: 1 jobs per VM bind IOCTL and error handling, patches 12-17
Part 3: CPU binds, patches 18-21
Part 4: Error injection for testing, patch 22

For reviewing, let's focus on part 1 for now and see if patches from
that part can start to get merged.

Tested with [1] and new error handling appears to be working. Also
tested with existing tests at every patch in the series and should be
working at every patch in the series.

Matt

[1] https://patchwork.freedesktop.org/series/129606/

Matthew Brost (22):
  drm/xe: Lock all gpuva ops during VM bind IOCTL
  drm/xe: Add ops_execute function which returns a fence
  drm/xe: Move migrate to prefetch to op_lock funtion
  drm/xe: Add struct xe_vma_ops abstraction
  drm/xe: Update xe_vm_rebind to use dummy VMA operations
  drm/xe: Simplify VM bind IOCTL error handling and cleanup
  drm/xe: Update pagefaults to use dummy VMA operations
  drm/xe: s/xe_tile_migrate_engine/xe_tile_migrate_exec_queue
  drm/xe: Add vm_bind_ioctl_ops_install_fences helper
  drm/xe: Move setting last fence to vm_bind_ioctl_ops_install_fences
  drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use
    this
  drm/xe: Add some members to xe_vma_ops
  drm/xe: Add xe_vm_pgtable_update_op to xe_vma_ops
  drm/xe: Convert multiple bind ops into single job
  drm/xe: Remove old functions defs in xe_pt.h
  drm/xe: Update PT layer with better error handling
  drm/xe: Update VM trace events
  drm/xe: Update clear / populate arguments
  drm/xe: Add __xe_migrate_update_pgtables_cpu helper
  drm/xe: CPU binds for jobs
  drm/xe: Don't use migrate exec queue for page fault binds
  drm/xe: Add VM bind IOCTL error injection

 drivers/gpu/drm/xe/xe_bo.c                  |    7 +-
 drivers/gpu/drm/xe/xe_bo.h                  |    4 +-
 drivers/gpu/drm/xe/xe_device.c              |   35 +
 drivers/gpu/drm/xe/xe_device.h              |    2 +
 drivers/gpu/drm/xe/xe_device_types.h        |   16 +
 drivers/gpu/drm/xe/xe_exec.c                |   25 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |   10 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |   60 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |    3 +
 drivers/gpu/drm/xe/xe_guc_submit.c          |   47 +-
 drivers/gpu/drm/xe/xe_migrate.c             |  387 ++----
 drivers/gpu/drm/xe/xe_migrate.h             |   46 +-
 drivers/gpu/drm/xe/xe_pt.c                  | 1223 ++++++++++++-------
 drivers/gpu/drm/xe/xe_pt.h                  |   15 +-
 drivers/gpu/drm/xe/xe_pt_types.h            |   53 +
 drivers/gpu/drm/xe/xe_sched_job.c           |   24 +-
 drivers/gpu/drm/xe/xe_sched_job_types.h     |   31 +-
 drivers/gpu/drm/xe/xe_trace.h               |   10 +-
 drivers/gpu/drm/xe/xe_vm.c                  |  980 +++++++--------
 drivers/gpu/drm/xe/xe_vm.h                  |    7 +
 drivers/gpu/drm/xe/xe_vm_types.h            |  198 +--
 21 files changed, 1773 insertions(+), 1410 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 28+ messages in thread